前些日子做了个 apache solr 应用的入门介绍,也在博客记录下,方便新手看看。以搜索论坛帖子为示例。
1、先下载 Apache Solr 1.3 http://apache.etoak.com/lucene/solr/1.3.0/apache-solr-1.3.0.zip,解压到如 E:\apache-solr-1.3.0。
2、下载 Apache Tomcat 6.0.18 http://labs.xiaonei.com/apache-mirror/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip,解压到如 E:\apache-tomcat-6.0.18。
3、solr 安装到 tomcat。修改 E:\apache-tomcat-6.0.18\conf\server.xml,加个 URIEncoding="UTF-8",把 8080 的那一块改为:
- <Connector port="8080" protocol="HTTP/1.1"
- connectionTimeout="20000"
- redirectPort="8443" URIEncoding="UTF-8"/>
把下面的内容保存到 E:\apache-tomcat-6.0.18\conf\Catalina\localhost\solr.xml,没有这个目录自行创建。
- <Context docBase="E:/apache-solr-1.3.0/dist/apache-solr-1.3.0.war" reloadable="true" >
- <Environment name="solr/home" type="java.lang.String" value="E:/apache-solr-1.3.0/example/solr" override="true" />
- </Context>
solr 的更多方式请看:solr install
4、现在安装好,启动 tomcat,并打开 http://localhost:8080/solr/admin/ 看看界面。
5、为搜索论坛帖子应用设计索引结构:
字段 | 说明 |
---|---|
id | 帖子 id |
user | 发表用户名或UserId |
title | 标题 |
content | 内容 |
timestamp | 发表时间 |
text | 把标题和内容放到这里,可以用同时搜索这些内容。 |
6、上面的索引结构告诉 solr,把下面的内容覆盖 E:\apache-solr-1.3.0\example\solr\conf\scheam.xml,(可以先备份这文件,方便以后看官方示例):
- <?xml version="1.0" encoding="UTF-8" ?>
- <schema name="example" version="1.1">
- <types>
- <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
- <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
- <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
- is a more restricted form of the canonical representation of dateTime
- http://www.w3.org/TR/xmlschema-2/#dateTime
- The trailing "Z" designates UTC time and is mandatory.
- Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
- All other components are mandatory.
- Expressions can also be used to denote calculations that should be
- performed relative to "NOW" to determine the value, ie...
- NOW/HOUR
- ... Round to the start of the current hour
- NOW-1DAY
- ... Exactly 1 day prior to now
- NOW/DAY+6MONTHS+3DAYS
- ... 6 months and 3 days in the future from the start of
- the current day
- Consult the DateField javadocs for more information.
- -->
- <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
- <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
- <analyzer>
- <tokenizer class="solr.CJKTokenizerFactory"/>
- </analyzer>
- </fieldType>
- </types>
- <fields>
- <field name="id" type="sint" indexed="true" stored="true" required="true" />
- <field name="user" type="string" indexed="true" stored="true"/>
- <field name="title" type="text" indexed="true" stored="true"/>
- <field name="content" type="text" indexed="true" stored="true" />
- <field name="timestamp" type="date" indexed="true" stored="true" default="NOW"/>
- <!-- catchall field, containing all other searchable text fields (implemented
- via copyField further on in this schema -->
- <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
- </fields>
- <!-- Field to use to determine and enforce document uniqueness.
- Unless this field is marked with required="false", it will be a required field
- -->
- <uniqueKey>id</uniqueKey>
- <!-- field for the QueryParser to use when an explicit fieldname is absent -->
- <defaultSearchField>text</defaultSearchField>
- <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
- <solrQueryParser defaultOperator="AND"/>
- <!-- copyField commands copy one field to another at the time a document
- is added to the index. It's used either to index the same field differently,
- or to add multiple fields to the same field for easier/faster searching. --