全文檢索平台:solr 與 python 共舞

系統安裝

  • 下載/安裝 solr 系統
  • 預設資料區運行點:
  • 下載/安裝 solrpy 程式庫
  • 安裝於 Tomcat 步驟:
    • 新增 solr.xml 於 [Tomcat 安裝目錄]/conf/Catalina/localhost 目錄下,檔案內容:
      <?xml version="1.0" encoding="UTF-8" ?>
      
      <Context path="/solr">
      	<Environment 
      		name="solr/home" 
      		type="java.lang.String" 
      		value="[Solr 安裝目錄]/example/solr" 
      		override="true" />
      </Context>
      
    • 複製 [Solr 安裝目錄]/example/lib/ext 下所有檔案至 [Tomcat 安裝目錄]/lib。
    • 複製 [Solr 安裝目錄]/example/resources/log4j.properties 至 [Tomcat 安裝目錄]/lib。
    • 複製 [Solr 安裝目錄]/example/webapps/solr.war 至 [Tomcat 安裝目錄]/webapps。 

{
  "id": 1200,
  "category": "Systems",
  "title": "IBM servers and storage drive cloud and big data initiatives.",
  "description": "Want to extract the value of big data and analytics\? Looking to remake your enterprise IT infrastructure for the era of cloud\?                 IBM offers servers and storage solutions to meet your strategic imperatives. From mainframes to Flash, our breakthrough systems                  help transform your business.",
  "features": "PureFlex System: Combine the flexibility of a general purpose system, elasticity of cloud and simplicity of an appliance in                      integrated systems with built-in expertise."
}

運行注意事項

  • 運行方式:jave -jar start.jar
  • 預設服務埠號:8983
  • 預設資料集:collection1

  • 資料規範:collection1/conf/schema.xml

  • 系統管理介面:http://localhost:8983/solr/

Python 程式設計

# -*- coding: utf-8 -*-

import sys
import solr

# create a connection to a solr server
solrInst = solr.SolrConnection('http://localhost:8983/solr')

# add a document to the index
solrInst.add(
    id="1600",
	category="Perspectives",
    title="Experts debate: Does IT infrastructure determine business success\?",
    description="The world has changed. It's far different from two years ago, let alone 10. Businesses must act faster than competitors, respond
quicker to customer needs, create and market products and services more nimbly, and be always-on to stay relevant in an always-on
global economy. In the most recent debate, experts discussed which aspects of IT infrastructure matter most, to which industries,
and in what scenarios infrastructure plays a key role."
) solrInst.commit() # do a search response = solrInst.query('title:infrastructure', facet='true', facet_field=['category', 'title']) for hit in response.results: print "%s\t%s" % (hit['category'], hit['title'])

常用指定

清除所有資料: http://localhost:8983/solr/update?stream.body=<delete><query>*:*</query></delete>
http://localhost:8983/solr/update?stream.body=<commit/>

進階範例


外部資源

  • Solr 查詢語法
  • Solr 查詢簡易工具
ċ
solr-test.py
(1k)
李智,
2015年1月10日 下午5:28