postmodern-text-searching
Postmodern Examples Table of Contents Arrays
Searching
Introductions to indexing and searching
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-query.html
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
Text Indexing
There are a couple of different ways to index a posgresql database, You could do the following to create an index, but the whole thing must be reindexed or it goes out of date. (And reindexing takes a long time)
CREATE INDEX idx_countries_text ON countries USING gin (to_tsvector('english'::regconfig, text))
A different way of indexing is to add a ts_vector column to the table as discussed in http://linuxgazette.net/164/sephton.html. Then you search against the ts_vector column, not the index.
ALTER TABLE countries ADD COLUMN tsv tsvector; UPDATE countries SET tsv = to_tsvector('english', text); CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON countries FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(tsv, 'pg_catalog.english', text);
Using this method, the search query is simpler, using the tsv column.:
(query "select id,name,text from countries where tsv @@ to_tsquery('english', $1)" "search-term1 & searchterm2")
Or using the :raw keyword
(postmodern:query (:select 'id 'name :from 'countries :where
(:raw
"to_tsvector('english' , text ) @@ to_tsquery('english' , $1)")) final-search-string)
Searching
Postgres provides 2 functions for matching against tsvectors: plainto_tsquery and to_tsquery. plainto_tsquery takes every word in the search string and assumes that they should all be connected by an "and" operator ('&').
to_tsquery() provides boolean abilities, but each word in the search string must be separated by a boolean operator. ('&' '|' or '!') (and, or, not). Unfortunately, it will throw up on text which doesn't have the separators (no phrases). It will not accept text which is not separated by '&', '|' or '!'. It does accept braces, which are used to indicate operator precidence, but two tokens which translate to different lexemes directly after one another will cause totsquery() to fail.
select to_tsquery('wonderful text'); ERROR: syntax error in tsquery: "wonderful text" select to_tsquery('wonderful | text'); to_tsquery ----------------------
'wonderful' | 'text'
Simple search
(query "select id,name from countries where to_tsvector('english', text) @@ to_tsquery('english', 'oid')")(query (:select 'id 'name :from 'countries :where
(:raw
"to_tsvector('english', text) @@ to_tsquery('english', 'oid')")))
After indexing, this seems to work. The following postmodern query will return the obvious records quickly.
(query "select id,name from countries where to_tsvector('english', text) @@ to_tsquery('english', $1)" "technical & India")
The string "fat & (rat | cat) & ! dog" will look for records that have the word "fat" and either the word "rat" or the word "cat" but cannot also have the word "dog".
The string "super:*" will look for records with a words that begins with "super".
Here is a function searching against a code database:
(defun search-code (search-string &optional (limit 20))
"Tries to run a text search on a string through source-code-functions2. By using plainto_tsquery any words passed
to search code are treated as connected by 'and'. So a search-string like 'graphviz edge ignore' will return the
source code for functions which have all three words."
(postmodern:query (:limit (:order-by (:select 'id 'name 'package-name 'code
:from (:as (:select 'id 'name 'package-name 'code 'tsv
:from 'source-code-functions2
(:as (:plainto_tsquery '$1) 'q)
:where (:@@ 'tsv 'q))
't1))
(:desc (:ts-rank-cd 't1.tsv (:plainto-tsquery '$2))))
'$3)
search-string search-string limit))
Ranking
select id,name,ts_rank_cd(to_tsvector(countries.text), to_tsquery('stamp')) as rank from countries where to_tsvector(countries.text) @@ to_tsquery('stamp')
Snippets with the search text in bold
select id,name,ts_headline(countries.text, to_tsquery('stamp')) as snippet from countries where to_tsvector(countries.text) @@ to_tsquery('stamp')
More information at Datatype text search