Postmodern Examples Table of Contents Arrays


Text Indexing

There are a couple of different ways to index a posgresql database, You could do the following to create an index, but the whole thing must be reindexed or it goes out of date. (And reindexing takes a long time)

CREATE INDEX idx_countries_text ON countries USING gin (to_tsvector('english'::regconfig, text))

A different way of indexing is to add a ts_vector column to the table as discussed in Then you search against the ts_vector column, not the index.

ALTER TABLE countries ADD COLUMN tsv tsvector; UPDATE countries SET tsv = to_tsvector('english', text); CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON countries FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(tsv, 'pg_catalog.english', text);

Using this method, the search query is simpler, using the tsv column.:

(query "select id,name,text from countries where tsv @@ to_tsquery('english', $1)" "search-term1 & searchterm2")

Or using the :raw keyword

(postmodern:query (:select 'id 'name :from 'countries :where (:raw "to_tsvector('english' , text ) @@ to_tsquery('english' , $1)")) final-search-string)


Postgres provides 2 functions for matching against tsvectors: plainto_tsquery and to_tsquery. plainto_tsquery takes every word in the search string and assumes that they should all be connected by an "and" operator ('&').

to_tsquery() provides boolean abilities, but each word in the search string must be separated by a boolean operator. ('&' '|' or '!') (and, or, not). Unfortunately, it will throw up on text which doesn't have the separators (no phrases). It will not accept text which is not separated by '&', '|' or '!'. It does accept braces, which are used to indicate operator precidence, but two tokens which translate to different lexemes directly after one another will cause totsquery() to fail.

select to_tsquery('wonderful text'); ERROR: syntax error in tsquery: "wonderful text" select to_tsquery('wonderful | text'); to_tsquery ---------------------- 'wonderful' | 'text'

Simple search

(query "select id,name from countries where to_tsvector('english', text) @@ to_tsquery('english', 'oid')") (query (:select 'id 'name :from 'countries :where (:raw "to_tsvector('english', text) @@ to_tsquery('english', 'oid')")))

After indexing, this seems to work. The following postmodern query will return the obvious records quickly.

(query "select id,name from countries where to_tsvector('english', text) @@ to_tsquery('english', $1)" "technical & India")

The string "fat & (rat | cat) & ! dog" will look for records that have the word "fat" and either the word "rat" or the word "cat" but cannot also have the word "dog".

The string "super:*" will look for records with a words that begins with "super".


select id,name,ts_rank_cd(to_tsvector(countries.text), to_tsquery('stamp')) as rank from countries where to_tsvector(countries.text) @@ to_tsquery('stamp')

Snippets with the search text in bold

select id,name,ts_headline(countries.text, to_tsquery('stamp')) as snippet from countries where to_tsvector(countries.text) @@ to_tsquery('stamp')

More information at Datatype text search