Wiki‎ > ‎Useful links‎ > ‎


Load SQL after running philoload without SQL?

Fortunately, the SQL files are generated even if you don't want to load them at philoload time, so they are available to you. To enable SQL after a non-SQL load, do the following: (all commands take place inside the $SYSDIR:

  • Load the SQL files load.database.sql and load.subdoctables.sql:
mysql -u mysqlusername -p < load.database.sql
mysql -u mysqlusername -p < load.subdoctables.sql
  • Move gimme and subdocgimme out of the way, and replace them with the SQLified versions:
mv gimme gimme.old
mv subdocgimme subdocgimme.old
cp gimme.sql gimme
cp subdocgimme.sql subdocgimme
  • Edit philo-db.cfg to enable SQL. Edit these lines:
$SQLenabled = 1;
# mySQL interface variables. CONNECTSTRING, USER, and PASSWD (from
# philologic.cfg are passed to gimme.sql in order to open a perl
# DBI connection to the mySQL server.
$HOST = "localhost";
# $PASSWD = ""; # This should be set in /etc/philologic/philologic.cfg

Create a philologic mySQL database and user

Assuming you have installed mySQL on your system, connect to the server as the root user and

mysql> create database philologic;
Query OK, 1 row affected (0.03 sec)

mysql> GRANT ALL on philologic.* TO 'philologic'@localhost IDENTIFIED BY 'YOURPHILOPASSWD';
Query OK, 0 rows affected (0.00 sec)

and that should do it.

Use XCES or formats other than TEI-Lite with Philologic?

The underlying parsing system of PhiloLogic is abstract. So, it's possible to parse various kinds of input data.

  • Firstly, edit the file newextract.plin (usually in /var/lib/philologic/utils).
  • You will see lines like this:
@xptitles =  (
"teiheader/filedesc/sourcedesc/bibl/title/", # ARTFL
  • "Xptitles" defines where Philologic looks for the title of document. The path "teiheader/filedesc/sourcedesc/bibl/title/" means: "Get the the title in the tag "title", inside a tag "monogr", inside a tag "biblstrcut", ..., inside a tag "teiheader". If Philologic doesn't found this tag, it will try in "teiheader/filedesc/sourcedesc/biblstruct/monogr/title/" and so on.
  • There are 17 options (autors, date, etc) that you can chance, adding your owns XPATHs.</pre>

Enter searches using upper-case accented vowels and have them match the same in the text

PhiloLogic will fold some upper-case characters to lower case, including some accented characters. This causes problems for searching on these characters when entered in capitals, because in words.R, these are represented in lower case. The same case folding does not normally occur on search strings. However, we can make it happen in by editing clean_word_pattern, and adding lowercaseify and up2low:

sub lowercaseify() {
local ($theword);
$theword = $_[0];
# $theword =~ s/(\p{IsUpper})/\l$1/g;
$theword =~ s/\xc3([\x80-\x9E])/&up2low($1)/ge;
return ($theword);

sub up2low() {
local ($onechar, $rtn);
$onechar = $_[0];
$onechar =~ tr/\x80-\x9E/\xA0-\xBE/;
$rtn = "\xc3" . $onechar;
return $rtn;

# ----------------------------------------------------------------------
# clean_word_pattern: this modifies the user supplied word to search
# for. We may add a line to transform " OR " in to "|", etc.
# Called from: search2t
# ----------------------------------------------------------------------
sub clean_word_pattern {
local ($word);
$word = $_[0];
$word =~ s/%20/\+/g;
$word =~ s/[ \+]+OR[ \+]+/\|/g;
$word =~ s/[ \+]+AND[ \+]+/\+/g;
$word =~ s/%../pack("H2", substr($&,1))/ge; # Convert Mime
$word = &postfix2UTF8($word);
$word =~ s/^\++//; # Strip Lead Spaces
$word =~ s/\++$//; # Strip Trailing Spaces
$word =~ s/\++/\+/g; # Remove Multiple Space
$word =~ s/\++\|\++/\|/g; # Delete Spaces around
# Convert * to .* where the preceeding character is not a ".", or
# "]" or ")" in order to allow the normal construct word* rather than
# requiring word.*
$word =~ s/([^\.\]\)])\*/$1\.\*/g;
$word =~ s/^\*/\.\*/;
$word = &lowercaseify($word);
return $word;

How to tokenize user-entered keywords for metadata searching

This is turned off by default. To turn it on, look for these lines in gimme, or subdocgimme and uncomment the substitutions.

# A poor man's tokenizer could be put in here by uncommenting the
# two lines below. This takes the "@" and translates it into
# a pattern to force word matching between non-alpha characters or
# at the beginning of a string, or end at the end of a string. You
# would probably need to check values, but this is known to work
# as is. Should this be set in the configuration?
# $query1 =~ s/\@([a-zA-Z0-9\[])/(\[^a-zA-Z0-9\]|^)$1/g;
# $query1 =~ s/([a-zA-Z0-9\]])\@/$1\([^a-zA-Z0-9\]|\$)/g;

This will allow you to tokenize your search using '@' as your anchor. For exemple, if you're looking for the word 'fun', you'll type @fun@