Patches

1 Patch: Bibliographic subset searches fail due to unordered doc IDs
2 Patch: object merge failure in word searching: OS-X only.
3 Patch: KWIC hit centering in artfl_kwic and artfl_sorted_kwic
4 Patch: Problem searching divlevel=note using all-caps "NOTES"
5 Patch: Collocation tables with Greek or other unicode not displaying properly (ie, at all!)
6 Patch: Fixing a message typo
7 Patch: Previous and Next arrows are not aligned properly, and styles don't apply properly to them
8 Patch: Typo in search3t in PhiloLogic 3.1 (multilingual version)
9 Patch: Dublin Core metadata fails to get a date in some cases
10 Concordance report include header elements
11 Enabling metadata searches containing ampersands (&)
12 Div level searches (headwords) that don't run when using ^ and $
13 Patch: Similarity Search Word Select Checkboxes Fail to Work
14 Patch: install hangs while making search forms
15 Patch: Line 992 in search3t says 'sprint' instead of 'sprintf'
16 Patch: concheadlinedico and getKWICtitledico in philosubs.pl
17 subdocgimme: add parent object to children for searching
18 Patch: Field-delimited sort Semantics Broken in OSX 10.5
19 Affichage de plusieurs versions d'un texte
20 Search3t: Fails to print number of hits
21 Missing/broken print statements for sprintf

Patch: Bibliographic subset searches fail due to unordered doc IDs

The problem: multidoc searching can return incomplete results if the selected document ids are passed to search3t out of ascending numeric order.

Put the patch for this in two places in search3t:

if ($gotmultidocid) {

open (TTFILE, ">$CORPUS");

if ($PhiloDeBugDump) {

print "<p><tt>DEBUG: multidocid: ";

}

@mphilodocids = sort numerically @mphilodocids; ## the patch

foreach $multidocid (@mphilodocids){

if ($PhiloDeBugDump) {

print " $multidocid ";

}

if ($multidocid =~ /^[0-9]*$/) {

print TTFILE pack("i", $multidocid);

}

And at the very bottom of the file:

sub numerically { $a <=> $b };

This patch can be found in the theme-rheme script, too, FYI.

Patch: object merge failure in word searching: OS-X only.

Bibliographic and Div Level Object word search merge failure in search3t Mac OS-X. This may be due to slight differences in the unpack function of perl under OS-X. We have not seen this problem on Linux installations. Condition: combining bibliographic and div level object searching fails to limit the search to selected bibliogarphic items.

PATCH: in search3t subroutine searchsubdoctablei:

CHANGE $y = unpack("s", $bindocid);

TO $y = unpack("i", $bindocid);

CHANGE $y = unpack("s", $thisdiv);

TO $y = unpack("i", $thisdiv);

This should work for all installations and will be put into the next release. Text on Linux before installing.

Patch: KWIC hit centering in artfl_kwic and artfl_sorted_kwic

Both of these have a routine which adds spaces to the left of a hit to force centering of the display on a line by adding multiple spaces (0x20 characters) which are normalized by browsers and thus do not work (as indicated by the comment). At about line 185, fix the section

# actually, padding doesn't seem to work ;(

$padding = ($lhalf - $title_length - $left_length - $pagenum_length);

$padding = "%-" . $padding . "s";

$left = sprintf ($padding, $left);

by replacing

$padding = "%-" . $padding . "s";

$left = sprintf ($padding, $left);

with

$left = " " x $padding . $left;

And yes, feel free to remove the comment indicating that this did not seem to work. :-) Thanks to Michael Beddow for this one.

Patch: Problem searching divlevel=note using all-caps "NOTES"

Searching divlevel=NOTES produces skewed results (too many) because of the way subdocgimme and gimme evaluate regular expressions with NOT. Simple patch:

change

elsif ($qval =~ /^NOT/) {

$qval =~ s/^NOT //g;

$qval =~ s/[ACEINOUY]/$ACCENTS{$&}/ge;

$query1 .= " $qnam NOT regexp \"$qval\" ";

}

by adding a space after the first NOT:

elsif ($qval =~ /^NOT /) {

$qval =~ s/^NOT //g;

$qval =~ s/[ACEINOUY]/$ACCENTS{$&}/ge;

$query1 .= " $qnam NOT regexp \"$qval\" ";

}

in both gimme and subdocgimme

Patch: Collocation tables with Greek or other unicode not displaying properly (ie, at all!)

Patch: artfl_pole.pl

Two locations, I think:

if ($w =~/[a-z]/) {

$leftfreq{$w}++;

$allfreq{$w}++;

}

if ($w =~/[a-z]/) {

$rightfreq{$w}++;

$allfreq{$w}++;

}

In both cases, I was checking to see if there is at least a letter, but for Greek Unicode, this fails.

The patch is to add Unicode characters:

if ($w =~/[a-z\200-\377]/) {

$leftfreq{$w}++;

$allfreq{$w}++;

}

if ($w =~/[a-z\200-\377]/) {

$rightfreq{$w}++;

$allfreq{$w}++;

}

Patch: Fixing a message typo

Table of Contents pages say this at top:

"Click here to run a search on selected parts of this documents."

"documents" should be singular.

Fix in english.messages.pl $philomessage[240]

Patch: Previous and Next arrows are not aligned properly, and styles don't apply properly to them

There are unecessary paragraph tags in the Previous and Next links, and for some reason styles don't seem to apply to them very well. You can fix that by making contextualize.pl and getobject.pl look more like this:

if ($pagenumber) {

if ( !$ChainLinksRestricted ) {

$pagetag = $o_t . "." . $doc . ":" . ($pagenumber - 2) . "." . $dbname;

$href = "<A class='navlink' HREF=\"" . $PAGESERVER . "?$pagetag\">";

$previouslink = $href . sprintf($philomessage[306], $ptmvo1) . "</A>\n";

}

elsif ( ($seq-1 < 0 ? 1-$seq : $seq-1) <= $ChainLinksLimit ) {

$pagetag = $o_t . "." . $hlist . "." . $dbname . "." .

$number . "." . $conj . "." . $wn . "." . ($seq - 1);

$href = "<A class='navlink' HREF=\"" . "$PHILOCGI/contextualize.pl" . "?$pagetag\">";

$previouslink = $href . sprintf($philomessage[306], $ptmvo1) . "</A>\n";

}

} else {

$previouslink = "";

}

if ( $forward ) {

if ( !$ChainLinksRestricted ) {

$pagetag = $o_t . "." . $doc . ":" . ($pagenumber) . "." . $dbname;

$href = "<A class='navlink' HREF=\"" . $PAGESERVER . "?$pagetag\">";

$nextlink = $href . sprintf($philomessage[307], $ptmvo1) . "</A>\n";

}

elsif ( ($seq+1 < 0 ? -$seq-1 : $seq+1) <= $ChainLinksLimit ) {

$pagetag = $o_t . "." . $hlist . "." . $dbname . "." .

$number . "." . $conj . "." . $wn . "." . ($seq + 1);

$href = "<A class='navlink' HREF=\"" . "$PHILOCGI/contextualize.pl" . "?$pagetag\">";

$nextlink = $href . sprintf($philomessage[307], $ptmvo1) . "</A>\n";

}

} else {

$nextlink = "";

}

Just cut out the <p> tags and add class='navlink' to the anchors. Then add a definition for that style to your header.

Patch: Typo in search3t in PhiloLogic 3.1 (multilingual version)

Bug: word search in div or subdiv object fails silently.

At or about line 960 in search3t change

print "<p>" . sprint($philomessage[35], $divindexarg2display);

To

print "<p>" . sprintf($philomessage[35], $divindexarg2display);

Note the missing "f".

Patch: Dublin Core metadata fails to get a date in some cases

In cases where the value of DC.date have non-numeric characters, no date is propagated. Examples

In utils/dublin.extract.plin change

$creationdate = $dublincore{"date"};

$sourcedate = $creationdate;

if ($sourcedate =~ /[A-Za-z\-\[\]\.\?\*\;\/ \<\>\,]/) {

if ($publicationdate =~ m/([0-9][0-9][0-9][0-9])/) {

$sourcedate = $1;

}

to

$creationdate = $dublincore{"date"};

$publicationdate = $creationdate;

$sourcedate = $creationdate;

if ($sourcedate =~ /[A-Za-z\-\[\]\.\?\*\;\/ \<\>\,]/) {

if ($sourcedate =~ m/([0-9][0-9][0-9][0-9])/) {

$sourcedate = $1;

}

Reason: attempt to evaluate $publicationdate when it is not set. Note that source date is set to the first string of 4 digits. The original date as in the DC is put in fields 10 (and 9 in fact) in the bibliography file. We want a four digit year in this field as it is evaluted as an integer by the mySQL loader.

Concordance report include header elements

Concordance reports read bytes without respect to the XML document structure. This means that you can get crud from the header where you hit is close to the beginning of the document. In philosubs.pl ConcFormat add right at the top:

# A Hall of Shame, right off the start

if ($bf =~ /<\/teiheader>/i) {

$bf =~ s/<\/teiheader>/XXXXZZZZ/i;

($temp, $bf) = split (/XXXXZZZZ/, $bf) ;

}

Yes, this is ugly. For some reason, we had problems trying to match all to the tag. In a real parser, we could do this nicely, but why parse an artbitrary string of bytes?

Enabling metadata searches containing ampersands (&)

It is possible that a title or some other field might contain an ampersand. If the search string for such a field contains the & character, that search will not run without the following patch added to gimme and philosubs (I am assuming ((I think)) that the entry in "bibliography" renders the amp as an sgml ent).

First, in philosubs, in sub clean_corpus_pattern, add this substitution:

$gimme_arg =~ s/\+\&\+/\+\\\&\+/g; # Escape "&" in titles... grrrr.

This sends the escaped amp to gimme, where you need this:

# Now, run the query and pump out the results. This should be

# split up...

if ($query1 =~ / \& /i) {

$query1 =~ s/ \& / \& /gi;

}

&go;

Try it. You'll like it.

Div level searches (headwords) that don't run when using ^ and $

When a headword search using the ^ (especially) yields no results, could be that the original div head contains lots of extra junk, like extra tags or extra spaces. If in divindex.raw the divheads are indeed messy, just run a little perl script to clean it up.

For example, I wrote this to get rid of extra tags, spaces, and other stuff for a dictionary:

#! /usr/bin/perl

while (<>) {

$incoming = $_;

$incoming =~ s/<([^>]*)>//gi;

$incoming =~ s/\t */\t/gi;

$incoming =~ s/ \) //gi;

$incoming =~ s/ \( //gi;

$incoming =~ s/ */ /gi;

$incoming =~ s/ , /, /gi;

print $incoming;

}

Before you run it, move the dividx.raw to something like dividx.raw-orig. Then run your clean up on it, outputting to dividx.raw. Reload and enjoy successful results.

Patch: Similarity Search Word Select Checkboxes Fail to Work

In word similarity searches, you may see a list of words, frequencies, and a checkbox for each one. You should be able to select one or more and search for these (with or without bibliographic criteria). If you select one or more words and this fails, returning instead a bibliography, check the out put to see if each checkbox entry is given a value.

If the value is empty, such as

Edit the Subroutine "SimilarWord" in philosubs.pl:

($freq, $searchword) = split(/ /,$asimilarword); # This split

$displword = $searchword; # fails if uniq

$simwordbuffer .= $leftpaddings . $freq . " "; # uses tab

Changing the space in the split statment to a tab "\t". Some versions of UNIX command uniq use a tab rather than space to delimit counts from entries (words).

Patch: install hangs while making search forms

This is probably a bug that existed in the first version of Philologic 3. We kept trying to get examples and if there weren't any at all to be found (or if you are very, very, very, very unlucky) this could go on forever. You can get rid of this by changing your philologic install. Edit philologic/installdir/lib/makeforms_gold.pl.plin and philologic/installdir/lib/makeforms_gold.pl:

while ($bibexample eq '') {

$randomindex = rand $bibindex;

$bibexample = $bibarray[$randomindex][$colindex];

}

should be changed to:

$trialcounter = 0;

while ($bibexample eq '' && $trialcounter < 100) {

$trialcounter++;

$randomindex = rand $bibindex;

$bibexample = $bibarray[$randomindex][$colindex];

}

Although this bug was fixed in later Philo3 release, another very similar patch hasn't made it into the distribution. This same hanging problem can occur if you have subdiv objects that don't have any examples to be found. The fix is very similar... change the following, in philologic/installdir/lib/makeforms_gold.pl and philologic/installdir/lib/makeforms_gold.pl.plin:

while ($subexample eq '') {

$randomindex = rand $subindex;

$subexample = $subarray[$randomindex][$colindex];

}

to:

$trialcounter = 0;

while ($subexample eq '' && $trialcounter < 100) {

$trialcounter++;

$randomindex = rand $subindex;

$subexample = $subarray[$randomindex][$colindex];

}

Same with:

foreach $col (@divhasvalues) {

if ($col) {

$divexample = '';

while ($divexample eq '') {

$randomindex = rand $divindex;

$divexample = $divarray[$randomindex][$colindex];

}

$divexamples[$colindex] = $divexample;

}

$colindex++;

}

...should be...

foreach $col (@divhasvalues) {

if ($col) {

$divexample = '';

$trialcounter = 0;

# Try 100 times to get a random example

while ($divexample eq ' && $trialcounter < 100) {

$trialcounter++;

$randomindex = rand $divindex;

$divexample = $divarray[$randomindex][$colindex];

}

$divexamples[$colindex] = $divexample;

}

$colindex++;

}

Patch: Line 992 in search3t says 'sprint' instead of 'sprintf'

Patch: concheadlinedico and getKWICtitledico in philosubs.pl

Two patches actually to address a couple of problems.

1) SQL divmetadata lookups. In both "Dictionary" text search result head generators, we are loading the entire contents of the divindex database into a hash keyed on the PhiloLogic div object id (nn:yy:...) and then looking up information about the title using the hash, such as:

if ( !$mvoreaddividx ) {

open( MVOBIB, $SYSTEM_DIR . "divindex.raw" );

while ( $mvobilioLine = <MVOBIB> ) {

@mvobibsplit = split( "\t", $mvobilioLine );

$BAYLEDIVIDX{ $mvobibsplit[0] } = $mvobibsplit[1];

$mvoreaddividx++;

}

close(MVOBIB);

}

[.....]

$conctit = $BAYLEDIVIDX{$x};

$x = the div object id. For large tables, this produces a noticable lag. Replace this with an SQL lookup instead.

$z = GetDivObjectMetadata($x);

@mvobibsplit = split("\t", $z);

$conctit = $mvobibsplit[1];

which calls the new subroutine:

sub GetDivObjectMetadata {

local ($metadataline, $res, $query, $objectid, $thistable, $sth, $r);

$objectid = $_[0];

if (!$DHBdivmetadata) {

$DBHdivmetadata = DBI->connect ($CONNECTSTRING, $USER, $PASSWD);

}

$thistable = $SQLDIVTABLENAME;

$query = "select * from $thistable where dgphilodivid = \"$objectid\" ";

if ($sth = $DBHdivmetadata->prepare($query)) {

$sth->execute();

while (@res = $sth->fetchrow ) {

foreach $r (@res) {

$metadataline .= $r . "\t";

}

$metadataline .= "\n";

}

return ($metadataline);

}

This pushes an SQL search and eliminates the creation of the hash. Also note that performance is further enhanced by adding an INDEX in load.subdoctables.sql:

create table TABLENAME (

dgphilodivid VARCHAR(250),

[....]

INDEX (dgphilodivid));

The combination greatly improves performance on large tables, such as the Encyclopedie article database (77,000 records). This applies to both Conc and KWIC headline functions.

2) In concheadlinedico hit highlighting bytes are not preserved for links to objects and pages. This is easily remedied by reverting to the original notation. Replace:

$headline =

'<a href="getobject.pl?c.'

. $x . '.'

. $dbname . '">'

. $conctit

. '</a>: ';

with

$headline = $ohref[1] . " " . $conctit . "</a>"; # MVO 03/08

subdocgimme: add parent object to children for searching

In subdocgimme, an object, such as an article, has children such as subarticles, I forgot to include the parent object address for word searching. All children of that object will be searched. A subtle bug that arises usually when searching for a single headword.

To fix, we're still testing this, in subdocgimme add a line. Change:

if (inkids) {

&PrintDivChildren($inkids);

}

to

if ($inkids) {

$inkids = $thisid . "|" . $inkids; # Patch MVO 03/06/08

&PrintDivChildren($inkids);

}

which simply adds the parent to the child object list. This applies to both SQL and egrep variants of subdocgimme.

Patch: Field-delimited sort Semantics Broken in OSX 10.5

This one is nasty. Leopard uses a version of GNU sort that breaks backwards compatability with our loader. You need to edit two makefiles to fix this, so as always, be very, very attentive to tabs, and empty newlines.

in /var/lib/philologic/loader.xmake:

#SORTFLAGS= -T . -y +0 -1 +1 -2n +2 -3n +3 -4n +4 -5n +5 -6n +6 -7n +7 -8n

SORTFLAGS= -T . -k 1,1 -k 2,2n -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n

in /var/lib/philologic/Makefile:

work/plain.files: bibliography

# ./utils/newdocs "TEXTS" | $(SORT) -n +0 -1 > $@

./utils/newdocs "TEXTS" | $(SORT) -n -k 1,1 > $@

And then in search3t, (AND search3torth), you need to make the following 5 edits.

around line 1806:

#open (SORTED, "| uniq -c | sort -nr +0 -1 > $PHILOTMP/generated.$$");

open (SORTED, "| uniq -c | sort -nr -k 1,1 > $PHILOTMP/generated.$$");

ca. 1950:

# open(MVOSORTED, "| sort -nr +0 -1 > $PHILOTMP/mvogenerated.$$");

open(MVOSORTED, "|sort -nr -k 1,1 > $PHILOTMP/mvogenerated.$$");

ca. 2025:

# open MVOSORTED, "| sort -nr +0 -1 > $PHILOTMP/mvogenerated.$$";

open MVOSORTED, "| sort -nr -k 1,1 > $PHILOTMP/mvogenerated.$$";

ca. 2119:

# open(MVOSORTED, "| sort -nr +0 -1 > $PHILOTMP/mvogenerated.$$");

open(MVOSORTED, "| sort -nr -k 1,1 > $PHILOTMP/mvogenerated.$$");

ca. 2404:

# open (SORTED, "| uniq -c | sort -nr +0 -1 > $PHILOTMP/generated.$$");

open (SORTED, "| uniq -c | sort -nr -k 1,1 > $PHILOTMP/generated.$$");

search3torth will need the same edits, if you use it.

Affichage de plusieurs versions d'un texte

Certains textes se présentent sous plusieurs formes ; le cas le plus fréquent est celui des langues : le même texte en anglais, en français, en espagnol... Mais il peut aussi s'agir de versions successives d'une même oeuvre, ou d'un texte et des gloses. Une solution pour traiter ce problème consiste à découper le document en parties de taille limitée (10 à 20 lignes max) et à découper toutes les versions (langues, formes), selon le même découpage ; puis à créer un seul document, chaque partie correspondant à un <div1> ; dans chaque <div>, on place les diverses formes avec le tag <p>, en précisant la nature de la forme, e.g : <p lang="german" n="2">...

Après quoi, il faut modifier légèrement le fichier /var/lib/philologic/utils/xml-sgmlloader.plin.

Aux lignes 411ss, après

print "sent $docid $DIV1 $DIV2 $DIV3 $PARA $SENT $parabyte"

ajouter:

if ($thetag =~ /lang=/) {

if ($printsqlsubdivtable) {

&makesqlsubdivrecord($thetag);

}

Si l'on fait une recherche sans préciser, elle s'effectue sur le fichier entier ; mais on peut limiter la recherche dans les options subdiv. Lorsque les occurrences sont affichées dans leur "context", il suffit de cliquer sur "section" pour obtenir un affichage complet de toute la partie concernée, ce qui permet de comparer très facilement les diverses langues / versions. (fonction : Mark Olsen !).

Search3t: Fails to print number of hits

On or about line 1289 of search3t change

sprintf($philomessage[176], $length);

to

print sprintf($philomessage[176], $length);

Missing/broken print statements for sprintf

There are several missing print statements for sprintf functions, so certain messages do not appear. We will make a list below:

search3t

Line 2462: sprintf($philomessage[187], int($length/200)+1);

Line 916 printf sprintf($philomessage[30], $length, $showthem) . "\n";

Change first printf to print (does not seem to matter, but let's make sure :-).

[edit]