PhiloLogic Version 3 Release Notes
All test samples are loaded without any database specific
customizations (except to point dictionary linking to appropriate
resources), using the load argument
from the directory containing the texts. We want this to run out of the box for this set of tests.
All language arrays are copied into each database lib directory. Selection of base language is done in philo-db.cfg.Please note that we have NOT translated system generated search forms. We have found that search forms and headers are frequently heavily modified by users and administrators. We have also opted not to support dynamic selection in the distribution, but this would be a trivial function. If we find we need to do it, we will add a the patch to the PhiloLogic wiki. If you add this, please let us know.
http://philologic.uchicago.edu/philologic3/docsouth.htmlTry "christ.* BUTNOT christ[im].*" (no quotes). Concise notation "christ.*|-christ[im].*" (Note the final ".*" required to distinguish from "[im]*").
Ajax note handler ... a configuration selectable note displayer that uses Ajax to get the note from the server and taggle display in the text rather than a pop-up browser window. Example:
http://cassat.uchicago.edu/cgi-bin/philologic/getobject.pl?c.8:3.lincolnWorks for ARTFL TEI formatted note tags and ATE (see below).
An experimental OS-X GUI loader. For those allergic to command line computing, this is an alternative to the command line loader and offers options. Proof-of-concept at this point.NON-TEI encoding scheme support: ATE, DocBook, and plaintext.
Plaintext by popular demand. Yes indeed, we have had people ask for it to be included in PhiloLogic. Tested on Gutenberg and Liberliber documents. Character conversion to UTF-8 from whatever character encoding you might get them in is *strongly recommended*, because we can have result pages that mix materials from different documents. Example, Gutenberg Spanish and German documents:
http://philologic.uchicago.edu/philo3002demo/gutenberg.form.htmlNote that PhiloLogic will handle earlier character representations, but some modification to headers, etc. would be required.
DocBook (Prototype support). Could be extended if there is interest. Again, proposed by a PhiloLogic user/hacker. Example:
http://philologic.uchicago.edu/philo3002demo/docbooklit.form.htmlThis is not fully supported, but could be if there is sufficient demand.
ATE: ARTFL Text Encoding. This is/was an intermediate internal encoding scheme consisting of Dublin Core headers, HTML (reasonably handled), with optional tagging for things like pages, notes, sentences, and so on, very lightly documented at http://philologic.uchicago.edu/ATE/ Example:
http://philologic.uchicago.edu/philo3002demo/lincoln.form.htmlCaveat emptor: PhiloLogic will probably load arbitrary HTML, but this may not always work, particularly if you have use of
New internal search engine (search3). Resolves library incompatibility bugs in new Linux releases noted in search2. Extensible in new ways and supports full object searching. The Linux and OS-X installations now have 64 bit index addressing, so this should be able to handle about a terabyte of TEI encoded text data.
NOT text search operator: Try "NOT christ jesus NOT christ" (no quotes) as a test in docsouth or EEBO. Concise regex notation: !chr.st.? jesus !chr.st.?
Searching for and in divs by type, head, as well as fields extracted from opener/closer, author/signed, dateline, salutation. The table for divs also has fields for id, n, and lang -- being populated if found -- and placename, classification and partofspeech (not populated at the moment, future use). Merges biblio and object searches.
Full word searching on selected subdiv objects: lg, note, epigraph, sp, and a couple of others. You can search on tag -- lg -- and type (hymn). Merges biblio and object searches. Fields in this table are tag, type, n, id, who, lang, which are being populated when data is found.
SQL subdoc object management. This includes dynamic terms buttons which give frequencies of values with other values selected in the same object level. This is also required support to standoff nested object mark-up.
Automatic generation of "whizbang" search form templates with examples drawn from your data.
Reimplemented "more hits" ... a sliding list of twenty blocks. The block size and number of block are set in philo-db.cfg
No limit on search results ... well, a million. This is set in the general philo configuration.
In single document searching, user may select any object. Multiply included objects ... selecting a div1 and then a div3 in that div1 are ... are filtered out to avoid repeats.
KWIC resorting option on left and/or right contexts as well as selected bibliographic information.
Extensive debugging information, enabled only from philo-db.cfg as a security measure.
Standard support for ARTFL TEI Lite recommendations, including metadata, notes, etc. Consult our local encoding recommendations.
Metadata extraction in the poor man's extractor for TEI, MEP, and CES. Textload is known to handle all three. ARTFL Text Encoding (ATE) is in another set of recognizers.
Textload now has a configuration file in the philologic home, which allows you to define parameters for the load.
Word count per document (standard) and FREQUENCY PACKAGE.
ADD-ONS: Full support for various dictionary look-ups. Enabled from configuration.
# Enable dictionary look-up function. Set to 0 to turn it off.
Additional code in "goodies" to hook-up PhiloLogic to TaporWare and to force a similarity search for "dirty OCR applications".
NOTE: Some nested subdiv objects, most notably sp, lg, and stage tags, may conflict with one another. This has to do with preset object depths as a holdover from the PhiloLogic2 series. See FUTURE DEVELOPMENTS below. Slight modifications to rules will fix most of these, but long term requires deeper object index function.
Discussing: subdiv object report generator. Currently, if you search for subdiv objects (in a selected set of docs or whole database) it will simply report these by types and attributes by document. Not sure how these should be handled. Suggestions?
textload.cfg has an instruction to dump pretty raw XPATHS in div and subdiv tables. Might be useful.
Testing Needed: Well, everything, of course. But, internal document navigation. This is based on a single object link table. Seems to work. Using this for notes, and other internal cross references, such as tables of contents, indexes, etc.
NOT Implemented at this time:
Future Development in rough order of priority: