New: PhiloLogic 3.2 (beta) release available (August 2010)
What is PhiloLogic?
PhiloLogic™ is the primary full-text search, retrieval and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago. This is a Free Software implementation of PhiloLogic for large TEI-Lite document collections. The wide array of XML data specifications and the recent deployment of basic XML processing tools provides an important opportunity for the collaborative development of higher-level, interoperable tools for Humanities Computing applications. The sophistication and power of the TEI-XML encoding specification supports the development of extremely rich textual data representations that encourage, if not require, development of sets of tools to exploit features of encoded text to perform particular tasks. It may be the case that one general tool will never fit all possible uses for encoded documents, but that a set of more specialized, interoperable tools for end-user applications will provide a mechanism for cost-effective deployment of end-user applications.
As the ARTFL Project's contribution to the collaborative development of these tools, PhiloLogic has been enhanced to support a wide variety of TEI-Lite (XML and SGML) encoded documents optionally using the Unicode character specification. We feel that Humanities Computing applications are particularly well suited to open source development by a community with wide ranging technical abilities that is not well supported by the commercial sector. Our goal is to provide as many features as possible while not requiring significant administrative or development work to use effectively.
Originally implemented to support large databases of French literature, PhiloLogic has been extended to support a wide variety of textual and hypermedia databases in collaboration with numerous academic institutions and, more recently, commercial organizations. PhiloLogic is a modular system, in which a textbase is treated as a set of coordinated or related databases, typically including an object (units of text such as a letter, scene, document, etc) database, a word forms database, a word concordance index mapped to textual objects, and an object manager mapping text objects to byte offsets in data files. Each of these databases is stored and managed using its own subsystem.
Reasons to use PhiloLogic:
light, fast, robust, extensively used and tested
few dependencies, basic installation almost wholly self-contained
out of the box operation with many configuration options
TEI-Lite XML/SGML (and variants such as MEP and CES) with Unicode support
support for plaintext, Dublin Core/HTML, and DocBook
MySQL back-end for bibliographic searching
optional XML-aware or non-XML bibliographic loaders
interoperability across certain systems
Check out our examples page for some of the running databases loaded using PhiloLogic. Many thanks to the Brown University Women Writers Project, the Margaret Sanger Papers Project, the Victorian Women Writers Project, Martin Mueller's Nameless Shakespeare Project, and the British Women Romantic Poets Project for providing us with these texts. Access to three of these databases is still pending approval from the data providers.
You may also be interested in looking at extensions to PhiloLogic, which are also available as open source releases. PhiloMine provides an interactive environment for a wide range of machine learning and text data mining functions. PhiloLine is an extension which uses a simple sequence alignment algorithm to detect similar passages.