Capabilites

Core Functionality

At its heart, Philologic is a full-text search and retrieval engine, which means it is primarily geared toward indexing, searching and retrieving text. As text is loaded into Philologic, it is extensively indexed (tech details here?). This indexing allows for fast search for keywords or any of a multitude of characteristics of the texts.

Text Input

Philologic can accept texts in a variety of formats, ranging from plain text with very little markup, to full TEI in UTF-8 with complex structure.

Formats accept:

    • TEI
    • XML
    • ATE
    • ????

Output

Text that has been loaded into Philologic can be searched using the forms generated for each database at load time or customized versions thereof. The full text of the document is available and can be displayed by paragraph, page, section and subsection.

Complex queries can include the following:

    • Boolean operators AND and NOT
    • Bibliographic criteria in conjunction with keywords
    • Similarity searching

Data presentation is also very flexible and offers a number of formats:

    • Frequency of keyword by title, author, date or headword
    • KWIC reports
    • Collocation tables
    • Time series and frequency displays

SQL Metadata / SubDoc Object Management

Using MySQL to manage metadata allows for fast and simple searching across large amounts of text based on bibliographic information. It's especially useful with corpuses that have lots of repeating metadata.