CasualTreeTagger

CasualTreeTagger is a simple GUI front end of TreeTagger by Helmut Schmid, which is a multi-language part-of-speech and lemma tagging tool.  What CasualTreeTagger does is simply use TreeTagger to add Part-of-Speech (POS) tags to your text.  You can install it on CasualTreeTagger.

Due to some changes introduced with El Capitan, the installer does not work on the previous version, so I decided to release a work-in-progress version. This version does not have all the functions in the previous version, but the installer and the tagging functions should work. 

Download: CasualTreeTagger (on Google Drive)

Last Updated: 2023/06/06

System Requirement: Mac with macOS 10.14 Mojave or later

If you have already installed TreeTagger, click Select TreeTagger Folder and select a folder you installed TreeTagger.

If you haven't done so, click Install TreeTagger. The Installer window will appear.

Select parameter/chunker file(s) you want to install by checking the check box(es).

Once you selected all the parameter/chunker files you want to install, click Install.

CasualTreeTagger creates a folder named treetagger in the Documents folder of your home directory and install TreeTagger there.  Do not delete this folder, or you need to install TreeTagger again.

If you want to add parameter files of another language or chunker parameter files, you can go to Menu -> Tools -> TreeTagger Downloader.

Single File Process

Click Single tab to use the Single File Mode.  In this mode, you can open a single file and POS-tag the text.

To open a file, go to Menu -> File -> Open... (or command + O).

Or you can click Open icon.

The text will appear in the left text area.  Click Process to process the text. The processed text will appear on the right.  You can select an output type in Preferences (see below).

To save tagged text, go to Menu -> File -> Save (or command + S).

Or you can click Save icon.

Checking unknown words

You can check <unknown> words by clicking Unknown Word List.  A panel with words that were not tagged for lemma will appear.

You can save this as a tab-delimited text file by clicking Save for later reference.  You can remove an entry by clicking Remove.

Clicking Go button will take you to the unknown word in the tagged text.

You can edit New POS and Lemma columns.  Then, you can click Replace to replace the entry (first, you need to click Go to search the unknown lemma in the text).  Or you can click Replace All to replace all the instances of the selected unknown word.

If you check the box(es) next to unknown words and click Add to, the checked words will be added to a lexicon list. 

Select a lexicon file to add the selected unknown words.  English and German are supported.  (See below for more information.)

Lexicon List

TreeTagger distribution supports English and German lexicon extension.  CasualTreeTagger assists you manage the lexicon extension file.

You can add checked unknown words on the Unknown Words list or go to Menu -> Window -> Lexicon Window.

When the Lexicon Window opens, the content of the lexicon extension file (in the lib folder) will be read and appear on the table.  If you add unknown words, they are added to the list.

If you want to add a new unknown words, type the word on the left text box and the lemma(s) on the right text box.  The format of lemmas is a combination of POS tag and lemma connected by a single-byte space character.  If you want to add more than one lemma (for a word like 'record' - verb and noun), separate lemmas by a comma [,].  For example,

Word -> records

Lemmas -> NNS record,VBZ record

Then click Add to add a new entry to the list.

Click Duplicate to copy a selected entry.  Click Delete to delete a selected entry.

You need to Save the list so that the new list will be reflected in the next tagging process.

Abbreviation List

This is an experimental feature because I'm not sure what this list does.

TreeTagger uses an abbreviation list to process abbreviations (to recognize a period as a part of an abbreviation, I believe).  You can manage the list on CasualTreeTagger.

To use this function, go to Menu -> Window -> Abbreviation Window.

You are prompted to select a language.

The content of a selected abbreviation file will appear on the table.

You can add an abbreviation and save the new list to the abbreviation file.

Checking incompatible characters

Since some of the TreeTagger parameter files, including English, are prepared in ISO Latin 1, characters that are not on the ISO Latin 1 character table (characters more than 2 bytes) will not be processed properly.  CasualTreeTagger automatically convert the text to ISO Latin 1 when processed, but those characters will be replaced by similar characters or '?'. 

You can check which characters will not be processed by clicking Check Chars button.

A panel with a list of characters (with context) will appear.  Selecting a line will take you to the position of the character in the original text.  You might want to replace these characters before you process the text.

Batch Process

If you want to process multiple files, use Batch mode.

To add files to the table, go to Menu -> File -> Open... (or command + O) or drag and drop files on to the table.

For plain text files, you can select an encoding

In Batch mode, you need to check if incompatible characters are included in the text files on the table by clicking Check Chars. The files with incompatible characters will be checked.

You can open a selected file on a built-in Editor by right-click and select Open File in Editor.

The Editor window will appear.

You can click Check Chars and check incompatible characters just as in the Single File mode.

You can save the changes by clicking Save Changes.

You can overwrite the file (Save) or save it as a new file (Save As...)

If you open a non-plain text file, you can only save it as a new file.

Once you are sure you want to process files, you need to decide if you want to save the files to the original folders.  If so, check Save to Original Folders.  All the processed files will be saved to the same folder as the originals with '_tagged' added to file names.

Click Process to start the batch process.

If you DID NOT check Save to Original Folders,

You can select if you want to save all the processed files to a single folder or keep the folder structure of the original folders.  If you want to save all the files in the same folder, check Single Folder.  If you check Check Replace Char, the list of all the characters that are replaced in the process will be saved as a text file (this may not work).

Regular Expression Find/Replace Panel

In addition to the built-in Find panel, CasualTreeTagger has a Regular Expression Find/Replace function as in other Casual~ applications.

To use this function go to Menu -> Edit -> Regex Find (command + Shift + F).

This is essentially the same as the one you can find in CasualTextractor, and other applications.

Preferences

General

- Tool: you can select either Tagger or Chunker.

* Process Summary - with the Default tag type in Tagger, you can create a summary of the process.

    - Word List - a simple word frequency list will be created

    - Word List with POS - a word-POS combination frequency list will be created.

    - POS List - POS frequency list will be created.

    - Bigram POS List - Bigram list of POS will be created.

In Single mode, you can click Summary button to see the lists.  You can export the lists on the tables.

In Batch mode, PROCESS REPORT file(s) will be created.  If you select to save the output files in the same folder(s), a folder will be created on the Desktop and the report file will be saved there.

- TreeTagger directory: you can change the folder in which TreeTagger is installed.  The default is treetagger folder in the local Documents folder.

- Delete punctuation tags: if checked, tags on non-letter characters will be removed.

- Ignore File Info: if your text files have file information on the files, you can set to ignore (not to tag) that part by checking this and specify the string at the end of the part in End of Info Tag.  This does not have to be in a tag format (any string works).

- Apply Replace Chars: you can specify characters you want to replace in the process.  To use this function, check the box.  This is designed to replace multi-byte non-alphabet characters often used in Web pages/Word Processor documents with a corresponding single byte characters so that TreeTagger can properly recognize them as such.  The following for pairs are registered by default.  You can add or remove pairs.

Tagger/Chunker

This is still experimental, but you can select a tag type other than the default output.  You can also select a language if you have installed more than one language parameter files.