CasualConc

© 2008-2009 Yasu Imao
How to Use‎ > ‎

Lemma

This is an experimental feature, so might not work as intended.  But if it works as intended, you can apply lemmas specified in a file for your analysis.  Lemmatization is mainly for Word Count and context words for Cluster and Collocation.  Lemmatization of search words is also added, but not tested.

To use the lemmatization function, you need to prepare a lemma list file.  You can create your own or you can find it from somewhere.  You can download Someya's e-lemma file at the WordSmith Tools web site.  Go to online manual -> Word List -> lemmas and click a link 'choosing lemma file' on the page.  The lemma file should be encoded in UTF-8 if the file has non-ASCII characters . So you might need to re-encode the e-lemma file.


Lemma Handling


To use a lemmatization function, check Lemmatize, and choose your lemma file, by clicking Select Lemma FIle. You will be prompted to select a file.  The lemma file should be a text file (.txt) encoded in ASCII or UTF-8.  There are two preset entry types or you can specify your own.

- lemma -> word,word,word,...
This is the e-lemma format. The lemma is followed by -> and words in lemma, separated by a comma (,) with no space in between. The word entry that is the same as the lemma doesn't have to be included.

e.g. lemma SEE: see -> sees, seen, saw, seeing

- lemma word word word ...
This format is simply a words that is included in a lemma. The first word will be treated as the lemma for the group.

e.g. lemma SEE: see sees seen saw seeing

- Other (Specify)

You can also specify the format of your own. For example, if you enter a colon (:) as Between lemma and words and a comma as Between words, an entry (one line) will be like:

see:sees,seen,saw,seeing

If your lemma file has a note/comment line(s) that starts with a certain character, you can specify it in First character of lines to ignore. For example, if you use # to mark a comment line, enter # in the box.

Apply Lemmatization to Search Word

With this checked, your search word(s) will be lemmatized. This will be also applied (if this works) to Concord. So, for example, if you turn on Lemmatize and this function, and search a word 'see' in Concord, all the instances of see, sees, seen, saw, and seeing will be displayed. This applied to other tools.


Keyword Grouping


The idea is to create groups of words you want to search in one go (days of a week, etc.).  The formatting is the same as lemma files, but the headword will not be included in the search. Here's how this works.  To use Keyword Grouping function, check Grouped Keywords.

First, create a grouping file in one of the formats or of your own (I recommend e-lemma format).  See above (Lemma) for more detail.

For example, if you want to group days of a week,

DoW -> Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday

The head word (grouping word) is case sensitive, but the words included in the group are not. So in this example Week or WEEK does not work.

In a search box of Concord, Cluster, Collocation/Cooccurrence, add @@ at the beginning of your search word(s). For example, if you want to search above 'week' group, type @@DoW in the search box.  You can use a group in a phrase, such as 'on @@DoW'.


Let me know if this works for you.