CasualConc (2017/06/09)

投稿日: Jun 08, 2017 3:56:58 PM

I received a few bug reports, so I fixed some of them. Also I've started working on some experimental features.

Bug Fixes

File Info

- Standard TTR values were doubled in parallel processing

- Values of Basic File Info were not stable at times in parallel processing


- Context word marking/coloring did not work when searching with context words on tagged text (with tags recognized)

- Search with context words was not functioning when the right context texts were short.


- A list of specified collocates can be searched


- Applying Lemma/Spelling Variation to search words did not work esp. on the database mode. This is still work-in-progress


- Installer did not work due to the changes on the TreeTagger site.

Random Forest Keyness

- When the frequency table was divided per specified words, the Gini index/Accuracy plot did not reflect the total

Experimental features

Tagging Process

- tagging process is available for Concord, Collocation, Cluster using TreeTagger and MeCab (if installed)

Batch Tagger

- porting the functionality of CasualTagger to batch tag text files using TreeTagger, MeCab, and Stanford CoreNLP


- searching collocations of specified node and collocates for multiple corpora/databases (total collocation counts only)

- collocates can be specified using wildcard characters or regular expression; these words are counted separately

File Info

- the same collocation function for files/corpora/databases

- only specified collocates or groups of collocates are counted

- multiple comparisons of Keyword stats (standard, MWU) can be done with one file/corpus against the rest of the files/corpora

- a specified range of Word Frequency results can be saved

Cluster Analysis

- p-values can be calculated ('pvclust' package)

Label Groups

- Labels for files/corpora can be managed