CasualConc (2017/06/09)
投稿日: Jun 08, 2017 3:56:58 PM
I received a few bug reports, so I fixed some of them. Also I've started working on some experimental features.
Bug Fixes
File Info
- Standard TTR values were doubled in parallel processing
- Values of Basic File Info were not stable at times in parallel processing
Concord
- Context word marking/coloring did not work when searching with context words on tagged text (with tags recognized)
- Search with context words was not functioning when the right context texts were short.
Collocation
- A list of specified collocates can be searched
General
- Applying Lemma/Spelling Variation to search words did not work esp. on the database mode. This is still work-in-progress
TreeTagger
- Installer did not work due to the changes on the TreeTagger site.
Random Forest Keyness
- When the frequency table was divided per specified words, the Gini index/Accuracy plot did not reflect the total
Experimental features
Tagging Process
- tagging process is available for Concord, Collocation, Cluster using TreeTagger and MeCab (if installed)
Batch Tagger
- porting the functionality of CasualTagger to batch tag text files using TreeTagger, MeCab, and Stanford CoreNLP
Collocation
- searching collocations of specified node and collocates for multiple corpora/databases (total collocation counts only)
- collocates can be specified using wildcard characters or regular expression; these words are counted separately
File Info
- the same collocation function for files/corpora/databases
- only specified collocates or groups of collocates are counted
- multiple comparisons of Keyword stats (standard, MWU) can be done with one file/corpus against the rest of the files/corpora
- a specified range of Word Frequency results can be saved
Cluster Analysis
- p-values can be calculated ('pvclust' package)
Label Groups
- Labels for files/corpora can be managed