CasualConc

© 2008-2009 Yasu Imao
How to Use‎ > ‎

East Asian Language Support


CasualConc can handle East Asian Languages (2-byte character languages, such as Japanese, Korean, Chinese, etc.) to some extent.  Most of the functions available in European Language mode A and B are covered, but because of the different nature in computer world, some functions are treated differently or not available.  Also, although technically any 2-byte character languages can be dealt with, I only tested the functions in Japanese because that's the only East Asian Language I can understand (and that's why labels are 'Japanese').

Not all the features are fully tested, so there might be some bugs.  If you find any, please report any bugs you find.


Enable East Asian Language Support

To use East Asian languages properly, you need to switch to an appropriate language mode.  Currently, CasualConc has two modes for East Asian languages: Japanese (plain) and Japanese (wakachi)

You can either do this in Preferences -> Concord or choose one of these at the bottom right corner of the main window.


Also in Preference -> Concord, you can specify any 2-byte characters that you don't want CasualConc to treat as a part of words.  Type any character you want to ignore in the analysis without any space or comma in between.  If you click Sample J, some of the common non-word 2-byte characters will be added to the box.  The characters specified here will be ignored in all the tools.




Japanese (plain)

plain is for text files that do not have spaces between words.  You can also analyze texts formatted in wakachi-gaki like a plain file if you choose Japanese (plain) with wakachi-gaki files to some extent.

私はその人を常に先生と呼んでいた。だからここでもただ先生と書くだけで本名は打ち明けない。
これは世間を憚かる遠慮というよりも、その方が私にとって自然だからである。
私はその人の記憶を呼び起すごとに、すぐ「先生」といいたくなる。
筆を執っても心持は同じ事である。
よそよそしい頭文字などはとても使う気にならない。

Concord

Because Concord works at word level, you cannot search text for words.  In kwic search, you can simply type word(s) in the search box and CasualConc can find any search words.  But it cannot separate words in context.  So context word coloring and sorting are character-based.



Because of the same reason, the context word search behaves differently.  Unlike other language modes, you cannot specify span of context words.  Instead, you select whether you want to look for context word(s) in Left, Right, or Both sides of the keyword. 



You can set this in Preferences -> Concord.  Check Context Words in Japanese (plain).



Cluster

Cluster is also character based.  Only the keyword is word/phrase or whatever you search.



Just as in any other modes, you can search cluster in Concord.



The resulting kwic search behaves somewhat differently from the normal search.


As in this example, the search words have spaces between word/character, but if you search this cluster by clicking Search button, CasualConc cannot find any instances of the cluster.  This is because the search from Cluster can ignore non-word characters specified in Preferences -> Concord (see at the top of the page).


Collocation/Cooccurrence

Collocation and Cooccurrence also works in character based. 

Collocation


Cooccurrence


You can search keyword-context word pair in Concord by selecting a line and right clicking it.  The search span is based on your choice in Preferences (Left, Right or Both).


Word Count

Word Count/n-gram are also in character based.  See the examples below.  You can search a selected character in Concord.

Word Count


3-gram



File Information

You can get the number of characters in the files.




Japanese (wakachi) 

wakachi is for text files that have spaces between words (wakachi-gaki).  Because of this, all the tools basically function as in European Language modes.

私 は その 人 を 常に 先生 と 呼ん で い た 。
だから ここ でも ただ 先生 と 書く だけ で 本名 は 打ち明け ない 。
これ は 世間 を 憚 かる 遠慮 と いう より も 、 その 方 が 私 にとって 自然 だ から で ある 。
私 は その 人 の 記憶 を 呼び 起す ごと に 、 すぐ 「 先生 」 と いい たく なる 。
筆 を 執っ て も 心持 は 同じ 事 で ある 。
よそよそしい 頭文字 など は とても 使う 気 に なら ない 。

Concord

The main difference between Japanese (wakachi) and European Languages A or B is that in Japanese (wakachi) mode, kwic and context texts are displayed without spaces between words.



Context word search also functions like European Language modes.  Just like in plain mode, characters specified to ignore in Preferences will be ignored in the context.



Cluster

Cluster functions at word level.



If you search a cluster from the cluster table, characters specified to ignore in the cluster will be ignored.  So even if you click Search in Concord with the same string, you will not get the same result.



Collocation/Coocurrence

Collocation/Coocurrence also function at word level.

Collocation



Coocurrence



Concord search from Collocation




Word Count

Word Count/n-gram are also word based.


Word Count


3-gram



File Information

You can get file information with n-letter words.