Corpus Text Type
There are 4 choices from the next minor update (0.9.4)
- European Language A: for languages mostly with standard alphabets (ASCII)
- European Language B: for languages with many non-standard characters
- Japanese (plain): 2-byte character languages with no space in between words (i.e. 今日もいい天気ですね。)
- Japanese (wakachi): 2-byte character languages with 1-byte space in between words (wakachi-gaki) (i.e. 今日 も いい 天気 です ね 。)
The difference between
European Languages A and B is how much 2-byte unicode characters are
used. Ruby treats strings by bytes (and internally all the strings are
treated in UTF-8 in CasualConc), so even though the characters appear
as 1-byte on the screen, many accented characters and other
non-standard characters are 2-byte or more. To recognize 2-byte
unicode characters in Ruby takes more processing time, so these are
separated. If there are less than 10~15 2-byte unicode characters in
either left or write context (with the specified span), choose A.
Otherwise choose B. You can also try A and if concordance lines are
not displayed properly or CasualConc crashes, switch to B.
By the
way, Japanese characters are 3 bytes in Unicode and the character
length is 2 on the screen. This is why Japanese (East Asian Language)
support is very limited and takes much longer processing time.
Handles Wakachi text as Plain text
If this is enabled, CasualConc treat Wakachi-gaki text as Plain text in Concord if you select Japanese (plain). This means that you can search any string in the text, which is not possible in wakachi-gaki text because of spaces between words.
2-byte characters not to be handled as a part of words
With
Japanese (or 2-byte characters), non-word characters can be specified
here to be ignored. Type any non-word character (2-byte) without any space or comma in between. This is simply because a regular expression '\w'
picks up all the 2-byte characters. If you click Sample J, it shows some of the common non-word 2-byte characters in the box. You can add your own.
Context Words in Japanese (plain)If you choose
Japanese (plain), context word search behaves differently from other Corpus Types. Because there is no gap between words, word-level search does not work. So in Japanese (plain), any string in the Context Search box can be searched. You can limit the search on both sides, left only and right only.
Font
There are currently only two choices. Courier and Courier New.
Courier is the default font, but it turned out some languages have
two-byte characters along with single-byte characters. Courier New
looks to display some of these languages properly (mono-space
characters for all the characters in a language). You can also change the font size. These changes will be reflected on the preview table.
Coloring of Sort Words
If this is checked, sort words will be colored. You can change the colors of sort words.
Click the color well for the one you want to change, the color panel appears. Set the color and close the panel. The change will be reflected on the preview table on the right.
Context Word Search History
You can specify how many context search words CasualConc remembers.
The
following are for educational use. You can create concordance lines
with limited hits and with blanks for the keyword. Number of blanks
specifies the number of 1-byte space characters.
Replace Keyword
You can replace the keyword (search word) with blank spaces with specific brackets. Choose a bracket type from the list.
Number of Blanks
You can specify the number of blank spaces (length of between the brackets).
Limit concordance output
You
can specify the number of lines in the result. This is for a
teacher/student not to be overwhelmed by the number of concordance
lines especially with very frequent words.
Include context words (L5 - R5) in CSV output
By default, when you export the Concordance result as CSV file, only the information on the table will be included. If you check this box, context words will also be included.