CasualConc

© 2008-2009 Yasu Imao
How to Use‎ > ‎

Concord



Corpus Text Type

There are 4 choices from the next minor update (0.9.4)

  - European Language A: for languages mostly with standard alphabets (ASCII)
  - European Language B:  for languages with many non-standard characters
  - Japanese (plain): 2-byte character languages with no space in between words (i.e. 今日もいい天気ですね。)
  - Japanese (wakachi): 2-byte character languages with 1-byte space in between words (wakachi-gaki) (i.e. 今日 も いい 天気 です ね 。)

The difference between European Languages A and B is how much 2-byte unicode characters are used.  Ruby treats strings by bytes (and internally all the strings are treated in UTF-8 in CasualConc), so even though the characters appear as 1-byte on the screen, many accented characters and other non-standard characters are 2-byte or more.  To recognize 2-byte unicode characters in Ruby takes more processing time, so these are separated.  If there are less than 10~15 2-byte unicode characters in either left or write context (with the specified span), choose A.  Otherwise choose B.  You can also try A and if concordance lines are not displayed properly or CasualConc crashes, switch to B.

By the way, Japanese characters are 3 bytes in Unicode and the character length is 2 on the screen.  This is why Japanese (East Asian Language) support is very limited and takes much longer processing time.


Handles Wakachi text as Plain text

If this is enabled, CasualConc treat Wakachi-gaki text as Plain text in Concord if you select Japanese (plain).  This means that you can search any string in the text, which is not possible in wakachi-gaki text because of spaces between words. 

2-byte characters not to be handled as a part of words

With Japanese (or 2-byte characters), non-word characters can be specified here to be ignored.  Type any non-word character (2-byte) without any space or comma in between.  This is simply because a regular expression '\w' picks up all the 2-byte characters.  If you click Sample J, it shows some of the common non-word 2-byte characters in the box.  You can add your own.


Context Words in Japanese (plain)


If you choose Japanese (plain), context word search behaves differently from other Corpus Types.  Because there is no gap between words, word-level search does not work.  So in Japanese (plain), any string in the Context Search box can be searched.  You can limit the search on both sides, left only and right only.



Font

There are currently only two choices.  Courier and Courier New.  Courier is the default font, but it turned out some languages have two-byte characters along with single-byte characters.  Courier New looks to display some of these languages properly (mono-space characters for all the characters in a language).  You can also change the font size.  These changes will be reflected on the preview table.


Coloring of Sort Words

If this is checked, sort words will be colored.  You can change the colors of sort words. 

Click the color well for the one you want to change, the color panel appears.  Set the color and close the panel.  The change will be reflected on the preview table on the right.


Context Word Search History

    You can specify how many context search words CasualConc remembers.


The following are for educational use.  You can create concordance lines with limited hits and with blanks for the keyword.  Number of blanks specifies the number of 1-byte space characters.

Replace Keyword

You can replace the keyword (search word) with blank spaces with specific brackets. Choose a bracket type from the list.


Number of Blanks

You can specify the number of blank spaces (length of between the brackets).


Limit concordance output

You can specify the number of lines in the result. This is for a teacher/student not to be overwhelmed by the number of concordance lines especially with very frequent words. 

 

Include context words (L5 - R5) in CSV output

By default, when you export the Concordance result as CSV file, only the information on the table will be included.  If you check this box, context words will also be included.