CasualMecab - Single

Single mode allows you to open a single text file and process the text with Mecab.

Single mode has two text boxes. Input (left) and Output (right).

Basic Functions

To open a file, go to Menu -> File -> Open... (command + O)

You are prompted to select a single file.

Supported file formats (experimental) are:

Plain Text (.txt)

Rich Text Format (.rft, .rtfd)

MS Word (.doc, .docx)

HTML (.html, htm)

Web Archive (.webarchive) from Safari [WebKit]

OpenOffice (.odt, .sxw)

When you open a plain text (.txt) file, you need to select an encoding. The following text encodings are supported:

UTF-8 - unicode (Mac Standard)

UTF-16 - Little endian(?)

SJIS - shift_jis (Windows Standard)

EUC - euc-jp (Unix Standard)

JIS - iso-2022-jp

The text file opens in the left view. If you open a file other than a plain text file, all the text information will be ignored (extracted as a plain text).

Delete unnecessary parts on the view.

Once the original text is ready to process, click Parse button.

If you open a file obtained from Aozora Bunko, there are some specific features to treat formatting. Please see below under Aozora Bunko Features.

By default, CasualMecab process text into Mecab output.

The above text will be processed into this:

For other types of output, check Single Preferences below.

Once the text is processed, go to Menu -> File -> Save (command + S).

You can choose an encoding to save the file. The file will be saved as a plain text.

Single Preferences

You can specify output format (this is in common with Batch mode).

Your choices are one of the following six:

The sample outputs used this input text.

私はその人を常に先生と呼んでいた。

MeCab

MeCab is the default format of MeCab.

私 名詞,代名詞,一般,*,*,*,私,ワタシ,ワタシ

は 助詞,係助詞,*,*,*,*,は,ハ,ワ

その 連体詞,*,*,*,*,*,その,ソノ,ソノ

人 名詞,一般,*,*,*,*,人,ヒト,ヒト

を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ

常に 副詞,一般,*,*,*,*,常に,ツネニ,ツネニ

先生 名詞,一般,*,*,*,*,先生,センセイ,センセイ

と 助詞,格助詞,一般,*,*,*,と,ト,ト

呼ん 動詞,自立,*,*,五段・バ行,連用タ接続,呼ぶ,ヨン,ヨン

で 助詞,接続助詞,*,*,*,*,で,デ,デ

い 動詞,非自立,*,*,一段,連用形,いる,イ,イ

た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ

。 記号,句点,*,*,*,*,。,。,。

Chasen

Chasen is Chasen-like format.

私 ワタシ 私 名詞-代名詞-一般

は ハ は 助詞-係助詞

その ソノ その 連体詞

人 ヒト 人 名詞-一般

を ヲ を 助詞-格助詞-一般

常に ツネニ 常に 副詞-一般

先生 センセイ 先生 名詞-一般

と ト と 助詞-格助詞-一般

呼ん ヨン 呼ぶ 動詞-自立 五段・バ行 連用タ接続

で デ で 助詞-接続助詞

い イ いる 動詞-非自立 一段 連用形

た タ た 助動詞 特殊・タ 基本形

。 。 。 記号-句点

Wakachi

Wakachi is wakachi-gaki format.

私 は その 人 を 常に 先生 と 呼ん で い た 。

Yomi

Yomi is alphabetization in Katakana.

ワタシハソノヒトヲツネニセンセイトヨンデイタ。

Tag

CasualMecab can output the following tag types.

<*>~</*>

XML basic type tags. Tags are the POS tags you selected.

<名詞>私</名詞> <助詞>は</助詞> <連体詞>その</連体詞>

If you select more than one tag, tags are connected with underscore [_]. This is same for other tag types.

<名詞_代名詞>私</名詞_代名詞> <助詞_係助詞>は</助詞_係助詞> <連体詞>その</連体詞>

~_*

POS tags are added after underscore [_].

私_名詞 は_助詞 その_連体詞 人_名詞 を_助詞 常に_副詞 先生_名詞 と_助詞 呼ん_動詞 で_助詞 い_動詞 た_助動詞 。_記号

~/*

POS tags are added after slash [/].

私/名詞 は/助詞 その/連体詞 人/名詞 を/助詞 常に/副詞 先生/名詞 と/助詞 呼ん/動詞 で/助詞 い/動詞 た/助動詞 。/記号

~<*>

POS tags are added in brackets following a word.

私<名詞> は<助詞> その<連体詞> 人<名詞> を<助詞> 常に<副詞> 先生<名詞> と<助詞> 呼ん<動詞> で<助詞> い<動詞> た<助動詞> 。<記号>

<w label="*">~</w>

XML type tag with attributes.

<w pos="名詞">私</w> <w pos="助詞">は</w> <w pos="連体詞">その</w>

Two options are available for this type of tags.

Word per Line - a single word is listed per line (this is also available for the basic XML type tags above.

<w pos="名詞">私</w>

<w pos="助詞">は</w>

<w pos="連体詞">その</w>

<w pos="名詞">人</w>

日本語ラベル - use Japanese for labels

<w 品詞="名詞">私</w> <w 品詞="助詞">は</w> <w 品詞="連体詞">その</w>

T Wakachi

With this option, the output are in Wakachi-gaki, but words are in selected POS tags.

名詞 助詞 連体詞 名詞 助詞 副詞 名詞 助詞 動詞 助詞 動詞 助動詞 。

If Skip 記号 is checked, POS tags for symbols will be deleted in the output. This applies to all the tag type options.

私_名詞 は_助詞 その_連体詞 人_名詞 を_助詞 常に_副詞 先生_名詞 と_助詞 呼ん_動詞 で_助詞 い_動詞 た_助動詞 。

Aozora Bunko Features

CasualMecab has some features to handle Aozora Bunko format files.

Kanji Substitution

Aozora Bunko files have notes and they are often used for uncommon kanjis. For example:

※[#「てへん+劣」、第3水準1-84-77]

CasualMecab has a function to store them and batch replace them.

Go to Menu -> Window -> Search Aozora Sub Kanji.

Substitute Kanji panel appears.

By default, Kanji column is empty. Double-click the line under Kanji column and type (or copy paste) a correct Kanji for a substitute.

If you click a line, the actual text will be selected in the main text. Here is the sample (the last line in the above table).

If there are more than one instance of the same substitute, click Find Next button to go to the next match.

Once you enter a correct Kanji, select a line and click Replace button. You can replace all on the table by clicking Replace All.

The above sample is replaced by the correct kanji.

If you want to store the correct Kanji for a substitute, click Add to Dictionary before you replace the text.

The selected combination will be added to the dictionary and when the same substitutes appear in another file, they are automatically filled on Substitute Kanji panel.

To check what is in the dictionary (or delete an entry), go to Menu -> Window -> Open Sub Kanji Dictionary.

Kanji Substitute Dictionary appears.

If you want to delete an entry, select one and click Delete button.

Ruby deletion

Aozora Bunko files often have ruby (hiragana reading for kanji), and CasualMecab has a function to delete them.

Click Delete Aozora Ruby button.

Before deletion

After deletion