自然言語処理

関連ニュース

ワードクラウド

Python日本語処理

Python による日本語自然言語処理

この文書は、Steven Bird, Ewan Klein, Edward Loper 著萩原正人、中山敬広、水野貴明　訳『入門自然言語処理』 O'Reilly Japan, 2010.

の第12章「Python による日本語自然言語処理」を、原書 Natural Language Processing with Python と同じ Creative Commons Attribution Noncommercial No Derivative Works 3.0 US License の下で公開するものです。

MeCab

MeCabの使い方
Google Colab へのMeCabインストール方法　（A)
mecab-ipadic-NEologd
Coogle Colabへのインストール方法　→　A　を参照

!pip install mecab-python3

!pip install unidic

⇒　mecab-python3とUniDicのインストール

!python -m unidic download

⇒　辞書のダウンロード

これでも動くがより新しい単語に対応するためには

!apt-get -q -y install mecab libmecab-dev file

!git clone --depth 1 https://github.com/neologd/mecab-unidic-neologd.git

!echo yes | mecab-unidic-neologd/bin/install-mecab-unidic-neologd -n

⇒　MeCabとmecab-unidic-NEologdをインストール

以下がMeCabをコマンドラインから実行する際の辞書指定

$ mecab -d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-unidic-neologd

以下の例のように「-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-unidic-neologd」オプションをMeCab.Taggerクラスのインスタンス生成時に指定する

dicdir = '-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-unidic-neologd'

tagger2 = MeCab.Tagger(dicdir)

⇒　MeCab + NEologd + 分かち書きの例

CONTENT = '日本では呪術廻戦、全米では鬼滅の刃が人気だ'

tagger = MeCab.Tagger(r'-Owakati -d "C:\mecab-ipadic-neologd"') #分かち書きと辞書の指定を同時にやるだけ

parse = tagger.parse(CONTENT)

print(parse)

Google Colabなど