syncha_en

Syncha 0.3.1

Syncha is a Japanese predicate argument structure analyzer. It identifies the relations of predicate (e.g. verb, adjective) and its arguments. It also identifies zero anaphoric relations, i.e. it detects unrealized arguments of predicates (zero pronouns) and searches for their antecedents. In addition, it identifies NP coreference relations.

Last update:

- v0.3.1: modified to run "cabocha" using "-f 1 -n 1" options.

- v0.3: coreference resolution module was implemented.

Download

- syncha-0.3.1.tgz (size: 31MB)

Quickstart

- install Japanese dependency parser CaboCha

- install lp_solve

- uncompress syncha-0.3.1.tgz

% tar xvfz syncha-0.3.1.tgz

- input raw text to the syncha program.

(sample text is included in syncha-0.3.1/dat/sample/test.in)

% cat syncha-0.3.1/dat/sample.test.in | ./syncha-0.3.1/syncha

(the output is represented as follows)

* 0 4D 0/1 0.000000 政府 名詞,一般,*,*,,,政府,セイフ,セイフ,, O id="2" は 助詞,係助詞,*,*,,,は,ハ,ワ,, O * 1 2D 2/3 2.292374 低 接頭詞,名詞接続,*,*,,,低,テイ,テイ,, O 所得 名詞,一般,*,*,,,所得,ショトク,ショトク,, O 者 名詞,接尾,一般,*,,,者,シャ,シャ,, O id="1" を 助詞,格助詞,一般,*,,,を,ヲ,ヲ,, O * 2 3D 1/1 1.096540 支援 名詞,サ変接続,*,*,,,支援,シエン,シエン,, O する 動詞,自立,*,*,サ変・スル,基本形,する,スル,スル,, O ga="2" o="1" type="pred" * 3 4D 0/1 0.000000 計画 名詞,サ変接続,*,*,,,計画,ケイカク,ケイカク,, O ga="2" id="3" type="noun" を 助詞,格助詞,一般,*,,,を,ヲ,ヲ,, O * 4 -1D 1/2 0.000000 発表 名詞,サ変接続,*,*,,,発表,ハッピョウ,ハッピョー,, O し 動詞,自立,*,*,サ変・スル,連用形,する,シ,シ,, O ga="2" o="3" type="pred" た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ,, O 。 記号,句点,*,*,,,。,。,。,, O EOS * 0 1D 1/2 1.668537 関係 名詞,サ変接続,*,*,,,関係,カンケイ,カンケイ,, O 省庁 名詞,一般,*,*,,,省庁,ショウチョウ,ショーチョー,, O id="5" の 助詞,連体化,*,*,,,の,ノ,ノ,, O * 1 2D 0/1 0.000000 協力 名詞,サ変接続,*,*,,,協力,キョウリョク,キョーリョク,, O ga="5" id="4" o="3" type="noun" を 助詞,格助詞,一般,*,,,を,ヲ,ヲ,, O * 2 -1D 1/1 0.000000 要請 名詞,サ変接続,*,*,,,要請,ヨウセイ,ヨーセイ,, O する 動詞,自立,*,*,サ変・スル,基本形,する,スル,スル,, O ga="2" o="4" type="pred" 。 記号,句点,*,*,,,。,。,。,, O EOS EOT

The output format follows the cabocha output format. In this format, The line including `type="pred"' stands for the head word of predicate automatically identified by SynCha. The line also contains the information about case markers, ga (nominative), o (accusative), ni (dative). The equivalent value of case (ga, o, ni) tag and id tag represents the predicate argument relation, in which the word including case tag is the head word of a predicate and the word including the id tag whose value is same as case's value is the head word of the argument. The words including type="noun" represent verbal nouns and their arguments are also identified. For example, in the above example, 計画 (plan) is appeared as noun but its nominative argument 政府 (The government) was identified automatically. Coreference relations are represented with eq tags. The words having same eq value is included in a same coreference cluster.

Reference

- Ryu Iida, Massimo Poesio. A Cross-Lingual ILP Solution to Zero Anaphora Resolution. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), pp. 804-813. 2011.

- Ryu Iida, Kentaro Inui, Yuji Matsumoto. Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP 2009). pp.647–655, 2009.

- Ryu Iida, Mamoru Komachi, Kentaro Inui, Yuji Matsumoto. Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations. In Proceedings of the ACL Workshop on the Linguistic Annotation Workshop (LAW), pp.132–139, 2007.