Research Projects
多言語同時翻訳の研究開発
関連プロジェクト
総務省研究開発課題「多言語翻訳技術の高度化に関する研究開発」(研究分担者,2020年~現在)
地理空間情報と自然言語処理
関連プロジェクト
科研費基盤研究 B「文章中の人物の移動軌跡を実世界の地図上に接地するための基礎研究とその応用」(研究協力者,2022年~現在)
国立国語研究所 共同研究プロジェクト「開かれた共同構築環境による通時コーパスの拡張」(共同研究員,2023年~現在)
言語処理学会第30回年次大会 テーマセッション「ことばと地理空間の情報処理」(共同提案者)
言語処理学会第29回年次大会 テーマセッション「地理空間情報と自然言語処理」(共同提案者) [article]
主な発表文献
Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, and Taro Watanabe. Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation. Findings of the Association for Computational Linguistics: EACL 2024, Malta, March, 2024. [paper] [dataset] [code]
「崩れた」テキストの正規化
主な発表文献
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT), pp. 67-80, Online, November, 2021. [paper] Best Paper Award
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 5532-5541, Online, June 2021. [paper] [arXiv] [dataset]
過去の研究テーマ
日本語や中国語の単語分割
主な発表文献
Shohei Higashiyama, Masao Ideuchi, Masao Utiyama, Yoshiaki Oida, and Eiichiro Sumita. A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging. Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems (Eval4NLP), pp. 1-10, Online, November, 2022. [paper] [dataset]
Shohei Higashiyama, Masao Utiyama, Yuji Matsumoto, Taro Watanabe, and Eiichiro Sumita. Auxiliary Lexicon Word Prediction for Cross-Domain Word Segmentation. Journal of Natural Language Processing, Vol. 27, No. 3, pp. 573-598, September 2020. [paper]
Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, Isaac Okada, and Yuji Matsumoto. Character-to-Word Attention for Word Segmentation. Journal of Natural Language Processing, Vol. 27, No. 3, pp. 499-530, September 2020. [paper] 論文賞 (Best Paper Award)
Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, and Isaac Okada. Incorporating Word Attention into Character-Based Word Segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 2699-2709, Minneapolis, USA, June 2019. [paper] [code]
情報抽出・知識獲得
主な発表文献
Shohei Higashiyama, Kunihiko Sadamasa, Takashi Onishi, and Yotaro Watanabe. Event Relation Acquisition Using Dependency Patterns and Confidence-Weighted Co-occurrence Statistics. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Annals of Computer Science and Information Systems, Vol. 11, pp. 339-345, Prague, Czech, September 2017. [paper]
東山翔平,関和広,上原邦昭.医療用語資源の語彙拡張と診療情報抽出への応用.自然言語処理,Vol. 22,No. 2,pp. 77-106,2015年6月. [paper]
東山翔平,ブロンデルマチュー,関和広,上原邦昭.カテゴリ階層を考慮した構造化パーセプトロンによる固有表現抽出.情報処理学会論文誌:数理モデル化と応用,Vol. 6,No. 3,pp. 43-52,2013年12月. [paper]