関連プロジェクト
国立国語研究所 共同研究プロジェクト「開かれた共同構築環境による通時コーパスの拡張」(共同研究員,2023年~現在)
主な発表文献
Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, and Taro Watanabe. Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts, In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities (NLP4DH), Miami, USA, November, 2024. [paper] [dataset]
関連プロジェクト
科研費基盤研究 B「文章中の人物の移動軌跡を実世界の地図上に接地するための基礎研究とその応用」(研究協力者,2022年~2025年3月)
主な発表文献
Aitaro Yamamoto, Hiroyuki Otomo, Hiroki Ouchi, Shohei Higashiyama, Hiroki Teranishi, Hiroyuki Shindo, and Taro Watanabe. Graph-Structured Trajectory Extraction from Travelogues. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 2025. [paper] [dataset]
Hibiki Nakatani, Hiroki Teranishi, Shohei Higashiyama, Yuya Sawada, Hiroki Ouchi and Taro Watanabe. A Text Embedding Model with Contrastive Example Mining for Point-of-Interest Geocoding. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), pp. 7279–7291, Abu Dhabi, UAE, January 2025. [paper] [code]
Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, and Taro Watanabe. Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation. Findings of the Association for Computational Linguistics: EACL 2024, St. Julian’s, Malta, March, 2024. [paper] [dataset] [code]
逸脱的な「打ち言葉」の正規化
主な発表文献
Shohei Higashiyama and Masao Utiyama, Comprehensive Evaluation on Lexical Normalization: Boundary-Aware Approaches for Unsegmented Languages. Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 2025. (to appear) [preprint]
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT), pp. 67-80, Online, November, 2021. [paper] Best Paper Award
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 5532-5541, Online, June 2021. [paper] [arXiv] [dataset]
過去の研究テーマ等
多言語同時翻訳(自動同時通訳)の研究開発・言語資源整備
関連プロジェクト
総務省研究開発課題「多言語翻訳技術の高度化に関する研究開発」(研究分担者,2020年~2025年3月)
日本語や中国語の単語分割
主な発表文献
Shohei Higashiyama, Masao Ideuchi, Masao Utiyama, Yoshiaki Oida, and Eiichiro Sumita. A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging. Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems (Eval4NLP), pp. 1-10, Online, November, 2022. [paper] [dataset]
Shohei Higashiyama, Masao Utiyama, Yuji Matsumoto, Taro Watanabe, and Eiichiro Sumita. Auxiliary Lexicon Word Prediction for Cross-Domain Word Segmentation. Journal of Natural Language Processing, Vol. 27, No. 3, pp. 573-598, September 2020. [paper]
Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, Isaac Okada, and Yuji Matsumoto. Character-to-Word Attention for Word Segmentation. Journal of Natural Language Processing, Vol. 27, No. 3, pp. 499-530, September 2020. [paper] 論文賞 (Best Paper Award)
Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, and Isaac Okada. Incorporating Word Attention into Character-Based Word Segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 2699-2709, Minneapolis, USA, June 2019. [paper] [code]
情報抽出・知識獲得
主な発表文献
Shohei Higashiyama, Kunihiko Sadamasa, Takashi Onishi, and Yotaro Watanabe. Event Relation Acquisition Using Dependency Patterns and Confidence-Weighted Co-occurrence Statistics. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Annals of Computer Science and Information Systems, Vol. 11, pp. 339-345, Prague, Czech, September 2017. [paper]
東山翔平,関和広,上原邦昭.医療用語資源の語彙拡張と診療情報抽出への応用.自然言語処理,Vol. 22,No. 2,pp. 77-106,2015年6月. [paper]
東山翔平,ブロンデルマチュー,関和広,上原邦昭.カテゴリ階層を考慮した構造化パーセプトロンによる固有表現抽出.情報処理学会論文誌:数理モデル化と応用,Vol. 6,No. 3,pp. 43-52,2013年12月. [paper]
その他の活動
言語処理学会第31回年次大会 テーマセッション「人文学と言語処理」(共同提案者)
言語処理学会第30回年次大会 テーマセッション「ことばと地理空間の情報処理」(共同提案者) [article]
言語処理学会第29回年次大会 テーマセッション「地理空間情報と自然言語処理」(共同提案者) [article]