Computational Psycholinguistics Tokyo (CPT)

Welcome! Computational Psycholinguistics Tokyo (CPT) is a platform where computational psycholinguists in the Tokyo area exchange and discuss any ideas related to computational approaches to natural languages in the human mind/brain.

Organizer:

Yohei Oseki (University of Tokyo/RIKEN AIP)

Email address:

comp.psycholing.tokyo 😄 gmail.com

Sponsors:

Japan Science and Technology Agency (JST) Precursory Research for Embryonic Science and Technology (PRESTO) "Building machines that process natural language like humans" (PI: Yohei Oseki).

Schedule

Upcoming events

Can modern LMs be truly polyglot? Language learnability and inequalities in NLP

Time: May 1 (Wednesday) 10:30~12:00, 2024

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Arianna Bisazza (University of Groningen)

Abstract: Despite their impressive advances, modern Language Models (LMs) are still far from reaching language equality, i.e. comparable performance in all languages. The uneven amount of data available in different languages is often recognized as the main culprit. However, another obstacle to language equality is posed by the observation that some languages are intrinsically more difficult to model than others by modern LM architectures, even when training data size is controlled for. In this talk, I will present evidence supporting this observation, coming from different tasks and different evaluation methodologies (e.g. using natural versus synthetic languages). I will then argue for the usefulness of artificial languages to unravel the complex interplay between language properties and learnability by neural networks. Finally, I will provide an outlook of my upcoming project aimed at improving language modeling for (low-resource) morphologically rich languages, taking inspiration from child language acquisition.

Past events

The temporal structure of syntax

Time: July 28 (Friday) 10:30~12:00, 2023

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Chia-Wen Lo (Max Planck Institute for Human Cognitive and Brain Sciences)

Abstract: Human language is compositional; language users create unbounded and novel phrases and sentences from a finite number of words. This compositional ability is highly structured; words must be combined according to syntactic rules to yield well-formed and interpretable phrases and sentences. Although many studies have provided neural evidence for when and where compositional processing takes place, how it is actually implemented in neural circuits remains largely underspecified. My research investigates the role of low-frequency neural activity in carrying out composition. In this talk, I will discuss the functional interpretation of delta-band frequencies (<4 Hertz), the temporal constraints of information processing in low-frequency neural activity, and the reconciliation between active segmentation and the processing of complex structures. Several ongoing projects and future directions will be discussed.

計算論的精神医学の紹介

Time: July 14 (Friday) 15:30~17:00, 2023

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Tsukasa Okimura (Showa University, Medical Institute of Developmental Disabilities Research)

Abstract: 精神医学は、精神の障害に対する医学であると同時に精神の主たる担い手とされる脳の変調を扱う医学である。100年以上、精神障害の臨床症状の特徴や生物学的知見が蓄積されているが、両者を統合することは現代においても困難である。その解決策の１つとして、計算論的アプローチを用いた計算論的精神医学が近年注目を浴びている。計算論的アプローチとは、脳を環境と相互作用しながら何らかの目的をもって情報処理する生成モデルとして扱いつつ、マクロな視点で脳を分析していくアプローチであるが、計算論的精神医学は、生成モデルとその障害モデルを構築して、臨床症状と脳の変調の統合に新たな理論的示唆を目論む分野である。本発表では、計算論的精神医学の概要を紹介させていただくのに加え、精神医学では研究や診療において言語と精神障害との関わりが非常に密接であるので、計算論的心理言語学との相乗効果がもたらされるディスカッションが得られるような精神障害における代表的な言語機能・思考の障害も提示させていただく。

自然言語処理を用いた脳内意味情報の可視化とその応用

Time: May 22 (Monday) 10:30~12:00, 2023

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Satoshi Nishida (National Institute of Information and Communications Technology, NICT)

Abstract: 日常世界における経験は、複雑かつ多様な情報を含んでいる。我々の脳は、そのような情報から柔軟に知覚的・認知的なまとまりを見出し、意味を割り当てる。そのため、脳内で表現される意味の情報を可視化することで、我々が世界をどのように分節化・構造化したうえで解釈しているかを理解できるといえる。私たちの研究グループでは、そのような脳内意味情報の可視化を目的として、多様な映像を視聴中にfMRIで計測した脳活動を、自然言語処理の特徴表現を利用してモデル化する手法の研究に取り組んでいる。この手法を用いて、個々人の脳内における意味情報表現の定量化を行い、特殊な意味認知を示す集団において脳内意味表現の異常を検出することに成功した。また、多様な映像に対して認知した意味内容を可視化する脳解読技術に応用した。さらに、私たちはモデル化手法を利用して、人工知能に脳情報を融合する技術の開発にも取り組んでいる。この技術を用いて、既存の自然言語処理モデルの振る舞いを人間らしく変えることにも成功した。本発表では、これら一連の研究成果について紹介する。

No free lunch: Why computational learning theory matters for language acquisition

Time: April 21 (Friday) 15:30~17:30, 2023

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Adam Jardine (Rutgers University)

Abstract: In the age of ChatGPT and other large language models, it is tempting to conclude that the success of brute-force statistical techniques obviates the need to posit domain-specific learning mechanisms in humans, such as that provided by Universal Grammar. On the contrary, results in formal learning theory--the mathematical study of the logic of learning--show that *any* successful learner must 1) restrict the space of hypotheses it is willing to consider; 2) make assumptions about how data is being presented to it; or 3) both of these. In this talk, I take us through the major results that demonstrate this, starting with the seminal work of Gold (1967). I argue that not only does this provide ample motivation for a Universal Grammar, but that using these results as a springboard to study the biases of human language learners provides informative hypotheses for language acquisition. To this end, I also briefly outline the research of myself and my students along these lines.

Coordinating on meaning in communication

Time: December 16 (Friday) 15:30~17:30, 2022

Place: Hybrid (University of Tokyo, Komaba Campus & Zoom)

Speaker: Robert Hawkins (Princeton University)

Abstract: Language use differs dramatically from context to context (and user to user). To some degree, recent large language models like GPT-3 are able to account for such contextual effects by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this talk, I'll share two projects that aim to build more flexible and adaptive language models. First, I'll introduce a Bayesian framework for understanding adaptation as a form of meta-learning and a simple implementation of this idea for neural language models that can adapt to individual users in real-time interactions. Second, to address inefficiencies of this framework, we introduce the mixed-effects transformer (MET), a novel approach for learning prefixes --- lightweight modules prepended to the input --- to account for hierarchically-structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. This work begins to lay a foundation for an approach to communication rooted not simply in transmission, as in classical formulations, but continual learning and adaptation over multiple timescales.

Connectionist Reading Group #2

Time: November 29 (Tuesday) 15:00~17:00, 2022

Place: University of Tokyo, Komaba Campus

Paper #1: Smolensky (1988a) On the proper treatment of connectionism. Behavioral and Brain Sciences 11, 1-23.

https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/on-the-proper-treatment-of-connectionism/4B8871A82A932DB96D183AAC9C0CF037

Paper #2: Smolensky (1988b) The constituent structure of connectionist mental states: A reply to Fodor and Pylyshyn. The Southern Journal of Philosophy 26, 137-161.

https://onlinelibrary.wiley.com/doi/10.1111/j.2041-6962.1988.tb00470.x

Connectionist Reading Group #1

Time: October 25 (Tuesday) 15:00~17:00, 2022

Place: University of Tokyo, Komaba Campus

Paper: Fodor & Pylyshyn (1988) Connectionism and cognitive architecture: A critical analysis. Cognition 28, 3-71.

https://www.sciencedirect.com/science/article/pii/0010027788900315

LSTM 言語モデルによる英語の filler-gap dependency の獲得における通構文的比較

Time: March 19 (Thursday) 13:00~15:00, 2022

Place: Zoom

Speaker: Satoru Ozaki (Carnegie Mellon University)

Abstract: LSTM 言語モデルによる英語の filler-gap dependency の獲得が近年注目を集めつつある (Chowdhury and Zamparelli 2018, 2019; Wilcox et al. 2018, 2019, et seq, Da Costa and Chaves 2020, Chaves 2020) 。特に Wilcox et al. 2018 は 2x2 デザインを用いて、二つの LSTM 言語モデルが埋め込み wh-疑問文 (embedded wh-questions) の統語的特徴及びいくつかの島の制約が学習可能であることを示した。この研究では、まず (1) Wilcox et al. 2018 の研究データを再考し、彼らの言語モデルが彼らの期待以上の結果を出していることを示す。次に、(2) 埋め込み wh-疑問文を含む五種類の英語の filler-gap dependency のデータを用意し、Wilcox et al. 2018 のモデルが構文の種類に関わらず同じ獲得結果を示すかどうかを確認する。異なる構文によって獲得に差が出るという結果を踏まえ、最後に (3) 構文のコーパスでの出現頻度が獲得に影響を及ぼす仮設を検証する。Penn Treebank の WSJ コーパスでは、構文の出現頻度と言語モデルによる獲得の間に相関が見られるが、実験に使った構文の数に限りがあるため、強い結論とは言えない。今後の研究では、他の言語での filler-gap dependency の獲得も考察したい。

計算論的精神医学の論文データベースCPSYMAPのコンセプトと実装

Time: February 24 (Wednesday) 10:30~12:30, 2021

Place: Zoom

Speaker: Ayaka Kato (University of Tokyo & RIKEN CBS)

Abstract: 近年、計算論的アプローチを用いて精神疾患を理解しようと試みる計算論的精神医学(computational psychiatry)の分野が注目され、興味を持つ研究者が増えている。一方で計算論的精神医学は精神医学、神経科学、数理モデルといった異なるタイプの分野の融合領域であり、元々の分野とは異なる分野を包括的に学んだり、先行研究を整理したりするのは容易ではない。そこで、精神医学、神経科学、数理モデルという軸に沿って計算論的精神医学の論文を2次元マップ上で視覚的に整理できるデータベース、CPSYMAPを構築した。CPSYMAP(https://ncnp-cpsy-rmap.web.app/)はWebアプリの形で誰でも利用ができ、コミュニティーサイエンス的な側面も持っている。本講演ではCPSYMAPのコンセプトとそれを実装する際の経緯についてお話しする。

認知アーキテクチャACT-Rによる思考と言語のモデリング

Time: December 25 (Friday) 13:00~15:00, 2020

Place: Zoom

Speaker: Junya Morita (Shizuoka University)

Abstract: 本チュートリアルでは，まずACT-Rに具体化される認知アーキテクチャの思想を紹介し，続いて現在のACT-Rの実装を簡単に紹介する．さらに，ACT-Rに実装される基礎的な構成要素によって組み立てられる多様な言語と思考のモデルを紹介する．チュートリアルを通し，現在のACT-Rによる認知システムの表現と応用を示し，将来的な展望を描くことを目指す．

再帰的ニューラルネットワーク文法によるヒト文処理のモデリング

Time: October 2 (Friday) 13:00~15:00, 2020

Place: Zoom

Speaker: Ryo Yoshida (University of Tokyo)

Abstract: 近年、自然言語処理においては、ニューラルネットワークを用いた高性能な言語モデルが多く提案されている。しかし、一般的なニューラル言語モデルは、文を階層構造を持たない単語の列として処理しており、ヒト文処理のモデルとしての妥当性は定かではない。一方で、自然言語の階層構造を考慮したニューラル言語モデルも存在し、句構造の生成モデルであるRecurrent Neural Network Grammars (Dyer et al., 2016)は、句構造解析、言語モデル二つのタスクで高い精度を達成している。これらの精度は工学的な評価指標によるものだが、本研究では、認知科学的な評価指標でニューラル言語モデルを評価する。具体的には、サプライザル理論 (Hale, 2001) に基づき、言語モデルが推定した単語・文節の予測しにくさ(サプライザル)が、ヒトの読み時間をどの程度説明できるか、を評価する。実験の結果、階層構造を考慮するニューラル言語モデルのサプライザルが、階層構造を考慮しないニューラル言語モデルのサプライザルよりも読み時間のモデル化に有用であることが確認できた。ヒト文処理のモデルとして、階層構造を考慮しないモデルよりも、階層構造を考慮するモデルの方が妥当であることが示されたといえる。

Message-Oriented Phonology in Japanese: Word Duration and Pitch Peak

Time: January 31 (Friday) 13:00~15:00, 2020

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Daiki Hashimoto (University of Tokyo)

Abstract: It has been widely demonstrated that a word is pronounced with lower phonetic redundancy when it has higher contextual predictability. For example, when a word is predictable given a preceding word and when a word has higher contextual predictability given a following word, it is pronounced with shorter duration. Likewise, words with higher contextual predictability are produced with centralized formant values. This probability-oriented reduction is known as “probabilistic reduction.”

This phenomenon can neatly be captured by Message-Oriented Phonology (MOP). MOP hypothesizes that a speaker balances the efficiency and accuracy of message transmission. When a word is contextually predictable, it can be conveyed successfully to an addressee, the result of which is that the speaker improves the efficiency of the message transmission. On the other hand, when a word is less predictable, the message transmission is more likely to fail, and thus a speaker needs to invest more resource cost in a speech signal, with the result that the phonetic redundancy is increased.

The aim of this study is to explore whether probabilistic reduction can be extended to pitch values. Most previous literature discusses probabilistic reduction in relation to word duration, so therefore, to the best of my knowledge, this study is the first study to investigate the relationship between pitch values and contextual predictability of a word. It will be demonstrated that a word is pronounced with a higher pitch value, when it is contextually less predictable. This result is amenable to MOP.

国立国語研究所ワークショップ「計算言語学の現在」

Time: December 6 (Friday) 10:30~18:00, 2019

Place: Ochanomizu University, Faculty of Science

Organizer: Yusuke Kubota (NINJAL)

Program:

10:30～11:00 イントロ

11:00～12:00「自然言語のグラウンディング研究の概観」宮尾祐介

12:00～13:45 昼休み

13:45～14:45「計算意味論の最前線 ―理論・実装・検証」戸次大介，峯島宏次

14:45～15:00 休憩

15:00～15:30「比較表現の計算意味論 ― CCG と自動定理証明によるアプローチ」春田和泉

15:30～16:00「依存型意味論による選言文と疑問文の分析」渡邉知樹

16:00～16:15 休憩

16:15～17:15「心理言語学における計算論的転回」折田奈甫，大関洋平

17:15～18:00 全体討論

理論言語学としての語用論を越えて：表出的意味に関する動的語用論とその幾何的解釈について

Time: November 22 (Friday) 14:00~16:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Akitaka Yamada (Georgetown University/Surugadai University)

Abstract: 語用論は、(i) 統語論／意味論との接点を持つ理論言語学の一領域としての顔を持つ。しかし、それだけではなく、(ii) 人間の推論に対するモデルを提案する点で文処理／言語処理に関する研究や（機械）学習などのモデルと強い親和性があり、また、(iii) その推論に多く社会学的な要因が関わることから社会言語学の理論とも密接な繋がりがある。しかし、このように潜在的には極めて学際的なこの研究分野は、これらの三つの側面がそれぞれの独自の枠組みで自律的に発達する過程で細分化され、これらを統一的に議論する土台は、ほとんど用意されてこなかった。本発表では、Yamada (to appear)で論じられた日本語の「丁寧語」に関する表出的意味をケーススタディに、この三つの領域を有機的につなげる視座を提案する。第一に、先行研究で盛んに使われてきた実数値を用いた表出的意味の解釈を整理し、それらが多様体の上での学習として幾何的に解釈されることを示す。第二に、これらの先行研究の経験的な問題点を乗り越えるためにYamada (2019, to appear)で提案されたベイズ統計学的な動的語用論を説明する。そして、この後者のモデルも、あるパラメータ空間の中での多様体学習として位置づけられることを示す。(i)理論言語学として提案された動的語用論のモデルが、(ii) オンラインの人間の動的推論メカニズムとして素直な解釈を帯びること、および、(iii) 社会言語学（変異理論）において古くから提案されてきた数理モデルのベイズ統計学の中での再解釈として位置づけられることを踏まえ、本研究会の参加者とともに、今後期待される語用論／認知科学の発展について模索する。

非文を利用した言語モデルの文法能力の向上

Time: October 25 (Friday) 13:00~15:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Hiroshi Noji (National Institute of Advanced Industrial Science and Technology)

Abstract: RNN をはじめとしたニューラル言語モデルは流暢な文の生成を可能とし、系列モデルながら文の統語構造の帰納的な学習にある程度成功しているように思える。しかし複雑な統語構造の文に対しては文法性判断を誤るなど、完全に帰納的な学習により文法獲得が可能なのか疑わしい面もある。本研究では、ニューラル言語モデルの学習の際に明示的に非文を利用することで、モデルが特定の文法現象に関する高い頑健性を獲得可能であることを示す。本手法は、モデルにとって完全な帰納的な学習が困難である現象に対する演繹的知識をモデルに付与する方法であるといえる。

夏の読書会 2019

Time: August 30 (Friday) & September 2 (Monday) 13:00~18:00

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Topic 1: Connectionist (Facilitator: Yusuke Kubota)

Plaut (2003) "Connectionist Modeling of Language: Examples and Implications"
Pater (2019) "Generative linguistics and neural networks at 60: foundation, friction, and fusion"

Topic 2: Bayesian (Facilitator: Naho Orita)

Tenenbaum and Griffiths (2001) "Generalization, similarity, and Bayesian inference"
Pearl (2018) "Modeling syntactic acquisition"

Topic 3: ACT-R (Facilitator: Hiroshi Noji)

Anderson et al. (2004) "An Integrated Theory of the Mind"
Lewis et al. (2006) "Computational principles of working memory in sentence comprehension"

Topic 4: Parsing (Facilitator: Yohei Oseki)

Hale (2018) "Models of Human Sentence Comprehension in Computational Psycholinguistics"
Hunter (2018) "Formal Methods in Experimental Syntax"

自然言語処理と形式意味論の知見に基づく含意関係認識

Time: July 19 (Friday) 13:00~15:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Hitomi Yanaka (RIKEN AIP)

Abstract: 含意関係認識とは、前提文が仮説文の意味を含むか否かを自動判定するタスクであり、コンピュータによる自然言語理解の実現に向けて活発に研究されている。近年、自然言語処理においては、ニューラルネットワークを用いて高精度で含意関係認識を解く手法が盛んに提案されている。しかし、深層学習の手法によって自然言語の意味をどこまで表現できるのかは全く明らかではない。一方で、形式意味論においては、自然言語の様々な意味現象を統一的に分析する研究が展開されている。本発表では、自然言語処理と形式意味論の各分野で蓄積されてきた知見を組み合わせて含意関係認識を解く手法について、発表者のこれまでの試みを紹介する。

日本言語学会ワークショップ「計算心理言語学ー概要と展望ー」

Time: June 23 (Sunday) 10:00~12:00, 2019

Place: Hitotsubashi University, Kunitachi Campus

Organizers: Naho Orita (Tokyo University of Science) & Yohei Oseki (Waseda University & RIKEN AIP)

Speakers: Yohei Oseki (Waseda University & RIKEN AIP), Douglas Roland (Waseda University), Yusuke Kubota (NINJAL), Naho Orita (Tokyo University of Science), Yuichiro Matsubayashi (Tohoku University)

Commentator: Shigeto Kawahara (Keio University)

Abstract: 計算心理言語学は、人間の言語処理・獲得の仕組みの解明を目指す心理言語学的問題に対して、計算論的アプローチで取り組む学問である。数理モデルを用いた明確な仮説の構築と、計算機を用いた定量的な仮説の検証を一貫して行うという特徴を持ち、これまで、言語獲得、文・意味処理、音声知覚などの様々な問題に関する新たな知見が提案されてきた。昨今の統計的学習モデルや言語資源の急速な進歩に伴い、さらなる展開が期待される分野である。しかし、この分野は比較的歴史が浅く、とりわけ国内の言語学コミュニティーの中で十分に認知・理解されているとは言い難い。本ワークショップは、形態、統語、意味、談話の諸問題に取り組む計算心理言語学的研究を概観し、隣接分野のコーパス言語学、実験言語学、自然言語処理との関連と違いを明確にしながら、計算心理言語学についての理解を深めることを目的とする。

機械読解タスクのベンチマーク的観点からの評価

Time: June 21 (Friday) 13:00~15:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Saku Sugawara (University of Tokyo)

Abstract: 自然言語処理において言語理解システムを評価するためのタスクのひとつに「機械による文章読解」（machine reading comprehension）がある。著名なデータセットではシステムが人間と同等の精度を出すものもある一方で、データセット自体が文章読解のどのような側面を評価しているのかについての分析は少ない。本発表では、読解において必要になる能力という観点から既存のデータセットを分析するアプローチについて発表者のこれまでの研究を紹介する。

What can psycholinguistics and deep learning contribute to each other?

Time: May 24 (Friday) 11:00~12:00, 2019

Place: Meeting Room #1, 1F, Building 55N, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Tal Linzen (Johns Hopkins University)

Abstract: Deep learning systems with minimal or no explicit linguistic structure have recently proved to be surprisingly successful in language technologies. In this talk, I'll discuss ways in which psycholinguistics -- in particular, the study of humans' acquisition and comprehension of syntax -- can help guide these developments and simultaneously benefit from them. Illustrating one direction of this relationship, I will show how theories and experimental methods from psycholinguistics can be instrumental in identifying the remaining limitations of existing models and improving those models further. In the other direction, neural networks can be used to address classic questions in linguistics and psycholinguistics, in particular by (1) providing a platform for testing for the necessity and sufficiency of explicit structural biases in the acquisition of syntactic transformations, and (2) providing highly accurate and syntactically informed estimates of word predictability, which can serve to test theories that ascribe a central role to predictability in explaining human sentence processing.

Subregular morphology: Structures, grammars, and learning

Time: May 24 (Friday) 12:00~13:00, 2019

Place: Meeting Room #1, 1F, Building 55N, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Jonathan Rawski (Stony Brook University)

Abstract: This talk overviews recent advances in the computational nature of morphology. I will analyze the data structures and grammar expressivity necessary to characterize morphotactics, concatenative and nonconcatenative transformations, and provably efficient learning algorithms for these structured classes of grammars. I show how tradeoffs in data structures point to a unified view of memory requirements for morphological grammars, providing a clear way to test the cognitive representations characteristic of human morphological knowledge.

How well do neural NLP systems generalize?

Time: May 24 (Friday) 16:00~17:30, 2019

Place: 〒103-0027 Nihonbashi 1-chome Mitsui Building, 15th floor, 1-4-1 Nihonbashi, Chuo-ku, Tokyo

Speaker: Tal Linzen (Johns Hopkins University)

Abstract: Neural networks have rapidly become central to NLP systems. While such systems perform well on typical test set examples, their generalization abilities are often poorly understood. In this talk, I will demonstrate how experimental paradigms from psycholinguistics can help us characterize the gaps between the abilities of neural systems and those of humans, by focusing on interpretable axes of generalization from the training set rather than on average test set performance. I will show that recurrent neural network (RNN) language models are able to process syntactic dependencies in typical sentences with considerable success, but when evaluated on more complex syntactically controlled materials, their error rate increases sharply. Likewise, neural systems trained to perform natural language inference generalize much more poorly than their test set performance would suggest. Finally, I will discuss a novel method for measuring compositionality in neural network representations; using this method, we show that the sentence representations acquired by neural natural language inference systems are not fully compositional, in line with their limited generalization abilities.

日本語絵本における動詞の項と格助詞の省略について

Time: April 5 (Friday) 13:00~15:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Naho Orita (Tokyo University of Science)

Abstract: 日本語の子どもは、項の数と格助詞を手がかりに未知の動詞の意味を推測すると報告されている（Matsuo et al. 2012, Suzuki and Kobayashi 2017）。しかし、これらの手がかりは子ども向け発話で頻繁に省略される（Rispoli1995, Matsuo et al. 2012）。本研究は、子ども向けの絵本を分析し、このギャップを再検討する。本発表では、発表者らが構築した日本語絵本述語項構造コーパスにおける動詞とその項、格助詞、有生性、視覚情報などの分布を明らかにする。子ども向け発話と比較して、絵本では項と格助詞の省略が少なく、これらの情報は動詞の他動性の予測に有意に影響することがわかった。また、有生性と絵に出ている指示対象候補の数という非言語情報が、項省略の存在を推測する上での手がかりになる可能性について、分析結果を踏まえて議論する。

大規模コーパスを利用した言語処理の計算心理言語学的研究

Time: February 28 (Thursday) 13:00~15:00, 2019

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Yohei Oseki (Waseda University & RIKEN AIP)

Abstract: 本研究は、BCCWJを視線計測実験でアノテーションしたBCCWJ-EyeTrackingに収録されている視線データと、NPCMJに収録されている句構造アノテーションで訓練した計算モデルを、情報理論で提案された評価尺度を介して比較・検証することにより、ヒト言語処理の計算基盤を解明することを目的とする。更に、脳波計測実験でアノテーションを施すBCCWJ-EEGを新たに開発し、そこに収録される脳波データと計算モデルを比較・検証することにより、言語処理の計算・神経基盤を明らかにすることを目指す。

統語・意味解析コーパスと言語研究への利用

Time: November 9 (Friday) 15:00~17:00, 2018

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Ayaka Suzuki (National Institute for Japanese Language & Linguistics)

Abstract: 国立国語研究所で現在開発中の、統語・意味解析情報付きコーパス（NINJAL Parsed Corpus of Modern Japanese; NPCMJ）の概要を紹介し、本コーパスを言語研究に利用するにあたって、どのような利点と課題が存在するのかを広く共有する。具体的には、本コーパスプロジェクトで開発中のユーザフレンドリーなインターフェースの開発や、本コーパスを用いたケーススタディーから見えてきた論点を報告する。

カテゴリ文法と言語理論

Time: October 12 (Friday) 15:00~17:00, 2018

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Yusuke Kubota (Tsukuba University)

Abstract: カテゴリ文法の概要を、言語理論としての側面と計算言語学・自然言語処理研究における統語・意味計算部門のコンポーネントという側面の両方を念頭に置き、かつ大まかに歴史的発展に沿って説明する。AB文法、CCG、ランベック文法の基本的な部分を説明した後、発表者が最近の研究で提案している、ハイブリッド範疇文法 (Hybrid Type-Logical Categorial Grammar) の概要を示し、いくつかの言語現象への適用を概観する。

単語埋め込みに基づく文の読み時間推定

Time: September 21 (Friday) 13:00~14:30, 2018

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Masayuki Asahara (National Institute for Japanese Language & Linguistics)

Abstract: ヒトの文処理において頻度情報が読み時間に影響を与えるという仮説がある[Hale 2001]。日本語においては、心理言語学において読み時間を基本句に相当する文節単位で評価する一方、コーパス言語学においては文節単位で頻度を計数せず、国語研短単位などの斉一な語の単位に基づいて頻度を計数する。この単位の齟齬が、頻度情報に基づく日本語の読み時間の影響の分析を難しくしていた。この問題を解決するために単語埋め込み [Mikolov+ 2013] による読み時間の推定を行う。実験の結果、Skip-gram に基づく単語埋め込みが読み時間をモデル化に有効であることが確認できた。この単語埋め込みの情報とサプライザル理論との関連について考察する。

夏の読書会 2018

Time: 9月10日~12日 13:00~17:00, 2018

Place: Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Gallistel and King (2009) "Memory and the computational brain" を読みます。参加を希望される方は以下の研究会メールアドレスまでご一報ください。

オートマトンで計算心理言語学

Time: June 29 & July 6 (Friday) 13:00~14:30, 2018

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Yohei Oseki (Waseda University & RIKEN AIP)

Abstract: 本発表では、主に文処理の計算心理言語学で用いられてきた計算モデルを概観します。まず、心理言語学の歴史を少し説明した後で、形式言語・オートマトン理論、確率モデル・パーサー、評価尺度、それらの心理言語学・認知科学的インパクトを議論します。最後に、文処理の計算心理言語学で得られた知見を語彙処理に応用した、発表者の研究を紹介します。

ベイズで計算心理言語学

Time: May 18 (Friday) 13:00~14:30, 2018

Place: Room #12, 5F, Building 51, Faculty of Science and Engineering, Nishi-waseda campus, Waseda University

Speaker: Naho Orita (Tokyo University of Science)

Abstract: 発表者がこれまでの計算心理言語学的研究で用いたモデルの説明をします。ベイズの基礎とその応用（トピックモデリング、Rational Speech Actモデル）、使用した言語資源、モデルの実装などについて、言語学者の方に向けてお話します。また、計算機モデルを用いて心理言語学の問題に取り組むことのpros & consなどをみなさんと議論したいと思います。