Abstract
Can advances in NLP techniques help advances in cognitive modeling? A growing number of recent papers seem to suggest so (Elsner et al., 2019; Linzen and Baroni, 2021; Baroni, 2021). This project focuses on replicating results from a study by Kirov & Cotterell (2018) focused on the learnability of linguistic rules in the context of past tense verbal inflection.
Description
In this project, you’ll replicate and extend an experiment by Kirov & Cotterell (2018) aimed at showing the utility of modern NNs for the cognitive modeling of linguistic phenomena.
Briefly, the paper revisits a classical work by Rumelhart and McClelland (1986) where a neural architecture was first trained to transduce English verb stems to their past tense forms (e.g. look→looked, go→went). This study and its critique (Pinker and Prince, 1988) led to a famous debate on the crucial question of whether or not neural networks learn and use “rules”.
About 30 years later, Kirov & Cotterell (2018) bring new substantial evidence to the debate by showing that an LSTM encoder-decoder architecture with an attention mechanism (i) learns very accurately both regular and irregular past tense formation in the observed data, (ii) generalizes reasonably to held-out verbs, and (iii) displays a development pattern that has also been observed in children (i.e. an oscillating learning curve for irregular verbs).
Your main goal is to replicate the results on past tense verbal inflection from Kirov and Cotterell (2018) (that is, experiments 1 and 2, for which code and data are available), and to extend the work with your research direction(s) of preference:
Ideas for research directions:
LSTMs process sequences in a left-to-right fashion, and because of this have been considered more “principled” than Transformer models for the processing of natural language. This did not prevent Transformer to replace recurrent architectures and become ubiquitous in NLP. Do the insights by Kirov & Cotterell (2018) stand when using transformer-based models?
How does the amount of available data affect the learning of irregular past tense forms? Is there a threshold in the amount of data, or in the ratio between regular and irregular forms, that hinders the learning process?
Kirov & Cotterell (2018) trained their models on the verbs’ phonetic transcription (phoneme sequences) rather than on their orthographic form (grapheme sequences) to simulate the natural process of acquiring language from speech input. Do models trained directly on graphemes perform differently, and why?
[Challenge 🏆] Replicate the experiments on a different language (any language with some inflection irregularities, e.g. Dutch or Italian, is a good candidate). This involves: (1) creating a lexicon with phonetic transcription (IPA) starting from existing resources, and (2) semi-automatically marking irregular vs regular forms. Useful tools and resources are provided below.
Materials
Paper repository: https://github.com/ckirov/RevisitPinkerAndPrince. Note: You are free to use another NN toolkit for your experiments, as long as you report it.
Dutch data from the Sigmorphon 2020 Task 0: “Typologically Diverse Morphological Inflection”: https://github.com/sigmorphon2020
Tool to mine IPA transcription: https://github.com/CUNY-CL/wikipron (see here for example usage, also for Dutch: https://sigmorphon.github.io/sharedtasks/2020/task1)
Non-neural inflection baseline (may be useful to silver-label regular forms in a new language) https://aclanthology.org/K17-2001.pdf (sect 5) and code: https://github.com/sigmorphon2020/task0-baselines/tree/master/nonneural
Italian data: https://github.com/sigmorphon/2019/tree/master/task1
References
Kirov, Christo, and Ryan Cotterell. "Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate." Transactions of the Association for Computational Linguistics 6 (2018): 651-665
Linzen, Tal, and Marco Baroni. "Syntactic structure from deep learning." Annual Review of Linguistics 7 (2021): 195-212.
Baroni, Marco. "On the proper role of linguistically-oriented deep net analysis in linguistic theorizing." arXiv preprint arXiv:2106.08694 (2021).
Elsner, M., Sims, A. D., Erdmann, A., Hernandez, A., Jaffe, E., Jin, L., ... & Stevens-Guille, S. (2019). Modeling morphological learning, typology, and change: What can the neural sequence-to-sequence framework contribute?. Journal of Language Modelling, 7.
David E. Rumelhart and James L. McClelland. 1986. On learning the past tenses of English verbs. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 2, pages 216–271. MIT Press.
Steven Pinker and Alan Prince. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1):73–193.