Evaluation metrics in phonology

Roni Katzir (Tel Aviv University and MIT) and Ezer Rasin (MIT)

Tuesday January 24th and Thursday January 26th 2017, from 13:30 to 17:00
SFL (61 Rue Pouchet, 75017 Paris: directions are available here)room 124

For a companion presentation on learning and semantics on Wednesday January 25th, see here

We discuss the connection between the representational choices made in theoretical phonology and the inductive leaps that the child should make given the input data. To succeed in acquisition, the child must arrive at generalizations of the right level: not too inclusive, so as to avoid the so-called subset problem, but also not too exclusive, so as to avoid overfitting the data. Early work in generative phonology assumed an evaluation metric (best known as presented by Chomsky & Halle in SPE) that turned out to be overly inclusive. More recently, learning in OT has advocated a principle, related to the the representational principle of Richness of the Base, that leads to overfitting. Other work avoids a direct evaluation of hypotheses altogether and relies on various heuristics that turn successful learning into an accident of the search procedure (e.g., Constraint re-ranking approaches). We return to the motivation behind the early evaluation metric but with a formal variant -- due to Solomonoff (1964) and using the implementation of Minimum Description Length (Rissanen 1978; also closely related to Bayesian approaches) -- which addresses the challenge to Chomsky & Halle's original proposal. We will present this approach, which we refer to as compression-based learning, and show how it applies to both constraint-based and rule-based phonology. We present arguments in favor of compression-based learning and comparisons with other approaches in the literature. In a second part, we show how compression-based learning can be used to choose between different representational hypotheses. 

Part I: Compression-based learning

On any reasonable linguistic theory, speakers need to store a grammar, G, in memory, where storage follows the specifications set by UG. Speakers also use G to parse the input data D, leading to an encoding of D by G, D:G, which may also be stored in memory. Presumably, the child acquiring a language similarly stores both G and D:G for its currently hypothesized grammar at any given point. The overall storage space, |G|+|D:G|, is one of the few quantities available to the child for the evaluation of hypotheses. Using it, the child can traverse parts of the representational search space provided by UG and compare hypotheses given the data using simple considerations of compression. 

We will discuss the general motivation for compression-based learning (as presented in Katzir 2014) and then proceed to look in detail the compression-based learner for OT phonology presented in Rasin & Katzir 2016. That learner focuses on the lexicon and the constraints, with the goal of modeling aspects of knowledge such as the English-speaking child's knowledge that the aspiration of the first segment in the word 'cat' is predictable, that [raiDer] is underlyingly / raiter/, and that [rai:Der] is underlyingly /raider/. The learner is the first we are aware of to succeed in obtaining such knowledge. Moreover, the generality of the evaluation metric allows us to learn additional parts of the grammar without changing our learner. We demonstrate this by learning not just the lexicon and the ranking of the constraints but also the constraints themselves. We then move on to the compression-based learner for SPE phonology presented in Rasin, Berger, & Katzir 2015. This work provides what to our knowledge is the first unsupervised learner that acquires both URs and a grammar of ordered phonological rules, including both optionality and rule interaction (both transparent and opaque), from distributional cues alone.

Part II: Learning-based theory comparison

Katzir 2014 suggests that the generality of compression-based learning and its minimal ontological commitment turn it into a tool for comparing different architectural choices in theoretical linguistics based on their predictions for learning. In particular, there will be cases in which different proposals are comparably successful in accounting for adult judgments but where their predictions for learnability and order of acquisition given compression-based learning diverge. (Under specialized learning approaches, such as Constraint Demotion, such comparisons are typically not possible.) 

We make use of the ability of compression-based learning to help the linguist choose between theories in the case of constraints on URs. In particular, and as pointed out in Rasin & Katzir 2015, 2016, we show that, compression-based learning is incompatible with the OT principle of Richness of the Base: it can acquire patterns similar to English nasalization and aspiration but crucially only if it rejects Richness of the Base and employs language-specific constraints on URs (as in the Morpheme-Structure Constraints of early generative phonology). This incompatibility holds both when the learner starts with constraint schemata and has to acquire the constraints themselves as part of the learning process and also when the constraints are given to the learner in advance. We use this to argue that phonological theory must allow some generalizations to be stated at the level of the lexicon and its alphabet (rather than only at the level of the constraints). 

Suggested readings 

Katzir, R. (2014). A cognitively plausible model for grammar induction. Journal of Language Modelling, 2(2):213–248. 

Rasin, E. and Katzir, R. (2015). Compression-based learning for OT is incompatible with Richness of the Base. In Bui, T. and Özyıldız, D., editors, Proceedings of NELS 45, volume 2, pages 267–274. 

Rasin, E. and Katzir, R. (2016). On evaluation metrics in Optimality Theory. Linguistic Inquiry, 47(2):235–282. 

Rasin, E., Berger, I., and Katzir, R. (2015). Learning rule-based morpho-phonology. Ms., Tel Aviv University and MIT.