University of Trento
In this talk, I will argue for looking at existing models and evaluations in a new light. A number of datasets in the computational linguistics literature are considered to be 'solved', but they are solved in ways which are radically different from human cognition. To illustrate this, I will come back to the notion of semantic competence, which includes basic linguistic skills encompassing both generic knowledge and referential phenomena, in particular a) the mastery of the lexicon, b) the ability to denote, or c) the ability to model one's language use on others. Even though each of those faculties has been extensively tested individually, there is still no computational model that would account for their joint acquisition under the conditions experienced by a human. I will concentrate on one particular aspect of this problem: the amount of linguistic data available to the child or machine. I will show that access to individuated linguistic entities and their properties -- that is, learning from denotations rather than words -- considerably speeds up the acquisition of basic semantic skills, including non-referential lexical knowledge.