CogALex

Cognitive Aspects of Lexicon



Workshop collated with COLING

Barcelona, Spain, September 13, 2020

https://www.softconf.com/coling2020/CogALex/

Workshop Description

Context and problem

The way we look at the lexicon has changed dramatically over the last few decades. While in the past being considered as an appendix to grammar, the lexicon has now moved to center stage. Indeed, there is hardly any task in NLP which can be conducted without it. Also, rather than considering it as a static entity (database view), dictionaries are now viewed as dynamic networks, akin to the human brain, whose 'nodes' and 'links' may change their weights (connection strengths) over time.

Words are important and so is the place where they are stored (dictionary). Being at the heart of many tasks researchers are eager to find out how words are acquired, represented, and organized in the various media (books, computers, human brain). We may also wonder what a lexical resource (dictionaries, thesauri, ontologies) should look like, and how it should be built in order to support encoding/decoding, or conceptualization.

Most people, including linguists, consider words to be products, i.e., holistic entities. This view is fine for practical purposes and off-line processing like search in a dictionary, or navigation in a lexical resource. However, this view is not adequate if we deal with on-line processing. Word access, or word production by humans (or, more precisely, by their brain) is a process, whose final products, words, have been synthesized over time (Indefrey & Levelt, 2004). Starting from meanings the speaker activates lemmata (abstract lexical forms devoid of phonological information) and only then phonological forms : sounds, syllables, phonemes. Like all processes, word production takes time (around 300 milliseconds/word). It is done stepwise (Dell, 1986, Levelt et al. 1999), and there is no guarantee soever that its output will be perfect, silence, speech errors, or tip-of-the tongue problems (incomplete activation) being evidence of the contrary.

"Speech is normally produced at a rate of about two to three words per second" (Levelt, 1989). This is quite an achievement, given the number of words 'stored' in our brain. Indeed, the speed at which our brain is able to 'locate' a specific word within in such a huge store (entire lexicon) is intriguing. This is why many of us are interested in the mental lexicon (Aitchison, 2003), trying to understand its structure and functioning and asking the question whether it could be used as a blueprint of the dictionary of tomorrow. Yet, strange as it may be, the two communities who have contributed most to the understanding of the mental lexicon hardly ever communicate with each other. Actually, most of the research has been conducted from two complementary viewpoints: one going from meaning to sound (Dell, 1986; Levelt et al., 1999), and the other one going from some input (query) to its direct neighbors (relational or topological view). The question they are asking is, given some input what are its direct neighbors, and how do they relate to the source word (Miller et al., 1988)? While both communities think in terms of networks, they both make quite different assumptions concerning the reality of words. For the representative of the first group they are decomposed (meaning, form, sound), while for the second (relational view) they are holistic entities linked in various ways : topically (Roget, 1852), semantically (Fellbaum, 1998) or via various other sorts of associations (Kiss, 1968; Schvaneveldt, 1989; Nelson, McEvoy & Schreiber, 1999).

Apparently, the two communities work on different planes (vertical/horizontal) and on different time scales. Psychologists describe the way how words are synthesized in real time, i.e., on-line processing, while computational linguists present a map of the ‘mental lexicon’, allowing for off-line processing (navigation). Working on an extremely small scale (generally fewer than 100 words), psychologists cannot offer us a usable resource (map), while lexicographers still have a hard time to deal with the problem of semantic input, or the problem of providing us a more complete picture or map of the mental lexicon. To do so, they would need to consider a much larger set of associations and draw on a large variety, i.e. a well-balanced set of corpora.

In sum, several communities are concerned with the cognitive aspects of the lexicon, not only psychologists and computational linguists/lexicographers, but also corpus linguists, and specialists working on complex graphs (Siew, et al. 2019; de Deyne et al. 2016; Wilks & Meara, 2002), etc. They all could make very valuable contributions, while benefitting from each other’s work, which, alas, is still not yet quite the case, which is why we organize this kind of workshop.

References

Aitchison, J. (2003). Words in the Mind: an Introduction to the Mental Lexicon. Oxford: Blackwell.

de Deyne, S., Verheyen, S. & Storms, G. (2016). Structure and organization of the mental lexicon: A network approach derived from syntactic dependency relations and word associations. In Mehler, A. et al. (Eds.). Towards a theoretical framework for analyzing complex linguistic networks (pp. 47–79). Berlin: Springer.

Dell G. S. (1986) A spreading activation theory of retrieval in language production. Psychol Rev 93:283–321.

Fellbaum, C. (Ed.), (1998). WordNet: An electronic lexical database and some of its applications. Cambridge: MIT Press.

Indefrey, P., & Levelt, W. J. (2004). The spatial and temporal signatures of word production components. Cognition, 92(1-2), 101-144.

Kiss, G. R. (1968). Words, associations, and networks. Journal of Verbal Learning and Verbal Behavior, 7(4), 707-713.

Levelt W. (1989). Speaking: From Intention to Articulation. MIT Press, Cambridge, MA

Levelt W., Roelofs A. & Meyer, A. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1-75.

Meara, P. (2009). Connected words: Word associations and second language vocabulary acquisition (Vol. 24). John Benjamins Publishing.

Miller, G., Fellbaum, C., Kegl, J. & Miller, K. (1988). WordNet: An Electronic Lexical Reference System Based on Theories of Lexical Memory. Revue québécoise de linguistique, vol. 17, n° 2, pp. 181-212.

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1999). The University of South Florida Word Association, Rhyme and Fragment Norms. (http://w3.usf.edu/FreeAssociation/Intro.html)).

Roget, P. (1852) Thesaurus of English Words and Phrases, Longman, London

Schvaneveldt, R. editor. (1989). Pathfinder Associative Networks: studies in knowledge organization. Norwood. N.J.

Siew, C. S., Wulff, D. U., Beckage, N. M., & Kenett, Y. N. (2019). Cognitive Network Science: A review of research on cognition through the lens of network representations, processes, and dynamics. Complexity

Wilks, C., & Meara, P. (2002). Untangling word webs: Graph theory and the notion of density in second language word association networks. Second Language Research, 18(4), 303-324.