Cognitive Aspects of the Lexicon
Workshop co-located with COLING
Barcelona, Spain, December 12, 2020
Context and problem
The way we look at the lexicon has changed dramatically over the last few deades. While in the past being considered as an appendix to grammar, the lexicon has now moved to the center stage. Indeed, there is hardly any task in NLP which can be conducted without it. Also, rather than considering it as a static entity (database view), dictionaries are now viewed as dynamic networks, akin to the human brain, whose 'nodes' and 'links' may change their weights (connection strengths) over time.
Words are important and so is the place where they are stored (dictionary). Being at the heart of many tasks researchers are eager to find out how words are acquired, represented, and organized in the various media (books, computers, human brain). We may also wonder what a lexical resource (dictionaries, thesauri, ontologies) should look like, and how it should be built in order to support encoding/decoding, or conceptualization.
Most people, including linguists, consider words to be products, i.e., holistic entities. This view is fine for practical purposes and off-line processing like search in a dictionary, or navigation in a lexical resource. However, this view is not adequate if we deal with on-line processing. Word access, or word production by humans (or, more precisely, by their brain) is a process, whose final products, words, have been synthesized over time (Indefrey & Levelt, 2004). Starting from meanings the speaker activates lemmata (abstract lexical forms devoid of phonological information) and only then phonological forms : sounds, syllables, phonemes. Like all processes, word production takes time (around 300 milliseconds/word). It is done stepwise (Dell, 1986, Levelt et al. 1999), and there is no guarantee soever that its output will be perfect, silence, speech errors, or tip-of-the tongue problems (incomplete activation) being evidence of the contrary.
"Speech is normally produced at a rate of about two to three words per second" (Levelt, 1989). This is quite an achievement, given the number of words 'stored' in our brain. Indeed, the speed at which our brain is able to 'locate' a specific word within in such a huge store (entire lexicon) is intriguing. This is why many of us are interested in the mental lexicon (Aitchison, 2003), trying to understand its structure and functioning and asking the question whether it could be used as a blueprint of the dictionary of tomorrow. Yet, strange as it may be, the two communities who have contributed most to the understanding of the mental lexicon (organization; access or production of words) hardly ever communicate with each other.
Actually, their research is based on two different, yet complementary viewpoints: one starts from concepts, word forms being incrementally synthesized based on meanings (Dell, 1986; Levelt et al., 1999), while the other starts from word forms (Miller et al., 1988). Given a query word (term coming to the author’s mind while the target is still eluding him) check whether any of the direct neighbors is the target, and if not, which one of them is most closely related to it.
While the first approach consists in activating the relevant nodes in a multilayered network simulating the normal situation of word access (meaning to sound), the second consists in navigating in a lexical network (topological view), which corresponds more to off-line processing (deliberate search in a lexical resource). The questions that members of this community are asking are the following: given some input what are its direct neighbors, and how do they relate to the source word (input)? Put differently, this community is trying to build a map of the mental lexicon (lexical graphs or association networks).
While both communities think in terms of networks, they both make quite different assumptions concerning the reality of words. For the representatives of the first group they are decomposed (meaning, form, sound), while for the second (relational view) they are holistic entities like in a traditional dictionary, but linked in various ways : topically (Roget, 1852), semantically (Fellbaum, 1998) or via various other sorts of associations (Kiss, 1968; Schvaneveldt, 1989; Nelson, McEvoy & Schreiber, 1999, Meara, 2008).
Apparently, the two communities work on different planes (vertical/horizontal) and on different time scales. Psychologists describe the way how words are synthesized in real time, i.e., on-line processing, while computational linguists present a map of the ‘mental lexicon’, allowing for off-line processing (navigation). Working on an extremely small scale (generally fewer than 100 words), psychologists cannot offer us a usable resource (map), while lexicographers still have a hard time to deal with the problem of semantic input, or the problem of providing us a more complete picture or map of the mental lexicon. To do so, they would need to consider a much larger set of associations and draw on a large variety, i.e. a well-balanced set of corpora.
In sum, several communities are concerned with the cognitive aspects of the lexicon, not only psychologists and computational linguists/lexicographers, but also corpus linguists, and specialists working on association networks (Lafourcade & Joubert, 2015; de Deyne & Storms, 2015), complex graphs (Wilks & Meara, 2002; Gaume et al. 2006; de Deyne et al., 2016; Siew, et al. 2019), and their sophisticated forms of linkage, i.e., multiplexity (Stella, 2017; Castro & Stella, 2019). They all could make very valuable contributions, while benefitting from each other’s work, which, alas, is still not yet quite the case, which is precisely one of the reasons why we organize this kind of workshop.
Aitchison, J. (2003). Words in the Mind: an Introduction to the Mental Lexicon. Oxford: Blackwell.
Castro, N., & Stella, M. (2019). The multiplex structure of the mental lexicon influences picture naming in people with aphasia. Journal of Complex Networks, 7(6), 913-931.
de Deyne, S., Verheyen, S. & Storms, G. (2016). Structure and organization of the mental lexicon: A network approach derived from syntactic dependency relations and word associations. In Mehler, A. et al. (Eds.). Towards a theoretical framework for analyzing complex linguistic networks (pp. 47–79). Berlin: Springer.
de Deyne, S. & Storms, G. 2015. Word associations. In J. R. Taylor (Ed.), The Oxford Handbook of the Word. Oxford University Press, Oxford, UK.
Dell G. S. (1986) A spreading activation theory of retrieval in language production. Psychological Review, 93:283–321.
Fellbaum, C. (Ed.) (1998). WordNet: An electronic lexical database and some of its applications. Cambridge: MIT Press.
Gaume, B., Venant, F. & Victorri, B. (2006). Hierarchy in lexical organisation of natural languages. In Pumain, D. (Ed.), Hierarchy in natural and social sciences (pp. 121-142). Springer, Dordrecht.
Indefrey, P. & Levelt, W. J. (2004). The spatial and temporal signatures of word production components. Cognition, 92(1-2), 101-144.
Kiss, G. R. (1968). Words, associations, and networks. Journal of Verbal Learning and Verbal Behavior, 7(4), 707-713.
Lafourcade, M. & Joubert, A. 2015. TOTAKI: A help for lexical access on the TOT Problem. In Gala, N., Rapp, R. & Bel-Enguix, G. (Eds). Language Production, Cognition, and the Lexicon. Dordrecht, Springer, pp. 95- 112
Levelt W. (1989). Speaking: From Intention to Articulation. MIT Press, Cambridge, MA.
Levelt W., Roelofs A. & Meyer, A. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1-75.
Meara, P. (2009). Connected words: Word associations and second language vocabulary acquisition (Vol. 24). John Benjamins Publishing.
Miller, G., Fellbaum, C., Kegl, J. & Miller, K. (1988). WordNet: An Electronic Lexical Reference System Based on Theories of Lexical Memory. Revue québécoise de linguistique, vol. 17, n° 2, pp. 181-212.
Nelson, D. L., McEvoy, C. L. & Schreiber, T. A. (1999). The University of South Florida Word Association, Rhyme and Fragment Norms. (http://w3.usf.edu/FreeAssociation/Intro.html).
Roget, P. (1852) Thesaurus of English Words and Phrases, Longman, London.
Schvaneveldt, R. editor. (1989). Pathfinder Associative Networks: studies in knowledge organization. Norwood. N.J.
Siew, C. S., Wulff, D. U., Beckage, N. M. & Kenett, Y. N. (2019). Cognitive Network Science: A review of research on cognition through the lens of network representations, processes, and dynamics. Complexity (published online: https://www.hindawi.com/journals/complexity/2019/2108423/)
Stella, M. (2017). Network structure and dynamics of empirical multiplex systems, Doctoral dissertation, University of Southampton.
Wilks, C. & Meara, P. (2002). Untangling word webs: Graph theory and the notion of density in second language word association networks. Second Language Research, 18(4), 303-324.
Centre National de la Recherche Scientifique
Laboratoire d'Informatique & Systèmes
The Hong Kong Polytechnic University
University of Pisa
Institute of Language Communication and the Brain