Context and problem

If goals are the motor and grammar the skeleton then words are the fuel.

Whenever we read a book, write a letter or perform a search (web, dictionary), we always use words, the expressive shorthand version (linguistic form) of more or less abstract thoughts. Yet, words or lexemes are not only vehicles to express thoughts, they are also means to conceive them. They are mediators between language and thought, allowing us to move quickly from one idea to another, summarizing, expanding or specifying possibly underspecified thoughts. Of course, words can do a lot more, allowing us to organize, memorize and access knowledge, and even reveal hidden meanings via information contained in the target or its surrounding words (subliminal communication). No doubt words are important.

Lexical items are generally viewed as objects, yet when it comes to speaking, reading or writing, they are processes. Actually, the production of words including their retrieval is fundamentally a knowledge-based cognitive process: to access a word its form must be stored. This is done holistically in the case of external resources (paper or electronic dictionary), which represent word forms as single tokens, and distributed in the case of the human brain, which decomposes the form into syllables and phonemes. While knowing the form is important, its storage is by no means sufficient. We may still fail to access it when needed. More importantly, when consulting a dictionary (off-line processing) other types of knowledge are used, most prominently meta-knowledge and cognitive states. Metaknowledge is revealed by the fact that search is generally initiated via a direct neighbor of the target word, to be continued then via one of the links connecting the source and the target (synonym, hypernym,...). Cognitive states are revealed by the information given at the onset of the search. They express the information available at that very moment. As psychologists have shown, authors always know something concerning the target, even if its concrete form is eluding word. Alas, what information precisely is available when being in this state varies from person to person and from moment to moment. It is unpredictable knowledge. Hence the importance of building a resource, flexible enough to accommodate any of them, allowing the user to start from anywhere and to access the target via many diverse routes.

Ironically, although – once expressed – many of these observations sound obvious, most of them have been overlooked by the various communities dealing with the lexicon, its acquisition, usage or modeling (e.g. lexicographers and computational linguists). Nevertheless one must admit that things have actually changed quite a bit, and some of these changes are so deep, fast and wide-ranging that it is sometimes hard to keep track of them, and be aware of the new problems and possibilities. As this evolution is in full swing, and in order to contribute to its dynamism, we have organized a series of workshops.

Starting at Coling-2004 (Geneva) with the workshop Enhancing and Using Electronic Dictionaries [1], there have been four follow-up events, Cogalex I-IV, each co-located with Coling (2008: Manchester[2], 2010: Beijing[3], 2012: Mumbai[4], and finally, 2014: Dublin[5]). Encouraged by the enthusiasm and interest expressed by the participants of the first four Cogalex events, we propose another edition of the workshop. Our goal is to provide a forum for computational lexicographers, researchers in NLP, psychologists and users of lexical resources to share their knowledge and needs concerning the construction, organization and use of a lexicon by people (lexical access) and machines (NLP, IR, data-mining). Like in the past, we will invite researchers to address various unsolved problems (see below). 

This time we will put stronger emphasis though on distributional semantics, in particular for its relevance as a cognitive model of the lexicon.The interest in distributional approaches has grown considerably over the last few year, both in computational linguistics and cognitive sciences. A further boost has been provided by the recent hype around deep learning and neural embeddings. While all these approaches seem to have great potential, their added value to address cognitive and semantic aspects of the lexicon still needs to be shown. 

In the proposed 5th edition of the CogALex workshop we intend to gain a better understanding of the mental lexicon (organization) in order to integrate these findings into lexical resources. Given recent advances in neurosciences, it appears timely to seek inspiration from neuroscientists studying the human brain. There is also a lot to be learned from other fields studying graphs and networks, even if their object of study is something else than language, for example biology, economy or society. 

As in the past we propose a 'shared task', or rather, a “friendly competition” concerning the corpus-based identification of semantic relations. The goal of this “competition between gentlemen" is less the discovery of the best system, as the testing of the relative efficiency of different distributional models and other corpus-based approaches on a challenging  semantic task. (For more details, see : https://sites.google.com/site/cogalex2016/home/shared-task).

[1] Workshop proceedings (in ACL anthology): http://aclweb.org/anthology-new/W/W04/#2100

[2] Workshop proceedings (in ACL anthology): http://www.aclweb.org/anthology/W/W08/#1900

[3] Workshop proceedings (in ACL anthology):  http://aclweb.org/anthology-new/W/W10/#3400

[4] Workshop proceedings (in ACL anthology): aclweb.org/anthology/W/W12/W12-5100.pdf

[5] Workshop proceedings (in ACL anthology): aclweb.org/anthology/W/W14/W14-47.pdf