Abstract of 

papers and posters

Keynote Speech

Gilles-Maurice de Schryver. Fine-tuning LLMs for lexicography

Oral Presentations 

Werner Winiwarter. CLAVELL - Cognitive Linguistic Annotation and Visualization Environment for Language Learning

In this paper, we introduce a novel sentence annotation method based on Radical Construction Grammar and Uniform Meaning Representation, covering multiple levels of linguistic analysis, ranging from interlinear morphemic glossing to PropBank rolesets, WordNet synsets, and Wikipedia page titles as concept identifiers. We visually enhance our annotation by using images to represent concepts, emojis for roles, and color-coding for constructions, the fundamental concept of Construction Grammar. The meaning representation is embedded into the syntactic parse by aligning all concepts with the surface tokens in the sentence. The main motivation for developing this type of representation was its use in second language acquisition as part of a Web-based language learning environment. By engaging in entertaining annotation tasks students assemble incrementally the representation using a bottom-up strategy. Based on language exposure while performing these exercises, we populate personal idiolectal constructions representing the students’ current status of second language comprehension. To showcase our system, we have implemented it for Japanese because of its soaring popularity in our language education program and the particular problems it poses to those trying to learn this language, especially Westerners. 

Erica Biagetti, Martina Giuliani, Silvia Zampetta, Silvia Luraghi and Chiara Zanchi. Combining Neo-Structuralist and Cognitive Approaches to Semantics to Build Wordnets for Ancient Languages: Challenges and Perspectives.

This paper addresses challenges encountered in constructing lexical databases, specifically WordNets, for three ancient Indo-European languages: Ancient Greek, Latin, and Sanskrit. The difficulties partly arise from adapting concepts and methodologies designed for modern languages to the construction of lexical resources for ancient ones. A further significant challenge arises from the goal of creating WordNets that not only adhere to a neo-structuralist relational view of meaning but also integrate Cognitive Semantics notions, aiming for a more realistic representation of meaning. This integration is crucial for facilitating studies in diachronic semantics and lexicology, and representing meaning in such a nuanced manner becomes paramount when constructing language resources for theoretical research, rather than for applied tasks, as is the case with lexical resources for ancient languages. The paper delves into these challenges through a case study focused on the TEMPERATURE conceptual domain in the three languages. It outlines difficulties in distinguishing prototypical and non- prototypical senses, literal and non-literal ones, and, within non-literal meanings, between metaphorical and metonymic ones. Solutions adopted to address these challenges are presented, highlighting the necessity of achieving maximum granularity in meaning representation while maintaining a sustainable workflow for annotators.

Yuhan Xia, Qingqing Zhao, Yunfei Long, ge xu and Jia Wang. SensoryT5: Infusing Sensorimotor Norms into T5 for Enhanced Fine-grained Emotion Classification.

In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose SensoryT5, a neuro-cognitive approach that integrates sensory information into the T5 (Text-to-Text Transfer Transformer) model, designed specifically for fine-grained emotion classification. This methodology incorporates sensory cues into the T5’s attention mechanism, enabling a harmonious balance between contextual understanding and sensory awareness. The resulting model amplifies the richness of emotional representations. In rigorous tests across various detailed emotion classification datasets, SensoryT5 showcases improved performance, surpassing both the foundational T5 model and current state-of-the-art works. Notably, SensoryT5’s success signifies a pivotal change in the NLP domain, highlighting the potential influence of neuro-cognitive data in refining machine learning models’ emotional sensitivity. 

Posters

Markus J. Hofmann, Markus T. Jansen, Christoph Wigbels, Benny Briesemeister and Arthur M. JacobsIndividual Text Corpora Predict Openness, Interests, Knowledge, and Level of Education.

 

Here we examine whether the personality dimension of openness to experience can be predicted from the individual google search history. By web scraping, individual text corpora (ICs) were generated from 214 participants with a mean number of 5 million word tokens. We trained word2vec models and used the similarities of each IC to label words, which were derived from a lexical approach of personality. These IC-label-word similarities were utilized as predictive features in neural models. For training and validation, we relied on 179 participants and held out a test sample of 35 participants. A grid search with varying number of predictive features, hidden units and boost factor was performed. As model selection criterion, we used R2 in the validation samples penalized by the absolute R2 difference between training and validation. The selected neural model explained 35% of the openness variance in the test sample, while an ensemble model with the same architecture often provided slightly more stable predictions for intellectual interests, knowledge in humanities and level of education. Finally, a learning curve analysis suggested that around 500 training participants are required for generalizable predictions. We discuss ICs as a complement or replacement of survey-based psychodiagnostics. 


Svenja Kenneweg, Brendan Balcerak Jackson, Joerg Deigmoeller, Julian Eggert and Philipp Cimiano.  An Empirical Study on Vague Deictic Temporal Adverbials.

Temporal adverbial phrases such as recently and some time ago have a special function in communication and temporal cognition. These adverbials are deictic, in that their meaning is tied to their time of utterance; and they are vague, in that the time periods to which they apply are under-specified in comparison to expressions such as yesterday, which precisely indicates the day before the day of utterance. Despite their vagueness, conversational participants have a mental image of when events described using these adverbials take place. Our study aims to quantify this mental model in terms of fuzzy or graded membership. To achieve this, we investigated the four English temporal adverbials recently, just, some time ago and long time ago as applied to types of events with different durations and frequencies, by conducting surveys to measure how speakers judge the different adverbials to apply in different time ranges. Our results suggest that it is possible to represent the meanings of deictic vague temporal adverbials geometrically in terms of graded membership within a temporal conceptual space. 

Hani Guenoune and Mathieu Lafourcade. Symbolic Learning of Rules for Semantic Relation Types Identification in French Genitive Postnominal Prepositional Phrases.

We are interested in the semantic relations conveyed by polylexical entities in the postnominal prepositional noun phrases form "A de B" (A of B). After identifying a relevant set of semantic relations types, we proceed, using generative AI, to build a collection of phrases, for each semantic relation type identified. We propose an algorithm for creating rules that allow the selection of the relation between A and B in noun phrases of each type. These rules correspond to selecting from a knowledge base the appropriate neighborhood of a given term. For the phrase "désert d’Algérie" carrying the location relation, the term "désert" is identified as a geographical location, and "Algérie" as a country. These constraints are used to automatically learn a set of rules for selecting the location relation for this type of example. Rules are not exclusive as there may be instances that fall under multiple relations. In the phrase "portrait de sa mère - the portrait of his/her mother", all of depiction, possession, and producer types are a possible match. 

Špela Vintar, Mojca Brglez and Aleš Žagar. How Human-Like Are Word Associations in Generative Models? An Experiment in Slovene.

Large language models (LLMs) show extraordinary performance in a broad range of cognitive tasks, yet their capability to reproduce human semantic similarity judgements remains disputed. We report an experiment in which we fine-tune two LLMs for Slovene, a monolingual SloT5 and a multilingual mT5, as well as an mT5 for English, to generate word associations. The models are fine-tuned on human word association norms created within the Small World of Words project, which recently started to collect data for Slovene. Since our aim was to explore differences between human and model-generated outputs, the model parameters were minimally adjusted to fit the association task. We perform automatic evaluation using a set of methods to measure the overlap and ranking, and in addition a subset of human and model-generated responses were manually classified into four categories (meaning-, position- and form-based, and erratic). Results show that human-machine overlap is very small, but that the models produce a similar distribution of association categories as humans. 

Irene Pagliai. Idiom Complexity in Apple-Pie Order: The Disentanglement of Decomposability and Transparency.

Both decomposability and transparency investigate the interplay between literality and figurativity in idioms. For this reasomnm, they have often been merged. This study argues that idiom decomposability and transparency are related but conceptually different constructs, thus advocating for their distinction. Leveraging a normed lexicon of Italian and English idioms, the respective effects of decomposability and transparency on idiom meaning recognition are explored via statistical modeling. Results show the two variables contribute differently to idiom meaning recognition in the two languages, while the absence of collinearity underscores their distinct contributions. Based on this empirical evidence, the study finally proposes FrameNet and MetaNet as computational tools for modeling idiom decomposability and transparency. This study thus not only substantiates the separation of idiom decomposability and transparency, but also sets a foundation for future interdisciplinary research to bridge the gap in idiom research between empirical psycholinguistics, cognitive linguistics and computational applications. 

Seohyun IM and Chungmin Lee.  What GPT-4 Knows about Aspectual Coercion: Focused on "Begin the Book"?

his paper explores whether Pre-trained Large Language Models (PLLMs) like GPT-4 can grasp profound linguistic insights into language phenomena such as Aspectual Coercion through interaction with Microsoft’s Copilot, which integrates GPT- 4. Firstly, we examined Copilot’s understanding of the co-occurrence constraints of the aspectual verb “begin” and the complex-type noun “book” using the classic illustration of Aspectual Coercion, “begin the book.” Secondly, we verified Copilot’s awareness of both the default interpretation of “begin the book” with no specific context and the contextually preferred interpretation. Ultimately, Copilot provided appropriate responses regarding potential interpretations of “begin the book” based on its distributional properties and context-dependent preferred interpretations. However, it did not furnish sophisticated explanations concerning these interpretations from a linguistic theoretical perspective. On the other hand, by offering diverse interpretations grounded in distributional properties, language models like GPT-4 demonstrated their potential contribution to the refinement of linguistic theories. Furthermore, we suggested the feasibility of employing Language Models to construct language resources associated with language phenomena including Aspectual Coercion.  

Simon De Deyne, Chunhua Liu and Lea Frermann. Can GPT-4 Recover Latent Semantic Relational Information from Word Associations? A Detailed Analysis of Agreement with Human-annotated Semantic Ontologies.

Word associations, i.e., spontaneous responses to a cue word, provide not only a window into the human mental lexicon but have also been shown to be a repository of common-sense knowledge and can underpin efforts in lexicography and the construction of dictionaries. Especially the latter tasks require knowledge about the relations underlying the associations (e.g., Taxonomic vs. Situational); however, to date, there is neither an established ontology of relations nor an effective labelling paradigm. Here, we test GPT-4’s ability to infer semantic relations for human-produced word associations. We use four human-labelled data sets of word associations and semantic features, with differing relation inventories and various levels of annotator agreement. We directly prompt GPT-4 with detailed relation definitions without further fine-tuning or training. Our results show that while GPT-4 provided a good account of higher-level classifications (e.g., Taxonomic vs Situational), prompting instructions alone cannot obtain similar performance for detailed classifications (e.g., superordinate, subordinate or coordinate relations) despite high agreement among human annotators. This suggests that latent relations can at least be partially recovered from word associations and highlights ways in which LLMs could be improved and human annotation protocols could adapted to reduce coding ambiguity.  

Bernard A. J. Jap, Yu-Yin Hsu, Lavinia Salicchi and Yu Xi Li. What’s in a Name? Electrophysiological Differences in Processing Proper Nouns in Mandarin Chinese. 

The current study examines how proper names and common nouns in Chinese are cognitively processed during sentence comprehension. EEG data was recorded when participants were presented with neutral contexts followed by either a proper name or a common noun. Proper names in Chinese often consist of characters that can function independently as words or be combined with other characters to form words, potentially benefiting from the semantic features carried by each character. Using cluster-based permutation tests, we found a larger N400 for common nouns when compared to proper names. Our results suggest that the semantics of characters do play a role in facilitating the processing of proper names. This is consistent with previous behavioral findings on noun processing in Chinese, indicating that common nouns require more cognitive resources to process than proper names. Moreover, our results suggest that proper names are processed differently between alphabetic languages and Chinese language.  

Iuliia Zaitova, Irina Stenger, Muhammad Umer Butt and Tania Avgustinova. Cross-Linguistic Processing of Non-Compositional Expressions in Slavic Languages.

This study focuses on evaluating and predicting the intelligibility of non-compositional expressions within the context of five closely related Slavic languages: Belarusian, Bulgarian, Czech, Polish, and Ukrainian, as perceived by native speakers of Russian. Our investigation employs a web-based experiment where native Russian respondents take part in free-response and multiple-choice translation tasks. Based on the previous studies in mutual intelligibility and non-compositionality, we propose two predictive factors for reading comprehension of unknown but closely related languages: 1) linguistic distances, which include orthographic and phonological distances; 2) surprisal scores obtained from monolingual Language Models (LMs). Our primary objective is to explore the relationship of these two factors with the intelligibility scores and response times of our web-based experiment. Our findings reveal that, while intelligibility scores from the experimental tasks exhibit a stronger correlation with phonological distances, LM surprisal scores appear to be better predictors of the time participants invest in completing the translation tasks. 

Bram van Dijk, Max J. van Duijn, Li Kloostra, Marco Spruit and Barend Beekhuizen. Using Language Models to Unravel Semantic Development in Children’s Use of Perception Verbs

We employ a Language Model (LM) to gain insight into how complex semantics of Dutch Perception Verb (PV) zien (‘to see’) emerge in children. Using a Dutch LM as representation of mature language use, we find that for ages 4-12y 1) the LM accurately predicts PV use in children’s freely-told narratives; 2) children’s PV use is close to mature use; and 3) complex PV meanings with attentional and cognitive aspects can be found. Our approach illustrates how LMs can be meaningfully em- ployed in studying language development, hence takes a constructive position in the debate on the relevance of LMs in this context. 

Ludovica Cerini, Alessandro Bondielli and Alessandro Lenci. Representing Abstract Concepts with Images: An Investigation with Large Language Models.

Multimodal metaphorical interpretation of abstract concepts has always been a debated problem in many research fields, including cognitive linguistics and NLP. With the dramatic improvements of Large Language Models (LLMs) and the increasing attention toward multimodal Vision-Language Models (VLMs), there has been pronounced attention on the conceptualization of abstracts. Nevertheless, a systematic scientific investigation is still lacking. This work introduces a framework designed to shed light on the indirect grounding mechanisms that anchor the meaning of abstract concepts to concrete situations (e.g. ability - a person skating), following the idea that abstracts acquire meaning from embodied and situated simulation. We assessed human and LLMs performances by a situation generation task. Moreover, we assess the figurative richness of images depicting concrete scenarios, via a text-to-image retrieval task performed on LAION-400M. 

Vadim A. Porvatov, Carlo Strapparava and Marina Tiuleneva. Big-Five Backstage: A Dramatic Dataset for Characters Personality Traits & Gender Analysis.

This paper introduces a novel textual dataset comprising fictional characters’ lines with annotations based on their gender and Big-Five personality traits. Using psycholinguistic findings, we compared texts attributed to fictional characters and real people with respect to their genders and personality traits. Our results indicate that imagined personae mirror most of the language categories observed in real people while demonstrating them in a more expressive manner. 

Yulia Zinova, Ruben van de Vijver and Anastasia Yablokova. Interaction of Semantics and Morphology in Russian Word Vectors.

In this paper we explore how morphological information can be extracted from fastText embeddings for Russian nouns. We investigate the negative effects of syncretism and propose ways of modifying the vectors that can help to find better representations for morphological functions and thus for out of vocabulary words. In particular, we look at the effect of analysing shift vectors instead of original vectors, discuss various possibilities of finding base forms to create shift vectors, and show that using only the high frequency data is beneficial when looking for structure with respect to the morphosyntactic functions in the embeddings. 

Vladislav Ivanovich Zubov and Elena Riekhakaynen. Listen, Repeat, Decide: Investigating Pronunciation Variation in Spoken Word Recognition among Russian Speakers.

Variability is one of the important features of natural speech and a challenge for spoken word recognition models and automatic speech recognition systems. We conducted two preliminary experiments aimed at finding out whether native Russian speakers regard differently certain types of pronunciation variation when the variants are equally possible according to orthoepic norms. In the first experiment, the participants had to repeat the words with three different types of pronunciation variability. In the second experiment, we focused on the assessment of words with variable and only one standard stress. Our results support the hypothesis that listeners pay the most attention to words with variable stress, less to the variability of soft and hard consonants, and even less to the presence / absence of /j/. Assessing the correct pronunciation of words with variable stress takes significantly more time than assessing words which have only one correct pronunciation variant. These preliminary results show that pronunciation variants can provide new evidence on how a listener accesses the mental lexicon during natural speech processing and chooses among the variants stored in it. 

Emese K. Molnár and Andrea Dömötör. The Mental Lexicon of Communicative Fragments and Contours: The Remix N- gram Method.

The classical mental lexicon models represent the lexicon as a list of words. Usage-based models describe the mental lexicon more dynamically, but they do not capture the real-time operation of speech production. In the linguistic model of Boris Gasparov, the notions of communicative fragment and contour can provide a comprehensive description of the diversity of linguistic experience. Fragments and contours form larger linguistic structures than words and they are recognized as a whole unit by speakers through their communicative profile. Fragments are prefabricated units that can be added to or merged with each other during speech production. The contours serve as templates for the utterances by combining linguistic elements on specific and abstract level. Based on this theoretical framework, our tool applies remix n-grams (combination of word forms, lem- mas and POS tags) to identify similar linguistic structures in different texts that form the basic units of the mental lexicon 

Michael M. Flor. Three Studies on Predicting Word Concreteness with Embedding Vectors. 

Human-assigned concreteness ratings for words are commonly used in psycholinguistic and computational linguistic studies. Previous research has shown that such ratings can be modeled and extrapolated by using dense word-embedding representations. However, due to rater disagreement, considerable amounts of human ratings in published datasets are not reliable. We investigate how such unreliable data influences modeling of concreteness with word embeddings. Study 1 compares fourteen embedding models over three datasets of concreteness ratings, showing that most models achieve high correlations with human ratings, and exhibit low error rates on predictions. Study 2 investigates how exclusion of the less reliable ratings influences the modeling results. It indicates that improved results can be achieved when data is cleaned. Study 3 adds additional conditions over those of study 2 and indicates that the improved results hold only for the cleaned data, and that in the general case removing the less reliable data points is not useful.