Publications

preprints

Understanding the limits of language is a prerequisite for Large Language Models (LLMs) to act as theories of natural language. LLM performance in some language tasks presents both quantitative and qualitative differences from that of humans, however it remains to be determined whether such differences are amenable to model size. This work investigates the critical role of model scaling, determining whether increases in size make up for such differences between humans and models. We test three LLMs from different families (Bard, 137 billion parameters; ChatGPT-3.5, 175 billion; ChatGPT-4, 1.5 trillion) on a grammaticality judgment task featuring anaphora, center embedding, comparatives, and negative polarity. N=1,200 judgments are collected and scored for accuracy, stability, and improvements in accuracy upon repeated presentation of a prompt. Results of the best performing LLM, ChatGPT-4, are compared to results of n=80 humans on the same stimuli. We find that increased model size may lead to better performance, but LLMs are still not sensitive to (un)grammaticality as humans are. It seems possible but unlikely that scaling alone can fix this issue. We interpret these results by comparing language learning in vivo and in silico, identifying three critical differences concerning (i) the type of evidence, (ii) the poverty of the stimulus, and (iii) the occurrence of semantic hallucinations due to impenetrable linguistic reference.

The search surface is a foundational concept in visual search literature. It describes the impact of target-distractor (TD) and distractor-distractor (DD) similarity on search efficiency. However, the search surface shape lacks direct quantitative support, being a summary approximation of a wide range of lab-based results poorly generalisable to real-world scenarios. This study exploits convolutional neural networks to quantitatively assess the similarity effects in search tasks using real images as stimuli and determine which levels of feature complexity the similarity effects rely on. Besides providing ecological converging evidence supporting the established search surface, our results reveal that TD and DD similarity mainly operate at two distinct layers of the network: DD similarity at the layer of coarse object features, while TD similarity at the layer of complex features used for classification. This suggests that these forms of similarities exert their major effects at two distinct levels of perceptual processing.

Data, Material, & Scripts

Computational models of semantic representations have long assumed and produced a single static representation for each word type, ignoring the influence of linguistic context on semantic representations. Recent Large Language Models (LLMs) introduced in Natural Language Processing, however, learn token-level contextualised representations, holding promise to study how semantic representations change in different contexts. In this study we probe type- and token-level representations learned using a prominent example of such models, Bidirectional Encoder Representations from Transformers (BERT), for their ability to i) explain semantic effects found for isolated words (semantic relatedness and similarity ratings, lexical decision, and semantic priming), but critically also to ii) exhibit systematic interactions between lexical semantics and context, and iii) explain meaning modulations in context. Across a wide range of empirical studies on each of these topics, we show that BERT representations satisfy two desiderata for psychologically valid semantic representations: i) they have a stable semantic core which allows people to interpret words in isolation and prevents words to be used arbitrarily and ii) they interact with sentence context in systematic ways, with representations shifting as a function of their semantic core and the context. This demonstrates that a single, comprehensive model which simultaneously learns abstract, type-level prototype representations as well as mechanisms of how these interact with context can explain both isolated word effects and context-dependent variations. Notably, these variations are not limited to discrete word senses, eschewing a strict dichotomy between exemplar and prototype models and re-framing traditional notions of polysemy. 


in press

Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated to these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.

2024

Link     Download     Data, Material, & Scripts

We investigate the onomasiological question of which words speakers actually use and produce when trying to convey an intended meaning. This is not limited to selecting the best-fitting available existing word, but also includes word formation, the coinage of novel words. In the first two experiments, we introduce the taboo game paradigm in which participants were instructed to produce a single-word substitution for different words so that others can later identify them. Using distributional semantic models with the capability to produce quantitative representations for existing and novel word responses, we find that (a) responses tend to be semantically close to the targets and (b) existing words were represented closer than novel words, but (c) even novel compounds were often closer than the targets’ free associates. In a final third experiment, we find that other participants are more likely to guess the correct original word (a) for responses closer to the original targets, and (b) for novel compound responses as compared to existing word responses. This shows that the production of both existing and novel words can be accurately captured in a unified computational framework of the semantic mechanisms driving word choice.


*shared first authorship    

Link/Download     Data, Material, & Scripts


The use of taboo words represents one of the most common and arguably universal linguistic behaviors, fulfilling a wide range of psychological and social functions. However, in the scientific literature, taboo language is poorly characterized, and how it is realized in different languages and populations remains largely unexplored. Here we provide a database of taboo words, collected from different linguistic communities (Study 1, N = 1,046), along with their speaker-centered semantic characterization (Study 2, N = 455 for each of six rating dimensions), covering 13 languages and 17 countries from all five permanently inhabited continents. Our results show that, in all languages, taboo words are mainly characterized by extremely low valence and high arousal, and very low written frequency. However, a significant amount of cross-country variability in words’ tabooness and offensiveness proves the importance of community-specific sociocultural knowledge in the study of taboo language.


Link/Download    Data, Material, & Scripts

Valence is a dominant semantic dimension, and it is fundamentally linked to basic approach-avoidance behavior within a broad range of contexts. Previous studies have shown that it is possible to approximate the valence of existing words based on several surface-level and semantic components of the stimuli. Parallelly, recent studies have shown that even completely novel and (apparently) meaningless stimuli, like pseudowords, can be informative of meaning based on the information that they carry at the sub-word level. Here, we aimed to further extend this evidence by investigating whether humans can reliably assign valence to pseudowords and, additionally, to identify the factors explaining such valence judgments. In Experiment 1, we trained several models to predict valence judgments for existing words from their combined form and meaning information. Then, in Experiment 2 and Experiment 3, we extended the results by predicting participants’ valence judgments for pseudowords, using a set of models indexing different (possible) sources of valence and selected the best performing model in a completely data-driven procedure. Results showed that the model including basic surface-level (i.e., letters composing the pseudoword) and orthographic neighbors information performed best, thus, tracing back pseudoword valence to these components. These findings support perspectives on the non-arbitrariness of language and provide insights regarding how humans process the valence of novel stimuli. 

Link/Download     Data, Material, & Scripts

Over the last years, advancements in deep learning models for computer vision have led to a dramatic improvement in their image classification accuracy. However, models with a higher accuracy in the task they were trained on do not necessarily develop better image representations that allow them to also perform better in other tasks they were not trained on. In order to investigate the representation learning capabilities of prominent high-performing computer vision models, we investigated how well they capture various indices of perceptual similarity from large-scale behavioral datasets. We find that higher image classification accuracy rates are not associated with a better performance on these datasets, and in fact we observe no improvement in performance since GoogLeNet (released 2015) and VGG-M (released 2014). We speculate that more accurate classification may result from hyper-engineering towards very fine-grained distinctions between highly similar classes, which does not incentivize the models to capture overall perceptual similarities. 

Link     Download     Data, Material, & Scripts

The human body is perhaps the most ubiquitous and salient visual stimulus that we encounter in our daily lives. Given the prevalence of images of human bodies innatural scene statistics, it is no surprise that our mental representations of the body are thought to strongly originate from visual experience. Yet, little is still known about high-level cognitive representations of the body. Here, we retrieved a body map from natural language, taking this as a window into high-level cognitive processes. We first extracted a matrix of distances between body parts from natural language data and employed this matrix to extrapolate a body map. To test the effectiveness of this high-level body map, we then conducted a series of experiments in which participants were asked to classify the distance between pairs of body parts, presented either as words or images. We found that the high-level body map was systematically activated when participants were making these distance judgments. Crucially, the linguistic map explained participants’ performance over and above the visual body map, indicating that the former cannot be simply conceived as a by-product of perceptual experience. These findings, therefore, establish the existence of a behaviorally relevant, high-level representation of the human body.

Link     Download

We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models’ overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the models’ human-likeness in the grammatical domain are premature.

Link     Download     Data, Material, & Scripts

Word frequency is one of the best predictors of language processing. Typically, word frequency norms are entirely based on natural-language text data, thus representing what the literature typically refers to as purely linguistic experience. This study presents Flickr frequency norms as a novel word frequency measure from a domain-specific corpus inherently tied to extra-linguistic information: words used as image tags on social media. To obtain Flickr frequency measures, we exploited the photo-sharing platform Flickr Image (containing billions of photos) and extracted the number of uploaded images tagged with each of the words considered in the lexicon. Here we systematically examine the peculiarities of Flickr frequency norms and show that Flickr frequency is a hybrid metric, lying at the intersection between language and visual experience and with specific biases induced by being based on image-focused social media. Moreover, regression analyses indicate that Flickr frequency captures additional information beyond what is already encoded in existing norms of linguistic, sensorimotor, and affective experience. Therefore, these new norms capture aspects of language usage that are missing from traditional frequency measures: a portion of language usage capturing the interplay between language and vision, which – this study demonstrates - has its own impact on word processing. The Flickr frequency norms are openly available on the Open Science Framework (https://osf.io/2zfs3/). 

Link/Download     Data, Material, & Scripts     

Five experiments investigated the association between time and valence. In the first experiment, participants classified temporal expressions (e.g., past, future) and positively or negatively connotated words (e.g., glorious, nasty) based on temporal reference or valence. They responded slower and made more errors in the mismatched condition (positive/past mapped to one hand, negative/future to the other) compared with the matched condition (positive/future to one hand, negative/past to the other hand). Experiment 2 confirmed the generalization of the match effect to nonspatial responses, while Experiment 3 found no reversal of this effect for left-handers. Overall, the results of the three experiments indicate a robust match effect, associating the past with negative valence and the future with positive valence. Experiment 4 involved rating the valence of time-related words, showing higher ratings for future-related words. Additionally, Experiment 5 employed latent semantic analysis and revealed that linguistic experiences are unlikely to be the source of this time–valence association. An interactive activation model offers a quantitative explanation of the match effect, potentially arising from a favorable perception of the future over the past.

2023

Link     Download     Data, Material, & Scripts

Humans are universally good in providing stable and accurate judgments about what forms part of their language and what not. Large Language Models (LMs) are claimed to possess human-like language abilities; hence, they are expected to emulate this behavior by providing both stable and accurate answers, when asked whether a string of words complies with or deviates from their next-word predictions. This work tests whether stability and accuracy are showcased by GPT-3/text-davinci-002, GPT-3/text-davinci-003, and ChatGPT, using a series of judgment tasks that tap on 8 linguistic phenomena: plural attraction, anaphora, center embedding, comparatives, intrusive resumption, negative polarity items, order of adjectives, and order of adverbs. For every phenomenon, 10 sentences (5 grammatical and 5 ungrammatical) are tested, each randomly repeated 10 times, totaling 800 elicited judgments per LM (total n = 2,400). Our results reveal variable above-chance accuracy in the grammatical condition, below-chance accuracy in the ungrammatical condition, a significant instability of answers across phenomena, and a yes-response bias for all the tested LMs. Furthermore, we found no evidence that repetition aids the Models to converge on a processing strategy that culminates in stable answers, either accurate or inaccurate. We demonstrate that the LMs’ performance in identifying (un)grammatical word patterns is in stark contrast to what is observed in humans (n = 80, tested on the same tasks) and argue that adopting LMs as theories of human language is not motivated at their current stage of development.

Link     Download     Data, Material, & Scripts

Quantitative, data-driven models for mental representations have long enjoyed popularity and success in psychology (for example, distributional semantic models in the language domain), but have largely been missing for the visual domain. To overcome this, we present ViSpa (Vision Spaces), high-dimensional vector spaces that include vision-based representation for naturalistic images as well as concept prototypes. These vectors are derived directly from visual stimuli through a deep convolutional neural network (DCNN) trained to classify images, and allow us to compute vision-based similarity scores between any pair of images and/or concept prototypes. We successfully evaluate these similarities against human behavioral data in a series of large-scale studies, including off-line judgments – visual similarity judgments for the referents of word pairs (Study 1) and for image pairs (Study 2), and typicality judgments for images given a label (Study 3) – as well as on-line processing times and error rates in a discrimination (Study 4) and priming task (Study 5) with naturalistic image material. ViSpa similarities predict behavioral data across all tasks, which renders ViSpa a theoretically appealing model for vision-based representations and a valuable research tool for data analysis and the construction of experimental material: ViSpa allows for precise control over experimental material consisting of images (also in combination with words), and introduces a specifically vision-based similarity for word pairs. To make ViSpa available to a wide audience, this article a) includes (video) tutorials on how to use ViSpa in R, and b) presents a user-friendly web interface at http://vispa.fritzguenther.de.  

Link     Download     Data, Material, & Scripts

Many theories on the role of semantics in morphological representation and processing focus on the interplay between the lexicalized meaning of the complex word on the one hand, and the individual constituent meanings on the other hand. However, the constituent meaning representations at play do not necessarily correspond to the free-word meanings of the constituents: Role-dependent constituent meanings can be subject to sometimes substantial semantic shift from their corresponding free-word meanings (such as -bill in hornbill and razorbill, or step- in stepmother and stepson). While this phenomenon is extremely difficult to operationalize using the standard psycholinguistic toolkit, we demonstrate how these as-constituent meanings can be represented in a quantitative manner using a data-driven computational model. After a qualitative exploration, we validate the model against a large database of human ratings of the meaning retention of constituents in compounds. With this model at hand, we then proceed to investigate the internal semantic structure of compounds, focussing on differences in semantic shift and semantic transparency between the two constituents.

Link/Download

Language processing is influenced by sensorimotor experiences. Here, we review behavioral evidence for embodied and grounded influences in language processing across six linguistic levels of granularity. We examine (a) sub-word features, discussing grounded influences on iconicity (systematic associations between word form and meaning); (b) words, discussing boundary conditions and generalizations for the simulation of color, sensory modality, and spatial position; (c) sentences, discussing boundary conditions and applications of action direction simulation; (d) texts, discussing how the teaching of simulation can improve comprehension in beginning readers; (e) conversations, discussing how multi-modal cues improve turn taking and alignment; and (f) text corpora, discussing how distributional semantic models can reveal how grounded and embodied knowledge is encoded in texts. These approaches are converging on a convincing account of the psychology of language, but at the same time, there are important criticisms of the embodied approach and of specific experimental paradigms. The surest way forward requires the adoption of a wide array of scientific methods. By providing complimentary evidence, a combination of multiple methods on various levels of granularity can help us gain a more complete understanding of the role of embodiment and grounding in language processing. 

Link/Download     

Outdoor recreation provides vital interactions between humans and ecological systems with a range of mental and physical benefits for people. Despite the increased number of studies using crowdsourced online data to assess how people interact with the landscape during recreational activities, the focus remains largely on mapping the spatial distribution of visitors or analyzing the content of shared images and little work has been done to quantify the perceptions and emotions people assign to the landscape. In this study, we used crowdsourced textual data from an outdoor activity-sharing platform (Wikiloc), and applied Natural Language Processing (NLP) methods and correlation analysis to capture hikers' perceptions associated with landscape features and physical outdoor activities. Our results indicate eight clusters based on the semantic similarity between words ranging from four clusters describing landscape features (“ecosystems, animals & plants”, “geodiversity”, “climate & weather”, and “built cultural heritage”), to one cluster describing the range of physical outdoor activities and three clusters indicating hikers' perceptions and emotions (“aesthetics”, “joy & restoration” and “physical effort sensation”). The association analysis revealed that the cluster “ecosystems, animals & plants” is likely to stimulate all three identified perceptions, suggesting that these natural features are important for hikers during their outdoor experience. Moreover, hikers strongly associate the cluster “outdoor physical activities” with both “joy & restoration” and “physical effort sensation” perceptions, highlighting the health and well-being benefits of physical activities in natural landscapes. Our study shows the potential of Wikiloc as a valuable data source to assess human-nature interactions and how textual data can provide significant advances in understanding peoples' preferences and perceptions while recreating. These findings can help inform outdoor recreation planners in the study region by focusing on the elements of the landscape that peoples perceive to be important (i.e. “ecosystems, animals & plants”).

2022

Link     Download     Data, Material, & Scripts

Theories of grounded cognition assume that conceptual representations are grounded in sensorimotor experience. However, abstract concepts such as jealousy or childhood have no directly associated referents with which such sensorimotor experience can be made; therefore, the grounding of abstract concepts has long been a topic of debate. Here, we propose (a) that systematic relations exist between semantic representations learned from language on the one hand and perceptual experience on the other hand, (b) that these relations can be learned in a bottom-up fashion, and (c) that it is possible to extrapolate from this learning experience to predict expected perceptual representations for words even where direct experience is missing. To test this, we implement a data-driven computational model that is trained to map language-based representations (obtained from text corpora, representing language experience) onto vision-based representations (obtained from an image database, representing perceptual experience), and apply its mapping function onto language-based representations for abstract and concrete words outside the training set. In three experiments, we present participants with these words, accompanied by two images: the image predicted by the model and a random control image. Results show that participants’ judgements were in line with model predictions even for the most abstract words. This preference was stronger for more concrete items and decreased for the more abstract ones. Taken together, our findings have substantial implications in support of the grounding of abstract words, suggesting that we can tap into our previous experience to create possible visual representation we don’t have.

Link     Download     Data, Material, & Scripts

Large-scale linguistic data is nowadays available in abundance. Using this source of data, previous research has identified redundancies between the statistical structure of natural language and properties of the (physical) world we live in. For example, it has been shown that we can gauge city sizes by analyzing their respective word frequencies in corpora. However, since natural language is always produced by human speakers, we point out that such redundancies can only come about indirectly and should necessarily be restricted cases where human representations largely retain characteristics of the physical world. To demonstrate this, we examine the statistical occurrence of words referring to body parts in very different languages, covering nearly 4 billions of native speakers. This is because the convergence between language and physical properties of the stimuli clearly breaks down for the human body (i.e., more relevant and functional body parts are not necessarily larger in size). Our findings indicate that the human body as extracted from language does not retain its actual physical proportions; instead, it resembles the distorted human-like figure known as the sensory homunculus, whose form depicts the amount of cortical area dedicated to sensorimotor functions of each body part (and, thus, their relative functional relevance). This demonstrates that the surface-level statistical structure of language opens a window into how humans represent the world they live in, rather than into the world itself. 

Link     Download     Data, Material, & Scripts

While distributional semantic models that represent word meanings as high-dimensional vectors induced from large text corpora have been shown to successfully predict human behavior across a wide range of tasks, they have also received criticism from different directions. These include concerns over their interpretability (how can numbers specifying abstract, latent dimensions represent meaning?) and their ability to capture variation in meaning (how can a single vector representation capture multiple different interpretations for the same expression?). Here, we demonstrate that semantic vectors can indeed rise up to these challenges, by training a mapping system (a simple linear regression) that predicts inter-individual variation in relational interpretations for compounds such as wood brush (for example brush FOR wood, or brush MADE OF wood) from (compositional) semantic vectors representing the meanings of these compounds. These predictions consistently beat different random baselines, both for familiar compounds (moon light, Experiment 1) as well as novel compounds (wood brush, Experiment 2), demonstrating that distributional semantic vectors encode variations in qualitative interpretations that can be decoded using techniques as simple as linear regression.

Link     Download     Data, Material, & Scripts

While a number of studies have repeatedly demonstrated an automatic activation of sensorimotor experience during language processing in the form of action-congruency effects, as predicted by theories of grounded cognition, more recent research has not found these effects for words that were just learned from linguistic input alone, without sensorimotor experience with their referents. In the present study, we investigate whether this absence of effects can be attributed to a lack of repeated experience and consolidation of the associations between words and sensorimotor experience in memory. To address these issues, we conducted four experiments in which (1 and 2) participants engaged in two separate learning phases in which they learned novel words from language alone, with an intervening period of memory-consolidating sleep, and (3 and 4) we employed familiar words whose referents speakers have no direct experience with (such as plankton). However, we again did not observe action-congruency effects in subsequent test phases in any of the experiments. This indicates that direct sensorimotor experience with word referents is a necessary requirement for automatic sensorimotor activation during word processing.

2021

Link     Download     Data, Material, & Scripts

In their strongest formulation, theories of grounded cognition claim that concepts are made up of sensorimotor information. Following such equivalence, perceptual properties of objects should consistently influence processing, even in purely linguistic tasks, where perceptual information is neither solicited nor required. Previous studies have tested this prediction in semantic priming tasks, but they have not observed perceptual influences on participants’ performances. However, those findings suffer from critical shortcomings, which may have prevented potential visually grounded/perceptual effects from being detected. Here, we investigate this topic by applying an innovative method expected to increase the sensitivity in detecting such perceptual effects. Specifically, we adopt an objective, data-driven, computational approach to independently quantify vision-based and language-based similarities for prime-target pairs on a continuous scale. We test whether these measures predict behavioural performance in a semantic priming mega-study with various experimental settings. Vision-based similarity is found to facilitate performance, but a dissociation between vision-based and language-based effects was also observed. Thus, in line with theories of grounded cognition, perceptual properties can facilitate word processing even in purely linguistic tasks, but the behavioural dissociation at the same time challenges strong claims of sensorimotor and conceptual equivalence.

Link/Download     Data, Material, & Scripts

In the current state-of-the art distributional semantics model of the meaning of noun-noun compounds (such as chainsaw, butterfly, home phone), CAOSS (Marelli et al. 2017), the semantic vectors of the individual constituents are combined, and enriched by position-specific information for each constituent in its role as either modifier or head. Most recently there have been attempts to include vision-based embeddings in these models (Günther et al., 2020b), using the linear  architecture implemented in the CAOSS model. In the present paper, we extend this line of research and demonstrate that moving to nonlinear models improves the results for vision while linear models are a good choice for text. Simply concatenating text and vision vectors does not currently (yet) improve the prediction of human behavioral data over models using text- and vision-based measures separately.

Link     Data, Material, & Scripts

Conversational negation often behaves differently from negation as a logical operator: when rejecting a state of affairs, it does not present all members of the complement set as equally plausible alternatives, but it rather suggests some of them as more plausible than others (e.g., “This is not a dog, it is a wolf/*screwdriver”). Entities that are semantically similar to a negated entity tend to be judged as better alternatives (Kruszewski et al., 2016). In fact, Kruszewski et al. (2016) show that the cosine similarity scores between the distributional semantics representations of a negated noun and its potential alternatives are highly correlated with the negated noun-alternatives human plausibility ratings. In a series of cloze tasks, we show that negation likewise restricts the production of plausible alternatives to similar entities. Furthermore, completions to negative sentences appear to be even more restricted than completions to an affirmative conjunctive context, hinting at a peculiarity of negation.

2020

Link/Download

*shared first authorship

While morphemes are theoretically defined as linguistic units linking form and meaning, semantic effects in morphological processing are not reported consistently in the literature on derived and compound words. The lack of consistency in this line of research has often been attributed to methodological differences between studies or contextual effects. In this paper, we advance a different proposal where semantic effects emerge quite consistently if semantics is defined in a dynamic and flexible way, relying on distributional semantics approaches. In this light, we revisit morphological processing, taking a markedly cognitive perspective, as allowed by models that focus on morphology as systematic meaning transformation or that focus on the mapping between the orthographic form of words and their meanings. 

Link     Download     Data, Material, & Scripts

Theories of grounded cognition postulate that concepts are grounded in sensorimotor experience. But how can that be for concepts like Atlantis for which we do not have that experience? We claim that such concepts obtain their sensorimotor grounding indirectly, via already-known concepts used to describe them. Participants learned novel words referring to up or down concepts (mende = enhanced head or mende = bionic foot). In a first experiment, participants then judged the sensibility of sentences implying up or down actions (e.g., “You scratch your bionic foot”) by performing up or down hand movements. Reactions were faster when the hand movement matched the direction of the implied movement. In the second experiment, we observed the same congruency effect for sentences like, “You scratch your mende”, whose implied direction depended entirely on the learning phase. This offers a perspective on how concepts learned without direct experience can nonetheless be grounded in sensorimotor experience.

Link     Download     Data, Material, & Scripts

Previous studies found that an automatic meaning-composition process affects the processing of morphologically complex words, and related this operation to conceptual combination. However, research on embodied cognition demonstrates that concepts are more than just lexical meanings, rather being also grounded in perceptual experience. Therefore, perception-based information should also be involved in mental operations on concepts, such as conceptual combination. Consequently, we should expect to find perceptual effects in the processing of morphologically complex words. In order to investigate this hypothesis, we present the first fully-implemented and data-driven model of perception-based (more specifically, vision-based) conceptual combination, and use the predictions of such a model to investigate processing times for compound words in four large-scale behavioral experiments employing three paradigms (naming, lexical decision, and timed sensibility judgments). We observe facilitatory effects of vision-based compositionality in all three paradigms, over and above a strong language-based (lexical and semantic) baseline, thus demonstrating for the first time perceptually grounded effects at the sub-lexical level. This suggests that perceptually-grounded information is not only utilized according to specific task demands but rather automatically activated when available.

Link/Download     Data, Material, & Scripts

In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1,810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents’ meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be captured using a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2,061 novel German compounds.

Link     Download     Data, Material, & Scripts

Speakers of languages with synchronically productive compounding systems, such as English, are likely to encounter new compounds on a daily basis. These can only be useful for communication if speakers are able to rapidly compose their meanings. However, while compositional meanings can be obtained for some novel compounds such as bridgemill, this is far harder for others such as radiosauce; accordingly, processing speed should be affected by the ease of such a compositional process. To rigorously test this hypothesis, we employed a fully implemented computational model based on distributional semantics to quantitatively measure the degree of semantic compositionality of novel compounds. In two large-scale studies, we collected timed sensibility judgements and lexical decisions for hundreds of morphologically structured nonwords in English. Response times were predicted by the constituents’ semantic contribution to the compositional process, with slower rejections for more compositional nonwords. We found no indication of a difference in these compositional effects between the tasks, suggesting that speakers automatically engage in a compositional process whenever they encounter morphologically structured stimuli, even when it is not required by the task at hand. Such compositional effects in the processing of novel compounds have important implications for studies that employ such stimuli as filler material or “nonwords,” as response times for these items can differ greatly depending on their compositionality.

2019

Link     Download

Models that represent meaning as high-dimensional numerical vectors—such as latent semantic analysis (LSA), hyperspace analogue to language (HAL), bound encoding of the aggregate language environment (BEAGLE), topic models, global vectors (GloVe), and word2vec—have been introduced as extremely powerful machine-learning proxies for human semantic representations and have seen an explosive rise in popularity over the past 2 decades. However, despite their considerable advancements and spread in the cognitive sciences, one can observe problems associated with the adequate presentation and understanding of some of their features. Indeed, when these models are examined from a cognitive perspective, a number of unfounded arguments tend to appear in the psychological literature. In this article, we review the most common of these arguments and discuss (a) what exactly these models represent at the implementational level and their plausibility as a cognitive theory, (b) how they deal with various aspects of meaning such as polysemy or compositionality, and (c) how they relate to the debate on embodied and grounded cognition. We identify common misconceptions that arise as a result of incomplete descriptions, outdated arguments, and unclear distinctions between theory and implementation of the models. We clarify and amend these points to provide a theoretical basis for future research and discussions on vector models of semantic representation.

Link     Download     Data, Material, & Scripts

In morphological processing, research has repeatedly found different priming effects by English and German native speakers in the overt priming paradigm. In English, priming effects were found for word pairs with a morphological and semantic relation (SUCCESSFUL-success), but not for pairs without a semantic relation (SUCCESSOR-success). By contrast, morphological priming effects in German occurred for pairs both with a semantic relation (AUFSTEHEN-stehen, ‘stand up’-‘stand’) and without (VERSTEHEN-stehen, ‘understand’-‘stand’). These behavioural differences have been taken to indicate differential language processing and memory representations in these languages. We examine whether these behavioural differences can be explained with differences in the language structure between English and German. To this end, we employed new developments in distributional semantics as a computational method to obtain both observed and compositional representations for transparent and opaque complex word meanings, that can in turn be used to quantify the degree of semantic predictability of the morphological system of a language. We compared the similarities between transparent and opaque words and their stems, and observed a difference between German and English, with German showing a higher morphological systematicity. The present results indicate that the investigated cross-linguistic effect can be attributed to quantitatively-characterized differences in the speakers' language experience, as approximated by linguistic corpora.

Link     Download     Data, Material, & Scripts

Effects of semantic transparency, reflected in processing differences between semantically transparent (teabag) and opaque (ladybird) compounds, have received considerable attention in the investigation of the role of constituents in compound processing. However, previous studies have yielded inconsistent results. In the present article, we argue that this is due to semantic transparency’s often being conceptualized only as the semantic relatedness between the compound and constituent meanings as separate units. This neglects the fact that compounds are inherently productive constructions. We argue that compound processing is routinely impacted by a compositional process aimed at computing a compositional meaning, which would cause compositional semantic transparency effects to emerge in compound processing. We employ recent developments in compositional distributional semantics to quantify relatedness- as well as composition-based semantic transparency measures and use these to predict lexical decision times in a large-scale data set. We observed semantic transparency effects on compound processing that are not captured in relatedness terms but only by adopting a compositional perspective.

Link

Scoring divergent-thinking response sets has always been challenging because such responses are not only open-ended in terms of number of ideas, but each idea may also be expressed by a varying number of concepts and, thus, by a varying number of words (elaboration). While many current studies have attempted to score the semantic distance in divergent-thinking responses by applying latent semantic analysis (LSA), it is known from other areas of research that LSA-based approaches are biased according to the number of words in a response. Thus, the current article aimed to identify and demonstrate this elaboration bias in LSA-based divergent-thinking scores by means of a simulation. In addition, we show that this elaboration bias can be reduced by removing the stop words (for example, and, or, for and so forth) prior to analysis. Furthermore, the residual bias after stop word removal can be reduced by simulation-based corrections. Finally, we give an empirical illustration for alternate uses and consequences tasks. Results suggest that when both stop word removal and simulation-based bias correction are applied, convergent validity should be expected to be highest.

2018

Link     Download     Data, Material, & Scripts

Theories of embodied cognition assume that concepts are grounded in non-linguistic, sensorimotor experience. In support of this assumption, previous studies have shown that upwards response movements  are faster than downwards movements after participants have been presented with words whose referents are typically located in the upper vertical space (and vice versa for downwards responses). This is taken as evidence that processing these words reactivates sensorimotor experiential traces. This congruency effect was also found for novel words, after participants learned these words as labels for novel objects that they encountered either in their upper or lower visual field. While this indicates that direct experience with a word’s referent is sufficient to evoke said congruency effects, the present study investigates whether this direct experience is also a necessary condition. To this end, we conducted five experiments in which participants learned novel words from purely linguistic input: Novel words were presented in pairs with real up- or down-words (Experiment 1); they were presented in natural sentences where they replaced these real words (Experiment 2); they were presented as new labels for these real words (Experiment 3); and they were presented as labels for novel combined concepts based on these real words (Experiment 4 and 5). In all five experiments, we did not find any congruency effects elicited by the novel words; however, participants were always able to make correct explicit judgements about the vertical dimension associated to the novel words. These results suggest that direct experience is necessary for reactivating experiential traces, but this reactivation is not a necessary condition for understanding (in the sense of storing and accessing) the corresponding aspects of word meaning.

Link     Download     Data, Material, & Scripts

In the present study, we investigated to what extent compounding involves general-level cognitive abilities related to conceptual combination. If that was the case, the compounding mechanism should be largely invariant across different languages. Under this assumption, a compositional model trained on word representations in one language should be able to predict compound meanings in other languages. We investigated this hypothesis by training a word embedding-based compositional model on a set of English compounds, and subsequently applied this model to German and Italian test compounds. The model partially predicted compound meanings in German, but not in Italian.

2016

Link/Download     Data, Material, & Scripts

Noun compounds, consisting of two nouns (the head and the modifier) that are combined into a single concept, differ in terms of their plausibility: school bus is a more plausible compound than saddle olive. The present study investigates which factors influence the plausibility of attested and novel noun compounds. Distributional Semantic Models (DSMs) are used to obtain formal (vector) representations of word meanings, and compositional methods in DSMs are employed to obtain such representations for noun compounds. From these representations, different plausibility measures are computed. Three of those measures contribute in predicting the plausibility of noun compounds: The relatedness between the meaning of the head noun and the compound (Head Proximity), the relatedness between the meaning of modifier noun and the compound (Modifier Proximity), and the similarity between the head noun and the modifier noun (Constituent Similarity). We find nonlinear interactions between Head Proximity and Modifier Proximity, as well as between Modifier Proximity and Constituent Similarity. Furthermore, Constituent Similarity interacts non-linearly with the familiarity with the compound. These results suggest that a compound is perceived as more plausible if it can be categorized as an instance of the category denoted by the head noun, if the contribution of the modifier to the compound meaning is clear but not redundant, and if the constituents are sufficiently similar in cases where this contribution is not clear. Furthermore, compounds are perceived to be more plausible if they are more familiar, but mostly for cases where the relation between the constituents is less clear.

Link/Download     Data, Material, & Scripts

In two experiments, we attempted to replicate findings by Günther, Dudschig & Kaup (2016) that word similarity measures obtained from distributional semantics models - Latent Semantic Analysis (LSA) and Hyperspace Analogue to Language (HAL) - predict lexical priming effects. To this end, we used the pseudo-random method to generate item material while systematically controlling for word similarities introduced by Günther et al., which was based on LSA cosine similarities (Experiment 1) and HAL cosine similarities (Experiment 2). Contrary to the original study, we used semantic spaces created from far larger corpora, and implemented several additional methodological improvements. In Experiment 1, we only found a significant effect of HAL cosines on lexical decision times, while we found significant effects for both LSA and HAL cosines in Experiment 2. As further supported by an analysis of the pooled data from both experiments, this indicates that HAL cosines are a better predictor of priming effects than LSA cosines. Taken together, the results replicate the finding that priming effects can be predicted from distributional semantic similarity measures. 

Link     Download     Data, Material, & Scripts

In distributional semantics models (DSMs) such as latent semantic analysis (LSA), words are represented as vectors in a high-dimensional vector space. This allows for computing word similarities as the cosine of the angle between two such vectors. In two experiments, we investigated whether LSA cosine similarities predict priming effects, in that higher cosine similarities are associated with shorter reaction times (RTs). Critically, we applied a pseudo-random procedure in generating the item material to ensure that we directly manipulated LSA cosines as an independent variable. We employed two lexical priming experiments with lexical decision tasks (LDTs). In Experiment 1 we presented participants with 200 different prime words, each paired with one unique target. We found a significant effect of cosine similarities on RTs. The same was true for Experiment 2, where we reversed the prime-target order (primes of Experiment 1 were targets in Experiment 2, and vice versa). The results of these experiments confirm that LSA cosine similarities can predict priming effects, supporting the view that they are psychologically relevant. The present study thereby provides evidence for qualifying LSA cosine similarities not only as a linguistic measure, but also as a cognitive similarity measure. However, it is also shown that other DSMs can outperform LSA as a predictor of priming effects.

2015

Link     Download

In this article, the R package LSAfun is presented. This package enables a variety of functions and computations based on Vector Semantic Models such as Latent Semantic Analysis (LSA) Landauer, Foltz and Laham (Discourse Processes 25:259–284, ), which are procedures to obtain a high-dimensional vector representation for words (and documents) from a text corpus. Such representations are thought to capture the semantic meaning of a word (or document) and allow for semantic similarity comparisons between words to be calculated as the cosine of the angle between their associated vectors. LSAfun uses pre-created LSA spaces and provides functions for (a) Similarity Computations between words, word lists, and documents; (b) Neighborhood Computations, such as obtaining a word’s or document’s most similar words, (c) plotting such a neighborhood, as well as similarity structures for any word lists, in a two- or three-dimensional approximation using Multidimensional Scaling, (d) Applied Functions, such as computing the coherence of a text, answering multiple choice questions and producing generic text summaries; and (e) Composition Methods for obtaining vector representations for two-word phrases. The purpose of this package is to allow convenient access to computations based on LSA.