Quantitative, data-driven models for mental representations have long enjoyed popularity and success in psychology (for example, distributional semantic models in the language domain), but have largely been missing for the visual domain. To overcome this, we present ViSpa (Vision Spaces), high-dimensional vector spaces that include vision-based representation for naturalistic images as well as concept prototypes. These vectors are derived directly from visual stimuli through a deep convolutional neural network (DCNN) trained to classify images, and allow us to compute vision-based similarity scores between any pair of images and/or concept prototypes. We successfully evaluate these similarities against human behavioral data in a series of large-scale studies, including off-line judgments – visual similarity judgments for the referents of word pairs (Study 1) and for image pairs (Study 2), and typicality judgments for images given a label (Study 3) – as well as on-line processing times and error rates in a discrimination (Study 4) and priming task (Study 5) with naturalistic image material. ViSpa similarities predict behavioral data across all tasks, which renders ViSpa a theoretically appealing model for vision-based representations and a valuable research tool for data analysis and the construction of experimental material: ViSpa allows for precise control over experimental material consisting of images (also in combination with words), and introduces a specifically vision-based similarity for word pairs. To make ViSpa available to a wide audience, this article a) includes (video) tutorials on how to use ViSpa in R, and b) presents a user-friendly web interface at http://vispa.fritzguenther.de.
Large-scale linguistic data is nowadays available in abundance. Using this source of data, previous research has identified redundancies between the statistical structure of natural language and properties of the (physical) world we live in. For example, it has been shown that we can gauge city sizes by analyzing their respective word frequencies in corpora. However, since natural language is always produced by human speakers, we point out that such redundancies can only come about indirectly and should necessarily be restricted cases where human representations largely retain characteristics of the physical world. To demonstrate this, we examine the statistical occurrence of words referring to body parts in very different languages, covering nearly 4 billions of native speakers. This is because the convergence between language and physical properties of the stimuli clearly breaks down for the human body (i.e., more relevant and functional body parts are not necessarily larger in size). Our findings indicate that the human body as extracted from language does not retain its actual physical proportions; instead, it resembles the distorted human-like figure known as the sensory homunculus, whose form depicts the amount of cortical area dedicated to sensorimotor functions of each body part (and, thus, their relative functional relevance). This demonstrates that the surface-level statistical structure of language opens a window into how humans represent the world they live in, rather than into the world itself.
While distributional semantic models that represent word meanings as high-dimensional vectors induced from large text corpora have been shown to successfully predict human behavior across a wide range of tasks, they have also received criticism from different directions. These include concerns over their interpretability (how can numbers specifying abstract, latent dimensions represent meaning?) and their ability to capture variation in meaning (how can a single vector representation capture multiple different interpretations for the same expression?). Here, we demonstrate that semantic vectors can indeed rise up to these challenges, by training a mapping system (a simple linear regression) that predicts inter-individual variation in relational interpretations for compounds such as wood brush (for example brush FOR wood, or brush MADE OF wood) from (compositional) semantic vectors representing the meanings of these compounds. These predictions consistently beat different random baselines, both for familiar compounds (moon light, Experiment 1) as well as novel compounds (wood brush, Experiment 2), demonstrating that distributional semantic vectors encode variations in qualitative interpretations that can be decoded using techniques as simple as linear regression.
While a number of studies have repeatedly demonstrated an automatic activation of sensorimotor experience during language processing in the form of action-congruency effects, as predicted by theories of grounded cognition, more recent research has not found these effects for words that were just learned from linguistic input alone, without sensorimotor experience with their referents. In the present study, we investigate whether this absence of effects can be attributed to a lack of repeated experience and consolidation of the associations between words and sensorimotor experience in memory. To address these issues, we conducted four experiments in which (1 and 2) participants engaged in two separate learning phases in which they learned novel words from language alone, with an intervening period of memory-consolidating sleep, and (3 and 4) we employed familiar words whose referents speakers have no direct experience with (such as plankton). However, we again did not observe action-congruency effects in subsequent test phases in any of the experiments. This indicates that direct sensorimotor experience with word referents is a necessary requirement for automatic sensorimotor activation during word processing.
Many theories on the role of semantics in morphological representation and processing focus on the interplay between the lexicalized meaning of the complex word on the one hand, and the individual constituent meanings on the other hand. However, the constituent meaning representations at play do not necessarily correspond to the free-word meanings of the constituents: Role-dependent constituent meanings can be subject to sometimes substantial semantic shift from their corresponding free-word meanings (such as -bill in hornbill and razorbill, or step- in stepmother and stepson). While this phenomenon is extremely difficult to operationalize using the standard psycholinguistic toolkit, we demonstrate how these as-constituent meanings can be represented in a quantitative manner using a data-driven computational model. After a qualitative exploration, we validate the model against a large database of human ratings of the meaning retention of constituents in compounds. With this model at hand, we then proceed to investigate the internal semantic structure of compounds, focussing on differences in semantic shift and semantic transparency between the two constituents.
In their strongest formulation, theories of grounded cognition claim that concepts are made up of sensorimotor information. Following such equivalence, perceptual properties of objects should consistently influence processing, even in purely linguistic tasks, where perceptual information is neither solicited nor required. Previous studies have tested this prediction in semantic priming tasks, but they have not observed perceptual influences on participants’ performances. However, those findings suffer from critical shortcomings, which may have prevented potential visually grounded/perceptual effects from being detected. Here, we investigate this topic by applying an innovative method expected to increase the sensitivity in detecting such perceptual effects. Specifically, we adopt an objective, data-driven, computational approach to independently quantify vision-based and language-based similarities for prime-target pairs on a continuous scale. We test whether these measures predict behavioural performance in a semantic priming mega-study with various experimental settings. Vision-based similarity is found to facilitate performance, but a dissociation between vision-based and language-based effects was also observed. Thus, in line with theories of grounded cognition, perceptual properties can facilitate word processing even in purely linguistic tasks, but the behavioural dissociation at the same time challenges strong claims of sensorimotor and conceptual equivalence.
In the current state-of-the art distributional semantics model of the meaning of noun-noun compounds (such as chainsaw, butterfly, home phone), CAOSS (Marelli et al. 2017), the semantic vectors of the individual constituents are combined, and enriched by position-specific information for each constituent in its role as either modifier or head. Most recently there have been attempts to include vision-based embeddings in these models (Günther et al., 2020b), using the linear architecture implemented in the CAOSS model. In the present paper, we extend this line of research and demonstrate that moving to nonlinear models improves the results for vision while linear models are a good choice for text. Simply concatenating text and vision vectors does not currently (yet) improve the prediction of human behavioral data over models using text- and vision-based measures separately.
Conversational negation often behaves differently from negation as a logical operator: when rejecting a state of affairs, it does not present all members of the complement set as equally plausible alternatives, but it rather suggests some of them as more plausible than others (e.g., “This is not a dog, it is a wolf/*screwdriver”). Entities that are semantically similar to a negated entity tend to be judged as better alternatives (Kruszewski et al., 2016). In fact, Kruszewski et al. (2016) show that the cosine similarity scores between the distributional semantics representations of a negated noun and its potential alternatives are highly correlated with the negated noun-alternatives human plausibility ratings. In a series of cloze tasks, we show that negation likewise restricts the production of plausible alternatives to similar entities. Furthermore, completions to negative sentences appear to be even more restricted than completions to an affirmative conjunctive context, hinting at a peculiarity of negation.
Theories of grounded cognition assume that conceptual representations are grounded in sensorimotor experience. However, abstract concepts such as jealousy or childhood have no directly associated referents with which such sensorimotor experience can be made; therefore, the grounding of abstract concepts has long been a topic of debate. Here, we propose (a) that systematic relations exist between semantic representations learned from language on the one hand and perceptual experience on the other hand, (b) that these relations can be learned in a bottom-up fashion, and (c) that it is possible to extrapolate from this learning experience to predict expected perceptual representations for words even where direct experience is missing. To test this, we implement a data-driven computational model that is trained to map language-based representations (obtained from text corpora, representing language experience) onto vision-based representations (obtained from an image database, representing perceptual experience), and apply its mapping function onto language-based representations for abstract and concrete words outside the training set. In three experiments, we present participants with these words, accompanied by two images: the image predicted by the model and a random control image. Results show that participants’ judgements were in line with model predictions even for the most abstract words. This preference was stronger for more concrete items and decreased for the more abstract ones. Taken together, our findings have substantial implications in support of the grounding of abstract words, suggesting that we can tap into our previous experience to create possible visual representation we don’t have.
While morphemes are theoretically defined as linguistic units linking form and meaning, semantic effects in morphological processing are not reported consistently in the literature on derived and compound words. The lack of consistency in this line of research has often been attributed to methodological differences between studies or contextual effects. In this paper, we advance a different proposal where semantic effects emerge quite consistently if semantics is defined in a dynamic and flexible way, relying on distributional semantics approaches. In this light, we revisit morphological processing, taking a markedly cognitive perspective, as allowed by models that focus on morphology as systematic meaning transformation or that focus on the mapping between the orthographic form of words and their meanings.
Theories of grounded cognition postulate that concepts are grounded in sensorimotor experience. But how can that be for concepts like Atlantis for which we do not have that experience? We claim that such concepts obtain their sensorimotor grounding indirectly, via already-known concepts used to describe them. Participants learned novel words referring to up or down concepts (mende = enhanced head or mende = bionic foot). In a first experiment, participants then judged the sensibility of sentences implying up or down actions (e.g., “You scratch your bionic foot”) by performing up or down hand movements. Reactions were faster when the hand movement matched the direction of the implied movement. In the second experiment, we observed the same congruency effect for sentences like, “You scratch your mende”, whose implied direction depended entirely on the learning phase. This offers a perspective on how concepts learned without direct experience can nonetheless be grounded in sensorimotor experience.
Previous studies found that an automatic meaning-composition process affects the processing of morphologically complex words, and related this operation to conceptual combination. However, research on embodied cognition demonstrates that concepts are more than just lexical meanings, rather being also grounded in perceptual experience. Therefore, perception-based information should also be involved in mental operations on concepts, such as conceptual combination. Consequently, we should expect to find perceptual effects in the processing of morphologically complex words. In order to investigate this hypothesis, we present the first fully-implemented and data-driven model of perception-based (more specifically, vision-based) conceptual combination, and use the predictions of such a model to investigate processing times for compound words in four large-scale behavioral experiments employing three paradigms (naming, lexical decision, and timed sensibility judgments). We observe facilitatory effects of vision-based compositionality in all three paradigms, over and above a strong language-based (lexical and semantic) baseline, thus demonstrating for the first time perceptually grounded effects at the sub-lexical level. This suggests that perceptually-grounded information is not only utilized according to specific task demands but rather automatically activated when available.
In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1,810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents’ meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be captured using a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2,061 novel German compounds.
Speakers of languages with synchronically productive compounding systems, such as English, are likely to encounter new compounds on a daily basis. These can only be useful for communication if speakers are able to rapidly compose their meanings. However, while compositional meanings can be obtained for some novel compounds such as bridgemill, this is far harder for others such as radiosauce; accordingly, processing speed should be affected by the ease of such a compositional process. To rigorously test this hypothesis, we employed a fully implemented computational model based on distributional semantics to quantitatively measure the degree of semantic compositionality of novel compounds. In two large-scale studies, we collected timed sensibility judgements and lexical decisions for hundreds of morphologically structured nonwords in English. Response times were predicted by the constituents’ semantic contribution to the compositional process, with slower rejections for more compositional nonwords. We found no indication of a difference in these compositional effects between the tasks, suggesting that speakers automatically engage in a compositional process whenever they encounter morphologically structured stimuli, even when it is not required by the task at hand. Such compositional effects in the processing of novel compounds have important implications for studies that employ such stimuli as filler material or “nonwords,” as response times for these items can differ greatly depending on their compositionality.
Models that represent meaning as high-dimensional numerical vectors—such as latent semantic analysis (LSA), hyperspace analogue to language (HAL), bound encoding of the aggregate language environment (BEAGLE), topic models, global vectors (GloVe), and word2vec—have been introduced as extremely powerful machine-learning proxies for human semantic representations and have seen an explosive rise in popularity over the past 2 decades. However, despite their considerable advancements and spread in the cognitive sciences, one can observe problems associated with the adequate presentation and understanding of some of their features. Indeed, when these models are examined from a cognitive perspective, a number of unfounded arguments tend to appear in the psychological literature. In this article, we review the most common of these arguments and discuss (a) what exactly these models represent at the implementational level and their plausibility as a cognitive theory, (b) how they deal with various aspects of meaning such as polysemy or compositionality, and (c) how they relate to the debate on embodied and grounded cognition. We identify common misconceptions that arise as a result of incomplete descriptions, outdated arguments, and unclear distinctions between theory and implementation of the models. We clarify and amend these points to provide a theoretical basis for future research and discussions on vector models of semantic representation.
In morphological processing, research has repeatedly found different priming effects by English and German native speakers in the overt priming paradigm. In English, priming effects were found for word pairs with a morphological and semantic relation (SUCCESSFUL-success), but not for pairs without a semantic relation (SUCCESSOR-success). By contrast, morphological priming effects in German occurred for pairs both with a semantic relation (AUFSTEHEN-stehen, ‘stand up’-‘stand’) and without (VERSTEHEN-stehen, ‘understand’-‘stand’). These behavioural differences have been taken to indicate differential language processing and memory representations in these languages. We examine whether these behavioural differences can be explained with differences in the language structure between English and German. To this end, we employed new developments in distributional semantics as a computational method to obtain both observed and compositional representations for transparent and opaque complex word meanings, that can in turn be used to quantify the degree of semantic predictability of the morphological system of a language. We compared the similarities between transparent and opaque words and their stems, and observed a difference between German and English, with German showing a higher morphological systematicity. The present results indicate that the investigated cross-linguistic effect can be attributed to quantitatively-characterized differences in the speakers' language experience, as approximated by linguistic corpora.
Effects of semantic transparency, reflected in processing differences between semantically transparent (teabag) and opaque (ladybird) compounds, have received considerable attention in the investigation of the role of constituents in compound processing. However, previous studies have yielded inconsistent results. In the present article, we argue that this is due to semantic transparency’s often being conceptualized only as the semantic relatedness between the compound and constituent meanings as separate units. This neglects the fact that compounds are inherently productive constructions. We argue that compound processing is routinely impacted by a compositional process aimed at computing a compositional meaning, which would cause compositional semantic transparency effects to emerge in compound processing. We employ recent developments in compositional distributional semantics to quantify relatedness- as well as composition-based semantic transparency measures and use these to predict lexical decision times in a large-scale data set. We observed semantic transparency effects on compound processing that are not captured in relatedness terms but only by adopting a compositional perspective.
Scoring divergent-thinking response sets has always been challenging because such responses are not only open-ended in terms of number of ideas, but each idea may also be expressed by a varying number of concepts and, thus, by a varying number of words (elaboration). While many current studies have attempted to score the semantic distance in divergent-thinking responses by applying latent semantic analysis (LSA), it is known from other areas of research that LSA-based approaches are biased according to the number of words in a response. Thus, the current article aimed to identify and demonstrate this elaboration bias in LSA-based divergent-thinking scores by means of a simulation. In addition, we show that this elaboration bias can be reduced by removing the stop words (for example, and, or, for and so forth) prior to analysis. Furthermore, the residual bias after stop word removal can be reduced by simulation-based corrections. Finally, we give an empirical illustration for alternate uses and consequences tasks. Results suggest that when both stop word removal and simulation-based bias correction are applied, convergent validity should be expected to be highest.
Theories of embodied cognition assume that concepts are grounded in non-linguistic, sensorimotor experience. In support of this assumption, previous studies have shown that upwards response movements are faster than downwards movements after participants have been presented with words whose referents are typically located in the upper vertical space (and vice versa for downwards responses). This is taken as evidence that processing these words reactivates sensorimotor experiential traces. This congruency effect was also found for novel words, after participants learned these words as labels for novel objects that they encountered either in their upper or lower visual field. While this indicates that direct experience with a word’s referent is sufficient to evoke said congruency effects, the present study investigates whether this direct experience is also a necessary condition. To this end, we conducted five experiments in which participants learned novel words from purely linguistic input: Novel words were presented in pairs with real up- or down-words (Experiment 1); they were presented in natural sentences where they replaced these real words (Experiment 2); they were presented as new labels for these real words (Experiment 3); and they were presented as labels for novel combined concepts based on these real words (Experiment 4 and 5). In all five experiments, we did not find any congruency effects elicited by the novel words; however, participants were always able to make correct explicit judgements about the vertical dimension associated to the novel words. These results suggest that direct experience is necessary for reactivating experiential traces, but this reactivation is not a necessary condition for understanding (in the sense of storing and accessing) the corresponding aspects of word meaning.
In the present study, we investigated to what extent compounding involves general-level cognitive abilities related to conceptual combination. If that was the case, the compounding mechanism should be largely invariant across different languages. Under this assumption, a compositional model trained on word representations in one language should be able to predict compound meanings in other languages. We investigated this hypothesis by training a word embedding-based compositional model on a set of English compounds, and subsequently applied this model to German and Italian test compounds. The model partially predicted compound meanings in German, but not in Italian.
Noun compounds, consisting of two nouns (the head and the modifier) that are combined into a single concept, differ in terms of their plausibility: school bus is a more plausible compound than saddle olive. The present study investigates which factors influence the plausibility of attested and novel noun compounds. Distributional Semantic Models (DSMs) are used to obtain formal (vector) representations of word meanings, and compositional methods in DSMs are employed to obtain such representations for noun compounds. From these representations, different plausibility measures are computed. Three of those measures contribute in predicting the plausibility of noun compounds: The relatedness between the meaning of the head noun and the compound (Head Proximity), the relatedness between the meaning of modifier noun and the compound (Modifier Proximity), and the similarity between the head noun and the modifier noun (Constituent Similarity). We find nonlinear interactions between Head Proximity and Modifier Proximity, as well as between Modifier Proximity and Constituent Similarity. Furthermore, Constituent Similarity interacts non-linearly with the familiarity with the compound. These results suggest that a compound is perceived as more plausible if it can be categorized as an instance of the category denoted by the head noun, if the contribution of the modifier to the compound meaning is clear but not redundant, and if the constituents are sufficiently similar in cases where this contribution is not clear. Furthermore, compounds are perceived to be more plausible if they are more familiar, but mostly for cases where the relation between the constituents is less clear.
In two experiments, we attempted to replicate findings by Günther, Dudschig & Kaup (2016) that word similarity measures obtained from distributional semantics models - Latent Semantic Analysis (LSA) and Hyperspace Analogue to Language (HAL) - predict lexical priming effects. To this end, we used the pseudo-random method to generate item material while systematically controlling for word similarities introduced by Günther et al., which was based on LSA cosine similarities (Experiment 1) and HAL cosine similarities (Experiment 2). Contrary to the original study, we used semantic spaces created from far larger corpora, and implemented several additional methodological improvements. In Experiment 1, we only found a significant effect of HAL cosines on lexical decision times, while we found significant effects for both LSA and HAL cosines in Experiment 2. As further supported by an analysis of the pooled data from both experiments, this indicates that HAL cosines are a better predictor of priming effects than LSA cosines. Taken together, the results replicate the finding that priming effects can be predicted from distributional semantic similarity measures.
In distributional semantics models (DSMs) such as latent semantic analysis (LSA), words are represented as vectors in a high-dimensional vector space. This allows for computing word similarities as the cosine of the angle between two such vectors. In two experiments, we investigated whether LSA cosine similarities predict priming effects, in that higher cosine similarities are associated with shorter reaction times (RTs). Critically, we applied a pseudo-random procedure in generating the item material to ensure that we directly manipulated LSA cosines as an independent variable. We employed two lexical priming experiments with lexical decision tasks (LDTs). In Experiment 1 we presented participants with 200 different prime words, each paired with one unique target. We found a significant effect of cosine similarities on RTs. The same was true for Experiment 2, where we reversed the prime-target order (primes of Experiment 1 were targets in Experiment 2, and vice versa). The results of these experiments confirm that LSA cosine similarities can predict priming effects, supporting the view that they are psychologically relevant. The present study thereby provides evidence for qualifying LSA cosine similarities not only as a linguistic measure, but also as a cognitive similarity measure. However, it is also shown that other DSMs can outperform LSA as a predictor of priming effects.
In this article, the R package LSAfun is presented. This package enables a variety of functions and computations based on Vector Semantic Models such as Latent Semantic Analysis (LSA) Landauer, Foltz and Laham (Discourse Processes 25:259–284, ), which are procedures to obtain a high-dimensional vector representation for words (and documents) from a text corpus. Such representations are thought to capture the semantic meaning of a word (or document) and allow for semantic similarity comparisons between words to be calculated as the cosine of the angle between their associated vectors. LSAfun uses pre-created LSA spaces and provides functions for (a) Similarity Computations between words, word lists, and documents; (b) Neighborhood Computations, such as obtaining a word’s or document’s most similar words, (c) plotting such a neighborhood, as well as similarity structures for any word lists, in a two- or three-dimensional approximation using Multidimensional Scaling, (d) Applied Functions, such as computing the coherence of a text, answering multiple choice questions and producing generic text summaries; and (e) Composition Methods for obtaining vector representations for two-word phrases. The purpose of this package is to allow convenient access to computations based on LSA.