Semantic transparency measures for English compounds


This dataset contains semantic transparency measures for a set of 1,865 English compound words, such as airport or ladybird. It is included as Supplemental Material to the following article:

Günther, F., & Marelli, M. (2018). Enter sand-man: Compound processing and semantic transparency in a compositional perspective. Journal of Experimental Psychology: Learning, Memory, and Cognition. doi: 10.1037/xlm0000677


Link to the dataset


The measures are obtained from an English distributional semantic space, where each word is represented as a high-dimensional numerical vector. Using this semantic space, modifier relatedness (simMO) is computed as the cosine similarity between modifier and compound, cos(air, airport) , and analogously head relatedness (simHO) as the cosine similarity between head and compound, cos(port, airport).

Furthermore, a compositional model was employed to obtain compositional meaning representations for the compounds (i.e., the meaning that would be predicted given the combination of their constituents). With this compositional meaning, modifier composition (simMC) is computed as the cosine similarity between modifier and compositional compound, cos(air,[air+port]), head composition (simHC) as the

cosine similarity between headand compositional compound, cos(port, [air+port]), and compound compositionality (simOC) as the cosine similarity between actual and compositional compound, cos(airport, [air+port]).

For more details, see the article (Günther & Marelli, 2018)