Carina Silberer


Universität Stuttgart 

Institut für Maschinelle Sprachverarbeitung 

Pfaffenwaldring 5 b 

70569 Stuttgart 

Since July 2020, I am a Juniorprofessor (Assistant Professor) of Computational Linguistics at the IMS, University of Stuttgart.

Before that I was a postdoctoral researcher in the AMORE project of Gemma Boleda at the Universitat Pompeu Fabra in Barcelona, and, before, in Prof. Dr. Manfred Pinkals's group at the Department of Computational Linguistics & Phonetics and the MMCI Cluster of Excellence, Saarland University. 

I obtained my PhD at the Institute for Language, Cognition and Computation (ILCC) at the School of Informatics (University of Edinburgh), working with Mirella Lapata, and did my Master's and Bachelor's degree in Computational Linguistics at the Department of Computational Linguistics (University of Heidelberg, Germany).


Application deadline is 10 May 2023 

Two positions (one 3-year PhD student and one 1-year postdoc position) in multimodal emotion analysis. We want to understand how people communicate emotions in social media with images and texts (for instance on Reddit) and how they choose the modality. 

This project is a collaboration with Roman Klinger [more information here]


My research interests lie in the area of Natural Language Processing. My focus is on learning semantic models from text data using machine learning, and grounding language in vision by learning from multimodal data. The long-term goal of my research lies in understanding and modeling human language use to enable human-machine communication, interaction  and instruction in and with the physical (real) world


Google scholar profile



The Causal-Action-Effect dataset of the paper Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain (Yang & Silberer, 2023): [github repo]



The dataset containing Hard Negative Captions to assess visual-linguistic models on their comprehension capabilities of fine-grained linguistic phenomena, introduced in Dönmez et al. (2023):  [github repo]

MMEmo Corpus

Multi-Modal Emotion Recognition Corpus of Reddit Posts introduced in On the Complementarity of Images and Text for the Expression of Emotions in Social Media (Khlyzova et al., 2022): [MMEmo]

ManyNames: Dataset with Names for Concrete Objects in Images

Dataset introduced in Object Naming in Language and Vision: A Survey and a New Dataset  (Silberer et al., 2020a): [ManyNames dataset and explorer] 

See also Silberer et al., 2020b 

VisA: Dataset with Visual Attributes for Concepts 

This dataset contains visual attribute annotations for over 500 concrete (animate and inanimate) concepts. All concepts are represented in ImageNet and the feature production norms of McRae et al. (2005). 

Each concept is annotated with visual attributes based on a taxonomy of 636 attributes. 

See Silberer et al. (2013, 2017) for details. 

The download consists of a number of XML files, one per higher-level category (e.g., vehicles, animals): Download [.zip]

Semantic and Visual Similarity Judgements for Concept Pairs

This dataset contains similarity judgements for 7,576 word pairs representing 500 concrete basic-level concepts (the same ones found in ViSA). All concepts are in ImageNet  and the feature production norms of McRae et al. (2005).  If you need superordinate categories for the basic-levels concepts, see the VisA dataset above.

Each concept occurs in approximately 30 pairs. Similarity ratings were obtained using Amazon Mechanical Turk. Participants were asked to rate a word pair on two dimensions, visual and semantic similarity using a Likert scale of 1 (highly dissimilar) to 5 (highly similar). 

 See Silberer & Lapata (2014) for details. 

The download consists of tsv file listing all concept pairs and mean semantic and visual similarity ratings in one column each.

similarity judgements [.tsv] (WordNet sense numbers: pairs_sensenums.tsv)