Carina Silberer

Contact

firstname.lastname@ims.uni-stuttgart.de

Universität Stuttgart

Institut für Maschinelle Sprachverarbeitung

Pfaffenwaldring 5 b

70569 Stuttgart

Since July 2020, I am a Juniorprofessor (Assistant Professor) of Computational Linguistics at the IMS, University of Stuttgart.

Before that I was a postdoctoral researcher in the AMORE project of Gemma Boleda at the Universitat Pompeu Fabra in Barcelona, and, before, in Prof. Dr. Manfred Pinkals's group at the Department of Computational Linguistics & Phonetics and the MMCI Cluster of Excellence, Saarland University.

I obtained my PhD at the Institute for Language, Cognition and Computation (ILCC) at the School of Informatics (University of Edinburgh), working with Mirella Lapata, and did my Master's and Bachelor's degree in Computational Linguistics at the Department of Computational Linguistics (University of Heidelberg, Germany).

News

In September 2024 I gave a one-week course in Multimodal CL and NLP at the Computational Linguistics Fall School 2024 in Passau, Germany.
For September 2024, we were organising a second edition of the LIMO Workshop at KONVENS 2024 in Vienna on Linguistic Insights from and for Multimodal Language Processing, which took place at KONVENS in 2023 (LIMO2023).
January 2024: I received a positive interim evaluation of my junior professorship.
January 2024: I gave an invited lecture at the Training School on Representation Mediated Multimodality -- Grounded Representation, Reasoning, and Learning for Interactive Interpretation in Malta.

I am hiring! (together with Roman Klinger)

Application deadline is 10 May 2023

Two positions (one 3-year PhD student and one 1-year postdoc position) in multimodal emotion analysis. We want to understand how people communicate emotions in social media with images and texts (for instance on Reddit) and how they choose the modality.

This project is a collaboration with Roman Klinger [more information here]

Research

My research interests lie in the area of Natural Language Processing. My focus is on learning semantic models from text data using machine learning, and grounding language in vision by learning from multimodal data. The long-term goal of my research lies in understanding and modeling human language use to enable human-machine communication, interaction and instruction in and with the physical (real) world.

Publications

Google scholar profile

Christopher Bagdon, Aidan Combs, Carina Silberer, Roman Klinger. 2025. Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts. In Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). [pdf / bib] [project]
Sinan Kurtyigit, Diego Frassinelli, Carina Silberer, Sabine Schulte im Walde. 2025. A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds. In Findings of the Association for Computational Linguistics: ACL 2025. [pdf / bib] [project]
Esra Dönmez, Pascal Tilli, Hsiu-Yu Yang, Thang Vu, Carina Silberer. 2023. HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), 364-388. [pdf / bib]
Hsiu-Yu Yang, Carina Silberer. 2023. Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain. In Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 13th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2023), 846‑871. [pdf] [bib] [project]
Chong Shen, Carina Silberer. 2023. Combining Tradition with Modernness: Exploring Event Representations in Vision-and-Language Models for Visual Goal–Step Inference. In Proceedings of the 2023 Conference of the Association for Computational Linguistics: Student Research Workshop , 254-265. [pdf / bib]
Míriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giró-i-Nieto. 2023. A closer look at referring expressions for video object segmentation. In Multimedia Tools and Applications 82, 4419–4438. [pdf / bib]
Hsiu-Yu Yang, Carina Silberer. 2022. Are Visual-Linguistic Models Commonsense Knowledge Bases? In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 5542-5559. [pdf / bib] [data]
Anna Khlyzova, Carina Silberer, Roman Klinger. 2022. On the Complementarity of Images and Text for the Expression of Emotions in Social Media. In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, 1-15. [pdf / bib]
Silberer, C, S. Zarrieß, M. Westera, G. Boleda. 2020. Humans Meet Models on Object Naming: A New Dataset and Analysis. In Proceedings of the 28th International Conference on Computational Linguistics (COLING), 1893-1905. [bib] [pdf] [ManyNames dataset]
Silberer, C, S. Zarrieß, G. Boleda. 2020. Object Naming in Language and Vision: A Survey and a New Dataset. In Proceedings of the 12th International Conference on Language Resources and Evaluation, 5792-5801. [bib] [pdf] [ManyNames dataset]
AM Hererra-Palacio, C Ventura, C Silberer, IT Sorodoc, G Boleda, X Giro-i-Nieto. 2019. Recurrent Instance Segmentation using Sequences of Referring Expressions. ViGIL Workshop at NeurIPS 2019. Vancouver, Canada. [pdf] [suppl-pdf]
L Aina, C Silberer, IT Sorodoc, M Westera, G Boleda. 2019. What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3772–3783. Minneapolis, USA. [bib] [pdf]
Carina Silberer and Manfred Pinkal. 2018. Grounding Semantic Roles in Images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2616-2626. Brussels, Belgium. [bib] [pdf]
L Aina, C Silberer, IT Sorodoc, M Westera, G Boleda. 2018. AMORE-UPF at SemEval-2018 Task 4: BiLSTM with Entity Library, 65-69. New Orleans, USA. [pdf / bib]
Carina Silberer, Jasper Uijlings, Mirella Lapata. 2018. Understanding Visual Scenes. Natural Language Engineering 24 (3), 441-465. [pdf]
Carina Silberer, Vittorio Ferrari, and Mirella Lapata. 2017. Visually Grounded Meaning Representations. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (11), 2284-2297.
Carina Silberer. 2017. Grounding the Meaning of Words with Visual Attributes. In: Feris R., Lampert C., Parikh D. (eds) Visual Attributes. Advances in Computer Vision and Pattern Recognition. Springer, 331-362.
Carina Silberer. 2015. Learning Visually Grounded Meaning Representations. PhD thesis. [pdf]
Carina Silberer and Mirella Lapata. 2014. Learning Grounded Meaning Representations with Autoencoders. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 721-732. Baltimore, USA. [bib] [pdf]
Carina Silberer, Vittorio Ferrari, and Mirella Lapata. 2013. Models of Semantic Representation with Visual Attributes. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 572-582. Sofia, Bulgaria. [bib] [pdf]
Carina Silberer and Mirella Lapata. 2012. Grounded Models of Semantic Representation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 1423-1433. Jeju Island, Korea. [bib] [pdf]
Carina Silberer and Anette Frank. 2012. Casting Implicit Role Linking as an Anaphora Resolution Task. In Proceedings of the First Joint Conference on Lexical and Computational, 1-10. Montréal, Canada. [bib] [pdf]
Carina Silberer and Simone Paolo Ponzetto. 2010. UHD: Cross-lingual Word Sense Disambiguation Using Multilingual Co-occurrence Graphs. In Proceedings of the 5th International Workshop on Semantic Evaluation, 134-137. Uppsala, Sweden. [bib] [pdf]
W Wentland, J Knopp, C Silberer and M Hartung. 2008. Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration. In Proceedings of the Sixth International Language Resources and Evaluation, 3230-3237. Marrakech, Morocco. [bib] [pdf]

Resources

CAE

The Causal-Action-Effect dataset of the paper Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain (Yang & Silberer, 2023): [github repo]

CWWV_IMG and CWWV_CLIP

The commonsense knowledge probing datasets of the paper Are Visual-Linguistic Models Commonsense Knowledge Bases? (Yang & Silberer, 2022): [github repo]

HNC

The dataset containing Hard Negative Captions to assess visual-linguistic models on their comprehension capabilities of fine-grained linguistic phenomena, introduced in Dönmez et al. (2023): [github repo]

MMEmo Corpus

Multi-Modal Emotion Recognition Corpus of Reddit Posts introduced in On the Complementarity of Images and Text for the Expression of Emotions in Social Media (Khlyzova et al., 2022): [MMEmo]

ManyNames: Dataset with Names for Concrete Objects in Images

Dataset introduced in Object Naming in Language and Vision: A Survey and a New Dataset (Silberer et al., 2020a): [ManyNames dataset and explorer]

VisA: Dataset with Visual Attributes for Concepts

This dataset contains visual attribute annotations for over 500 concrete (animate and inanimate) concepts. All concepts are represented in ImageNet and the feature production norms of McRae et al. (2005).

Each concept is annotated with visual attributes based on a taxonomy of 636 attributes.

See Silberer et al. (2013, 2017) for details.

The download consists of a number of XML files, one per higher-level category (e.g., vehicles, animals): Download [.zip]

Semantic and Visual Similarity Judgements for Concept Pairs

This dataset contains similarity judgements for 7,576 word pairs representing 500 concrete basic-level concepts (the same ones found in ViSA). All concepts are in ImageNet and the feature production norms of McRae et al. (2005). If you need superordinate categories for the basic-levels concepts, see the VisA dataset above.

Each concept occurs in approximately 30 pairs. Similarity ratings were obtained using Amazon Mechanical Turk. Participants were asked to rate a word pair on two dimensions, visual and semantic similarity using a Likert scale of 1 (highly dissimilar) to 5 (highly similar).

See Silberer & Lapata (2014) for details.

The download consists of tsv file listing all concept pairs and mean semantic and visual similarity ratings in one column each.

similarity judgements [.tsv] (WordNet sense numbers: pairs_sensenums.tsv)

Google Sites

Report abuse