NLP Beyond Text

2nd International Workshop on Cross/Multi modal Natural Language Processing

co-located with The Web Conference 2021


  • The program of the workshop is out!

  • We're delighted to announce that we'll have 4 keynote speakers.

  • The list of accepted papers is out!

  • Submission Deadline extended

  • The First Call for Papers is out! Check it here.

  • NLPBT 2021 will be co-located with The Web Conference 2021!


Humans interact with each other through several means (e.g., voice, gestures, written text, facial expressions, etc.) and a natural human-machine interaction system should preserve the same modality (Kiela et al., 2018). However, traditional Natural Language Processing (NLP) focuses on analyzing textual input to solve language understanding and reasoning tasks, and other modalities are only partially targeted. This workshop aims to promote research in the area of Multi/Cross-Modal NLP, i.e., studying computational approaches exploiting the different modalities humans adopt to communicate. In particular, the focus of this workshop is (i) studying how to bridge the gap between NLP on spoken and written language (Chung et al., 2018; Elizalde et al., 2019) and (ii) exploring how NLU models can be empowered by jointly analyzing multiple input sources, including language (spoken or written), vision (gestures and expressions) and acoustic (paralingustic) modalities (Abouelenien et al., 2017; Madhysastha et al., 2018). The former comes from the observation that voice-based interaction, which is typical of conversational agents, poses new challenges to NLU. The latter aims to address the way humans acquire and use language. Usually, it happens in a perceptually rich environment (Evtimova et al., 2017), where they communicate using modalities that go beyond language itself. Therefore, extending NLP to modalities beyond written text is a fundamental step in allowing AI systems to reach human-like capabilities.

The workshop would seek papers focusing on relevant topics falling under the cross and multi modal NLP. Topics of interest include but are not limited to:

  • text preprocessing on ASR transcriptions (e.g., ASR error detection and correction);

  • cross-modal NLU from written text to speech transcription;

  • multi-modal sentiment analysis, emotion recognition and sarcasm detection;

  • multi-modal dialogue systems;

  • multi-modal machine translation;

  • multi-modal question answering.

Key Dates

All deadlines must be considered at 11.59pm GMT-12 (anywhere on Earth).

  • Submission Deadline: Jan 16th, 2021 Jan 23rd, 2021

  • Acceptance Notification: Feb 8th, 2021 Feb 15th, 2021

  • Camera-ready version: Feb 22th, 2021

  • Workshop: April 15th, 2021


Mohamed Abouelenien, Veronica Perez-Rosas, Rada Mihalcea, and Mihai Burzo. 2017. Multimodal gender detection. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17. ACM.

Katrina Evtimova, Andrew Drozdov, Douwe Kiela, and Kyunghyun Cho. 2017. Emergent language in a multi-modal, multi-step referential game. ArXiv, abs/1705.10369.

Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th ACL, Vancouver, Canada. Association for Computational Linguistics.

Douwe Kiela, Alexis Conneau, Allan Jabri, and Maximilian Nickel. 2018. Learning visually grounded sentence representations. In Proceedings of the 2018 NAACL. Association for Computational Linguistics.

Yu-An Chung, Wei-Hung Weng, Schrasing Tong, and James Glass. 2018. Unsupervised cross-modal alignment of speech and text embedding spaces. In Proceedings of the 32Nd NIPS. Curran Associates Inc.

Pranava Madhysastha, Josiah Wang, and Lucia Specia. 2018. The role of image representations in vision to language tasks. Natural Language Engineering, 24(3):415–439.

Benjamin Elizalde, Shuayb Zarar, and Bhiksha Raj. 2019. Cross modal audio search and retrieval with joint embeddings based on text and audio. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing.