Workshop Schedule

  • 09.30-09.45 Welcome and Opening Remarks
  • 09.45-10:30 Invited talk 1: Marco Baroni
  • 10.30–11.00 Coffee Break
  • 11.00-11.45 Invited talk 2: Mohit Bansal
  • 11.45-12.30 Invited talk 3: Raquel Fernandez
  • 12.30-14.00 Lunch
  • 14.00-14.45 Invited talk 4: Yulia Tsvetkov
  • 14.45-15.00 Outstanding Papers Spotlight Presentations
  • 15.00-16.30 Poster Session (including Coffee Break from 15:30-16:00) + Drinks Reception
  • 16.30–17.30 Panel Discussion
  • 17.30–17.40 Closing Remarks + Best Paper Awards Announcement

Workshop slides (including panel discussion)

Marco Baroni received a PhD in Linguistics from the University of California, Los Angeles, in the year 2000. After several experiences in research and industry, he joined the Center for Mind/Brain Sciences of the University of Trento, where he became associate professor in 2013. In 2016, Marco joined the Facebook Artificial Intelligence Research team. In 2019, he became ICREA research professor, affiliated with the Linguistics Department of Pompeu Fabra University in Barcelona. Marco's work in the areas of multimodal and compositional distributed semantics has received widespread recognition, including a Google Research Award, an ERC Starting Grant and the ICAI-JAIR best paper prize. Marco's current research focuses on a better understanding of artificial neural networks, focusing in particular on what they can teach us about human language acquisition and processing.

Talk: Language emergence as representation learning

Human language is a powerful code representing objects, facts and concepts for the purpose of exchanging information with others and to support one's own thinking. In my talk, I would like to explore this representational perspective on language, by discussing current work on language emergence among deep neural network agents that have to jointly solve a task. Recent findings suggest that the language-like code developed by such agents both differs from and resembles natural language in interesting ways. For example, the emergent code does not naturally represent general concepts, but rather very specific invariances in the perceptual input, and it is not an efficient representational system in the way human language is. On the other hand, emergent communication between deep agents is subject to an information minimization pressure that is also present in human language, and that provides a beneficial form of representation regularization. I will conclude by discussing the implications that these results have both for our understanding of natural language and for the development of language-endowed artificial agents.

Raquel Fernández is Associate Professor at the Institute for Logic, Language and Computation (ILLC), University of Amsterdam, where she leads the Dialogue Modelling Group. Her work and interests revolve around language use, encompassing topics that range from computational semantics and pragmatics to the dynamics of dialogue interaction, visually grounded language processing, and child language acquisition. She received her PhD from King's College London and has held research positions at the University of Potsdam and at CSLI, Stanford University. She has been awarded several prestigious personal grants by the Netherlands Organisation for Scientific Research (NWO) and is a recent recipient of an ERC Consolidator Grant.

Talk: Representations shaped by dialogue interaction (Slides)

When we use language to communicate with each other in conversation, we build an internal representation of our evolving common ground. Traditionally, in dialogue systems this is captured by an explicit dialogue state defined a priori. Can we develop dialogue agents that learn their own (joint) representations? How are these representations shaped by interaction? In this talk, I will discuss recent research in my group aimed at addressing these questions, with a focus on visually grounded scenarios. I will start by discussing how visual grounding can act as a testbed for task-oriented neural dialogue models that learn their own representations. Most current work in the field focuses on reporting numeric results solely based on task success. I will argue that we can gain more insight by (i) analysing the linguistic output of alternative systems and (ii) probing the representations they learn.

Dr. Mohit Bansal is the Director of the UNC-NLP Lab ( and an asst. professor in CS at UNC Chapel Hill. Prior to this, he was a research asst. professor at TTI-Chicago. He received his PhD from UC Berkeley (with Dan Klein) and BTech from IIT Kanpur. His research expertise is in statistical natural language processing and machine learning, with a particular focus on multimodal, grounded, and embodied semantics (i.e., language with vision and speech, for robotics), human-like language generation and Q&A/dialogue, and interpretable and generalizable deep learning. He is a recipient of the 2019 Google Focused Research Award, 2018 ARO Young Investigator Award, 2017 DARPA Young Faculty Award, 2017 ACL Outstanding Paper Award, 2014 ACL Best Paper Award Honorable Mention, and 2018 COLING Area Chair Favorites Award. He is serving as the Program Co-chair for CoNLL 2019. Webpage:

Talk: Knowledgeable and Adversarially-Robust Representation Learning (Slides)

In this talk, I will discuss work on knowledgeable and robust representation learning for natural language processing and generation. First, I will describe our past work on incorporating syntactic, paraphrastic, and multilingual knowledge into embeddings for language forms of varyious granularities. Next, I will present our recent multi-task learning and reinforcement learning methods that harness auxiliary knowledge-skill tasks such as entailment, saliency, and video generation, and self-learn their optimal curriculum as well as the choice of auxiliary tasks. We will also discuss models that learn to fill reasoning gaps in multi-hop generative-QA using external commonsense knowledge. Lastly, we will discuss how to analyze NLP models' adversarial failures and how to move towards making their representations more adversarially-robust against, e.g., span distractors in QA, reasoning shortcuts in multi-hop QA, over-sensitivity and over-stability in dialogue models, and compositionality-insensitivity in NLI models.

Yulia Tsvetkov is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. Her research interests lie at or near the intersection of natural language processing, machine learning, linguistics, and social science. Her current research projects focus on language generation, multilinguality, automated negotiation, and NLP for social good. Prior to joining LTI, Yulia was a postdoc in the department of Computer Science at Stanford University; she received her PhD from Carnegie Mellon University.

Talk: Modeling Output Spaces in Continuous-Output Language Generation (Slides)

The softmax layer is used as the output layer of nearly all existing models for language generation. However, it is the computational bottleneck of these systems: the slowest layer to compute, with a huge memory footprint, which constrains output vocabularies to discrete, often uninterpretable, semantically-unrelated tokens. In this talk I'll introduce continuous-output generation—a general modification to the seq2seq models for generating text sequences which does away with the softmax layer, replacing it with an embedding layer. I will present an exploration of objective functions to generate word embeddings, approaches to generating phrase embeddings and approaches to modeling representations of morphologically-rich output spaces, with applications in machine translation. In the second part of my talk, departing from machine translation, I'll describe our ongoing work on enabling generative adversarial networks (GANs) for text, a notoriously hard task specifically due to the softmax layer. Generating word embeddings instead of softmax distribution makes GANs for text generation end-to-end differentiable and the training more stable. Unlike softmax-based GANs which rely fully on the language modeling power, continuous-output GANs generate better quality and diverse text when trained from scratch, without pretraining.