Invited Talk Abstracts

Mehrnoosh Sadrzadeh: Ellipsis in Compositional Distributional Semantics

Ellipsis is a natural language phenomenon where part of a sentence is missing and its information must be recovered from its surrounding context, as in “Cats chase dogs and so do foxes.”. Formal semantics offers different methods for resolving ellipsis and recovering the missing information, but the problem has not been considered for distributional semantics, where words have vector embeddings and combinations thereof provide embeddings for sentences. In elliptical sentences these combinations go beyond linear as copying of elided information is necessary. I will talk about recent results in our NAACL 2019 paper, joint with G. Wijnholds, where we develop different models for embedding VP-elliptical sentences using modal sub-exponential categorial grammars. We extend existing verb disambiguation and sentence similarity datasets to ones containing elliptical phrases and evaluate our models on these datasets for a variety of linear and on-linear combinations. Our results show that indeed resolving ellipsis improves the performance of vectors and tensors on these tasks and it also sheds some light on disambiguating their sloppy and strict readings.

Ellie Pavlick: What should constitute natural language "understanding"?

Natural language processing has become indisputably good over the past few years. We can perform retrieval and question answering with purported super-human accuracy, and can generate full documents of text that seem good enough to pass the Turing test. In light of these successes, it is tempting to attribute the empirical performance to a deeper "understanding" of language that the models have acquired. Measuring natural language "understanding", however, is itself an unsolved research problem. In this talk, I will discuss recent work which attempts to illuminate what it is that state-of-the-art models of language are capturing. I will describe approaches which evaluate the models' inferential behavior, as well as approaches which rely on inspecting the models' internal structure directly. I will conclude with results on human's linguistic inferences, which highlight the challenges involved with developing prescriptivist language tasks for evaluating computational models.

Raffaella Bernardi: Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat

The development of conversational agents that ground language into visual information is a challenging problem that requires the integration of dialogue management skills with multimodal understanding. Recently, visual dialogue settings have entered the scene of the Machine Learning and Computer Vision communities thanks to the construction of visually grounded human-human dialogue datasets against which Neural Network models (NNs) have been challenged.

I will present our work on GuessWhat?! in which two NN agents interact to each other so that one of the two (the Questioner), by asking questions to the other (the Answerer), can guess which object the Answerer has in mind among all the entities in a given image (GuessWhat?!). I will present our Questioner model: it encodes both visual and textual inputs, produces a multimodal representation, generates natural language questions, understands the Answerers' responses and guesses the object. I will compare our model's dialogues with models that exploit much more complex learning paradigms, like Reinforcement Learning, showing that more complex machine learning methods do not necessarily correspond to better dialogue quality or even better quantitative performance.

The talk is based on work available at https://vista-unitn-uva.github.io/