May 24th Timetable

9.00-10.00: Keynote Speaker 1: Gemma Boleda

10.00-10.30: Lightning Talks (I) - see order below

10.30-11.30: Poster Session (I) and Coffee Break 

11.30-12.30: Keynote Speaker 2: Raquel Fernandez

12.30-14.00: Lunch

14.00 - 15.00: Keynote Speaker 3:  Julie Weeds

15.00 - 15.30: Lightning Talks (II) - see order below

15.30 - 16.30: Poster Session (II) and Coffee Break 

16.30 - 17.00: Panel Discussion

Titles and Abstracts for Keynote Talks

1- Talking about you: Deep Learning models of linguistic reference. Gemma Boleda

Abstract. We use language to talk about things. For instance, within the TV series Friends, "the brother of Monica Geller" can be used to refer to the character Ross Geller. Yet, most computational work on language lacks this connection to the reality that it is about: in Machine Translation, we get for instance "le frère de Monica Geller", with no link to the person it refers to. Instead, we want to model language in context. Our hypothesis is that jointly learning to represent language and the entities referred to will improve computational models of both. I will report on ongoing research testing this hypothesis with Deep Learning models. 

2- Analysing Language in Use with Vector Representations. Raquel Fernandez

Abstract. Distributed vector representations have become ubiquitous in computational semantics. Yet their application to aspects related to language interaction and dialogue is still limited. In this talk, I will present recent work on two lines of research connected to this. In the first part, I will focus on lexical meaning within online communities of practice. In the second part, I will zoom into dyadic interaction and present a case study on using Recurrent Neural Networks on synthetic dialogue data to investigate incremental language understanding.

3- On Composition vs Contextualisation of Distributed Word Representations: What’s the Point of Sentence Representations? Julie Weeds

Abstract. It is now common practice to represent words as points in some high dimensional space where proximity in the space is taken to correlate with similarity in usage and, following the distribution hypothesis (Harris, 1954), often also in meaning.   Researchers have also long been interested in how such representations can be combined to form representations of larger units of meaning including phrases, sentences and documents.  However, despite a large body of work in this area, at many benchmark evaluation tasks, it has been hard to beat simple models of composition which typically add, or sometimes multiply word representations (Mitchell and Lapata, 2010).  There have been recent advances in neural methods (Peters et al., 2018; Devlin et al. 2018) where the representation of a word is first effectively contextualised given the sentence in which it appears.  However, typically, the final representation of a sentence is still related to the sum of word, or other intermediate layer, embeddings.

In this talk, I will address the question of whether it makes sense to represent a sentence as a single point in space or whether it is better represented as a collection of contextualised points.  The answer is of course application dependent and we must turn to tasks in discourse and dialogue analysis to discover which is the best approach.  I will highlight the dangers of reducing sentence meaning to a single point in space and also discuss tasks where contextualisation rather than composition is crucial.   I will also ask how we can better contextualise words for use in sentence representations and subsequent discourse and dialogue analysis.  I will demonstrate the importance of syntax sensitivity and discuss the growing body of work on dependency-aware embeddings, including our own efforts to create a space which enables syntax-sensitive contextualisation based on the Anchored Packed Tree (APT) framework (Weir et al., 2016).

Titles and Abstracts for Lightning Talks/Poster Sessions

The Lightning talks (max 5 minutes each) will be presented in the order below in each session, and the following poster sessions will have the posters corresponding to those talks:

Lightning Talks/Poster Session I

1- High-dimensional distributed semantic spaces for utterances. Jussi Karlgren (Gavagai and KTH Royal Institute of Technology). 

Abstract. High-dimensional distributed semantic spaces have proven useful and effective for aggregating and processing visual, auditory, and lexical information for many tasks related to human-generated data. The model proposed here represents both lexical and structural linguistic items in a common framework. This framework is a high-dimensional representation for utterance and text level data based on a mathematically principled and behaviourally plausible approach, which allows the configurations that a lexical item is observed in to be treated similarly to the lexical data. The implementation of the representation is a straightforward extension of Random Indexing models previously used for lexical linguistic items. The model is computationally habitable, and suitable as a bridge between symbolic representations such as dependency analysis and continuous representations such as classifiers or further machine-learning approaches.


2- Incremental Semantic Judgements. Mehrnoosh Sadrzadeh (Queen Mary University of London), Matthew Purver (Queen Mary University of London), Gijs Wijnholds (Queen Mary University of London), Julian Hough (Queen Mary University of London) and Ruth Kempson (King's College London). 

Abstract. We show how one develops a structure-preserving vector space semantic mapping for Dynamic Syntax trees. In order to do so, we use the tensor contraction mechanisms used in compositional distributional semantics, e.g. by Maillard et al. We then implement a word-by-word incremental vector semantics on a verb disambiguation task. The results show that as the context increases, i.e. as we increment the utterances from subject to subject-verb to subject-verb-object, their ambiguous verbs disambiguate better. The results also show that the copy-subject tensor model works best.


3- CoDE: Learning Composable Dependency Embeddings. Lorenzo Bertolini (University of Sussex), Julie Weeds (University of Sussex) and David Weir (University of Sussex). 

Abstract. Through the years, a restricted number of authors have tackled the non-trivial problem of encoding syntactic information in distributional representations by injecting dependency-relation knowledge directly into word embeddings. Although such representations should bring a clear advantage in complex representations, such as at the phrasal and sentence level, these models have been tested mainly through word-word similarity benchmarks or with rich neural architecture. Outside the embeddings' domain, the APT model has offered an effective resource for modelling compositionality via syntactic contextualization. In this work, we present a novel model, built on top of GloVe, to reduce APT representations to low-dimensionality dense dependency-based vectors, that showcase APT-like composition ability. We then propose a detailed investigation of the nature of these representations, as well as their usefulness and contribution in semantic composition.


4- Neural dialogue act recognition with transformer pre-training. Bill Noble (University of Gothenburg) and Vladislav Maraev (University of Gothenburg). 

Abstract. BERT, a multi-layer attention-based transformer, uses language model pre-training to achieve state of the art results on a variety of NLP tasks. To assess its potential for dialogue applications, we propose a series of dialogue act recognition experiments with various utterance encoders, including BERT.


5- Why natural language models must be partial and shifting: a Dynamic Syntax with Vector Space Semantics perspective. Ruth Kempson (King's College London), Julian Hough (Queen Mary University of London), Christine Howes (University of Gothenburg), Matthew Purver (Queen Mary University of London), Patrick Healey (Queen Mary University of London), Arash Eshghi (Heriot-Watt University) and Eleni Gregoromichelaki (Dusseldorf University). 

Abstract. This paper brings together Dynamic Syntax (DS) and Vector Space semantics (VSS) with current work in cognitive neuroscience and evolution to argue that natural language (NL) models need to be defined in partial and shifting terms. We argue that by combining process-oriented language perspectives from DS and the open-endedness of meaning from VSS, the possibility of developing an integrated account of language behaviour and how it might have emerged becomes a more nearly realisable goal.

6- Using neurophysiological features for learning truth-conditional lexical meanings. Henk Zeevat (University of Amsterdam/Heinrich Heine University, Duesseldorf). 

Abstract. The paper sketches a path under exploration that might lead from flat text to symbolic descriptions of lexical meaning. It offers an alternative to vector based representations by using these same vectors to abduce symbolic representations. The path starts from the observed similarities between the primitives of frame semantics and the semantic features that can be located by neuro-imaging on brain areas in Binder et al (2012). Associations with Binder features can be estimated well from flat text and for each of the Binder features it is possible to find a frame template that explains the association with the Binder feature for a particular class of words. The method seems to perform well on first attempts, but it is obvious that it needs further techniques to become comparable with human knowledge of concepts.

Lightning Talks/Poster Session II

7- Evaluation of sentence embeddings transformations for estimating translation editing effort with Gaussian kernels. Ibai Roman (EHU/UPV), Roberto Santana (University of the Basque Country), Alexander Mendiburu (The University of the Basque Country) and Jose A. Lozano (The University of the Basque Country). 

Abstract. We focus on a practical NLP regression problem that is related to automatic translation of texts. In this domain, post-editing work is frequently required, and an estimation of the cost of the editing process (in terms of time, effort, and editing distance) is essential. Instead of manually defining the features, as in previous work, our approach relies on sentence embeddings, vector representations of the source and the automatically translated texts, to predict the post-editing effort that leads to the final text. Effort prediction is made by using Gaussian Processes (GP) with different choices of the kernel.


8- Discourse Complexity in Categorical Compositional Relational Semantics. Alexis Toumi (University of Oxford). 

Abstract. Categorical compositional distributional (DisCoCat) models give a semantics to sentences by turning their syntactic structure into a linear map for composing vector representations of words. We look at the variant RelCoCat (categorical compositional relational) models, where linear maps are replaced by relations, and show how it yields a Montague semantics for the fragment of natural language expressible in regular logic, which underlies conjunctive queries in database theory. RelCoCat allows to go beyond the sentence boundary into the realm of large-scale discourse: we use tools from query complexity to study the asymptotic behaviour of natural language processing. We show how question answering and text summarisation reduce to conjunctive query containment and minimisation respectively: they are in fact NP-complete. We discuss two ways of taming the complexity of discourse semantics. The first is to restrict the structure of our discourse to tractable fragments of conjunctive queries such as bounded tree-width, which is well-motivated from a cognitive perspective. The second is to consider probabilistic databases and relax question answering to an approximation problem, this yields a principled way of going from Boolean to distributional semantics.


9- The Influence of Semantic and Syntactic Tags on Sentence Acceptability Judgments. Adam Ek (University of Gothenburg), Jean-Philippe Bernardy (University of Gothenburg) and Shalom Lappin (University of Gothenburg). 

Abstract. In this paper, we investigate the effect of enhancing LSTM language models (LM) with syntactic and semantic tags, whose vectors are combined with the lexical embedding vectors of the words to which they are assigned. We evaluate the effect on language modeling by comparing the enhanced LM's perplexity to that of a plain LM. Additionally, we use LMs to predict sentence acceptability judgments. The results show that syntactic tags lower the perplexity in LMs while semantic tags increase the perplexity. We also show that neither syntactic or semantic tags improve the sentence acceptability predictions compared to human judgments.


10- Towards a multimodal vector analysis for verbal and non-verbal data. Saba Nazir (Queen Mary University of London), Mehrnoosh Sadrzadeh (Queen Mary University of London), Julian Hough (Queen Mary University of London) and Patrick Healey (Queen Mary University of London). 

Abstract. We provide an audio-text vectorial analysis of a toy dataset of contents of BBC programmes and an audio vectorial analysis of laughter with the future goal of fusing the latter with text and employing to derive dialogue semantics.


11- Vectors Under Discussion. Matthew Purver (Queen Mary University of London), Mehrnoosh Sadrzadeh (Queen Mary University of London) and Julian Hough (Queen Mary University of London). 

Abstract. We propose a vector-space approach to the Questions Under Discussion (QUD) model of Ginzburg (2012), which promises to fill a gap in recent implementations (Maraev et al, 2018) by providing a direct measure of question resolution while allowing fine-grained encoding of expectations about the answer.


12- Evaluating Composition Models for VP-Elliptical Sentence Embeddings. Gijs Wijnholds (Queen Mary University of London) and Mehrnoosh Sadrzadeh (Queen Mary University of London). 

Abstract. Ellipsis is a natural language phenomenon where part of a sentence is missing and its information must be recovered from its surrounding context, as in ``Cats chase dogs and so do foxes.". Formal semantics has different methods for resolving ellipsis and recovering the missing information, but the problem has not been considered for distributional semantics, where words have vector embeddings and combinations thereof provide embeddings for sentences. In elliptical sentences these combinations go beyond linear as copying of elided information is necessary. In this paper, we develop different models for embedding VP-elliptical sentences. We extend existing verb disambiguation and sentence similarity datasets to ones containing elliptical phrases and evaluate our models on these datasets for a variety of non-linear combinations and their linear counterparts. We compare results of these compositional models to state of the art holistic sentence encoders. Our results show that non-linear addition and a non-linear tensor-based composition outperform the naive non-compositional baselines and the linear models, and that sentence encoders perform well on sentence similarity, but not on verb disambiguation.


13- A Frobenius Algebraic Analysis for Parasitic Gaps. Michael Moortgat (Utrecht University), Mehrnoosh Sadrzadeh (Queen Mary University of London) and Gijs Wijnholds (Queen Mary University of London). 

Abstract. We provide syntactic types in the language of modal Lambek Calculus and semantic types in vector spaces with Frobenius algebras to be able to present a vector space semantics for parasitic gapping. To our knowledge, this is the first time that parasitic gaps have been treated in a vector semantic setting. We develop string diagrams, category-theoretic Frobenius algebraic types, and normal and closed forms for this phenomena. Experimenting with the results is left to future work.