Proceedings

Morning Talks

10:00 — Semantic Matching Against a Corpus: New Applications and Methods — (Withheld by author request)

In this paper, we exploit models of natural language entailment in a new way: matching a natural language proposition against a corpus. If successful, this use case could lead to new modes of interaction with text corpora, of particular interest to domain experts who wish to explore the extent to which an idea is expressed in text. We apply semantic matching methods to two domains (framing of policy issues and disaster recovery), and demonstrate the viability of a simple word-vector-averaging method in one case and the benefits of a syntax-based method in the other. Our user study confirms that semantic matching is effective and that there are potential users for this kind of application.

10:20 — Synthetic and Natural Noise Both Break Neural Machine Translation — Yonatan Belinkov and Yonatan Bisk.

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

10:40 — Syntactic Scaffolds for Semantic Structures — (Withheld by author request)

We introduce the syntactic scaffold, a framework to incorporate syntactic information into semantic tasks. Our approach avoids expensive syntactic processing at runtime, only making use of a treebank during training through a multitask objective. We evaluate the method on two tasks -frame-semantic role labeling and coreference resolution. We find that it achieves significant improvements over strong baselines, achieving a new state of the art for both tasks. Our code will be released as open source upon publication.

Short Talks

11:20 — Compositional Language Modeling for Icon-Based Augmentative and Alternative Communication — Shiran Dudy and Steven Bedrick.

Icon-based communication systems are widely used in the field of Augmentative and Alternative Communication. Typically, icon-based systems have lagged behind word- and character-based systems in terms of predictive typing functionality, due to the challenges inherent to training icon-based language models. We propose a method for synthesizing training data for use in icon-based language models, and explore two different modeling strategies.

11:32 — Keep your bearings: Lightly-supervised Information Extraction with Ladder Networks that avoids Semantic Drift — Ajay Nagesh and Mihai Surdeanu.

We propose a novel approach to semi-supervised learning for information extraction that uses ladder networks (Rasmus et. al., 2015). Our approach is simple, efficient and has the benefit of being robust to semantic drift, a dominant problem in most semi-supervised learning systems. We empirically demonstrate the superior performance of our system compared to the state-of-the-art on two standard datasets for named entity classification. We obtain around 62% and 200% improvement over the state-of-art baseline on these two datasets.

11:44 — Infrequent Discourse Relation Identification Using Data Programming — Xing Zeng, Giuseppe Carenini, Raymond Ng and Hyeju Jang.

Discourse parsing is an important task in natural language processing as it supports a wide range of downstream NLP task. While the overall performance of discourse parsing has been recently improved considerably, the performance on relatively infrequent discourse relations is still rather low. To resolve this gap between this performance between infrequent and frequent relations, we propose a novel method for discourse relation identification that builds a simple neural network model employing “a paradigm for the programmatic creation of training datasets,” called Data Programming (DP)

11:56 — Community Member Retrieval on Social Media using Textual Information — Aaron Jaech, Shobhit Hathi and Mari Ostendorf.

This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community. The solution introduces an unsupervised proxy task for learning user embeddings: user re-identification. Experiments with 16 different communities show that the resulting embeddings are more effective for community membership identification than common unsupervised representations.

12:08 — Semantic similarity of conversational speech between children with and without ASD — Joel Adams and Alexandra Salem.

We apply distributional semantic models to consider the similarity of conversational speech between children with Autism Spectrum Disorder (ASD) and typically developing (TD) subjects. We start by projecting transcripts of ADOS administrations into a 300-dimension semantic vector space. We then take an empirical approach of considering each TD transcript in turn as a “gold standard” or ideal representation of the ADOS. We then examine the variability of semantic similarity between this “gold standard” and all other transcripts. We find that in general, for any selection of a “gold standard”, TD subjects use language that is more semantically similar to it than that of ASD subjects. In most cases the difference in similarity scores is sufficient to establish diagnostic group differences.

Afternoon Talks 1

14:30 — Annotation Artifacts in Natural Language Inference Data — Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman and Noah Smith.

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et.al 2015) and 53% of MultiNLI (Williams et. al, 2018). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

14:50 — Simulating Action Dynamics with Neural Process Networks — Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox and Yejin Choi.

Understanding procedural language requires anticipating the causal effects of actions, even when they are not explicitly stated. In this work, we introduce Neural Process Networks to understand procedural text through (neural) simulation of action dynamics. Our model complements existing memory architectures with dynamic entity tracking by explicitly modeling actions as state transformers. The model updates the states of the entities by executing learned action operators. Empirical results demonstrate that our proposed model can reason about the unstated causal effects of actions, allowing it to provide more accurate contextual information for understanding and generating procedural text, all while offering more interpretable internal representations than existing alternatives.

Poster Room A

A1 — Lexicalized Reordering for Left-to-Right Hierarchical Phrase-based Translation — Maryam Siahbani and Anoop Sarkar.

Phrase-based and hierarchical phrase-based (Hiero) translation models differ radically in the way reordering is modeled. Lexicalized reordering models play an important role in phrase-based MT and such models have been added to CKY-based decoders for Hiero. Watanabe et al. (2006) proposed a promising decoding algorithm for Hiero (LR-Hiero) that visits input spans in arbitrary order and produces the translation in left to right (LR) order which leads to far fewer language model calls and leads to a considerable speedup in decoding. We introduce a novel shift-reduce algorithm to LR-Hiero to decode with our lexicalized reordering model (LRM) and show that it improves translation quality for Czech-English, Chinese-English and German-English.

A2 — Simultaneous Translation using Optimized Segmentation — Maryam Siahbani, Hassan S. Shavarani, Ashkan Alinejad and Anoop Sarkar.

Previous simultaneous translation approaches either use a separate segmentation step followed by a machine translation decoder or rely on the decoder to segment and translate without training the segmenter to minimize delay or increase translation quality. We integrate a segmentation model and an incremental decoding algorithm to create an automatic simultaneous translation framework. Oda et al. (2014) propose a method to provide annotated data for sentence segmentation. This work uses this data to train a segmentation model that is integrated with a novel simultaneous translation decoding algorithm. We show that this approach is more accurate than previously proposed segmentation models when integrated with a translation decoder. Our results on the speech translation of TED talks from English to German show that our system can achieve translation quality close to the offline translation system while at the same time minimizing the delay in producing the translations incrementally. Our approach also outperforms other comparable simultaneous translation systems in terms of translation quality and latency.

A3 — Joint Prediction of Word Alignment with Alignment Types — Anahita Mansouri Bigvand, Te Bu and Anoop Sarkar.

Current word alignment models do not distinguish between different types of alignment links. In this paper, we provide a new probabilistic model for word alignment where word alignments are associated with linguistically motivated alignment types. We propose a novel task of joint prediction of word alignment and alignment types and propose novel semi-supervised learning algorithms for this task. We also solve a sub-task of predicting the alignment type given an aligned word pair. In our experimental results, the generative models we introduce to model alignment types significantly outperform the models without alignment types.

A4 — GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition — Golnar Sheikhshab, Elizabeth Stark, Aly Karsan, Anoop Sarkar and Inanc Birol.

The rapidly growing amount of research papers in computational biology makes it difficult for researchers to keep up to date on new results. The motivation behind this paper is to use natural language processing to automatically understand relevant concepts from the large amount of text data in published papers in computational biology. We focus on the gene mention detection task that allows us to identify which genes are being discussed in each paper making it possible to search for concepts like genes rather than searching on words.In this paper we introduce GraphNER, a semi-supervised machine learning model for named entity recognition (NER). In particular, we use GraphNER to identify gene mentions in natural language data such as biomedical papers. It combines training data where the gene mentions are identified by human experts with unlabelled data that contains many other relevant gene mentions, but which have not been identified by human experts. The labeled and unlabeled data are linked together using similarities between n-grams that occur in the two data sources (an n-gram is a contiguous sequence of n words in the text). GraphNER uses the information gleaned from this graph,and combines it with a conditional random field (CRF) model for NER. We consider two different CRF-based NER systems on two different datasets combined with our graph model for semi-supervised learning for the task of gene mention detection.We show that GraphNER consistently improves the overall quality of gene mention detection due to its higher precision. GraphNER is freely available at http://www.bcgsc.ca/platform/bioinfo/software/graphne

A5 — Analysis of Social Media Texts Concerning Climate Change and Sustainable Development — Lydia Odilinye, Fred Popowich, Bdour Alzeer, Brie Hoffman, Volodymyr Kozyr and Chelsea Li.

We investigate natural language processing techniques for analyzing social media data related to climate change and sustainable development strategies. A collection of content analysis and sentiment analysis methods are used to identify and categorize the text, as well as opinions expressed in various social media accounts. The social media platforms used in the study include Twitter, Facebook, and Instagram written in both English and in French. The work also identifies key environmental or sustainability issues concerning stake-holders.

A6 — Exploring Discourse Coherence Features for Dementia Detection — Hyeju Jang, Vaden Masrani, Giuseppe Carenini, Raymond Ng, Gabriel Murray and Thalia Field.

In this paper, we propose novel discourse related features for dementia prediction, specifically focusing on discourse coherence. Discourse coherence "concerns the ways in which the components of the textual world, i.e. the configuration of concepts and relations which underline the surface text, are mutually accessible and relevant" (Beaugrande and Dressler, 1981). Previous literature has shown that people with dementia tend to have problems with discourse coherence including impairment in global coherence, disruptive topic shift, frequent use of filler phrases, and less use of connective words. Among them, we address the problem of global coherence and topic shift. Our analysis is ongoing. We plan to present experimental results and analysis either in the final version of our paper or at the workshop itself.

A7 — Partial Email Thread Summarization: Conversational Bayesian Surprise and Silver Standards — Jordon Johnson, Vaden Masrani, Giuseppe Carenini and Raymond Ng.

We define and motivate the problem of summarizing partial email threads. This problem introduces the challenge of generating reference summaries for partial threads when human annotation is only available for the threads as a whole, particularly when the human-selected sentences are not uniformly distributed within the threads. We propose an oracular algorithm for generating these reference summaries with arbitrary length. In addition, we apply a recent unsupervised method based on Bayesian Surprise that incorporates background knowledge into partial thread summarization, extend it with conversational features, and modify the mechanism by which it handles redundant information. Experiments with our method indicate improved performance over the baseline for shorter partial threads; and our results suggest that the potential benefits of background knowledge to partial thread summarization should be further investigated with larger datasets. Ongoing work includes evaluating a new candidate algorithm for generating reference summaries for partial email threads, as well as evaluating partial thread summaries over an additional dataset.

A8 — Mapping Distributional Semantics to Property Norms with Neural Networks — Dandan Li and Douglas Summers-Stay.

In recent years, word embeddings have shown great success in many natural language processing tasks, but they characterize the meaning of a word/concept by an uninterpretable “context signatures”(Baroni et al. 2010). In cognitive psychology, concepts are represented by their relations with properties. In this work, we present a neural network-based method for mapping a distributional semantic space onto a human-built property space automatically. Experimental results show that our method can achieve state-of-the-art performance on the widely used McRae dataset.

A9 — Extrapolative Models for Rich Text Generation — (Withheld by author request)

This work introduces extrapolative models for constructing coherent narratives from minimal linguistic inputs. Our models use intermediate interpretable conceptual representation of sentences and post- decoding enrichment to add novelty and interest to the generated texts. The intermediate representation comprises event and entity categories extracted from a large story corpus in a way to promote novelty, topicality, and coherence. The post-decoding system leverages reversed entailment relationships (anti-entailment) and Maximum Mutual Information (MMI) to reduce blandness and repetition in the generated texts.

Poster Room B

B1 — The entrainment of creaky voice in spoken dialogue — Courtney Mansfield.

Speakers adapt their speaking style to match that of their interlocutor in a process described as linguistic entrainment or accommodation. While previous studies have considered the entrainment of acoustic-prosodic features in spoken dialogue, the entrainment of phonation has not been examined. Using a hand-annotated and automatically classified dataset of phonation labels, I examine the entrainment of creaky voice in partners over several dimensions. I additionally consider how speaker and partner gender affect both rates of creak and entrainment in a cooperative dialogue.

B2 — Forecasting the Future using Diverse Social Media Sources — Katherine Porterfield, Dustin Arendt, Nathan Hodas and Svitlana Volkova.

Social media (SM) signals have been effectively used to predict real-world events e.g., protest and civil unrest, public opinions and elections, and influenza dynamics. However, the majority of existing social media analytics focuses on predictive rather than forecasting capabilities. Moreover, the existing approaches fail to provide rigorous quantitative evaluation by focusing on a single metric only, and frequently analyze the predictive power of text signals exclusively.We propose to evaluate the predictive power of mixed social media signals, e.g., images and text in combination with recently emerged deep learning models across multiple forecasting tasks; domains, e.g., Twitter and Flickr; and geolocations.

B3 — Towards Anticipatory Analytics: Forecasting Instability Across Countries from Dynamic Knowledge Graphs — Suraj Maharjan, Prasha Shrestha, Katherine Porterfield, Dustin Arendt and Svitlana Volkova.

Protests, civil unrest, and instability around the world not only cause physical damages but also human casualties. It would be possible to mitigate these losses if there were a way to forecast such events happening world-wide. With the plethora of information coming from a variety of news sources, a system can be designed to alert about such events in advance. The ability to forecast protests, fights, and assaults can help governments to make plans to safeguard citizens from potential casualties, and monitor essential supplies.

B4 — Forecasting Influenza-like Illness Dynamics for Military Populations Using Neural Networks and Social Media — Ellyn Ayton, Katherine Poterfield, Svitlana Volkova and Court Corley.

Every year there are 500,000 deaths worldwide attributed to influenza. The Center for Disease Control and Prevention (CDC) reports weekly on the level of influenza-like illness (ILI) seen year round in hospitals and doctor visits. These values are used to monitor the spread and impact of influenza, however by the time the ILI data is released, the information is already 1-2 weeks old and is frequently inaccurate until revisions are made. To overcome this, we propose making use of large amounts of social media data, such as Twitter, to be a secondary source of information in order to predict current and future ILI proportions -- the total number of people seeking medical attention with ILI symptoms.

B5 — Serendipity in Book Passage Recommendation with Neural and LDA Topic Modeling — Joshua Mathias.

This paper presents and compares methods of recommending multiple passages of a specific book, given a paragraph as input (to aid authors). This research focuses on recommending verses from the King James Bible, but is intended to be generally useful for passage and citation recommendation.

B6 — Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection — Aaron Tuor, Lauren Charles and Fnu Anubhav.

Due to globalization, geographic boundaries no longer serve as effective shields for the spread of infectious diseases. In order to aid bio-surveillance analysts in disease tracking, recent research has been devoted to developing information retrieval and analysis methods utilizing the vast corpora of publicly available documents on the internet. In this work, we present methods for the automated retrieval and classification of documents related to active public health events. We demonstrate classification performance on an auto-generated corpus, using recurrent neural network, TF-IDF, and Naive Bayes log count ratio document representations. By jointly modeling the title and description of a document, we achieve 97% recall and 93.3% accuracy with our best performing bio-surveillance event classification model: logistic regression on the combined output from a pair of bidirectional recurrent neural networks.

B7 — Continuous Learning in a Hierarchical Multiscale Neural Network — Thomas Wolf, Julien Chaumond and Clement Delangue.

We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.

B8 — E-MATCH – Event Matching by Analysis of Text Characteristics — Anne Kao, Stephen Poteet, Lesley Quach, Rodney Tjoelker and David Augustine.

With the exponential growth in volume and diversity in data on the Internet, it is common for the same event to be reported or mentioned by different sources using different descriptions and depicting different aspects of the event. It is rarely easy to decipher if multiple reports or mentions refer to the same event, because the exact time and location may not be mentioned. Furthermore, in cases of anonymous reporting, key actors may not be included either. We offer a method E-MATCH to identify reports from multiple data sources that potentially refer to the same event. It is especially useful in identifying diverse descriptions of the same event from multiple data sources when key location, time, and actor information is missing.

B9 — Building Knowledge Bases for Precision Cancer Medicine — Jake Lever, Eric Y. Zhao, Jasleen Grewal, Luka Culibrk, Melika Bonakdar, Kilannin Krysiak, Arpad Danos, Obi Griffith, Malachi Griffith, Martin R. Jones and Steven J.M. Jones.

Precision cancer medicine efforts rely heavily on existing research into the role of different genes in cancer, drug resistance markers and other information about common somatic mutations. Text mining methods should be used to create knowledge bases to automate precision medicine pipelines that can predict drugs to use for patient treatment. To this end, we present our work on the CancerMine and CIViCmine resources. We have built a relation extraction pipeline that processes abstracts and full-text articles to identify the drivers, oncogenes and tumor suppressors in different cancer types as well as biomarkers of diagnosis, prognosis, predisposition, and drug resistance.

Afternoon Talks 2

17:15 — Adverbial Clausal Modifiers in the LinGO Grammar Matrix — (Kristen Howell & Olga Zamareva)

We extend the coverage of grammars produced by the LinGO Grammar Matrix to adverbial clausal modifiers. We present an analysis, taking a typologically-driven approach to account for this phenomenon across the world's languages, which we implement in the Grammar Matrix Customization System. We test our analysis on 5 typologically diverse languages that were not considered in development, achieving 88% coverage and 0.02% over-generation.

17:35 — SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach — Michael Petrochuk and Luke Zettlemoyer.

The SimpleQuestions dataset is one of the most commonly used benchmarks for studying single-relation factoid questions. In this paper, we present new evidence that this benchmark can be nearly solved by standard methods. First we show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the linguistic signal alone. Second we introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, despite using standard methods. Finally, we report an empirical analysis showing that the upper bound is loose; roughly a third of the remaining errors are also not resolvable from the linguistic signal. Together, these results suggest that the SimpleQuestions dataset is nearly solved.

17:55 — Neural Relation Extraction Model with Selectively Incorporated Concept Embeddings — Yi Luan, Mari Ostendorf and Hannaneh Hajishirzi.

This paper describes our submission for the SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. We extend the end-to-end relation extraction model of \cite{miwa2016end} with enhancements such as a character-level encoding attention mechanism on selecting pre-trained concept candidate embeddings. Our official submission ranked the second in relation classification task (Subtask 1.1 and Subtask 2 Scenerio 2), and the first in the relation extraction task (Subtask 2 Scenario 1).