Nate Chambers is a professor at the United States Naval Academy where he teaches undergraduate computer science. When not enjoying the classroom, his research interests focus on understanding events in text, the semantic relations between them, and how they characterize knowledge the expressed in narratives. He has worked on event schemas and script-like reasoning since his days as a graduate student at Stanford University (2011 graduate), publishing some of the first papers on statistical learning of event schemas. He currently works on a DARPA project that is pursuing large-scale event schema learning for deeper understanding. In a new area of research, he recently won a best paper award for his work with undergraduates on the human sex trafficking domain, performing information extraction on adversarial text.
Abstract: This talk describes our efforts to better understand the role of commonsense knowledge in narrative text, particularly with a focus on unstated facts that connect the discourse. Many times the logical connections between events (preconditions, causation, etc.) are explicitly stated in the text, but oftentimes they are not. When discourse does not connect them, how is coherence maintained? These unstated facts come from a variety of inferred sources like preconditions, causation, and various states of entities. I will describe two avenues of research we are pursuing to better understand the commonsense knowledge behind these unstated connections. Both build on top of the ROCStories corpus. The first is a new annotation of ROCStories that identifies (unstated) entity states between the story events, and then modifies the stories to maintain coherence when faced with counterfactuals. The second is a larger effort to build TellMeWhy, an evaluation dataset of 31k why-questions asking why specific events in the stories occurred. Our explorations reveal how well (or how poorly) today's large language models can identify answers, as well as identify when commonsense knowledge is needed to answer them.
Bonnie Webber received her PhD from Harvard University and then taught at the University of Pennsylvania in Philadelphia for 20 years before joining the School of Informatics at the University of Edinburgh, where she is now professor emeritus.
Known for early research on "cooperative question-answering" and extended research on discourse anaphora and discourse relations, she has served as President of the Association for Computational Linguistics (ACL) and Deputy Chair of the European COST action IS1312, "TextLink: Structuring Discourse in Multilingual Europe". Along with Aravind Joshi, Rashmi Prasad, Alan Lee and Eleni Miltsakaki, she co-developed the Penn Discourse TreeBank -- most recently, the PDTB-3.0 (LDC2019T05).
She is a Fellow of the Association for Advancement of Artificial Intelligence (AAAI), the Association for Computational Linguistics (ACL) and the Royal Society of Edinburgh (RSE). In July 2020, she was awarded the ACL Life Time Achievement award. Her current interest is focussed on automating the recognition and correction of inconsistencies in annotated corpora.
Abstract: When we started work on the Penn Discourse TreeBank (PDTB-2), we considered only two possibilities: When a discourse relation held between a pair of sentences and/or clauses linked by one (or more) explicit discourse connectives, the sense(s) holding between then arose either (a) from a combination of the senses associated with the connective(s) and the senses of its arguments, or (b) from a combination of the senses of the arguments in the case of adjacent arguments and no explicit connective. In subsequent papers, we showed that this meant that more than one sense relation could hold simultaneously between a pair of arguments.
We then noticed that expressions other than discourse connectives could signal the sense(s) holding between adjacent arguments. These expressions we called "Alternative Lexicalizations" or AltLexs, noting that the set appeared open-ended and drawn from a wide variety of syntactic types [Prasad et al, 2010]. On the other hand, we found that AltLex expressions could be partitioned into three groups, depending on (a) whether an expression belonged to a syntactic type admitted as an explicit connective (i.e., an ADVP or a PP), and (b) whether an expression was ''frozen'' (ie, blocking free substitution, modification or deletion of any of its parts) or not. Although most AltLex expressions were both lexically and syntactically free (over 75%), they were constrained semantically to having one part that co-referred with Arg1 of the relation and another part that conveyed the sense of the relation itself.
When funding became available to extend the annotation of implicit discourse relations to ones within a single sentence, it became clear that certain local lexico-syntactic constructions also unambiguously signalled particular discourse relations. We called them "Alternative Lexicalized Constructions" or AltLexC expressions. They included auxiliary inversion (signalling that the inverted clause was the condition under which the main clause held true), as in
... but would have climbed 0.6%, had it not been for the storm [wsj_0573]
and predicate inversion (signalling that the main clause held, despite the truth of the inverted predicate), as in
Crude as they were, these early PCs triggered explosive product development in desktop models for the home and office. [wsj_0022]
At the same time, Das and Taboada [2018] argued for the existence of many more ways of "signalling" discourse relations, including through referential, lexical, semantic, syntactic, graphical and genre features, though without explicitly indicating what part(s) of a relation served as signals. [Zeldes & Liu, 2020] then took up the challenge of trying to automatically detect and identify such signals in a text.
Given this background, what I want to present today are what I take to be particularly interesting syntactic signals --- either because they demonstrate where greater syntactic analysis is needed to identify the signal (though once identified, the signal is unambiguous as to the sense of the relation) or because they show where an independently identified component of the syntactic-semantic interface -- namely, marked information structure -- can signal discourse relational senses, as well as supporting referential coherence.