Spring 2026
Title: What's Important in Discourse?
Abstract: Discourse is not a shopping list of sentences: some utterances, participants and sections in text and conversation are more important or salient than others, though measuring this can be challenging. In this talk I explore differences in the importance of content using a newly developed methodology leveraging multiple summarization, in which information captured in more summaries is considered more salient than the less ‘summary-worthy’ information that does not make the cut. Multiple analyses of the linguistic means that signal salience at the discourse level show considerable variation across text types, revealing that how we express pertinent versus supporting information varies broadly between fiction, academic writing, spontaneous conversation or YouTube videos. To investigate these effects, I propose an adversarial genre analysis using models trained to fit one genre and tested on data with perturbed inputs, which shows for example that properties flagging a character as important in a biography could actually correspond to a tangential one in a Reddit forum discussion, and vice versa. I will also present some recent results on the sensitivity of both humans and LLMs to the memorability of salient information, and how human and model-generated summaries compare and diverge.