Seminar series

Probabilistic Reference and Grounding with PRAGR

Vivien Mast, Bremen/ Potsdam University

5th March 2015 (staff host: Verena Rieser)

Keywords: Situated Human-Robot Interaction, Natural Language Generation

Standard algorithms for generating referring expressions assume that every entity has a true or false value for each property: a book is either yellow or not. Generating a referring expression is equated to finding a (minimal or cognitively motivated) set of properties which are all true for the target entity, but not all true for any distractor. However, human language relies on highly flexible, context dependent qualitative concepts such as graded properties or colour terms which cannot be uniquely mapped to quantitative measurements. Moreover, humans are capable of grounding reference in dialogue by flexibly adapting conceptualizations in order to reach mutual understanding. I will present the Probabilistic Reference And GRounding mechanism PRAGR which generates and resolves referring expressions based on perceptual data. PRAGR is geared towards maximizing referential success by flexibly assigning linguistic concepts to objects, depending on context. I will present recent studies evaluating PRAGR in robot-robot and human-robot communication and demonstrate the potential of PRAGR for referential grounding dialogues.

A Generative Probabilistic Model of Users in a Spatial Navigation Domain

Aciel Eshky, University of Edinburgh

7th May 2014 (staff host: Verena Rieser)

Much of the previous work in statistical dialog has focused on domains in which user goals are represented and grounded as categorical entities. In this talk, I will discuss recent work on modelling user behavior in a novel task-oriented domain (the Map Task) where user goals are spatial routes across artificial landscapes. We approach the problem of modelling users as an instance of generative probabilistic modelling, which allows us to capture a range of plausible behavior given the same underlying goal. I show how to derive an efficient feature-based representation of the spatial goals, which admits efficient exact inference and generalises to new routes. To evaluate the model, we first exploit its probabilistic nature and evaluate it intrinsically using held-out probability and perplexity. We find a substantial reduction in uncertainty brought by our spatial representation. We then evaluate extrinsically in a human judgement task, where mechanical Turkers score the appropriateness of a route description produced by our model or by real human Givers. We find that our model's behaviour does not differ significantly from the behaviour of real users.

Clarifying reference and plans in dialogue

Gregory Mills, University of Edinburgh

30th January 2013 (staff host: Verena Rieser)

One of the most contentious debates in studies of dialogue concerns the explanatory role assigned to interlocutors' intentions. In (post) Gricean cognitive/pragmatic models of meaning (e.g. Sperber and Wilson 1987, Searle 1979, Levinson 2006), intentions play the central role: intentions are a priori mental states determining a speaker's utterance formulation. Similarly, for hearers, comprehension involves recognizing or inferring the speaker's intentions. Successful communication is idealized as involving intentional transparency between speaker and hearer, and a key problem for these models is accounting for which structures (beliefs, codes, inferences) are shared or known to be shared by interlocutors (Kecskes and Mey 2008). By contrast, empirical approaches which focus in the first instance on how language is used in dialogue present a more nuanced view of the role of intentions: for example, in a series of maze game experiments, Garrod et al (2004) found that explicit articulation of speaker intentions is much less effective than more tacit forms of communication via collaborative feedback (e.g. hesitations, disfluencies, partial repeats, clarifications and repair). Moreover, the basic findings of dialogue research show how this collaborative feedback often leads to speakers adapting their own utterance mid-stream, resulting in incrementally and jointly produced utterances which necessarily do not correspond to the original speaker's own intention or goal (Goodwin, 1979). Under this view, intentions, plans, and beliefs are treated as joint construals (Clark, 1996) that are emergent from the interaction.

To address these issues we report a variant of the "maze task" (Garrod et al 1987, 2004), in which participants are required to collaboratively develop sequences of steps for solving the mazes. Participants communicate with each other using an experimental chat tool (Healey and Mills 2006), which interferes with the unfolding dialogue by inserting artificial clarification requests that appear, to participants as if they originate from each other. Two kinds of clarification request were introduced: (1) Artificial "Why?" questions to query the participants' plan, (2) Fragment clarification requests (Healey et al 2003) that repeat a single word from the prior turn, querying the content of participants' referential descriptions.

We show how over the course of the interaction, interlocutors change how they treat these two kinds of clarification request: "Why?" clarification requests querying higher level plans become easier to respond to as co-ordination develops, while for fragment clarification requests the converse is the case: they become harder to respond to as the task progresses. Further, we show how this differential pattern is not arrived at via explicit negotiation, but through the tacit turn-by-turn feedback mechanisms of dialogue.

All your essay are belong to us: Automatically scoring essays with e-rater

Joel Tetreault, Educational Testing Service in Princeton, NJ

10th October 2012 (staff host: Verena Rieser)

One of the largest advances in large-scale educational assessment in the last twenty years has been the use of computer programs to automatically score student essays. The advantages of automatic scoring methods include faster and consistent score reporting as well as paving the way for online writing instructional tools for individual and classroom use. In this talk, I will first present "e-rater", the automatic scoring tool developed by ETS which uses a host of linguistic and statistical features. e-rater is used to score millions of essays every year for high-stakes exams such as the TOEFL and GRE (in conjunction with human raters) as well as other assessments. Second, I will discuss in detail one component of e-rater: grammatical error detection. In particular, usage errors involving prepositions are among the most common types seen in the writing of non-native English speakers. Prior corpus analyses report error rates for English prepositions that were as high as 10%. Since prepositions are such a nettlesome problem for English as a Second Language (ESL) writers, developing an NLP application that can reliably detect these types of errors will provide an invaluable learning resource. To address this problem, we describe a system which detects preposition errors with a precision of 84% in TOEFL essays. In this talk, I will discuss the system as well as issues in developing and evaluating NLP grammatical error detection applications.

Data-Driven Approach to Concept-to-Text Generation

Ioannis Konstas, School of Informatics, University of Edinburgh

5th September 2012 (staff host: Verena Rieser)

Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection (“what to say”) and surface realization (“how to say”) in a domain-independent fashion.Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We represent our grammar compactly as a weighted hypergraph and recast generation as the task of finding the best derivation tree for a given input. Experimental evaluation on several domains achieves competitive results with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.

Joint Decision Making for Situated Language Generation" (slides attached)

Nina Dethlefs (Interaction Lab), Heriot-Watt University

22nd February, 2012

Natural Language Generation (NLG) systems in situated domains face a

number of decisions concerning what to communicate to a human user,

how to structure their content and how to express it. Traditionally,

these decisions have been addressed sequentially and in isolation of

each other. Recent studies however have shown that they are strictly

interdependent and that an isolated treatment can deteriorate the

overall performance of systems.

In this talk, I will argue for a joint learning framework for situated

NLG that is based on Hierarchical Reinforcement Learning and can be

augmented with graphical models. I will discuss a human evaluation

study that compares two systems that guide users through a virtual 3D

environment: one using a jointly optimised policy and one using a

policy optimised in isolation. Results show that the jointly optimised

policy outperforms its isolated counterpart in terms of task success,

user satisfaction and similarity with human authors.

Paul Crook

August 23rd, 3.15pm

1/ "Lossless Value Directed Compression of Complex User Goal States for

Statistical Spoken Dialogue Systems" (Interspeech 2011)

This paper presents initial results in the application of Value Directed

Compression (VDC) to spoken dialogue management belief states for reasoning

about complex user goals. On a small but realistic SDS problem VDC generates a

lossless compression which achieves a 6-fold reduction in the number of

dialogue states required by a Partially Observable Markov Decision Process

(POMDP) dialogue manager (DM). Reducing the number of dialogue states reduces

the computational power, memory, and storage requirements of the hardware used

to deploy such POMDP SDSs, thus increasing the complexity of the systems which

could theoretically be deployed. In addition, in the case when on-line

reinforcement learning is used to learn the DM policy, it should lead to, in

this case, a 6-fold reduction in policy learning time. These are the first

automatic compression results that have been presented for POMDP SDS states

which represent user goals as sets over possible domain objects.

2/ "Parallel Computing and Practical Constraints when applying the Standard

POMDP Belief Update Formalism to Spoken Dialogue Management" (IWSDS 2011)

We explore the commonly stated assumption that the standard POMDP formalism

for belief updates cannot be directly applied to Dialogue Management for

Spoken Dialogue Systems (SDSs) due to the computational intractability of

maintaining a large belief state space. Focusing on SDSs, as this application

has particular bounds in terms of “real-time” belief updates and potentially

massive numbers of observations, we quantify computational constraints both in

terms of compute time and memory. We establish a level of complexity of SDS

task below which a direct implementation of the standard POMDP formalism is

possible and beyond which some form of compressed representation is required.

We find that computation time of POMDP belief updates is rarely an issue.

Memory size and latency tend to be the dominant constraints. Low-latency,

shared-memory architectures are more suitable than General Purpose Graphics

Processing Units (GPGPUs) or largescale cluster/cloud infrastructure. One

assumption, that users do not change their goal during a dialogue, has

significant beneficial impacts on memory requirements allowing for practical

POMDP SDSs which have millions of states.

Title: Learning Dialogue Systems with Complex Interactions:

Nonparametric Bayes, Partial Observability and Semi-Markov Dynamics

(An ongoing research)

Who: Dr. Zhuoran Wang

When: Wednesday June 22nd, 3.15pm

Location: G45

The partially observable Markov decision process (POMDP) has recently proven

successful in uncertainty management problems for speech-based interactive

systems of various kinds. However, the specification of its parameters (usually

either handcrafted or learned based on carefully factorized models) becomes

challenging when the interaction complexity grows, e.g. for systems with

multiple modalities and/or complex user goals. We propose a

new nonparametric Bayesian framework for such systems, with the hypothesis that

semi-Markov POMDP models can be directly inferred from observations, without

any domain-specific knowledge. The semi-Markov pattern allows optimisation of

the timing of system responses to yield more natural interactions, for example

in incremental processing of spoken dialogue. An extension based on Dirichlet

process (DP) emissions is also presented, which achieves further nonparametric

hierarchical clustering of the generative process. This is an ongoing

work, presenting and motivates the framework, and describes practical

algorithms for both inference

and planning with such models. The resulting framework is a novel combination

of automatic state inference, observation clustering, and response timing

optimisation for spoken and multimodal interactive systems.

Judy Robertson

Monday 16th of May, 2011.

Slides can be found.


How to put children off studying computing: learners' attitudes to computer science after school based game making projects


There is a pressing need to interest young people in computer science. A recent popular approach has been to harness learners' enthusiasm for computer games to motivate them to learn computer science concepts through game authoring. This talk will describe a study in which 992 learners across 13 schools took part in a game making project. It will provide evidence from 225 pre-test post-test questionnaires on how learners' attitudes to computing changed during the project, as well as qualitative reflections from the class teachers on how the project affected their learners. Results indicate that girls did not enjoy the experience as much as boys, and that in fact the project may make pupils less inclined to study computing in the future. This has important implications for future efforts to engage young people in computing.