This project is concerned with developing mathematical and computational models of the uncertainty arising from vagueness and testing them on large scale data.
This project will investigate whether the differences between various sources of disagreement (e.g., noise, ambiguity, subjective bias) can be detected using statistical models, and how to use such insight to guide the development of approaches for training and evaluating NLP models with datasets containing disagreements.
Computational models of referring expression interpretation that can learn from datasets with disagreement do not yet exist. The objective of this project is to develop such models, as well as metrics that do justice to interpretative variation.
The project will develop models for detecting offensive language that take into account the fact that the offensiveness of some content can be controversial.
The project will be concerned with the differences in interpretation that arise from misunderstandings in dialogue, focusing in particular in misunderstandings in coreference and reference.