Vagueness is the pervasive phenomenon in which words have imprecisely defined boundaries, which are applied differently in different contexts and by different people. For example, when a patient report describes a baby's blood pressure as too high, or her condition as stable, these terms are interpreted differently by different clinicians. This project studies mathematical and computational models of uncertainty and vagueness.
In the context of natural language processing (NLP), vagueness remains an underexplored area of research. Vagueness highlights intricacies of natural language that are not captured by standard measures of accuracy currently used in NLP benchmarks. To illustrate, consider the following variables that (among others) are at play in dealing with vague expressions.
Poorly defined boundaries. Consider the sorites paradox. One grain does not make a heap. Moreover, if one grain does not make a heap then two grains do not make a heap since no single grain can be the difference between a heap and a non-heap. Following this reasoning, we find that if two grains do not make a heap, then neither do three, or a million. In fact, this leads to a paradox as we reach the even stronger conclusion that no number of grains can make a heap. Vague concepts give rise to the sorites paradox: when is someone tall, bald or old? We can give examples of people we judge to be clearly tall and those who are clearly short, but what happens at the boundary?
Context-dependence. Human judgments and expressions of vagueness are dependent on the context that they are produced in. "There are a lot of people" might be an appropriate response to seeing twenty people in a house (especially when one is more of an introvert), but not to seeing twenty people in a football stadium. Context effects are not limited to linguistic cues and can also include visual information. In judging whether there are many striped fish in the image on the left, we can be influenced by factors such as the number of other fish in the image, the spacing between the fish and the way the two types of fish are grouped together (Coventry et al., 2005).
This only scratches the surface of why vagueness is a unique and interesting topic of NLP research. In this project, we go beyond the standard measures of accuracy used in NLP to study and develop models that reflect the inherent inconsistencies that arise when humans produce and interpret vague expressions.
For more information, please reach out to Hugh Mee Wong. This project is a collaboration between the department of Information and Computing Sciences and the department of Languages, Literature and Communication.
Coventry, K., Cangelosi, A., Newstead, S. N., Bacon, A., & Rajapakse, R. (2005). Grounding natural language quantifiers in visual attention. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society Lawrence Erlbaum Associates.
This web page is subject to change.