Detecting Marked and Unmarked Categories in Language

Detecting Marked and Unmarked Categories in Language (Alina Arseniev-Koehler, Devin Cornell, and Andrei Boutyline )

Many of our categories are marked. For example, we might qualify someone as 'openly gay' but rarely do we say someone as 'openly straight.' Or, we might qualify a nurse as male, but would not qualify a nurse as female - describing 'female nurse' or 'openly straight' sounds redundant. These marked and unmarked categories highlight the meanings we take for granted (e.g., Zerubavel 2018) and what we see as the default meaning versus the 'other'. We use these markings pervasively, and often subconsciously. However, in using these marked and unmarked categories, our language (and, in turn, machine-learned language) can reinforce boundaries between the marked and unmarked.

We're interested in how machine-learning models of language learn these marked and unmarked categories, and in training a model to detect them in written language (which, from our experience, is quite hard for a human). While we can imagine how to train a method to detect a specific types of marking (such as gender) from previous work on gender bias detection (e.g., Bolukbasi 2016) we're especially interested in thinking about this pattern in more generalizable ways. What is the process of learning and detecting the marked and unmarked and is there a typology of markedness? What makes something more or less prone to being marked? How might markedness might be represented as a form of knowledge - for us but also for a machine-learning model? How does markedness relate to ambiguity and methods to resolve ambiguity in language (e.g., co-reference resolution)? Which types of markedness are easier or harder for a machine-learning model to learn?

Zerubavel, Eviatar. Taken for Granted: The Remarkable Power of the Unremarkable. Princeton University Press, 2018

Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? debiasing word embeddings." Advances in Neural Information Processing Systems. 2016.

Arseniev Cornell and Boutyline - Huge Dwarves - DISI August 2018.pdf