Detailed information

Title: What Is a Proxy and Why Is It a Problem?


Abstract: Warnings about so-called 'proxy variables' have become ubiquitous in recent policy debates about machine learning's potential to discriminate illegally. Yet it is far from clear what makes something a proxy and why it poses a problem. In most cases, commentators seem to worry that even when a legally proscribed feature such as race is not provided directly as an input into a machine learning model, discrimination on that basis may persist because non-proscribed features are correlated with --- that is, serve as a proxy for --- the proscribed feature. Analogizing to redlining, commentators point out that zip codes can easily serve as a stand in for race. Yet, unlike lenders, a machine learning model will not seize on zip codes because the model intends to discriminate on race; it will only do so because zip codes also happen to be predictive of the outcome of interest. So how are we to decide whether a variable is serving as a proxy for race or as a legitimate predictor that just happens to be correlated with race?


This question cuts to the core of discrimination law, posing both practical and conceptual challenges for resolving whether any observed disparate impact is justified when a decision relies on variables that exhibit any correlation with class membership. This paper attempts to develop a more principled definition of proxy variables, aiming to bring improved clarity to statistical, legal, and normative reasoning on the issue. It describes the various conditions that might create a proxy problem and explores a range of possible responses. In so doing, it reveals that any rigorous discussion of proxy variables requires excavating the causal relationship that different commentators assume to exist between non-proscribed features, proscribed features, and the outcome of interest.


Bio: Margarita Boyarskaya is a PhD student with the Technology, Operations, and Statistics group at NYU Stern working on causal models for algorithmic fairness. She is a student fellow at the NYU Information Law Institute (Privacy Research Group). Margarita holds a B.Sc. and a M.Sc. in theoretical mathematics from Moscow State University. Previously, she was an intern with the FATE group at Microsoft Research.