Detailed information

Title: Social Fairness, Accountability and Transparency of the Data Economy: Using Machine Learning to Combat the Emptiness of Privacy Policies


Abstract:

Online consumers do not read privacy policies. These policies outline the ways in which consumer data will be collected and used. Certain common usages -personalization of ads, prices or content - have long been identified as potential venues of algorithmic unfairness, lack of accountability and transparency. In this paper we draw the reader’s attention to a related problem, which we call social unfairness and non-transparency. We demonstrate that privacy policies of online apps and services are written in an essentially "empty" language - full of qualifiers like "we might," abstract terms like "third parties," and open-ended phrases like"among others." As a result, even a consumer with expert knowledge is unable to predict what will be done with what her data, and the imbalance of rights and duties strongly favors the data holders, to the detriment of consumers.We argue that machine learning tools can be used to combat the "emptiness" of language of privacy policies, by increasing the text-processing power of individual consumers and consumer organizations. Further, once the social practice changes and policies start containing more actual information, machine learning can be used to process these documents with high speed and accuracy. To substantiate this claim, we review the existing literature and conduct out own experiment. We have tagged 15 privacy policies of services in competitive markets (jogging, food delivery and dating apps) and demonstrate that information derived from topic modeling in conjunction with a classification model allows one to detect a rigidly defined type of information with accuracy as high as 90 percent. This finding, given the small size of the initial sample, provides good prospects for future research.Hopefully, more and more applications empowering individual consumers and civil society will soon be able to follow.


Bio:

  • Przemysław Pałka is a Research Scholar at Yale Law School, Fellow in Private Law at YLS Center for Private Law and a Resident Fellow at the Information Society Project at Yale. His research interests encompass the intersection of law and new technologies, especially contract law, consumer law, personal data protection law, machine learning, data analytics & automation of legal tasks. Before coming to Yale, Pałka was a researcher the European University Institute in Florence, where he coordinated the CLAUDETTE project, and where he also obtained his Ph.D. More info at: https://przemysLAW.technology/

  • Roozbeh Yousefzadeh is a postdoctoral fellow at Yale University. His research interests are machine learning, scientific computing, and optimization. He received his PhD in Computer Science from University of Maryland, based on his work on deep learning interpretation and homotopy methods. More information available at: http://campuspress.yale.edu/roozbeh/