Foundations of Data Science - Virtual Talk Series

... on the "Theory of Large ML Models."

Dr. Emily Wenger (Duke University)

Monday Aug 12.

11am Pacific Time

Dr. Emily Wenger researches security and privacy issues related to machine learning models. Currently, she is an Assistant Professor of Electrical and Computer Engineering at Duke University, and before that was a Research Scientist at Meta AI. She graduated with her PhD from the University of Chicago in 2023. Dr. Wenger's research has been featured by numerous media outlets including CNN, NBC, the New York Times, and the BBC, and she has received several awards, including a GFSD fellowship, Siebel Scholarship, and the University of Chicago Harper Dissertation award. She was named to the 2024 Forbes 30 under 30 list for her work on Glaze, a tool that protects artists' work from unwanted use in generative AI models.

Title: Reclaiming Data Agency in the Age of Ubiquitous Machine Learning

Zoom link:

Abstract: As machine learning (ML) models grow in size and scope, so does the amount of data needed to train them. Unfortunately, individuals whose data is used in large-scale ML models may face unwanted consequences, such as privacy or intellectual property violations. While these issues are well-known, many existing solutions assume that such data use is inevitable and that the best path forward is to mitigate privacy risks via methods like differential privacy, encrypted model training, and federated learning. But what if such data use was not inevitable? What if, instead, users had agency over how and if their data is used in ML models? This talk introduces the concept of data agency, individuals’ ability to know and control whether their data is used in ML systems. It proposes technical tools enabling data agency, including methods to trace or disrupt data use in ML systems, and considers possible limitations of such tools. Finally, it will discuss how data agency tools complement existing ML data privacy protection approaches and could amplify nascent efforts to regulate ML data use.