ICLR-21 Workshop on Responsible AI
An interdisciplinary perspective on practical limitations and tradeoffs among fairness, explainability, safety, robustness, and beyond.

Joining the Workshop on May 7



Artificial Intelligence and Machine Learning are increasingly employed by industry and government alike to make or inform high-stakes decisions for people in areas such as employment, credit lending, policing, criminal justice, healthcare, and beyond. Over the past several years, we have witnessed growing concern regarding the risks and unintended consequences of inscrutable ML techniques (in particular, deep learning) in such socially consequential domains. This realization has motivated the community to look closer at the societal impacts of automated decision making and develop tools to ensure the responsible use of AI in society. Chief among the ideals that the ML community has set out to formalize and ensure are safety, interpretability, robustness, and fairness. In this workshop, we examine the community’s progress toward these values and aim to identify areas that call for additional research efforts. In particular, by bringing researchers with diverse backgrounds, we will focus on the limitations of existing formulations of fairness, explainability, robustness and safety, and discuss the tradeoffs among them. The following is a non-exhaustive list of questions we aim to address through our invited talks, panels, and accepted papers:

  • How should we capture and mitigate the risks and unintended consequences of AI in society? What are the limitations of existing notions for capturing unintended consequences?

  • What are the tradeoffs among different facets of responsible AI? In particular, under what conditions are interpretability, reproducibility, fairness, privacy, safety, and robustness, considered competing constraints against each other? If so, what is the right framework to think about balancing them?

  • Where do we stand in evaluating and incorporating fairness, safety, interpretability, and robustness constraints into our machine learning algorithms?

  • What can we guarantee about fairness, safety, interpretability, transparency and robustness of deep neural networks?

  • How can we mitigate various forms of bias that may creep into the training data (including but not limited to biased target variables, proxy features, and sampling bias)? How can we make the data collection pipelines a more integrated part of the practice of Machine Learning?

  • How can we incorporate insights from social sciences and domain expertise into the design, deployment, and maintenance of ML decision-making systems?

  • What have we gotten right/wrong with respect to responsible AI? In particular, what are the main impediments to the uptake of responsible AI methods in practical settings?

The main goal of the workshop is to bring together researchers from academia, industry, and government with diverse expertise and points of view on responsible AI, to discuss how to effectively evaluate and enforce machine learning pipelines to better comply with fairness, safety, interpretability and robustness constraints. Our workshop will consist of a diverse set of speakers (ranging from researchers with social work background to researchers in the ML community) to discuss transparency, bias and inequity in various real-world problems, including but not limited to criminal justice, health care and medicine, poverty and homelessness, and education. In addition, our invited talks will cover interpretability, and safety of modern machine learning models, their conflicting constraints, ethical and legal issues, and unintended consequences in areas such as self-driving cars and robotics. The workshop aims to further develop these research directions for the machine learning community.

Speakers & Panelists May 7, 6:45-19:45 PDT

A Human-Centered Perspective on Responsible Autonomy

Dorsa Sadigh (Stanford) 8:00-8:30 PDT

Abstract: Developing responsible, safe, and robust autonomous and intelligent systems is a challenging problem affecting today’s robotics systems as they start entering our lives. Autonomous vehicles interacting with other drivers, assistive robots interacting with humans in homes, AI bots negotiating with people are only a few examples of AI agents that require safe, responsible, and robust interactions with humans. In this talk, I will first discuss how building computational human models and an interaction-aware perspective on developing AI agents is a key component for safety and robustness. I will describe two approaches and their applications for modeling human-AI interaction: One is a game theoretic approach that models the interaction as an underactuated dynamical system. However, this approach is computationally expensive, so I will introduce an orthogonal perspective on learning low dimensional representations of other agents’ policies as a way of modeling interaction, coordination, and collaboration. I will then define the notion of influencing interactions, which are interactions that can influence human responses and reactions to AI agents. Specifically, I will talk about approaches for building conventions and positively influencing humans in long-term repeated interactions with AI agents.

Bio: Dorsa Sadigh is an assistant professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the intersection of robotics, learning, and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot interaction. Dorsa has received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2017, and has received her bachelor’s degree in EECS from UC Berkeley in 2012. She is awarded the NSF CAREER award, the AFOSR Young Investigator award, the IEEE TCCPS early career award, the Google Faculty Award, and the Amazon Faculty Research Award.

Towards interpreting responsibly. Good intentions are not enough.

Been Kim (Google Brain) 8:30-9:00 PDT

Abstract: Interpretable machine learning has been a popular topic of study in the era of machine learning. But are we making progress? Are we going toward interpreting responsibly? In this talk, I present a family of work that encourages us to be critical of ourselves, our methods and our goals.

Bio: Been Kim is a staff research scientist at Google Brain. Her research focuses on improving interpretability in machine learning by building interpretability methods for already-trained models or building inherently interpretable models. She gave a talk at the G20 meeting in Argentina in 2019. Her work TCAV received UNESCO Netexplo award, was featured at Google I/O 19' and in Brian Christian’s book on “The Alignment Problem”. Been has given keynote at ECML 2020, tutorials on interpretability at ICML, University of Toronto, CVPR and at Lawrence Berkeley National Laboratory. She was a co-workshop Chair ICLR 2019, and has been an area chair at conferences including NIPS, ICML, ICLR, and AISTATS. She received her PhD. from MIT.

(Ir)responsible AI: Revisiting Transparency and Fairness in AI Systems

Motahare Eslami (CMU) 9:00-9:30 PDT

Abstract: The power, opacity, and bias of AI systems have opened up new research areas for bringing transparency, fairness, and accountability into AI systems. In this talk, I will revisit these lines of work, and argue that while they are critical to make AI systems responsible, fresh perspectives are needed when these efforts fall short. First, I discuss how algorithmic transparency, when not designed carefully, can be more harmful than helpful, and that we need to find the right level of transparency to provide users with an informed interaction with an AI system. Second, I will talk about the current approaches tackling algorithmic bias in AI systems, including bias detection and bias mitigation, and their limitations. I particularly show that the current bias detection and algorithmic auditing techniques that mainly rely on experts, and are conducted outside of everyday use of AI systems, fall short in detecting biases that emerge in real-world contexts of use, and in the presence of complex social dynamics over time. I propose “everyday algorithm auditing” that involves everyday users in detecting, understanding, and/or interrogating biased and harmful algorithmic behaviors via their day-to-day interactions with algorithmic systems. I then take a new perspective on the bias mitigation efforts that try to bring fairness to AI systems, and argue that there are many cases that mitigating algorithmic bias is quite challenging, if not impossible. I propose the concept of “bias transparency” that center bias awareness, rather than bias mitigation, in the design of AI systems, particularly high-stakes decision-making systems that users’ awareness of potential biases is critical in making a final informed decision.

Bio: Motahhare Eslami is an assistant professor at the School of Computer Science, Human-Computer Interaction Institute (HCII), and Institute for Software Research (ISR), at Carnegie Mellon University. She earned her Ph.D. at the Computer Science Department at the University of Illinois at Urbana-Champaign. Motahhare’s research develops new communication techniques between users and opaque algorithmic socio-technical systems to provide users a more informed, satisfying, and engaging interaction. Her work has been recognized with a Google Ph.D. Fellowship, Best Paper Award at ACM CHI, and has been covered in mainstream media such as Time, The Washington Post, Huffington Post, the BBC, Fortune, and Quartz. Motahhare is also a Facebook and Adobe Ph.D. fellowship finalist, and a recipient of ICWSM Best Reviewer Award. Motahhare’s research is supported by NSF, Amazon, Facebook, and Cisco.

Synergies Between Algorithmic Fairness and Domain Generalization

Richard Zemel (University of Toronto) 10:00-10:30 PDT

Abstract: While the ultimate goals of domain generalization and algorithmic fairness are closely aligned, little research has focused on their similarities and how they can inform each other constructively. One of their main common goals involve developing learning algorithms robust to changes across domains or population groups. Domain generalization methods typically rely on knowledge of disjoint domains or environments, while information indicating which demographic groups are at risk of discrimination is often used in the fairness literature. Drawing inspiration from recent fairness approaches that improve worst-case performance without knowledge of sensitive groups, I'll present a novel domain generalization method that handles the more realistic scenario where environment partitions are not provided. Empirical and theoretical analysis show how different partitioning schemes can lead to increased or decreased generalization performance, suggesting when this approach can out-perform existing invariant learning methods with hand-crafted environments. I will also discuss whether algorithms from robust ML can be used to improve the fairness of classifiers that are trained on biased data, in the context of the task of fairly predicting the toxicity of internet comments. These studies suggest that the two areas may be mutually beneficial, but point out some difficulties that arise when applying the current algorithms in practice.

Bio: Richard Zemel is a Professor and Machine Learning Research Chair in the Department of Computer Science at the University of Toronto. He is a Co-Founder and the Research Director of the Vector Institute for Artificial Intelligence. His awards include an NVIDIA Pioneers of AI Award, an ONR Young Investigator Award, a CIFAR AI Chair, and two NSERC Discovery Accelerators. Rich is on the Advisory Board of the Neural Information Processing Society. Rich's current research interests include learning useful representations of data without any supervision, algorithmic fairness, learning with little data, and graph-based machine learning.

Can we design deep learning models that are inherently interpretable?

Cynthia Rudin (Duke) 10:30-11:00 PDT

Abstract: Black box deep learning models are difficult to troubleshoot. In practice, it can be difficult to tell whether their reasoning process is correct, and “explanations” have repeatedly been shown to be ineffective. In this talk I will discuss two possible approaches to create deep learning methods that are inherently interpretable. The first is to use case-based reasoning, through a neural architecture called ProtoPNet, where an extra “prototype” layer in the network allows it to reason about an image based on how similar it looks to other images (the network says “this looks like that”). Second, I will describe “concept whitening,” a method for disentangling the latent space of a neural network by decorrelating concepts in the latent space and aligning them along the axes.

Bio: Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University, and directs the Prediction Analysis Lab, whose main focus is in interpretable machine learning. She is also an associate director of the Statistical and Applied Mathematical Sciences Institute (SAMSI). Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award, was named as one of the "Top 40 Under 40" by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics.

Some of her (collaborative) projects are: (1) she has developed practical code for optimal decision trees and sparse scoring systems, used for creating models for high stakes decisions. Some of these models are used to manage treatment and monitoring for patients in intensive care units of hospitals. (2) She led the first major effort to maintain a power distribution network with machine learning (in NYC). (3) She developed algorithms for crime series detection, which allow police detectives to find patterns of housebreaks. Her code was developed with detectives in Cambridge MA, and later adopted by the NYPD. (4) She solved several well-known previously open theoretical problems about the convergence of AdaBoost and related boosting methods. (5) She is a co-lead of the Almost-Matching-Exactly lab, which develops matching methods for use in interpretable causal inference.

Understanding the Limits of Model Explainability

Hima Lakkaraju (Harvard) 11:00-11:30 PDT

Abstract: As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, I will discuss a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using results from real world datasets (including COMPAS), I will demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases. I will conclude the talk by discussing extensive user studies that we carried out with domain experts in law to understand the perils of such misleading explanations and how they can be used to manipulate user trust.

Bio: Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit: https://himalakkaraju.github.io/

Challenges and opportunities of working with computer scientists as a social scientist

Eric Rice (USC) 14:00-14:30 PDT

Abstract: Responsible research in AI is complex and requires multiple lenses and multiple stake holders. I this presentation Dr. Rice will discuss his insights as a social work researcher working on AI driven intervention programs with computer scientists and community-based collaborators. He proposes a model for iterative, interdisciplinary research to insure that “real world” problems are responsibly addressed. He presents three “lessons” learned from his work on HIV prevention interventions with youth experiencing homelessness.

Bio: Dr. Eric Rice is an associate professor at the University of Southern California in the Suzanne Dworak-Peck School of Social Work.  He is the founding co-director of the USC Center for Artificial Intelligence in Society, a center which is a joint venture with the Viterbi School of Engineering.  Rice received his PhD from Stanford University.

He is an expert in social network science and community-based research.  He is the author of more than 150 peer reviewed publications and the recipient of many federal, state, and foundation grants. Since 2002, Dr. Rice has worked closely with social service providers in Los Angeles and in many other communities across the country to develop novel solutions to end youth homelessness and support young people across the nation who experience homelessness.

AI Fairness in Practice

Joaquin Quiñonero Candela (Facebook AI) 14:30-15:00 PDT

Abstract: A key requirement for building AI responsibly is to ensure that AI-powered systems and products are fair and that they work for everyone. This talk builds on the lessons of my own personal journey into AI fairness, coming at it initially from an AI research and engineering lens and then gradually realizing that fairness is a process. Indeed, fairness work is never finished and fairness questions aren’t primarily AI questions because addressing them requires multidisciplinary teams and approaches. I’ll give concrete examples that illustrate contradictory fairness definitions and I’ll share an approach to breaking down the AI fairness problem in order to make progress in practice.

Bio: Joaquin Quiñonero Candela is the Distinguished Technical Lead for Responsible AI at Facebook, focusing on areas like fairness and inclusiveness, robustness, privacy, transparency and accountability. As part of this focus, he serves on the Board of Directors of the Partnership on AI, an organization interested in the societal consequences of artificial intelligence, and is a member of the Spanish Government’s Advisory Board on Artificial Intelligence.

Before this he built the AML (Applied Machine Learning) team at Facebook , driving product impact at scale through applied research in machine learning, language understanding, computer vision, computational photography, augmented reality and other AI disciplines. AML also built the unified AI platform that powers all production applications of AI across the family of Facebook products.

Prior to Facebook, Joaquin built and taught a new machine learning course at the University of Cambridge, worked at Microsoft Research, and conducted postdoctoral research at three institutions in Germany, including the Max Planck Institute for Biological Cybernetics. He received his PhD from the Technical University of Denmark.

Data, decisions, and dynamics

Moritz Hardt (UC Berkeley) 15:15-15:45 PDT

Abstract: Consequential decisions compel individuals to react in response to the specifics of the decision rule. This individual-level response in aggregate can disrupt the statistical patterns that motivated the decision rule, leading to unforeseen consequences.

In this talk, we will discuss two ways to formalize dynamic decision making problems. One, called performative prediction, directly makes macro-level assumptions about the aggregate population response to a decision rule. The other, called strategic classification, follows economic microfoundations in modeling individuals as utility-maximizing agents with perfect information. We will reflect on the advantages and limitations of either perspective, pointing out avenues for future work.

Based on collaborations with Anca Dragan, Meena Jagadeesan, Celestine Mendler-Dünner, John Miller, Smitha Milli, Juan Carlos Perdomo, Tijana Zrnic

Bio: Moritz Hardt is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Hardt investigates algorithms and machine learning with a focus on reliability, validity, and societal impact. After obtaining a PhD in Computer Science from Princeton University, he held positions at IBM Research Almaden, Google Research and Google Brain. Hardt is a co-founder of the Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) and a co-author of the forthcoming textbook "Fairness and Machine Learning". He has received an NSF CAREER award, a Sloan fellowship, and best paper awards at ICML 2018 and ICLR 2017.

Climate Change: A Key Consideration for Responsible AI

Priya Donti (CMU) 16:15-16:45 PDT

Abstract: Climate change is one of the most pressing issues of our time, with massive implications for virtually every aspect of society. In this talk, I argue that climate change should be a core consideration for the development and practice of responsible AI. I first provide an overview of the various ways in which AI affects climate change, through the immediate impacts of applications, the system-level shifts it induces, and the energy use of models themselves. Ultimately, most AI and ML applications — not just those that are traditionally viewed as "climate-relevant" — can have significant implications for both climate change and climate action. I then provide several principles for responsible AI that are especially pertinent to aligning the use of AI with climate change goals.

Bio: Priya Donti is a Ph.D. student in the Computer Science Department and the Department of Engineering & Public Policy at Carnegie Mellon University, co-advised by Zico Kolter and Inês Azevedo. She is also co-founder and chair of Climate Change AI, an initiative to catalyze impactful work at the intersection of climate change and machine learning.

Her work lies at the intersection of machine learning, electric power systems, and climate change mitigation. Specifically, she is interested in creating machine learning techniques that incorporate domain knowledge (such as power system physics) to reduce greenhouse gas emissions from the electricity sector.

Towards Creating Models People Can Use: Experiences from Health Applications

Finale Doshi-Velez (Harvard) 16:45-17:15 PDT

Abstract: In this talk, I'll discuss some of our human-subject experiments designing and testing interfaces for AIs to interact with clinicians in the context of antidepressant recommendations. I'll point out ways in which clinicians reacted to correct and incorrect recommendations, and the kinds of requests they had for interfaces for them to better explore options with patients also as part of the conversation. These studies result in many interesting and challenging open questions in machine learning to deliver the requests and needs.

Bio: Finale Doshi-Velez is a John L. Loeb associate professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. Her research uses probabilistic methods to address many decision-making scenarios, with a focus on applications related to healthcare. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability. She is arec AFOSR YIP and NSF CAREER recipient; Sloan Fellow; and IEEE AI Top 10 to Watch.