Friday August 11th @ C4.7

When can we trust that a system that has performed well in the past will continue to do so in the future? Designing systems that are reliable in the wild is essential for high stakes applications such as self-driving cars and automated surgical assistants. This workshop aims to bring together researchers in diverse areas such as reinforcement learning, human-robot interaction, game theory, cognitive science, and security to further the field of reliability in machine learning. We will focus on three aspects — robustness (to adversaries, distributional shift, model misspecification, corrupted data); awareness (of when a change has occurred, when the model might be miscalibrated, etc.);and  adaptation (to new situations or objectives). We aim to consider each of these in the context of the complex human factors that impact the successful application or meaningful monitoring of any artificial intelligence technology. Together, these will aid us in designing and deploying reliable machine learning systems.


We are seeking submissions that deal with the challenges of reliably applied machine learning techniques in the real world. Some possible questions touching on each of these categories are given below, though we also welcome submissions that do not directly fit into these categories.


  • Robustness: How can we make a system robust to novel or potentially adversarial inputs? What are ways of handling model mis-specification or corrupted training data? What can be done if the training data is potentially a function of system behavior or of other agents in the environment (e.g. when collecting data on users that respond to changes in the system and might also behave strategically)?
  • Awareness: How do we make a system aware of its environment and of its own limitations, so that it can recognize and signal when it is no longer able to make reliable predictions or decisions? Can it successfully identify “strange” inputs or situations and take appropriately conservative actions? How can it detect when changes in the environment have occurred that require re-training? How can it detect that its model might be mis-specified or poorly-calibrated?
  • Adaptation: How can machine learning systems detect and adapt to changes in their environment, especially large changes (e.g. low overlap between train and test distributions, poor initial model assumptions, or shifts in the underlying prediction function)? How should an autonomous agent act when confronting radically new contexts?
  • Monitoring: How can we monitor large-scale systems in order to judge if they are performing well? If things go wrong, what tools can help?
  • Value Alignment: For systems with complex desiderata, how can we learn a value function that captures and balances all relevant considerations? How should a system act given uncertainty about its value function? Can we make sure that a system reflects the values of the humans who use it?
  • Reward Hacking: How can we ensure that the objective of a system is immune to reward hacking? Reward hacking is a way that the system can attain high reward that was unintended by the system designer. For example see https://blog.openai.com/faulty-reward-functions/
  • Human Factors: Actual humans will be interacting and adapting to these systems when they are deployed. How do properties of humans affect the guarantees of performance that the system has? What if the humans are suboptimal or even adversarial?

This workshop is supported by the Open Philanthropy Project (http://www.openphilanthropy.org) and The Leverhulme Centre for the Future of Intelligence (http://lcfi.ac.uk/ ). 

Organizers: Dylan Hadfield-Menell, Adrian Weller, Jacob Steinhardt, Smitha Milli