Overview

Saturday December 9th, 2017

Schedule

9:30 - 9:45 Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

        El Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx and Alexandre Maurer, EPFL

9:45 - 10:15 Minimax-Regret Querying on Side Effects in Factored Markov Decision Processes

        Satinder Singh, University of Michigan, Ann Arbor

10:15 - 10:30 Robust Covariate Shift with Exact Loss Functions

        Angie Liu and Brian Ziebart, University of Illinois, Chicago

10:30 Coffee Break

11:00 Adversarial Robustness for Aligned AI

        Ian Goodfellow, Google Brain 

11:30 Incomplete Contracting and AI Alignment

        Gillian Hadfield, USC Law (Center for Human Compatible AI)

12:00 Lunch Break

1:15 Learning from Human Feedback

        Paul Christiano, Open AI

1:45 Finite Supervision Reinforcement Learning

        Wiliam Saunders and Eric Langlois, University of Toronto

2:00 Safer Classification by Synthesis

        William Wang, Angelina Wang, Aviv Tamar, Xi Chen and Pieter Abbeel, University of California, Berkeley

2:15 Poster Spotlights

2:30 Poster Session

3:30 Machine Learning for Deliberative Human Judgment

        Owain Evans, University of Oxford (Future Humanity Institute)

4:00 Learning Reward Functions

        Jan Leike, Deepmind

4:30 Technical Discussion: Open Problems in AI Alignment


In order to be helpful to users and to society at large, an autonomous agent needs to be aligned with the objectives of its stakeholders. Misaligned incentives are a common problem with human agents --- we should expect similar challenges to arise from misaligned incentives with artificial agents. For example, it is not uncommon to see reinforcement learning agents ‘hack’ their specified reward function. How do we build learning systems that will reliably achieve a user's intended objective? How can we ensure that autonomous agents behave reliably in unforeseen situations? How can we design systems that can aggregate the preferences of multiple users with differing values and judgements? As AI capabilities develop, it is crucial for the AI community to come to satisfying and trustworthy answers to these questions.


This workshop will focus on the following challenges in value alignment:

  1. Learning complex rewards that reflect and are aligned with human preferences (e.g. meaningful oversight, preference elicitation, inverse reinforcement learning, learning from demonstrations or feedback).

  2. Engineering reliable AI systems (e.g. robustness to distributional shift, model misspecification, or adversarial data, via methods such as adversarial training, KWIK-style learning, or transparency to human inspection).

  3. Dealing with bounded rationality and incomplete information in both AI systems and their users (e.g. acting on incomplete task specifications or partially observable rewards, learning from users who sometimes make mistakes).

  4. Exploring problem framings from economics, game theory, or other areas (e.g. solving human-AI principal-agent problems; aligning incentive gradients used to train machine learning systems with the goals of their designers.)


We also welcome submissions that do not directly fit these categories but generally deal with problems relating to value alignment in artificial intelligence.


Organizing Committee:

Dylan Hadfield-Menell (Berkeley)

David Krueger (MILA / University of Montreal)

Jacob Steinhardt (Stanford)

David Duvenaud (Toronto)

Anca Dragan (Berkeley)

Comments