Models of Human Feedback for AI Alignment ICML 2024

ICML 2024 Workshop
Models of Human Feedback for AI Alignment

July 26th 2024, Schubert 4 - 6, Messe Wien Exhibition Congress Center, Vienna, Austria

Update: the 2nd Edition of the workshop will take place at ICML 2025! (Webpage 2025)

Important Dates

Paper Submission Deadline (OpenReview): May 31, 2024
Acceptance Notification: June 17, 2024
Camera-Ready Deadline: June 25, 2024
Workshop: July 26, 2024

Workshop Overview

Aligning AI agents with human intentions and values is one of the main barriers to the safe and ethical application of AI systems in the real world, spanning various domains such as robotics, recommender systems, autonomous driving, and large language models. To this end, understanding human decision-making and interpreting human choices is fundamental for building intelligent systems that can interact with users effectively, align with their preferences, and contribute to the development of ethical and user-centric AI applications.

Despite its vital importance for Human-AI Alignment, current approaches, such as Reinforcement Learning with Human Feedback (RLHF) or Learning from Demonstrations (LfD), rely on highly questionable assumptions about the meaning of observed human feedback and interactions. In fact, these assumptions remain mostly unchallenged by the community, and simplistic human feedback models are often being reused without any re-evaluation of their suitability. For example, we typically assume that a human acts rationally, that human feedback is unbiased, or that all humans provide similar feedback and have similar opinions. Of course, many of these assumptions are violated in practice, however, the role of such modeling assumptions has mostly been neglected in the literature on human-AI alignment. The goals of this workshop are:

to bring together different communities towards a better understanding of human feedback
to discuss different types of human feedback and discuss mathematical and computational models of human feedback and their shortcomings
discuss important and promising future directions towards a better understanding of human feedback models and better AI alignment.

Speakers