Traditional inverse reinforcement learning (IRL) allows us to automatically construct reward functions when they are difficult to build by hand, such as when using complicated observations like images. However, it also requires full demonstrations of a task, meaning we need to already know how to perform a task. One workaround is to gather instances of a desired outcome and train a classifier to detect your goal - however, this comes with its own issues such as how you should mine negatives, or balance your dataset, etc. Moreover, a clever RL agent might learn to maximize the classifier reward without actually achieving our desired objective.
In the event-based framework, we can formalize this problem as learning the event probability, and we require data corresponding the states and actions when the event occurs. We see that VICE is able to learn policies that correspond to our true objective (of the pushing the block to the target), while the pre-trained classifier baseline maximizes the log-probability objective to its limit without achieving our desired goal, which indicates that naive classifiers can easily lead to task misspecification. The binary event indicator baseline (which observes the true event, and is similar to RL from sparse rewards) is able to learn the desired behavior, but is significantly less sample efficient, and requires heavy supervision (labels that indicate whether an event happened or not for every state visited). All videos shown here are after training for 1000 iterations.