Week 5: Human-in-the-Loop RL

Preference-based learning:

Preference-based RL for dialog / summarization

COACH:

TAMER:

Learning a Goal Classifier

Other