Few-shot Preference Learning for Human-in-the-Loop RL