Learning to summarize user information for personalized reinforcement learning
from human feedback