In reinforcement learning, reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic. These challenges can become more pronounced in real-world RL applications, where reward design is typically a collaborative process between an RL practitioner and a domain expert. The domain expert might express preferences, constraints, or desired outcomes, leaving the RL practitioner responsible for designing a reward function that satisfies these preferences.
Therefore, in this work, we develop a reward alignment metric, the Trajectory Alignment Coefficient, to evaluate how well a reward function, discount factor pair encodes the preferences of a domain expert. The Trajectory Alignment Coefficient quantifies the similarity between a human stakeholder’s ranking of trajectory distributions and those induced by a given reward function, discount factor pair. The figure below demonstrates how this metric can aid RL practitioners in reward design.