Imagined Potential Games: A Framework for Simulating, Learning, and Evaluating Interactive Behaviors

[Paper] [Appendix] [Code(coming soon)]

Interaction Generation Details

Reinforcement Learning Details

Realism and Interaction Reconstruction Details

Baseline Comparison Details

Paper Highlight

Hallway 2 agents

U-Turn 2 agents

T-Intersection 2 agents

Intersection 4 agents

Imagined Potential Game (IPG): distributed interaction generation framework that models collaborative interaction behaviors and serves as planners for reactive agents in arbitrary environments.

These reactive agents can be used to

Simulate Interactive Behaviors in different scenarios
- Collaborative required scenarios like hallway, intersection...
- Randomly generated scenarios.
Serve as reactive agents in simulations to help Learn to interact with collaborative closed-loop reactive agents.
- Gym Env with reactive agents (not rule-based, not replay)
- How to learn? Safe RL? Hierarchical RL? Inverse RL?
Evaluate the effectiveness of planners in highly interactive scenarios.
- Interact with heterogeneous agents of different types.
- Evaluation metrics for effective interactive navigation.

[Details on Interaction Generation] [Details on RL training]

I. Simulating Interactive Behaviors

Generate diverse, realistic interactions in interactive scenarios in a distributed way.

Examples of interactions in different scenarios are shown below. For more demos and details on resolving deadlocks (we can't fully avoid this under distributed setting), see Interaction Generation Details.

Scenario 1: Hallway

Hallway, 2 agents

Hallway, 3 agents

Hallway, 4 agents

Scenario 2: T-Intersection

T-Intersection, 2 agents

T-Intersection, 3 agents

T-Intersection, 4 agents

Scenario 3: U-Turn

U-Turn, 2 agents

Scenario 4: Intersection

Intersection, 2 agents

Intersection, 3 agents

Intersection, 4 agents

Scenario X: Randomly Generated Obstacles

Random Obstacles, 2 agents

Random Obstacles, 3 agents

II. Learning Interactive Behaviors

How to interact with collaborative agents that react based on your behavior? Learn to be conservative and aggressive in different cases.

Interactive RL Environment

Given different scenarios where IPG agents can effectively interact, we replace one IPG agent with a user-controlled agent (RL agent).

RL agents that succeeds in this environment needs to:

Navigate safely and efficiently.
Interact successfully with "IPG agents using different interaction parameters" representing agents with different personalities.
Interact with agents effectively in different scenarios (both interactive and non-interactive).

Success RL agents (in single scenario, interacting with fixed type of IPG agent)

LEFT: IPG agent interacts with IPG agent; RIGHT: Trained RL agent interacts with IPG agent.

Hallway: IPG agent will choose to yield.

Hallway: RL agent learns to go straight since IPG agent will yield.

IPG agent stops and interacts after observing the two agents.

T-intersect: RL agent behaves similar to IPG agent.

Failure RL agents

IPG agents interact in the intersection scenario.

RL agent fails to learn an effective way interacting with multiple agents

More details on RL reward design, challenges, and failure analysis. See RL Details and Discussion.
Code for the interactive simulation environment will be open-sourced soon.
How to effectively learn RL policies that interact with collaborative agents in this environment? Our Future Work! Coming soon.

III. Evaluate Interactive Planners

How to evaluate interactive planner performance?

The problem for trajectory prediction and existing sim-agent metrics: 1. Data-driven(need real data) 2. Not edge cases(highly interactive).

With IPG, we can test:

Can the planner interact with heterogeneous agents

Two representative agents other than normal IPG agents (or IPG agents with extreme parameters):

Ignore: plan as if no other agents exist. Non-collaborative: Avoid collision while planning, but don't collaborate.

We show that IPG agents can interact with these two types of agents.

Blue : IPG Green : Non-collaborative

The IPG agent is able to resolve deadlock via increasing the self safety radius.

Blue : Ignore Green : IPG

The IPG can avoid collision via MPC and consistently update on estimation of others.

This can be used to evaluate whether a trained RL agent or other planning algorithms can interact with agents using different interaction strategies!

How much extra time it's costing compared to a Centralized trajectory planning for all agents?

The robot can always choose the most conservative strategy and yield to other agents. But this is not efficient.

We propose using centralized "interaction"(there's no exact interaction since it's planned by a centralized planner) time as baseline time to compare extra time cost for interaction.

Centralized

Completion Time : 6.6 sec (Baseline)

IPG

Extra Time : + 2.0 sec

Page updated

Google Sites

Report abuse

Imagined Potential Games: A Framework for Simulating, Learning, and Evaluating Interactive Behaviors

I. Simulating Interactive Behaviors

Scenario 1: Hallway

Scenario 2: T-Intersection

Scenario 3: U-Turn

Scenario 4: Intersection

Scenario X: Randomly Generated Obstacles

II. Learning Interactive Behaviors

Success RL agents (in single scenario, interacting with fixed type of IPG agent)

LEFT: IPG agent interacts with IPG agent; RIGHT: Trained RL agent interacts with IPG agent.

Failure RL agents

More details on RL reward design, challenges, and failure analysis. See RL Details and Discussion.

Code for the interactive simulation environment will be open-sourced soon.

How to effectively learn RL policies that interact with collaborative agents in this environment? Our Future Work! Coming soon.