Training steps: 500k Speed: x2.5
RL agent yields to other agents.
Training steps: 1000k Speed: x2.5
As training progresses, RL agent becomes more aggressive.
Training steps: 1000k Speed: x2.5
State representation: [x, y, v * sinθ, v * cosθ]
Training steps: 1000k Speed: x2.5
State representation: [x, y, θ, sinθ, cosθ, v]
In our experiments conducted in HallWay, we encountered significant challenges in training a RL agent capable of interacting effectively with a blind agent using conventional RL algorithms, reward structures, and observation designs. We hypothesize that the primary difficulty stems from inadequate exploration strategies. Addressing this issue, we propose the refinement of RL algorithm designs as a direction for future work.