We show additonal videos in our evaluations. We conduct all experiments in the natural environment at UC Berkeley.Â
With pedestrians
In these evaluations, the pedestrians try to have almost same interactions with the robot to have fair comparison in each method. The baseline is fine-tuned SACSoN policy by maximizing the SACSoN objectives. Different from our method, the baseline does not contain the learned Q value, when training the actor.
With small obstacles
We place the unseen small obstacles on the robot original trajectories to visualize the collision avoidance performance. In these video, we evaluate both our method and the baselines in the same environment with same obstacles at same position.