Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning