Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data