Grounding Large Language Models with Online Reinforcement Learning