MLM Reinforcement Learning