Reinforcing VLAs in Task-Agnostic World Models