Learning Rollout from Sampling: 

An R1-Style Tokenized Traffic Simulation Model