Generative Verifiers:

Reward Modeling as Next-Token Prediction