Home

MONA Experiment Trajectories

Here we present randomly sampled transcripts throughout training from the experiments presented in MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking

Transcripts:

Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, and Rohin Shah. "MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking". arXiv preprint arXiv:2501.13011, 2025.

BibTeX entry:

@misc{farquhar2025monamyopicoptimizationnonmyopic,

title={MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking},

author={Sebastian Farquhar and Vikrant Varma and David Lindner and David Elson and Caleb Biddulph and Ian Goodfellow and Rohin Shah},

year={2025},

eprint={2501.13011},

archivePrefix={arXiv},

primaryClass={cs.LG},

url={https://arxiv.org/abs/2501.13011},

}

Page updated

Google Sites

Report abuse

MONA Experiment Trajectories

Transcripts:

Test-driven development

Loan applications