Here I list the most recent paper from each of my research fields. For a full list of my papers, consult my Google Scholar.
To overcome the limitations of tuning LLMs from data, we introduce Language Self-Play (LSP) that learns without data, completely though self-play.
In the most recent work, we offer a transformer architecture that enables solving optimization problems end-to-end. (AAAI 2026)Â
We meta-learn Discovered Policy Optimization - an RL algorithm with convergence guarantees. (NeurIPS 2022)
We introduce trust region algorithms, HATRPO and HAPPO, to multi-agent RL and prove their monotonic improvement guarantees. (ICLR 2022)