Publications

Here I list the most recent paper from each of my research fields. For a full list of my papers, consult my Google Scholar.

LLMs

To overcome the limitations of tuning LLMs from data, we introduce Language Self-Play (LSP) that learns without data, completely though self-play.

In the most recent work, we offer a transformer architecture that enables solving optimization problems end-to-end. (AAAI 2026)

We meta-learn Discovered Policy Optimization - an RL algorithm with convergence guarantees. (NeurIPS 2022)

We introduce trust region algorithms, HATRPO and HAPPO, to multi-agent RL and prove their monotonic improvement guarantees. (ICLR 2022)

Page updated

Report abuse