Leaderboard
This is the Elo rating leaderboard generated by human evaluations (human-) and LLM-as-a-Judge evaluations (auto-*).
The leaderboard does not include teams that did not submit a paper.