23 May, 2025: Leaderboard updated for new set of questions
30 May, 2025: Leaderboard updated for new set of questions
7 June, 2025: Deadline for submitting system for in-person tournament
14 June, 2025: Deadline for submitting system for online tournament (Last call for submitting systems!)
🥇 First place: $200 🥈Second place: $150 🥉Third place: $100 🎖️Fourth place: $50
We have two QuizBowl tasks for your AI system submissions:
Your AI systems will be paired with human players in this competition and will contribute together toward every point.
Fastest fingers first!
Toss-up: A single question read aloud to both teams; anyone can buzz at any point, and the first correct buzz earns points, while an incorrect buzz locks out the team with negative points.
Task: Play alongside human teammates, continually parse the question text, update the best guess, gauge confidence, and decides when to buzz
Collaborate and win more points!
Bonus: A short, multi-part (typically three) question given only to the team that won the toss-up. They can discuss briefly before answering each part in a set amount of time.
Task: Respond with a best guess, a numeric confidence score, and a brief explanation. Using this input, and their initial guess the team captain decides on their final answer, combining human insight with AI analysis.
We use 🤗 Hugging Face for this competition. If you don't have a Hugging Face account, please create one at huggingface.co
📝 Step I : Register for competition at qanta-challenge/register
💻 Step II : Play around with our submission interface:
Create an AI Quiz bowl system, and run against our play ground data, and and check your scores and fun metrics.
📊 Step III: Submit your AI system and wait for your system to appear on our leaderboard at qanta-challenge/leaderboard
✌🏼😄 There are two ways to submit a system:
Using prompt-based interface at our Hugging Face space qanta-challenge/quizbowl-submission, or
Linking your own Hugging Face model pipeline to our competition submission. (to be soon supported)
🤗 No matter how you submit, your system will be viewed the same way in the competition (and eligible for the same prizes).
📜 Documentation on interface usage, walkthroughs and tutorials at qanta-challenge/QANTA25
🧑🏻💻 Need any help, or have issues?
Please get in touch on the 👾 Discord server (#model-submission channel), or
File an issue at qanta-challenge/QANTA25/issues or,
Send us e-mail to qanta@googlegroups.com.
🚀 Build and Test Your AI Agent—Directly in Your Browser
Our interactive, prompt-based interface at qanta-challenge/quizbowl-submission makes it simple to design, test, and submit your Quizbowl AI agents—no setup or installation required.
🧠 You can experiment with different strategies for generating explanations and confidence scores on a sample set of quiz questions. You can quickly start building with sample pipelines we provide, or import your previous submissions as a starting point.
ℹ️ A step-by-step tutorial guides you through creating a variety of systems for both Toss-up and Bonus question answering tasks, including multi stage pipelines.
🧩 As you test, you’ll receive instant feedback, including a dashboard for tossup questions and accuracy metrics for bonuses—so you can refine your ideas in real time. When you’re ready, submit your prompt-based model directly through the interface. It’s the fastest way to prototype and refine your ideas for our competition.
⚡️ No sign-up is needed to get started: explore, build, and experiment freely. When you’re ready to submit your agent for the competition, just sign up and submit—all within the same interface.
Currently, we support select models from OpenAI, Cohere, and Anthropic for this interface.
⚙️ 🛠️ Ready to Go Further? Submit Your Own Hugging Face Model
For participants looking to go beyond prompt-based solutions, we will also support direct submission of custom Hugging Face model pipelines.
⚡ Once you’ve familiarized yourself with the task and interface, you can fine-tune your own models and submit any pipeline, as long as it follows our simple input/output API specification.
🔩This advanced option lets you experiment with your own architectures (e.g., RAG, LORA-tuned LLama-3), training data, and model tweaks—giving you complete control over your system’s behavior.
Submissions using the Hugging Face pipeline will be evaluated and ranked alongside prompt-based entries, with results posted to the leaderboard (updated May 23, May 30, and June 7). 📊
Checkout our QANTA 20205 Leaderboard at 🏅qanta-challenge/leaderboard.
The leaderboard will not be the final ranking of systems. How well the model does in our live human-computer match will determine its final ranking. However, the leaderboard metrics should provide a good measure of how well you're doing. Moreover, we will use our leaderboard metrics to downsample the AI systems we use (finalists) during our live competition.
📊 How we score your submitted systems ?
🛎️ Tossups:
We measure how often on average your AI “buzzes in” correctly, compared to human players. An agent is better if it buzzes correctly more often before humans' correct buzzes on a tossup, but avoid making premature incorrect buzzes—an accurate system with a well-calibrated buzzer.
Your system earn more points for buzzing before humans and being right (+1), lose points for incorrect early buzzes (–0.5), and get zero for missed chances. We use these numbers to keep it aligned with Quiz bowl scoring system. (+10 / -5 / 0)
This metric is inspired by a calibration metric, CALSCORE (Sung et al, 2025, Section 4).
🧐 Bonuses:
Here, we focus on the effect your system has when paired with human team, where we have captains to select the final answers to the question.
The metric captures how much your AI system’s explanations and answers improve the human’s final guess, compared to what the human would do alone. This is measured as a delta in accuracy, ranging from –1.0 (AI harms performance) to +1.0 (AI maximally helps).
Bonus scoring mechanism for models and humans is in this description of bonus scoring, and we will focus on the "Effect" metric, with a computer "captain" looking at a set of explanations and deciding which to trust: .
Both tossup and bonus scores are displayed on seperate leaderboard so you can see your strengths at a glance.
Overall Competition
⇒ To qualify for the competition, you must make at least one toss-up system and one bonus system submission!
We then, will pair up your best performing systems in each category to consider that entry for the final leaderboard.
While the leaderboard gives a strong indication of performance, our real goal is to measure:
If a human player teams up with your AI system, how many more points will they earn together?
Final rankings will be based on these head-to-head human–AI matches.
Note: We will be adding more questions subsequently on May 23, May 30, and June 7, and reevaluate submitted systems and update our leaderboard.
What if my system type is not specified here or not supported yet?
⇒ Please send a message to qanta@googlegroups.com so we could check how we could adapt the leaderboard for your purpose.
How will the models "play" each other?
⇒ Upon collecting all submitted systems, we will run them on our secret test set and score them by our evaluation metrics. Whoever has the highest score will be the winner of the tournament!
If you have any questions, please contact us at qanta@googlegroups.com.