Hits / FAQ
Introduction
The goal of this page is to provide information to help you write better adversarial questions. In addition to examples of good and bad questions, we also have frequently asked questions (bottom of the page).
What makes a Question Adversarial?
Here are some useful videos about how to write good adversarial questions:
Paraphrased
Require logical reasoning, multi-hop reasoning
Require commonsense or domain expert knowledge
Contain math
Contain cross-lingual contents.
But this is not a complete list! The whole goal of this competition is to find more techniques that either make it difficult for computers to know when they're right or wrong or to lead the computers astray in their answering. We need your creativity!
What makes a Question Bad?
Sometimes, when people write questions that aim to "trick" a computer, they end up just writing bad questions that are confusing or ambiguous. We don't want those! To give examples of bad questions, I’m going to take some bad questions from the German board game “Spiel des Wissens” (translated into English of course).
Aside: Let me emphasize that not all the questions are bad. And it’s still a good game. I’d recommend it! The questions are up-to-date, and it’s fun to play with my kids because there are both “easy” and “difficult” questions so that kids can play alongside adults.
One of the biggest problems of Manchester paradigm questions is when the person asking the question thinks that there’s one correct answer, but the question admits many different answers. Let’s take
What is the name of the literary genre in which elements of the real and the magical world are interwoven?
The official answer was “fairy tales”, but “fantasy” or “magical realism” are arguably also correct.
This is particularly problematic for “location” and “temporal” questions. Where are the “Virunga Volcanoes?
this could be Earth, Africa, or just the Democratic Republic of the Congo (where the national park is). There’s nothing in the question to let you know that you need to name all three countries that the Virunga range goes through.
While that was more specific than one might expect, here’s a question that’s less specific than I thought.
Where was the first Iron Man?
has the natural answer Oahu, one of the Hawaiian islands. The official answer of “Hawaii” sounds false if you think about Hawaii the island rather than Hawaii the state. But you could also imagine someone answering USA, since it’s connected to the US Navy. The question should say something like “on which island” or “in which American state” to make it more reasonable.
Sometimes questions are vague not in what they’re asking about per se but in terms of the definition. Let’s take this question:
What countries border France to the south?
The official answers are Spain and Andorra: that makes sense! You can stand in those countries, walk straight north and end up in France. But if that’s the definition we’re using, are there other countries that would work? If you've vacationed in Baden-Baden, you're in German and France is to the north. While not all of Germany borders France to the South, some of it does.
And if you just count a border where you can go due south on a road from France to another country, then you could answer: Germany (to Baden-Baden as I mentioned before), Switzerland (Basel), Italy (Mont Blanc), Luxembourg (Saulnes), or Belgium (Fromelennes). But don’t forget that France isn’t just in Europe. Thanks to its overseas department of French Guina, you can also answer Brazil and Suriname.
Maybe Germany is excluded because it also borders France to the West? But thanks to the Spanish exclave of Llivia, you can start in Spain and walk to France in every cardinal direction!
A stronger version of this is when there’s an implicit assumption in the question that’s not stated. In linguistics, this is called a “false presupposition”. One example of this is
What color are Flamingo’s feathers?
The official answer said "pink". Now, this is true if the flamingos are eating brine shrimp or other foods rich in beta carotene. When flamingos are born, they’re white.
In a game written for a European audience makes sense that
What voltage comes out of the socket?
has the answer 230 Volts. But someon in the US would reasonably say 120, and that would be right!
What connector do you use to charge a phone?
has the official answer Micro-USB. Of all the questions I’m complaining about, this one just seems completely inexplicable to me. The whole thing about phones is that there are a bazillion different chargers. Every company used to have their own charger, and even in the age of standardization, there’s Micro-USB, USB C, Mini-USB and whatever Apple is doing today. I would love to live in a world where it was the case that you only had to have a Micro-USB charger!
Sometimes when a question is vague, people add something subjective to try to make it more specific but end up failing because the thing making it more specific is highly subjective.
What’s the name of Christopher Columbus’s most famous ship?
The official answer is Santa Maria, but there’s evidence that La Nina was Columbus’s favorite (and also had the official name Santa Clara). To specifically distinguish between the ships, you could rephrase the question like “Which of Columbus’s ships was stripped of its timbers to build a fort called La Navidad in northern Haiti”.
Similarly,
What famous novel takes place in an Italian Benedictine Abbey?
has the official answer Name of the Rose (and it’s what most people think of), but if you’re not ancient like I am, maybe you’d think of the Sarah Dunant book Sacred Hearts instead. So you could instead ask something like “What famous novel set in an Italian Benedictine Abbey has a main character whose name combines William of Ockham and Sir Arthur Conan Doyle's The Hound of the Baskervilles”.
But it’s not just the questions that can be bad; sometimes the answers are problematic. Let’s take this question:
Who was the last Czar of Russia?
The given answer is “Nikolas Alexandrovitch Romanov”. Now that’s an acceptable answer, that’s what he was called when he was born. But that’s not how most people know him. His regnal name was Nikolas II. Ideally you should have both as acceptable answers, but if you only have one, it should be Nikolas II. But ideally you should have even more answers: Nicholas II, Nikolai II Alexandrovich Romanov, Никола́й II Алекса́ндрович, and Николай II.
A final example of bad questions is the “swerve”. I didn’t see any examples of this from Spiel des Wissens, so had to search for some examples on the Internet.
One man gained fame for starting the Kulturkampf as well as a war with France. Otto van Bismarck lends his name to the capital of which U.S. state?
As you can see, you start out thinking that the answer will be Bismark and then it suddenly changes to being about North Dakota!
A less intentional kind of swerve is when the pronouns or references are unclear. A good question needs to establish early on what kind of thing it’s looking for. There are several ways questions mark what it’s asking about. Sometimes with a pronoun:
He was born in Tampico, Illinois before being elected president.
Sometimes with a determiner:
This is the most populous state in the US.
Sometimes with an interrogative:
What is the capital of California?
Things get confusing when you use a bunch of them at the same time! What the heck is this question looking for?
After being born in Tampico, Illinois, he always lived in this most populous state until moving out of what state capital when he was elected president?
Now this goes way overboard—nobody would write a question this confusing—but you can avoid this confusion by following these rules:
Don’t use a pronoun until you have introduced the answer type “what composer” in some other way
Only use “this” or “what” when referring to the thing that you’re asking about
Use consistent references to the answer. It’s okay to use different references, but they shouldn’t contradict each other: “this author of 1984” and “this volunteer in the Spanish Civil War” are okay because there’s about different things, but you shouldn’t say “this British author” and “this Spanish resident” (even if they’re technically correct … and certainly don’t do this to create a serve question).
To be clear, it’s impossible to always write a great question (mistakes happen, and this is why it’s always good to get second opinions on questions through playtesting or editing), but it gets easier over time, and hopefully just knowing about the ways questions can go wrong will help you avoid these common traps!
What makes a Question Good?
Well, last semester, I gave my students a new challenge: write questions that computers couldn’t answer. And they sure did! But many of the questions were indeed hard to answer, but they were so vague that no human could really answer the question either.
One way of knowing that a question is bad is that even if you have perfect knowledge, it’s hard to tell if you have the right answer. So, for example, take this question:
This online series stars a man who played an SS officer in one series and a woman who was a spy in yet another TV show.
If the official answer is “The Diplomat” based on the stars’ previous turns—Rufus Sewell in The Man in the High Castle and Keri Russel in The Americans—even if you’ve watched every single episode of all three series (and you should … they rock, kudos to the author of this question for their excellent taste in media), you’re not sure if you’ve got the answer right.
To verify the answer, you’d have to think about every male actor who played a spy hunter and every female actor that played that had Nazis (from in Hogan’s Heroes to Hunters) and every actor that played a spy (from Barabara Feldon in Get Smart to Sarah Lancaster in Chuck) and make sure that The Americans is the only possible answer.
From a trivia perspective, this punishes knowledge: the more possibilities you think of, the more candidates you have to check. That isn’t necessarily a bad thing: we want to test players’ ability, but there’s now no real way to check the answer.
So let’s edit the question to give it just a little more specificity:
What series could be named after either of its two co-stars: one of this series’ co-star is played by an actor who previously portrayed an SS officer who lured a doctor to the car after claiming to have killed his son and injecting the doctor with poison; the other co-star of this series played by an actress who portrayed a deep-cover spy that killed Nicaraguans being trained in the US by the CIA.
This doesn’t make the question that much easier. You must have seen these episodes to realize what’s being talked about here. That takes deep knowledge. Then you have to figure out that those actors are now on this new show on Netflix. But if you do have that knowledge, you can very quickly verify that the answer is correct: Obengruppenführer Smith killed Dr. Gerry Adler and Elizabeth Jennings killed three Contra agents in the episode Martial Eagle.
Now the question is clear: if you have all of the information at your disposal you can be fairly certain that the answer is actually The Diplomat starring Kerri Russel and Rufus Sewell. So you know immediately that you “figured it out”. This feels sooo good when you’re answering the question … all the puzzle pieces fit together in a satisfying way.
And this is exactly like—for instance—taking a setting of the variables in a 3SAT formula and then seeing if the overall formula is true. Remember that the problem is *whether* there is a solution … finding that solution is hard. But once you have an answer, you can check whether the answer is correct relatively quickly.
So, when you’re writing a question, vagueness can make it difficult both for humans and computers. So as you’re writing a question, think about this: could a human with perfect information produce a certificate to their answer? If not, or if there are dozens of possible answers, then the question is probably too vague to be usable.
Now, I said something like this to my course, and the students took a slightly different approach: make the answer specific by adding more and more vague constraints. To make this a little more mathematical, let’s say that there are L clues and each clue has C possible options. Then the work you have to do to check the possible candidates is L raised to the C. This is doable, but it goes against our ethos of having questions that are easier for humans than computers, since searching over a large set of options advantages computers.
So as you write question, make sure that a knowledgeable person will know immediately if their answer is correct.
Questions about Question Writing
Q. How difficult should the questions be?
For our summer 2023 competition, our questions should have a range of human difficulty (of course, for computers it should be as difficult as possible). Your easiest questions should be answerable by every "trivia player" at the end of the question. In other words, not insultingly easy but fairly easy. Your hardest questions should be a stretch for strong trivia players but still gettable by the best players.
Some of our previous competitions used pyramidal questions. These questions should be roughly on par with high school nationals questions: difficult clues on accessible topics that most people will get by the end of the question. In the quiz bowl lingo, this roughly is "2-dot" difficulty.
Q: What computers should not be able to answer the questions?
A: It's okay if some computers can't answer the questions. But clearly it would be better if all computers cannot answer the question. Even better is if they all fail in different ways. We're not going to focus on particular systems and say that, for instance, it must absolutely stump ChatGPT.
Q: I can't find my favorite answer in the system. Why is this?
A: This is a design decision that we made. These are the answers for which there have been three answers in mainstream quiz bowl tournaments. This is a tradeoff that we made to keep things relatively fair. We want questions that are challenging for computers not because they lack data but because they cannot understand English. By excluding rare answers and only focusing on frequently asked answers, if a computer gets it wrong, it's not because it lacks information to work off of ... it's because it didn't understand the question. We realize it's a little frustrating, so it's useful to check whether the answer is in bounds before writing the question. In many cases, you can tweak the question to ask about something more general (instead of asking about "William W. Belknap", ask about Grant, focusing on members of his cabinet).
Q: How do I make an account on the interface?
A: Just login with a new email and password. This will create a new account. There can only be one account per email.
Q: What if I forget my password?
A: Send us an email at qanta@googlegroups.com.
Q: What are the strange highlighted colors in the interface?
A: Words which are highlighted are "important" for our Quizbowl AI system to make its predictions. If you modify those words (e.g., rephrase that sentence), there is a high chance the system will get more confused.
Q: Do I have to use a Wikipedia page title as the answer?
A: Normally yes. It helps to standardize and will also let us link it to resources. However, some of the answers are not going to be in that set. E.g., if you want to have an answer like "missing a leg" or "ways Sean Bean has died in films" or "because they're all dead", that's not going to match a page. Please try to match if you can, but otherwise, an arbitrary string is fine.
Q: Can the answer be a year or a number?
A: Yes, but these questions are often difficult to write so that they're uniquely identifying. For events that are ancient history, there's often debate about when something happened. And there are different calendar systems. You you'd need to specify something like "Based on the Greogorian calendar". You also need to be specific about if you want a year, a day, a month, etc. There's also sometimes confusion about when something happened: battles can last days, elections are voted on in November but someone isn't sworn in until the following January. So all of this is to say, it's okay to ask this, but it sometimes requires more care.
Also, humans don't memorize a lot of dates. Some things are tightly tied to dates (coronation of Charlemagne, Pearl Harbor, September 11), but most things are not. So if we're looking for things that humans can answer but computers cannot, these sorts of things may be more difficult.
Q: Can I use ChatGPT to write the questions?
A: As long as you don't directly use ChatGPT answers, you can get help from it. For example, since your goal is to write questions that stump ChatGPT, you can use it to ensure that your questions are good enough to bypass ChatGPT.
Q: Do questions have to be in English?
A: Yes. We are going to use English-based question answering competitions (etc., human answerer, computer answerer), so the language will have to be in English. However, you can use foreign terms of phrases that a well-educated English speaker would know.
Q: What is the scope of the answer space? What if I think there could be more than one answer?
A: For the sake of simplicity, we are not taking any submission on multiple answers to the question in the interface (One way to avoid even thinking about this is to write non-vague enough question that only admits ONE answer!). This changed from the previous pyramidal question writing competition where you were required to select multiple answers before writing questions. Now you are restricted with one answer (The topic box -- Wikipedia page title -- you choose to write your question about).
We will try our best so that human competitions are held with the same level of strictness that computer competitions are held with. At the same time, we will also try our best to be lenient on the overall assessment of the human answers and computer's answers (i.e., acronyms will be considered as answers even when the correct answer was in its original form).
On the other hand, if you insist that your question admits more than one answer (which we perfectly agree that this is normal), please submit your question in the form of Google spreadsheet. We will take into account your team's questions on the interface AND what you've submitted through the spreadsheet.
Gameplay Questions
Q: How does a computer decide if it is confident enough to answer?
A: If the designated computer has the highest confidence of all computers, it will answer. If its confidence is more than half of the highest confidence of another computer, it will still answer. If its confidence is lower than half of the next highest confidence, it will defer to another teammate.
Q: If a computer passes, which teammate does it go to?
A: The computer with the highest confidence.
Q: So there's no computer captain?
A: Correct.
Categories and Diversity
Q: What happens if I don't follow the question category distribution requirement?
A: To create a full set of questions, we need to have questions about different things. Thus, we could only use your best questions; this would hurt your "Quantity/Quality" score.
Q: What if I write all of my questions about a single country?
A: We encourage you to write as many diverse questions as you can, meaning the question set includes a variety of countries, locations, and named entities. Remember, you will get bonus points if you include diversity in your questions!
Q: How do I split questions between categories if I don't write a full set?
A: No set of questions is perfect. Do your best in segregating each question into categories. For example, it is fine when you write ten questions for one category; however, make sure to submit each question tagged with one of the eight categories (However, your score will be affected if you don't follow the half, double, and full packet guidelines).
Q: Can we blend subcategories/categories?
A: Feel free to blend subcategories and categories. Our distribution requirements aren't super strict. But don't use blending to avoid writing about topics (e.g., "real" science) that should be covered in the set somewhere.
Submission
Q: What if I already have some questions written that aren’t in the interface?
A: You can use another way of submitting the questions. You can easily use a spread sheet (example here: Google Sheet ) to submit your questions. Feel free to write the remaining set of questions on the interface (But please remember to use the same name (team name) and gmail address in the Google Form for registration)
Q: How do I submit questions? Why this craziness?
A: You submit your questions through a web form. We describe the system in this paper, and a tutorial video is below.
Q: If I write questions in the Spreadsheet, do I have to use the same answer set?
A: We encourage you to write answers that are on the same answer set, because it will be easier to evaluate your questions compared to what other teams wrote; this means we will run different questions targeted on the same answer, and see how the same QA model spits out answers. It will be fun and interesting to see how models return different (incorrect or correct) answers to different questions that expect the same answer!
Q: I have one really good question. Can I just submit that? Do I have a chance of winning?
A: We encourage you to write a full set of questions to make it comparable with other participants. Since we evaluate your questions depending on the scoring rubric, it will be impossible to grade your questions under the same scale as others. This being said, you will have a very very low chance of winning.
Editing
Q: Who will be editing the questions?
A: Professional trivia editors.
Q: What if they change my question to make it less adversarial?
A: That is a risk! As a result, you should make sure that your questions are usable as-is so that editors will not change the question.
Q: What happens if the editors decide my question is not usable?
A: Then the question will not be a part of the competition and will not be a part of your final score.