Adversarial Writing FAQ

Writing Questions

Q. How difficult should the questions be?

For our summer 2023 competition, our questions should have a range of human difficulty (of course, for computers it should be as difficult as possible). Your easiest questions should be answerable by every "trivia player". In other words, not insultingly easy but fairly easy. Your hardest questions should be a stretch for strong trivia players but still gettable by the best players.

Some of our previous competitions used pyramidal questions. These questions should be roughly on par with high school nationals questions: difficult clues on accessible topics that most people will get by the end of the question.

Q: What computers should not be able to answer the questions?

A: It's okay if some computers can't answer the questions. But clearly it would be better if all computers cannot answer the question. Even better is if they all fail in different ways. We're not going to focus on particular systems and say that, for instance, it must absolutely stump ChatGPT.

Q: I can't find my favorite answer in the system. Why is this?
A: This is a design decision that we made. These are the answers for which there have been three answers in mainstream quiz bowl tournaments. This is a tradeoff that we made to keep things relatively fair. We want questions that are challenging for computers not because they lack data but because they cannot understand English. By excluding rare answers and only focusing on frequently asked answers, if a computer gets it wrong, it's not because it lacks information to work off of ... it's because it didn't understand the question. We realize it's a little frustrating, so it's useful to check whether the answer is in bounds before writing the question. In many cases, you can tweak the question to ask about something more general (instead of asking about "William W. Belknap", ask about Grant, focusing on members of his cabinet).

Q: How do I make an account on the interface?
A: Just login with a new email and password. This will create a new account. There can only be one account per email.

Q: What if I forget my password?
A: Send us an email at qanta@googlegroups.com.

Q: What are the strange highlighted colors in the interface?
A: Words which are highlighted are "important" for our Quizbowl AI system to make its predictions. If you modify those words (e.g., rephrase that sentence), there is a high chance the system will get more confused.

Q: Do I have to use a Wikipedia page title as the answer?
A: Normally yes. It helps to standardize and will also let us link it to resources. However, some of the answers are not going to be in that set. E.g., if you want to have an answer like "missing a leg" or "ways Sean Bean has died in films" or "because they're all dead", that's not going to match a page. Please try to match if you can, but otherwise, an arbitrary string is fine.

Gameplay

Q: How are computer teams formed?

A: A human team will play against no more than four computer opponents. If there are four or fewer computers, all of them will play. If there are more than four computers, a set of four will be selected randomly. The probability of a computer being selected will be proportional to its accuracy.

Q: How does a computer decide if it is confident enough to answer?

A: If the designated computer has the highest confidence of all computers, it will answer. If its confidence is more than half of the highest confidence of another computer, it will still answer. If its confidence is lower than half of the next highest confidence, it will defer to another teammate.

Q: If a computer passes, which teammate does it go to?

A: The computer with the highest confidence.

Q: So there's no computer captain?

A: Correct.

Categories and Diversity

Q: What happens if I don't follow the question category distribution requirement?

A: To create a full set of questions, we need to have questions about different things. Thus, we could only use your best questions; this would hurt your "Quantity/Quality" score.

Q: What if I write all of my questions about a single country?

A: We encourage you to write as many diverse questions as you can, meaning the question set includes a variety of countries, locations, and named entities. Remember, you will get bonus points if you include diversity in your questions!

Q: How do I split questions between categories if I don't write a full set?

A: No set of questions is perfect. Do your best in segregating each question into categories. For example, it is fine when you write ten questions for one category; however, make sure to submit each question tagged with one of the eight categories (However, your score will be affected if you don't follow the half, double, and full packet guidelines).

Q: Can we blend subcategories/categories?

A: Feel free to blend subcategories and categories. Our distribution requirements aren't super strict. But don't use blending to avoid writing about topics (e.g., "real" science) that should be covered in the set somewhere.

Submission

Q: What if I already have some questions written that aren’t in the interface?

A: You can use another way of submitting the questions. You can easily use a spread sheet (example here: Google Sheet ) to submit your questions. Feel free to write the remaining set of questions on the interface (But please remember to use the same name (team name) and gmail address in the Google Form for registration)

Q: How do I submit questions? Why this craziness?

A: You submit your questions through a web form. We describe the system in this paper, and a tutorial video is below.

Q: If I write questions in the Spreadsheet, do I have to use the same answer set?

A: We encourage you to write answers that are on the same answer set, because it will be easier to evaluate your questions compared to what other teams wrote; this means we will run different questions targeted on the same answer, and see how the same QA model spits out answers. It will be fun and interesting to see how models return different (incorrect or correct) answers to different questions that expect the same answer!

Q: I have one really good question. Can I just submit that? Do I have a chance of winning?

A: We encourage you to write a full set of questions to make it comparable with other participants. Since we evaluate your questions depending on the scoring rubric, it will be impossible to grade your questions under the same scale as others. This being said, you will have a very very low chance of winning.

Editing

Q: Who will be editing the questions?

A: Professional trivia editors.

Q: What if they change my question to make it less adversarial?

A: That is a risk! As a result, you should make sure that your questions are usable as-is so that editors will not change the question.

Q: What happens if the editors decide my question is not usable?

A: Then the question will not be a part of the competition and will not be a part of your final score.

Questions

Q: Can the answer be a year or a number?

A: Yes, but these questions are often difficult to write so that they're uniquely identifying. For events that are ancient history, there's often debate about when something happened. And there are different calendar systems. You you'd need to specify something like "Based on the Greogorian calendar". You also need to be specific about if you want a year, a day, a month, etc. There's also sometimes confusion about when something happened: battles can last days, elections are voted on in November but someone isn't sworn in until the following January. So all of this is to say, it's okay to ask this, but it sometimes requires more care.

Also, humans don't memorize a lot of dates. Some things are tightly tied to dates (coronation of Charlemagne, Pearl Harbor, September 11), but most things are not. So if we're looking for things that humans can answer but computers cannot, these sorts of things may be more difficult.

Q: Can I use ChatGPT to write the questions?

A: As long as you don't directly use ChatGPT answers, you can get help from it. For example, since your goal is to write questions that stump ChatGPT, you can use it to ensure that your questions are good enough to bypass ChatGPT.

Q: Do questions have to be in English?

A: Yes. We are going to use English-based question answering competitions (etc., human answerer, computer answerer), so the language will have to be in English. However, you can use foreign terms of phrases that a well-educated English speaker would know.

Q: What is the scope of the answer space? What if I think there could be more than one answer?

A: For the sake of simplicity, we are not taking any submission on multiple answers to the question in the interface (One way to avoid even thinking about this is to write non-vague enough question that only admits ONE answer!). This changed from the previous pyramidal question writing competition where you were required to select multiple answers before writing questions. Now you are restricted with one answer (The topic box -- Wikipedia page title -- you choose to write your question about).

We will try our best so that human competitions are held with the same level of strictness that computer competitions are held with. At the same time, we will also try our best to be lenient on the overall assessment of the human answers and computer's answers (i.e., acronyms will be considered as answers even when the correct answer was in its original form).

On the other hand, if you insist that your question admits more than one answer (which we perfectly agree that this is normal), please submit your question in the form of Google spreadsheet. We will take into account your team's questions on the interface AND what you've submitted through the spreadsheet.