How to Write Good Questions
(that are also Adversarial)

This page will give an overview of how to write questions for our 2023 competition.

Diversity

Existing sources of questions are not diverse: they're mostly about American men.

If you write your question in the app, you will get scores for diversity in the leaderboard: questions that talk about entities from underrepresented groups will get a higher score.

Suggestions for Upgrading Diversity will recommend entities from underrepresented groups (e.g., if you're writing about Winston Churchill, using Bhambatha as a clue would improve the diversity of question).

*An example of diversity help

The entities in below question are relevant to United Kingdom (listed in Diversity of Your Question), but you can improve your question by adding information about one of the countries that are listed in the Suggestions for Upgrading Diversity. The more you include entities that can be represented by different countries, your diversity score will go up!

Below is an example of a good question that takes advantage of the diversity suggestions in the bottom left corner. Currently, the question has low level of diversity (only having one entity, "George Orwell" linked to "United Kingdom").

From the suggestion list, one piece of information was added about how Apple was recognized by several media, Forbes and Boston Consulting Group, hinted from the Wikipedia page in the United States. Then, Diversity of Your Question picked up the "United States" which made your question more diverse!

How to write Good Questions?

Here are some useful videos about how to write good adversarial questions:

Human-Computer Adversarial Question Answering: Introduction

The Manchester Paradigm: The Long History of Humans Crafting Adversarial Questions

The Adversarial Question Writing Process

Adversarial Question Answering: How Explanations for Humans can Trick Computers

Here are some helpful articles to write good adversarial questions:

https://acf-quizbowl.com/subash-guide/

https://www.ocf.berkeley.edu/~quizbowl/qb-writing.html

https://aclanthology.org/2020.emnlp-main.466.pdf

https://aclanthology.org/2020.acl-main.662.pdf

https://arxiv.org/pdf/2211.17257.pdf

How to write Good Adversarial Questions?

There are multiple ways that you can stump the computer, such as questions that are

Paraphrased
Require logical reasoning, multi-hop reasoning
Require commonsense or domain expert knowledge
Contain math
Contain cross-lingual contents.

Watching this video will help you get the idea: https://www.youtube.com/watch?v=6oZCIOBiSaI

Avoiding Vagueness: You Should "Know" When Your Answers is Right

Well, last semester, I gave my students a new challenge: write questions that computers couldn’t answer. And they sure did! But many of the questions were indeed hard to answer, but they were so vague that no human could really answer the question either.

One way of knowing that a question is bad is that even if you have perfect knowledge, it’s hard to tell if you have the right answer. So, for example, take this question:

This online series stars a man who played an SS officer in one series and a woman who was a spy in yet another TV show.

If the official answer is “The Diplomat” based on the stars’ previous turns—Rufus Sewell in The Man in the High Castle and Keri Russel in The Americans—even if you’ve watched every single episode of all three series (and you should … they rock, kudos to the author of this question for their excellent taste in media), you’re not sure if you’ve got the answer right.

To verify the answer, you’d have to think about every male actor who played a spy hunter and every female actor that played that had Nazis (from in Hogan’s Heroes to Hunters) and every actor that played a spy (from Barabara Feldon in Get Smart to Sarah Lancaster in Chuck) and make sure that The Americans is the only possible answer.

From a trivia perspective, this punishes knowledge: the more possibilities you think of, the more candidates you have to check. That isn’t necessarily a bad thing: we want to test players’ ability, but there’s now no real way to check the answer.

So let’s edit the question to give it just a little more specificity:

What series could be named after either of its two co-stars: one of this series’ co-star is played by an actor who previously portrayed an SS officer who lured a doctor to the car after claiming to have killed his son and injecting the doctor with poison; the other co-star of this series played by an actress who portrayed a deep-cover spy that killed Nicaraguans being trained in the US by the CIA.

This doesn’t make the question that much easier. You must have seen these episodes to realize what’s being talked about here. That takes deep knowledge. Then you have to figure out that those actors are now on this new show on Netflix. But if you do have that knowledge, you can very quickly verify that the answer is correct: Obengruppenführer Smith killed Dr. Gerry Adler and Elizabeth Jennings killed three Contra agents in the episode Martial Eagle.

Now the question is clear: if you have all of the information at your disposal you can be fairly certain that the answer is actually The Diplomat starring Kerri Russel and Rufus Sewell. So you know immediately that you “figured it out”. This feels sooo good when you’re answering the question … all the puzzle pieces fit together in a satisfying way.

And this is exactly like—for instance—taking a setting of the variables in a 3SAT formula and then seeing if the overall formula is true. Remember that the problem is *whether* there is a solution … finding that solution is hard. But once you have an answer, you can check whether the answer is correct relatively quickly.

So, when you’re writing a question, vagueness can make it difficult both for humans and computers. So as you’re writing a question, think about this: could a human with perfect information produce a certificate to their answer? If not, or if there are dozens of possible answers, then the question is probably too vague to be usable.

Now, I said something like this to my course, and the students took a slightly different approach: make the answer specific by adding more and more vague constraints. To make this a little more mathematical, let’s say that there are L clues and each clue has C possible options. Then the work you have to do to check the possible candidates is L raised to the C. This is doable, but it goes against our ethos of having questions that are easier for humans than computers, since searching over a large set of options advantages computers.

So as you write question, make sure that a knowledgeable person will know immediately if their answer is correct.

Examples of Bad Questions

To give examples of bad questions, I’m going to take some bad questions from the German board game “Spiel des Wissens” (translated into English of course).

Aside: Let me emphasize that not all the questions are bad. And it’s still a good game. I’d recommend it! The questions are up-to-date, and it’s fun to play with my kids because there are both “easy” and “difficult” questions so that kids can play alongside adults.

One of the biggest problems of Manchester paradigm questions is when the person asking the question thinks that there’s one correct answer, but the question admits many different answers. Let’s take

What is the name of the literary genre in which elements of the real and the magical world are interwoven?

The official answer was “fairy tales”, but “fantasy” or “magical realism” are arguably also correct.

This is particularly problematic for “location” and “temporal” questions. Where are the “Virunga Volcanoes?

this could be Earth, Africa, or just the Democratic Republic of the Congo (where the national park is). There’s nothing in the question to let you know that you need to name all three countries that the Virunga range goes through.

While that was more specific than one might expect, here’s a question that’s less specific than I thought.

Where was the first Iron Man?

has the natural answer Oahu, one of the Hawaiian islands. The official answer of “Hawaii” sounds false if you think about Hawaii the island rather than Hawaii the state. But you could also imagine someone answering USA, since it’s connected to the US Navy. The question should say something like “on which island” or “in which American state” to make it more reasonable.

Sometimes questions are vague not in what they’re asking about per se but in terms of the definition. Let’s take this question:

What countries border France to the south?

The official answers are Spain and Andorra: that makes sense! You can stand in those countries, walk straight north and end up in France. But if that’s the definition we’re using, are there other countries that would work? If you've vacationed in Baden-Baden, you're in German and France is to the north. While not all of Germany borders France to the South, some of it does.

And if you just count a border where you can go due south on a road from France to another country, then you could answer: Germany (to Baden-Baden as I mentioned before), Switzerland (Basel), Italy (Mont Blanc), Luxembourg (Saulnes), or Belgium (Fromelennes). But don’t forget that France isn’t just in Europe. Thanks to its overseas department of French Guina, you can also answer Brazil and Suriname.

Maybe Germany is excluded because it also borders France to the West? But thanks to the Spanish exclave of Llivia, you can start in Spain and walk to France in every cardinal direction!

A stronger version of this is when there’s an implicit assumption in the question that’s not stated. In linguistics, this is called a “false presupposition”. One example of this is

What color are Flamingo’s feathers?

The official answer said "pink". Now, this is true if the flamingos are eating brine shrimp or other foods rich in beta carotene. When flamingos are born, they’re white.

In a game written for a European audience makes sense that

What voltage comes out of the socket?

has the answer 230 Volts. But someon in the US would reasonably say 120, and that would be right!

What connector do you use to charge a phone?

has the official answer Micro-USB. Of all the questions I’m complaining about, this one just seems completely inexplicable to me. The whole thing about phones is that there are a bazillion different chargers. Every company used to have their own charger, and even in the age of standardization, there’s Micro-USB, USB C, Mini-USB and whatever Apple is doing today. I would love to live in a world where it was the case that you only had to have a Micro-USB charger!

Sometimes when a question is vague, people add something subjective to try to make it more specific but end up failing because the thing making it more specific is highly subjective.

What’s the name of Christopher Columbus’s most famous ship?

The official answer is Santa Maria, but there’s evidence that La Nina was Columbus’s favorite (and also had the official name Santa Clara). To specifically distinguish between the ships, you could rephrase the question like “Which of Columbus’s ships was stripped of its timbers to build a fort called La Navidad in northern Haiti”.

Similarly,

What famous novel takes place in an Italian Benedictine Abbey?

has the official answer Name of the Rose (and it’s what most people think of), but if you’re not ancient like I am, maybe you’d think of the Sarah Dunant book Sacred Hearts instead. So you could instead ask something like “What famous novel set in an Italian Benedictine Abbey has a main character whose name combines William of Ockham and Sir Arthur Conan Doyle's The Hound of the Baskervilles”.

But it’s not just the questions that can be bad; sometimes the answers are problematic. Let’s take this question:

Who was the last Czar of Russia?

The given answer is “Nikolas Alexandrovitch Romanov”. Now that’s an acceptable answer, that’s what he was called when he was born. But that’s not how most people know him. His regnal name was Nikolas II. Ideally you should have both as acceptable answers, but if you only have one, it should be Nikolas II. But ideally you should have even more answers: Nicholas II, Nikolai II Alexandrovich Romanov, Никола́й II Алекса́ндрович, and Николай II.

A final example of bad questions is the “swerve”. I didn’t see any examples of this from Spiel des Wissens, so had to search for some examples on the Internet.

One man gained fame for starting the Kulturkampf as well as a war with France. Otto van Bismarck lends his name to the capital of which U.S. state?

As you can see, you start out thinking that the answer will be Bismark and then it suddenly changes to being about North Dakota!

A less intentional kind of swerve is when the pronouns or references are unclear. A good question needs to establish early on what kind of thing it’s looking for. There are several ways questions mark what it’s asking about. Sometimes with a pronoun:

He was born in Tampico, Illinois before being elected president.

Sometimes with a determiner:

This is the most populous state in the US.

Sometimes with an interrogative:

What is the capital of California?

Things get confusing when you use a bunch of them at the same time! What the heck is this question looking for?

After being born in Tampico, Illinois, he always lived in this most populous state until moving out of what state capital when he was elected president?

Now this goes way overboard—nobody would write a question this confusing—but you can avoid this confusion by following these rules:

Don’t use a pronoun until you have introduced the answer type “what composer” in some other way
Only use “this” or “what” when referring to the thing that you’re asking about
Use consistent references to the answer. It’s okay to use different references, but they shouldn’t contradict each other: “this author of 1984” and “this volunteer in the Spanish Civil War” are okay because there’s about different things, but you shouldn’t say “this British author” and “this Spanish resident” (even if they’re technically correct … and certainly don’t do this to create a serve question).

To be clear, it’s impossible to always write a great question (mistakes happen, and this is why it’s always good to get second opinions on questions through playtesting or editing), but it gets easier over time, and hopefully just knowing about the ways questions can go wrong will help you avoid these common traps!

How to Write Good Questions (that are also Adversarial)

Categories