Forecasting, Good Judgement and Calibration

How to carry out this module

The session follows through in the order in which this document is arranged. It follows this structure:

Readings and short exercises
Individual exercises

Aims for the module

This module is dedicated to understanding and practicing the basics of good practice in forecasting and good judgment. We are not experts on this subject, so we rely on the expertise of others. Most of the structure and content curation for this special session was designed by Edo Arad, and most of the content used was written by Jacob Steinhardt. Let’s get some definitions clear before we get started:

A primer on why these concepts matter

Some of the initial core insights from the value of forecasting accuracy come from the research of Philip Tetlock and Dan Gardner (as well as intelligence agencies, as you may have noted in the reasoning transparency module). Tetlock famously published a study that evaluated the accuracy of expert predictions of future events over 16 years, concluding that the average expert did no better than random guessing. The famous quip is that the “average expert was roughly as accurate as a dart-throwing chimpanzee” (Tetlock and Gardner, 2016, p.4). In Superforecasting: The Art and Science of Prediction, Tetlock laments how the chimp line stuck while the more exciting finding did not: some people, whom Tetlock and Gardner call superforecasters, are incredibly good at predicting future events, and the good news is that we can learn from them to become better forecasters ourselves.

Forecasting, good judgment, and calibration matter for research because they ensure the soundness of our conclusions. When doing applied research to determine how things will pan out in the future, we are essentially informing bets. Having confidence in our ability and process to predict future events thus seems very important.

Forecasting: making predictions based on past and present data and reasoning
Good judgment: “the ability to weigh complex information and reach calibrated conclusions” (Todd, 2020, para. 22)

Task 1

Read Notes on Good Judgement (Todd, 2020). Try to understand the main ideas, and note down your questions. We'll experience most of the tips and ideas first-hand on this day, so don't worry if there's too much content here to remember.

Readings and exercises

This section includes a few introductory readings, which – in some cases – include exercises. These will be done individually, although you can form quick reading groups or Pomodoro sessions.

Forecasting

Task 2

Read Forecasting: Zeroth and First Order (Steinhardt and Denain, 2021). This is the first in a sequence of lecture notes on forecasting. You can find many links for further reading and exercises in these lecture notes. We recommend spending about 30% of your time on the exercises. The goal is to learn intuitive ways to extrapolate quantities from past observations.
Read Base rates and Reference Classes (Steinhardt and Denain, 2021). You don't need to understand the last section on base rates for events that haven't happened. Instead, we recommend skimming it to get the main idea. Please follow through with the exercises as well.

☀️We can use the Laplace rule from this article to solve the sunrise problem!

P(first event) = 1/(n+2)

n(sunrises) = 4.5*10^9 * 365.25 (age of Earth * sunrises per year)

That leaves us with P = 0.00000000000060841129 of the sun not rising tomorrow☀️

Bayes theorem and rule

Bayesian reasoning gets thrown around as a catchphrase a lot in some communities and has some quite specific mathematical concepts underpinning it. At a very basic and rough level, it implies that we can better ascertain the probability of an event by understanding the likelihood of a related event and correctly adjusting that probability for new evidence that affects that hypothesis. We are butchering the concept, but at a basic level, it works like this: your forecasts should consider existing evidence and how new evidence affects that evidence. The readings below do a much better job clarifying this and helping with the application.

Task 3

Either watch Bayes theorem (3Blue1Brown, 2019) or read High-speed intro to Bayes' rule (Arbital, n.d.) (or both, but will take more time). The goals for this are to understand how to mathematically update prior probability given quantitative evidence and familiarity with cases where the prior probability points in one direction. In contrast, the evidence points the other way. These topics are confusing even to experts!
If you are unsure about probability and mathematics, we recommend reading the article and going over everything there slowly. That could take longer than 30 minutes, but that's okay! If you see that 30 minutes are up, find a good spot to pause and get back to it later today if you have the time.
Suppose you are well-versed in the mathematics of probability. In that case, we’d like you to focus more on the demonstrated applications to make sure that you take the intuitions correctly and can recreate all of the deductions.

Guidance on creating a forecast and some techniques for good judgment

Task 4

Read Prioritizing Information (Steinhardt, 2021) and From Considerations to Probabilities (Steinhardt, 2021). These articles go through the full process of making a forecast. Go over it slowly, notice where your assessments may differ, and try to understand why he does what he does. It can get pretty technical, so feel free to ask questions and skip parts to return to them if you have the time. You can also spend more time here to understand it thoroughly and skip the next calibration training session if you think that’s preferable for you. Also, there's at least one math typo for you to discover :)

🏪A nice addition to this toolkit is using the rule of 70 to get between rates and doubling times easily

70/(rate)=doubling time

e.g., infection growing at 5% per week means the infected population will double in 14 weeks.

https://populationeducation.org/what-doubling-time-and-how-it-calculated/🏪

Further practical advice on thinking smart

Most of this advice comes from Tips on Doing Impactful Research: A Collection

Critical thinking

“Question your assumptions.
Frequently imagine what someone you respect would say if they thought your argument was wrong, or try to make the best argument against what you are currently thinking.
Write down your views and check against your old views to see when you were wrong and when you were right. This can give you a feeling for when you were too confident in your views and the other way around.
Listen to yourself if something seems troubling, and try articulating, exploring, and steel-manning that intuition in multiple ways until it makes sense in a way that can be integrated with other knowledge (with whatever updates/revisions follow) or goes away.
Be aware that anyone, including people within communities you have some affiliation with, may do sloppy reasoning/research sometimes. Is the argument supported at every point by evidence? Do all the pieces of evidence build on each other to produce a sound conclusion?
Pay attention to the use of contradictory epistemic standards and premises on different arguments/patterns. Reconcile them or adjust your confidence in them.
Look for implicit assumptions and make them explicit.
For all arguments you want to make, either develop each argument until it makes sense and fits into what you aim to achieve or leave it out for now. Vague connections will only distract the reader.” (Hultsch and Lutz 2020, p.10)

Failing fast

“Fail fast is a philosophy that values extensive testing and incremental development to determine whether an idea has value.
Think about how your idea/research may fail in order to detect weaknesses.
- Failing not only fast but considering all failure modes might help to avoid pitfalls: Murphyjitsu is the practice of strengthening plans by repeatedly envisioning and defending against failure modes until you would be shocked to see it fail. This post in the LessWrong Forum gives some guidance on how to use it productively.
Get feedback early on, even if it would mean e.g. choosing a different research method. Discuss your ideas with friends, and write your ideas up in emails or blog posts to get feedback from people.
- If you reach out to people directly, try to make it especially easy for them to give feedback by stating your question clearly and saying what they can do quickly that would be useful to you, for example by saying “X is probably wrong, what do you think?” or asking “have I explained X clearly?”
It is important to get feedback early because it will be more demotivating to receive negative feedback and much harder to incorporate feedback or change direction if you have spent a month or more working on something. If you have spent 1.5 weeks researching and writing something, it’s probably worth sharing with someone.” (Hultsch and Lutz 2020, p.11)

Keep reflecting on the research process

“From a workshop by Alex Lintz: Research processes develop in the dark; we rarely learn about how other people do research. Make space for that! Bring a group of people together and ask them about their methodologies and research processes, such as how they conduct literature reviews. In this post Alex also discusses practicing research methods with others to refine techniques and learn from others.
It can also be useful to spend time at the end of each day thinking and possibly journaling about what went well that day, what you want to adjust and what you want to tackle tomorrow.” (Hultsch and Lutz 2020, p.12)

Always reach conclusions

“Remember that the main goal of research is to find answers and, to do that, we must reach conclusions we can work with further.
How to avoid drawing no, wrong, or misleading conclusions? Add some or a few of the following to your conclusion:
- Distinguish between what is subjective and objective and be transparent about how you combine it in your final judgment. This way, people can understand how you got there.
- State your confidence and epistemic status (How sure are you about your results/conclusion? How much time have you put in? What else should people know to not place too much/too little weight on your work?)
- State in which direction you have updated.
- Present a range of plausible conclusions and list crucial considerations.” (Hultsch and Lutz 2020, p.13)

Applied exercises

Calibration

Now that we have looked into best practices in estimation, it is worth introducing the third core concept for this session: calibration. Calibration is a property of someone’s estimates, forecasts, and guesses. It is essentially a measure of how good our estimates are.

Calibration is the “consistency between the distributional forecasts and the observations and is a joint property of the predictions and the observed values” (Gneiting et al., 2007, p. 246)

We will now practice making estimates that match our intuitions about a certain topic, with some exercises and games. Hopefully, you will keep coming back to this and going into other further readings and self-guided learning. You never stop training this stuff.

Task 5

Carry out this calibration training game (~45 minutes)
- Go to the link.
- Log in using a Google account.
- Ensure that your deck selection contains Animal Welfare, Global Poverty, and The World Then and Now (to a total of 96 questions).
- Choose the 90% confidence interval.
- Start making estimates, and aim for the lower and upper bounds to be roughly the bottom 5th or 95th percentile. You should feel that you'd be as surprised to learn the result is above your upper bound as you'd be rolling a 1 on a 20-side dice.
- Make these estimates without googling any information; the goal is not to have the narrowest interval but to be correct on each question with a probability of 90%. You can visit the charts page to see how you are doing.
- You can try to make a simple model and use a calculator, but that's not needed.
- Aim for about a minute per question, but feel free to go faster or slower as you think is beneficial. You are not supposed to complete all the questions.
- Later, we will have another calibration session with a 50% confidence interval, trying to complete as much of the deck as possible.

Task 6

Carry out this calibration training game (~45 minutes)
- Continue with the calibration training; choose the 50% CI this time.
  - Try to make the lower/upper estimates your 25th and 75th percentiles.
  - Make sure to learn from your previous attempts, making the intervals larger or smaller as needed. We are slowly getting used to “what 50% / 25% / 10% feels like”.
    - You can also choose to return to the 90% CI, but it’s good to have some variety when learning skills.
  - If you have somehow finished the deck, you can add additional decks or use Open Philanthropy's program.

How to improve further and more readings

There is some evidence that forecasting and good judgment are trainable skills, yet we do not expect one short day session to be the end-all and be-all. We are not super forecasters yet! The following list from Open Philanthropy (n.d.) shows some of the things they do to improve their skills:

Continue to play calibration training games (here, here, and here)
“Train probabilistic reasoning: In one especially compelling study (Chang et al. 2016), a single hour of training in probabilistic reasoning noticeably improved forecasting accuracy. Similar training has improved judgmental accuracy in some earlier studies, and is sometimes included in calibration training.
Incentivize accuracy: In many domains, incentives for accuracy are overwhelmed by stronger incentives for other things, such as incentives for appearing confident, being entertaining, or signaling group loyalty. Some studies suggest that accuracy can be improved merely by providing sufficiently strong incentives for accuracy such as money or the approval of peers.
Think of alternatives: Some studies suggest that judgmental accuracy can be improved by prompting subjects to consider alternate hypotheses.
Decompose the problem: Another common recommendation is to break each problem into easier-to-estimate sub-problems.
Combine multiple judgments: Often, a weighted (and sometimes “extremized”) combination of multiple subjects’ judgments outperforms the judgments of any one person.
Correlates of judgmental accuracy: According to some of the most compelling studies on forecasting accuracy I’ve seen, correlates of good forecasting ability include “thinking like a fox” (i.e. eschewing grand theories for attention to lots of messy details), strong domain knowledge, general cognitive ability, and high scores on “need for cognition,” “actively open-minded thinking,” and “cognitive reflection” scales.
Prediction markets: I’ve seen it argued, and I find it intuitive that an organization might improve forecasting accuracy by using prediction markets. However, I haven’t studied their performance yet.
Learn a lot about the phenomena you want to forecast: This one probably sounds obvious, but I think it’s important to flag, to avoid leaving the impression that forecasting ability is more cross-domain/generalizable than it is. Several studies suggest that accuracy can be boosted by having (or acquiring) domain expertise. A commonly-held hypothesis, which I find intuitively plausible, is that calibration training is especially helpful for improving calibration, and that domain expertise is helpful for improving resolution.” (para. 19)

Further materials

The Question of Evidence | Clearer Thinking <- Strong recommendation
Efforts to Improve the Accuracy of Our Judgments and Forecasts (Open Philanthropy, n.d.)
How much do you believe your results? (Neyman, 2023)
Common Probability Distributions (Steinhardt and Ding, 2021)
Sequence thinking vs. cluster thinking (Karnofsky, 2016) (or for a shortened version summarising the main points: My notes on: Sequence thinking vs. cluster thinking (Grilo, 2022))
Superforecasting: The Art and Science of Prediction (Tetlock and Gardner, 2016)
How a ragtag band of internet friends became the best at forecasting world events (Matthews, 2024)

In defence of epistemic modesty (Lewis, 2017)

Page updated

Report abuse