Evidence reviews are an intrinsic part of virtually all research projects, whereby researchers seek to assess the relevant existing evidence pertaining to a research question. There is no catch-all answer to "good evidence," as this depends on the specific research question one is asking. However, there are conventions around the hierarchy of evidence for attributing cause-and-effect relationships.
An evidence review is a catch-all term for a somewhat structured process that reviews existing evidence in relation to a research question. As practitioners of impact-oriented research, evidence reviews are an intrinsic part of virtually all research projects we undertake, as we need to know what existing research can tell us about the decisions we face.
Most of the time, we will be interested in understanding what the evidence for a specific causal relationship is. For example, when AIM researcher Vicky Cox looked into banning low-welfare animal product imports, she needed to assess whether a non-profit’s work was likely to cause a change in import policy (Cox, 2023). When AIM researcher James Che looked into participatory learning and action groups for maternal support, the main question he faced was whether putting these groups in place was likely to cause declines in maternal and neonatal mortality (Che, 2024).
Evidence reviews of intervention effectiveness are closely tied to the ToC for the intervention itself, as you try to understand the logic and assumptions sustaining the intervention through the ToC and the evidence backing it.
Other times, we will be looking at evidence to try to gain a descriptive understanding of a topic. This may involve gauging the evidence on the burden on a specific problem, the barriers leading to an issue, or the case studies for how a specific intervention works in practice. For instance, AIM researcher Morgan Fairless had to spend quite some time reviewing the evidence substantiating how big of a problem snakebites were, given the lack of consensus (Fairless, 2023).
Evidence reviews are usually structured in steps:
Set a research question
Search for published evidence about the research question
Assess the studies collected
Draw conclusions based on the studies as evidence
The nightmare question that keeps researchers up at night. The best answer to this question is “it depends” – which can be a terribly useless (albeit often used) answer. However, we want you to take home the message that the quality of evidence depends – to a large extent – on the questions you are asking (e.g., do you want to learn how people feel about a certain disease plaguing a community? Perhaps a participatory research exercise may give you the richest detail on this; do you want to make a causal inference about the quality of a vaccine? There’s no question about it, a randomized trial or collection of studies will be your go-to). Quality of evidence is also relative and often practical, while randomized-controlled trials may be the “gold standard” for causal inference, they are often impractical to run, or cannot be run because of ethical concerns – sometimes it is straight-up impossible to randomize the treatment, and in those cases, a quasi-experimental approximation may be the best possible evidence for a particular question.
Please remember this point on context and research questions as we review this section. You will note we will delve mostly into the quality of evidence for causal attribution, which does have a relatively straightforward hierarchy of quality.
The hierarchy of evidence is a framework used in evidence-based medicine and social science research to assess the quality and reliability of different types of evidence for causal attribution.
It organizes various study designs based on their methodological rigor, potential for bias, and ability to provide reliable answers to causal attribution research questions. The hierarchy allows researchers and healthcare professionals to determine the strength of evidence supporting a particular intervention or treatment. The hierarchy typically consists of several levels, with higher levels representing stronger evidence.
Note that the hierarchy of evidence is a rough heuristic applied specifically to causal attribution. Different studies are better suited for specific research questions. You probably would not weigh a qualitative examination of focus group data very highly if you are trying to determine how a vaccine affects rates of disease, but you would probably value its insight in determining what feelings people have in relation to getting their children vaccinated. Additionally, it is good practice to avoid any brute applications of the hierarchy in terms of quantity of studies. One excellent RCT may be better than four badly designed ones, a quasi-experimental study may be more informative than a meta-analysis of observational studies, and so on.
While the specific levels may vary slightly depending on the source or field of study, a common hierarchy includes the following:
Core materials
Why is it so hard to know if you're helping? (Hoel, 2024) (~17 minutes)
How many people die from snakebites? (Dattani, 2023) (~12 minutes)
Here we discuss the four key steps involved in an evidence review. Sometimes you only have a couple of minutes or hours to conduct a review. Fret not, nobody expects you to run a fully systematic review in five minutes. Note this is an ideal-type explanation, and that you will be able to adapt and adjust the process to different research capacities as you see fit.
Evidence reviews follow a research question. Sometimes, these research questions are not explicitly stated, but there are good reasons to make research questions explicit:
Reasoning transparency (see section 1)
Helps focus the evidence-gathering process
Helps focus analysis onto responding to a concrete question, instead of exploring a topic
Good research questions are clear, focused and concise. They are also researchable (i.e., they can be investigated and answered) (George Mason University, n.d.). A link to previous week's work: A good way of defining research questions is to look at your ToC and the assumptions/uncertainties in it. Each uncertainty can typically be linked to (at least) one research question.
The next step is to gather the relevant studies pertinent to your question. This step can be done to different standards depending on how much time is available to the researcher. Adjusting the research process you use to the amount of time and capacity you have available is a key skill in applied research.
The ideal-type evidence-gathering process involves something close to a systematic literature review, where a search protocol is developed and executed aiming to find all pertinent literature on a subject, and then combed through to find applicable studies. Executing a search like this is really transparent (because you can communicate your search protocols), and minimizes risks of missing important studies.
Most of the time, you only have limited time and need to move on to analysis quickly. In those cases searching for specific types of studies that summarize information on a question first (such as systematic reviews) is helpful, and your search for relevant literature may be less structured.
The steps for gathering literature on a question can be summarized as follows:
Think of search terms pertinent to the research question
Identify where you will search for published literature (Elicit can help in this process for searching)
We also like to check grey literature published by a few organizations that do similar research to ours, such as GiveWell, Rethink Priorities, Open Philanthropy, Animal Charity Evaluators, Happier Lives Institute, and Founders Pledge.
Use a spreadsheet or literature review assistance software (https://www.rayyan.ai/; https://sr-accelerator.com/#/; CADIMA) to collect all relevant data from the studies (perhaps Author, Year, DOI/URL, Abstract, Method)
Use some form of decision-making process to decide which papers you will review (such as prioritizing the top methodology to answer a question, cutting off resources from a certain year and below, etc.)
Once you have identified which studies you will review and in what relative order, you can proceed with the analysis. We suggest creating separate sheets in your spreadsheet for different types of evidence. The specific columns you use will vary slightly depending on the review you are conducting, but some usual criteria to collect information are:
Title of paper, authors, and year of publication.
Method
Details on the intervention design
Study Period
Location of study
Context notes
Outcome variable
Control or Comparison Group
Study size
Description of the population
How is the Effect size measured?
Follow up period between end of intervention and main outcome
Effect size of treatment/change
Baseline
P-value
Confidence interval
Statistical power
Was the study pre-registered?
Do you perceive a risk of researcher or funder bias?
Comments on External validity to RQ/Context
We discuss how to evaluate different study designs in section 3.4.2.
Once you have collected and analyzed your evidence, you can draw conclusions and write up of your process and answers. After you have collected all the information in your spreadsheet you can then summarise your findings in writing in your report and put an overall credence score on your conclusion and the evidence base for the intervention you are looking at. Of course, making conclusions on the evidence base of an intervention is somewhat more complicated than counting the number of RCTs that exist for an intervention. A more nuanced approach to interpreting and comparing evidence bases of different interventions is discussed in the core material.
Core materials
Comparing two bodies of evidence (Falk and Hausen, 2023) (video, ~13 minutes)
Does X cause Y? An in-depth evidence review (Karnofsky, 2021)
Database Search Tips: Overview (MIT, n.d.)
Further materials
Guidance on Conducting a Systematic Literature Review (Xiao and Watson, 2017 - ALT LINK) (~60 minutes)
Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Thomas, 2023) (click through handbook, very long but you can pick and choose)
Click here for a breakdown of different methodologies and interpretation guidelines.
Practice project and samples in our full PDF version.