The aim of our evaluations is to rigorously evaluate a prospective non-profit intervention idea against our evaluative criteria.
Stage 5 deep dive reports aim to examine an idea's prospects rigorously. We are no longer trying to refine a list of ideas; we strive to equip decision-makers with as much relevant information as possible to decide whether to recommend an idea. The outputs of this stage are a deep dive report and a summary of key information input into a decision-making spreadsheet.
We have two main prioritized types of readers for our reports:
Decision-makers at AIM: Senior staff, researchers, and occasionally invited experts, who will gather information through the reports and vote/debate the merits of each idea.
CEIP incubatees: Individuals who will read recommended reports to understand the merits of a recommended non-profit intervention to prioritize it among others.
Our approach to these reports is flexible, responding to the specific research requirements of the research round and management needs. Each research tool/method and question we aim to address in our research aligns with the evaluative criteria discussed here.
The standard report template can be found here. Round leads can amend the template as needed. Our style guide is also an important reference document for this stage.
Before starting work across many reports, the round lead should ensure that the evaluation criteria and rubric are still relevant and useful.
Work-based on a coherent rubric and shared understanding of our priorities. One way to facilitate this is to conduct some early-stage work on the rubrics used for the final-decision making spreadsheet. Researchers should understand the rubric, and use the language in it across their reports to clarify meaning (similar to how we should use the same words for verbal estimates of probability).
Report leads will be assigned to each report. Depending on the research round's staffing, leads may work with other researchers or individually. The round lead will assign reports based on the prioritization exercise undertaken in stage 4.
As always, report leads should stop working on reports (in consultation with the round lead and research director) if they identify a critical reason to do so. If an idea no longer looks promising to the report lead, they should trigger a discussion process with the leads to debate the merits of continued work on the idea.
Have regular reviews and check-ins. Don’t just work without a reviewer and rely on your advisor/manager/reviewer.
Each report should have a secondary reviewer. The author and reviewer should meet regularly (e.g., one short weekly meeting) to troubleshoot, check progress, and review.
After each report is finished at stage 5, we will hold a meeting so researchers can present their progress and ideas to the team and executive members. This aims to get some earlier-stage views from final decision-makers and identify any concerns with an idea early on.
We aim to get at least one reviewer external to AIM who can read over and give feedback.
Below is a step-by-step guide to the process we usually use for our evaluations.
Note that some alternative processes to the below route could be:
CEA first. Start with a 15h CEA and then build the report after that. This might be best where cost-effectiveness is the key reason an idea might fail.
Critical uncertainty by critical uncertainty. List out the key uncertainties and work through them one at a time. This might work best for complex ideas with many steps in the theory of change.
If the researchers have not researched this same idea at earlier stages, they should, at minimum, read through the previous research stages.
If one person did the Stage 4 research, they may want to write up a background reading list for others who want to do deep diving on the same topic.
Write up your priors and uncertainties and an initial informed consideration at this stage.
Validate some of the main key information from the previous stages and provide an overview of the problem. For example, we summarize how many DALYs are lost to it globally, what some of the countries with the highest burden are, etc.
“I personally spend often ~1 day in the beginning just reading stuff. It really helps me get my head around the topic and make sure that I know what kind of literature is out there and what people think, as well as identify additional important considerations I may have missed before. It also makes the subsequent steps (idea mapping and ToC) easier and more confident (i.e., I feel less like I'm making stuff up based on limited knowledge).” Filip Murár, Senior Research Manager
Mapping out an idea could be useful.
There are two or three main types of mapping that could be useful at this stage. The number of different types of mapping that you will need to do for your intervention will mostly depend on how specific your intervention currently is, where the more specific, the less mapping you will need to do. Alternatively, you might think that we’ve zoomed into something too specific too quickly so you might want to take a step back to make sure we’ve not missed anything.
Broad approach mapping: For example, we could do something similar to this policy advocacy vs. country advocacy mapping that we did for road traffic safety but for something like “package of interventions” vs. “single intervention” or “community-based” vs. “school-based.”
Process mapping: Identify a range of key options by taking a step back and thinking about the problem you are trying to solve as a whole process, for example, with a brainstorming exercise or a process map (Examples: road safety, pharmaceuticals).
The ToC section is structured in the report template. The minimal requirement here is to depict the ToC and state the assumptions underlying it. The most critical assumptions will likely require treatment in the evidence review section.
Use the ToC to understand what implementation considerations you should research in more depth. We should investigate implementation matters and use that research to form a perspective view on tractability. You are not, however, designing a full intervention, so it's okay to leave things unresolved.
Note: One standardised example we have used for policy behaviour change interventions is COM-B, which is helpful to look at how actors' capability, opportunity and motivation changes can lead to behaviour changes for top options.
For most interventions, we can break down the question into the following key parts to be evaluated separately:
Can a charity effectively deliver this intervention? For example, can a new charity successfully train health workers/TBAs on the benefits of KMC so that it is regularly delivered through healthcare facilities
Will the intervention have the expected impact? E.g.,. Does KMC actually reduce neonatal mortality? How does this compare to conventional neonatal care?
Another way of structuring ERs is by step in the ToC, which is also encouraged.
You may have identified other critical uncertainties besides the two main questions above. For example, if the intervention may cause harm, the risk should be fully understood and explored. Base these questions on what your uncertainties were during the ToC exercise. Use the ToC for the intervention and identify crucial considerations and/or assumptions that need more research, where we have gaps in knowledge along the ToC and research them.
Evidence gathering: We try to gather evidence as systematically as possible while trading this off with efficiency. We strongly recommend using a spreadsheet, and reviewing all potential references, including and excluding sources as per predetermined criteria.
Evidence types:
“Traditional” evidence from published and grey literature and case studies
If there is only 1-2 key papers that we're strongly relying on, read those papers in detail and check their assumptions and weaknesses
Theory-based evidence: Consider the arguments and strategic case for this intervention, such as: What is the case for this within the scope of global health? Why do governments not do this and why are advocates not already working on this? What is the best case against this?
Timing issues: Would this happen anyway? Would some known org expand to fill the gaps over the next few years if a CE charity does not? What are the general trends? Does the expected positive effect last or does it wane with time?
Comparing a many-weak arguments steelman of the case for the intervention with a many-weak arguments steelman of the case against the intervention
Is the theory of change long and complicated or quick and simple? How does this affect the likelihood of success?
Reference class thinking: Sometimes, thinking about an applicable reference class can help with forecasting. You may want to spend some time exploring what could be appropriate reference classes for this idea and conducting some research on them to use as forecasting inputs.
E.g., Using ~3 reference classes, going from narrowest (matching the intervention idea) to widest. For example, (1) the effect of SMS reminders on ANC utilization, (2) the effect of reminders on any healthcare utilization in LMICs, and (3) the effect of reminders on any healthcare utilization anywhere in the world.
Expert commentary
Expert interviews are part of our evidence-gathering and analysis. Expert insight is useful in many aspects of our research. We consider insight as evidence in evidence review and other aspects of our process, and also as information regarding the level of support experts and other stakeholders would have for such an endeavor.
Contact experts early on in this process, they take time to reply.
Start thinking about which experts you want to reach out to at the start of the report to get the ball rolling.
Prep questions before the interview. Ensure the questions you ask are tailored to the ToC and that you have plans for what to probe or dig deeper into during the interviews.
“I commit to sending experts the questions a day earlier. This external commitment makes me accountable to prepare accordingly for interviews. I find that a prepped for meeting is infinitely more valuable than an unguided one.” Morgan Fairless, Research Director
Make sure that you ask the expert whether they would prefer to be anonymous and what level of sharing permission they would be comfortable with. The template provides more guidance on this.
We are hoping to talk to ~5 (range of 3-7) experts for each stage 5 report, but in general, the more experts, the better, as long as you feel like you are not hitting diminishing returns.
Generally, we want to be speaking with people who work in similar organizations on similar interventions, especially in our top countries or similar countries, it would be great if you could find someone who has a good overview of the space e.g., from a coordination group on a topic like WHO’s coordination group on maternal syphilis. Maybe GiveWell could also have a good overview. It may also be promising to speak with the authors of the most promising evidence you have found to get a better sense of the conditions under which their studies were conducted and to get a sense of how well that might match the real work.
At this stage, we should ask experts who else we should talk to and follow their recommendations when relevant (it would be great if we could get them to introduce us to their suggested contacts!).
Check our knowledge management database for relevant interviews and interviewees we may have engaged with already (Interviews database)
Keep track of the experts you speak with in a spreadsheet and cross-reference who you contact.
Troubleshooting: not getting enough expert contacts/responses
Talk to the rest of the team/office. Maybe someone knows someone.
You can use Linkedin to see this as well. Get other team members to look for keywords and relevant orgs on Linkedin.
Check our knowledge management database for relevant interviews and interviewees we may have engaged with already (Interviews database)
Don’t be afraid to nudge experts. Be open to just asking them questions over email, or a Google Doc rather than on a call, Ask experts to put you in touch with other experts etc.
See the ARP module on WFMs to explore the basics of Geographic WFMs. These WFMs are particularly influential on incubatees, so care must be taken to balance prescriptiveness and openness to different locations.
A strong WFM should expect to narrow down the list of locations to those where the intervention would be most cost-effective .
Best practice recommendations:
Think carefully about the addressable population. Often, it’s not population alone that matters, but addressable population that matters (e.g. mobile phone owners, or radio listeners).
Take care to not over-complicate - a simpler model is preferred to an overcomplicated one, always. Avoid using more than one measure of the same thing (e.g., using DALYs and Death rates).
It is important to consider reasoning transparency when presenting a Geo Assessment. Incubatees have said they find it difficult to understand our assessments and what went into decision-making. Consider filling out the first tab with indicator descriptions, as well as writing down your reasoning behind weight and indicator choices.
Consider how you are manipulating results - see for instance guidance on the type of average you can use, or whether you should be log-transforming data.
It may be helpful to establish a view on tractability before conducting any specific WFMs. For instance, you can use our tractability data spreadsheet to rule out a bottom list of intractable countries.
The Geographic WFM section often includes a sub-section detailing what actors we found
Look for actors working on this intervention and where they are working so we can deprioritize countries where there are already actors working to avoid duplicating efforts. Consider Charities, funders, international bodies (WHO, etc.), and others.
Using generative AI tools for this could be helpful.
Asking experts may also prove helpful.
Sometimes, organizations get a membership to a working group or association of like-minded organizations. These associations/working groups are a good one-stop shop for identifying major players working on a topic.
Key tool: use the CE's crowdsourced notes on countries (for qualitative geo selection)
Maybe trying to answer specific questions that are relevant for this decision but are not easily findable in big cross-country databases and so would have taken too long to find the answers to for your initial high-level WFM of all countries, e.g. how many checks are already performed, by health workers at ANCs, the typical duration of an ANC visit, etc. to get a sense of how many interventions you can package here, or things like stock-outs, procurement processes, whether products are registered for use/medications are listed as essential medicines, etc.
(Optional) 0–4h: Identify cross-country evidence of impact. Look at how the problem has changed in relevant countries, identify trends, identify how much the charity sector played a role, etc. (maybe better to leave to later). Maybe this is something that is slowly getting better over time, e.g., as the government prioritizes it more and so we can expect the burden to decrease and, therefore, the counterfactual impact to be lower.
Outputs:
These outputs will primarily be used in other stages of the process, eg. to inform which experts you want to speak with (targeting experts from your most promising countries), deciding which country to model in the CEA, getting a sense of scalability. The charities can take this geographic assessment and the most promising countries into account when building their own geographic assessments. The top country they work in will ultimately be decided by scoping visits
A quantitative mapping of the problem, opportunities, and potential impact by country.
An initial list of the most promising countries
A weakly held view on which countries are most promising
We try to create simple and elegant CEAs. Over-cooked CEAs are wrong more often than simple ones. We recognize that our CEAs have wide error bars and designed them with this in mind.
Report rounded / 2 significant figures. Reporting decimals or with too much fake precision gives a sense of narrowness that our estimates do not have.
Explain your choices, source your inputs.
If you’re modeling the burden of disease, consult this GiveWell guidance on which sources (beyond GBD) to use.
A reviewer should create a parallel, independent CEA/BOTEC to create a point of comparison in terms of approaches.
If multiple staff are doing CEAs, we will need to make sure the same methodology and assumptions are used (E.g., discount rates, overhead costs).
We strongly recommend modeling several countries - for instance top 10 countries in GWFM, using the country data tab.
Follow this algorithm for converting prices across geographies and time (guidance adjusted from JPAL here).
Gather cost data
(If not in USD) Convert to USD using exchange rate for the appropriate year, use the annual exchange rate (e.g., if the cost data is from 2004, then exchange rate from 2004) - https://data.imf.org/en/datasets/IMF.STA:ER
Use a US GDP Deflator to bring prices to 2023 USD (more guidance on this coming)
Roughly speaking, we should be spending approximately similar times on impacts and on costs. If your costing section took 10 minutes but you spent 5 days on impacts, go back to your costing section and refine.
Pick up the phone and call manufacturers.
For wage data, please use https://ilostat.ilo.org/data/ for the closest occupation or sector.
Report costs and benefits in a 30 year horizon. If you think there is a strong reason not to do this, coordinate with your round lead.
Describe what you think working on this idea will look like. The point is to identify what the key inputs and outputs of the intervention look like in more detail than the ToC, which focuses on flow through effects. From there, you can do some secondary research on any critical factors that may be of concern.
Detail your level of concern on the following key factors:
Talent (ideal founders and key hires and how difficult we expect it to be to find these)
Access to information
Access to relevant stakeholders
Feedback loops
Funding
Complexity of scaling/ Scalability/ Scale of the problem
Neglectedness
Execution difficulty/ Tractability/ Paths to failure
Externalities
What are the biggest remaining uncertainties?
Also consider similarity to existing AIM charities (for internal use only) and answer the following questions:
Does an existing AIM charity already do this idea or a similar idea?
Would any current AIM charities (or adjacent) be interested in shifting to do this idea, either as part of their current plans or if this idea was recommended?
What are the cases for and against a new charity doing this idea vs. an incumbent charity taking it on?
Does the existing AIM charity give a clear “yes” to AIM recommending this for a new charity? If not, what are their concerns? (Note: the programs team thinks anything less than a clear “yes” should be taken quite seriously, and that researchers should bring in relevant directors to help manage the relationship as needed. If we’re going to recommend the idea despite a clear yes, this needs to be communicated clearly and considerately to the incumbent org)
How should we manage the relationship? Does the incumbent org agree to provide support to a new charity (i.e. engaging in the program, mentoring founders)?
Decision-making takes place across two meetings, called first and final. Before the first meeting, complete a decision-making spreadsheet before decision-making meeting. Participants should prepare for meetings by reading all papers and the decision-making spreadsheet. The first meeting is designed to prioritize discussion for final decision-making and spot any last-minute critical uncertainties. To that end, treating it as if it were a final instance of decision-making is important. During the first meeting, the team may decide to drop ideas which will not be considered at the final meeting. Between the first and the final decision meetings you will do further research on the critical uncertainties raised in the first meeting. You will update the report and the decision-making spreadsheet in red text so that the updates are easy to identify. During the final decision-making meeting, the team will decide which ideas to recommend for incubation.
These reports capture information not included in the final report and give founders some direction towards actually getting a charity started.
They should not be significantly more research but are mostly a place to dump extra thoughts that are not well formed enough for a public report / come after the public report is written / are more about next steps than deciding what idea to do / etc
They can be quite short, say 3 pages. Although in practice they may often be quite long as they might involve copying and pasting things that were written up but did not make it to the final report or comments from experts after the reports were made public.
A template structure for the note might be as follows (0.5-2days):
Documenting: How did this idea come about, any contextual information that isn’t included in reports.
Background reading list (30min-1h). If you wanted to get someone up to speed quickly with this intervention, what would be the most useful things for them to read?
List of experts and contact details (where we can share them) of experts to talk to and what you can/should reach out to them about. (30min)
How ambitious to be: sometimes we model CEAs very sceptically (e.g. takes 5 years for policy change). But we want incubatees to be ambitious (e.g. takes 1.5 years for policy change). So explain this.
Researchers views on getting started on this idea - List of (topic relevant) decisions that founders might have to make and how to form views on them. Including:
Any additional views on which countries to choose and how to go about making this decision (0-1h30min)
How to pilot this intervention (0-1h)
What the critical uncertainties are and how they could be solved if someone had a chunk of time to invest into them (1-3h)
How you might go about answering these uncertainties
What you would do when doing in-country coping to help resolve these uncertainties
Who you might want to try and talk to to help resolve these uncertainties
etc.
Any critical implementation challenges and how they could be resolved or worked around if someone had a chunk of time to invest into them (0-1h30min)
How this intervention might be different from the other recommended ideas? (0-1h)
Eg. More focus on policy/working with the government, pain averting vs. under-5 lives saved etc.
If this charity is not successful what we think might have happened - most likely failure points (0-1h)
If struggling to think of this you should ask yourself “If this organization will have shut down within 5 years, what do I think is the most likely reason why?”
Suggested first steps that the charity should take (15min-1.5h)
Any other views (0-30min)
eg. further stressing something of particular importance/relevance from the report that could be missed
Any other information not captured in the final report: (0-1.5h)
Copying notes that did not make the final report, if they are useful (0-30min)
Anything that was controversial and maybe was toned down (0-30min)
Links to other CE internal research that might be relevant (0-30min)
Information that comes up after the report is written (ad hoc and ongoing)
The implementation notes are a dumping ground for all the information and feedback and comments and connections we made after the final report is written
Descriptions
What skills could be good and what they would be useful for?
What does it feel like / look like to do this?
What does M&E look like and when to stop?