Thursday, October 12, 2023 | 7:30 am - 4:30 pm
We are excited for another year of gathering to distribute knowledge in the research methods field!
All presenters will have approximately 15 minutes to present and 10 minutes for questions.
Click the down arrow to view abstracts.
Panels and schedule are subject to change leading up to the conference. All presentations are in-person; there are no virtual sessions.
Agenda
7:30am - 8:30am | Breakfast
8:30am - 10:00am | 1A. Measuring Soft Skills: Socio-economic, Interpersonal, Personality, and Self-Awareness | Kellogg Global Hub Room 5301
Aletheia Donald, "Measuring Psychological Constructs in Low-Income Settings"
Abstract: Psychological constructs like goal-setting and self-efficacy are increasingly recognized as important for explaining human behavior. Yet reliable measures either don't exist or have only been validated in high-income countries. Our paper introduces 4 new scales, which we developed and validated across multiple parts of sub-Saharan Africa. The four scales are a new Goal-Setting Capacity scale, a Generalized Efficacy Livelihoods scale aimed at measuring efficacy applicable to general economic activities, an Agricultural Self-Efficacy scale aimed at capturing relevant constraints for farmers, especially women, and a new Locus of Control scale, which unlike prior scales, aims to distinguish who might be the powerful others within the external locus subscale.
Savanna Henderson, "The Predictive Validity of Soft Skills Measures: A Systematic Review"
Abstract: Evidence suggests that soft skills, such as interpersonal skills, personality traits, self-awareness, and personal initiative may impact future economic outcomes. Yet a lack of consensus on how to define skills and a dearth of longitudinal data presents questions around which skills predict later outcomes and how to effectively measure them. In this systematic review, we attempt to address these questions by compiling and analyzing all the known evidence, including ongoing research and seven commissioned studies, on the predictive validity of soft skills measures on labor market outcomes. We summarize the existing evidence, identify gaps in the literature, provide advice for the field, and identify which skills and instruments predict labor market outcomes and should be prioritized for future use. After eliminating hundreds of candidate studies via title and abstract analysis, this review includes 41 studies published between 1990 - 2022, encompassing a number of soft skills and measurement types. While the goal is to apply lessons learned to low- and middle-income countries, this review is unconstrained geographically, in order to cast the widest net. In addition, this review includes lessons learned from seven parallel studies that were commissioned by Innovations for Poverty Action to better understand the predictive validity of various soft skill measures on longer-term economic empowerment outcomes. The studies were conducted in the United States by International Youth Foundation, Algeria by World Learning, Peru by the University of Toronto, Tanzania by the World Bank Gender Innovation Lab, Uganda by the University of California-Berkeley, and South Africa by Oxford University and the World Bank. Each study used different survey instruments and methods of data collection, but the research questions and statistical approaches were harmonized.
Smita Das and Vic Marsh, "Measuring Socio-emotional Skills: Results from 14 Skills in 5 Countries in Sub-Saharan Africa"
Abstract: Despite the importance of socio-emotional skills (SES) for men and women’s economic empowerment, the lack of measures validated in LMICs has inhibited policy research on which skills matter most. Most studies of SES training to date use a list of commonly used, non-representative measures. Here, the World Bank Africa Gender Innovation Lab and Innovations for Poverty Action have partnered to develop, adapt, and test measures of 14 skills that span the range of SES. After translation and cognitive interviews, measures were adapted and tested in baseline studies from 5 countries: Nigeria, Tanzania, Rwanda, Congo, and Cote d’Ivoire. Results demonstrate that the latest iteration of these measures meet key psychometric thresholds for reliability, exploratory factor analysis, and confirmatory factor analysis. We also examine the relationship of each skill measure with employment, income, and gender. We will also share the results of a second paper examining the gender gap in skills found for a randomized control trial in Tanzania. Here, both self-reported scales and behavioral measures were used for each of the same set of 14 skills. We examine the role played by education, cognitive ability and social desirability in explaining the observed male advantage in self-reported measures across skills. In addition, we show that this gender gap is not observed when behavioral measures are used for the same skills. Both papers appear to confirm the gender similarities hypothesis on levels of SES. The team will also share forthcoming concurrent and predictive validity results on gender differences in returns to SES, as well as key practical takeaways for others pursuing research on SES.
8:30am - 10:00am | 1B. Improvements in Measuring Household Welfare and Poverty | Kellogg Global Hub Room 4101
Jonathan Morduch, "Poverty at Higher Frequency"
Abstract: The impact on poverty is an important outcome for many RCTs conducted by development economists, where poverty is typically defined as insufficient yearly consumption. But recent experiments demonstrate that households often experience poverty as a condition marked by seasonality, economic instability, and illiquidity across months. We describe two aspects of this divergence between poverty as measured and poverty as experienced. The first concerns the measurement of national poverty; the second concerns the meaning and measurement of household poverty. We first show that in practice national poverty measures in low- and middle-income countries often reflect more than insufficient yearly consumption. Expert guidelines for food data collection recommend asking questions on consumption with weekly or monthly recall, together with randomized sampling through the year where each household is interviewed only once. The result is that what is measured as poverty by many countries is in practice an aggregate of month-wise poverty measures. Second, we introduce a poverty measure that captures the aggregation of month-wise poverty measures. We relate that measure to national measures and demonstrate its properties with five years of monthly data from rural South India. In the sample, annual poverty rates measured with annual consumption under-count roughly a quarter of the experiences of poverty. The extended poverty framework clarifies how improvements in within-year consumption smoothing reduce exposure to poverty. Similar patterns should apply in many national statistics, and, as a result, financial interventions with no impact on overall consumption can reduce national poverty by promoting consumption smoothing. We use a lasso to also show that the month-sensitive poverty measure is a stronger predictor of child anthropometric outcomes in rural India than conventional poverty measures based on yearly consumption.
Nishant Yonzan, "A New Distribution Sensitive Index for Measuring Welfare, Poverty, and Inequality"
Abstract: Simple welfare indices such as mean income are ubiquitous but not distribution sensitive. In contrast, existing distribution sensitive welfare indices are rarely used, often because they are difficult to understand and lack intuitive units. We propose a simple new distribution sensitive welfare index with intuitive units: the average factor by which individual incomes must be multiplied to attain a given reference level of income. The new index is subgroup decomposable with population weights and satisfies the three main definitions of distribution sensitivity in the literature. Variants on this index can be used as distribution sensitive poverty measures and as inequality measures, with the same simple intuitive units. We illustrate the properties of the new indices using the global distribution of income across individuals between 1990 and 2019.
Hai-Anh Dang, "Using Survey-to-Survey Imputation to Fill Poverty Data Gaps at a Low Cost: Evidence from a Randomized Survey Experiment"
Abstract: In low- and middle-income countries, survey data on household consumption are often unavailable or not comparable over time. While survey-to-survey imputation is proposed as a solution to provide low-cost and reliable estimates of household consumption expenditures and poverty, the efforts to validate the resulting estimates have been scant. We implemented a randomized survey experiment in Tanzania that included (i) a benchmark treatment arm with a (target) survey questionnaire that is identical to the questionnaire for the (base) survey with which imputation models were estimated and that provided the true poverty rate that imputations can be compared to and (ii) additional treatment arms that administered lighter target survey questionnaires designed to collect only the data that competing imputation models can be applied to. The results demonstrate that if the predictors in the target survey is elicited through questions that are identical to their counterparts in the base survey, imputation accuracy is not impacted by the remaining differences between the base and target survey in terms of scope and complexity. Basic imputation models including utility expenditures as a predictor in a core set of predictors on demographics, employment, household assets and housing yield highly accurate predictions vis-à-vis the true poverty rate. In the case of a target survey with significantly modified (either shortened or aggregated) food and non-food consumption modules vis-à-vis the base survey, imputation models including food consumption or non-food consumption expenditures as predictors do well only if the distributions of the predictors are standardized vis-à-vis the base survey. For the best-performing models to reach acceptable levels of accuracy, the analysis shows that the minimum-required sample size should be 1,000 observations for both the base and target survey. The results are robust to using alternative base surveys or different poverty lines and deflators.
10:00am - 10:30am | Break
10:30am - 11:45am | 2A. Estimations with Imperfect Data | Kellogg Global Hub Room 5301
Eric Auerbach, "Identifying Socially Disruptive Policies"
Abstract: Social disruption occurs when a policy creates or destroys many network connections between agents. It is a costly side effect of many interventions and so a growing empirical literature recommends measuring and accounting for social disruption when evaluating the welfare impact of a policy. However, there is currently little work characterizing what can actually be learned about social disruption from an experiment in practice. In this paper, we consider the problem of identifying social disruption in a RCT design that is popular in the literature. We provide two sets of identification results. First, we show that social disruption is not generally point identified, but informative bounds can be constructed using the eigenvalues of the network adjacency matrices observed by the researcher. Second, we show that point identification follows from a theoretically motivated monotonicity condition, and we derive a closed form representation. We apply our methods in two empirical illustrations and find large policy effects that otherwise might be missed by alternatives in the literature.
Sylvain Chabe-Ferret, "How Much Should We Trust Observational Estimates? Accumulating Evidence Using Randomized Controlled Trials with Imperfect Compliance"
Abstract: Despite advances in our understanding of quasi-experimental methods, there will likely remain demand to evaluate programs using observational methods like regression and matching. To evaluate the observational bias in these methods we collected data from a large number of RCTs with imperfect compliance (ICRCTs) conducted over the last 20 years. We create comparable observational and experimental estimates of treatment effects, and use these to estimate bias in each study. We then use meta-analysis to quantify the average direction of bias and uncertainty about its size. We find little evidence of average bias but large uncertainty. We suggest adjusting standard confidence intervals to take this uncertainty into account. Our preferred estimates imply that a hypothetical infinite N observational study has an effective standard error of over 0.16 standard deviations and hence a minimal detectable effect of more than 0.3 standard deviations. We conclude that -- given current evidence -- observational studies cannot be used to provide information about the impact of many programs that in truth have important policy relevant effects, but that collecting data from more ICRCTs may help to reduce uncertainty and increase the effective power of observational program evaluation.
Jason Kerwin, "Estimating Agricultural Labor Supply Elasticities using Bunching Methods"
Abstract: We apply bunching methods and exploit a minimum wage scheme to estimate the wage elasticity of labor supply for agricultural workers in Malawi. Our data comprises daily output and fortnightly payroll data spanning nine years for 10,000 piece-rate workers, at a company that enforced a minimum wage. Workers face a kinked budget constraint in this setting: their earnings are flat as a function of daily effort until they hit a minimum output level. We use this kinked budget constraint and draw on insights from the public finance literature on bunching in response to tax rates to estimate the elasticity of worker effort with respect to wages. Consistent with the theoretical model behind this approach, our preliminary results (show below) reveal that there is substantial missing mass in the output distribution, centered around the minimum wage cutoff. The results from our study will shed light on the impact of minimum wage policies on agricultural labor supply in developing countries, and our labor supply elasticity estimates will help inform macroeconomic models of developing countries.
10:30am - 11:45am | 2B. Multi-mode Survey Strategies: What We Learned Through the Pandemic | Kellogg Global Hub Room 4101
Raymond Duch, "Improving Sampling and Generalizability in Field Experiments using Targeted Multi-Mode Convenience Samples and MRP"
Abstract: We report the results of a methodological project that aims to improve the generalizability of treatment effects from field experiments. Over the past three years, we have conducted field experiments with approximately 10,000 participants in both rural and urban Ghana. The in-person data collection is based on the Random Control Trials that the Oxford team has conducted in Ghana over the last 18 months. In addition, approximately 3000 Ghana subjects have participated in similar online experiments. The focus of the trials and online experiments is the impact of financial incentives on compliance with health measures promoted by national and health authorities. In parallel with this experimental data collection effort, we have prepared a digital census of the Ghana population with geo-coded units that measure approximately 2.5 square meters. These geo-coded cells are populated with extensive socio-demographic and political information. We apply multi-level modelling with poststratification (MRP) to our multi-mode samples to obtain estimates of the population average and heterogeneous treatment effect. The small-area estimates of treatment effects not only capture within-setting geographical variation but also allow us to compare the ability of different modes to improve our estimates of heterogeneous effects. The final platform we are developing provides easy-to-implement solutions for improving the potential of easy-to-obtain convenience samples to capture heterogeneity in treatment effects as well as population average treatment effects. Applied re-searchers can select the sample mode depending on the analyst’s estimation targets for heterogeneity and apply MRP to obtain population average and heterogeneous treatment effects.
Philip Wollburg, "The Effect of Survey Mode on Data Quality: Evidence from a Survey Experiment in Nigeria"
Abstract: As COVID-19 disrupted in-person survey operations, phone surveys proved a viable, useful, and cost-effective data collection mode that has since become widespread in low- and middle-income countries (LMICs). Phone surveys also respond to the need, in light of recurring shocks such as natural disasters, health epidemics, and violent conflict, for more rapid, frequent, and flexible data collection modes that can become part of routine data collection systems. However, moving from traditional in-person data collection to mixed mode data collection including phone surveys creates new questions for methodological research. One such issue are survey mode effects, that is, differences in measured outcomes resulting from the data collection mode – in-person and over the phone. The existing evidence on mode effects proper is limited in the context of LMICs; but mode effects likely vary between different topics and question types, so there is a need for broad evidence on the matter. In this study, we designed a randomized survey experiment fielded as part of the nationally representative General Household Survey in Nigeria that comprehensively investigates survey mode effects across policy-relevant outcomes covering food security, health, labor, subjective welfare, and other domains. We distributed phones to 1,000 households that answer identical questions in-person and over the phone. To isolate the effect of survey mode, we randomize the order of the in-person and phone interview and target the same respondents across interviews. We discuss methodological challenges for the experimental design relating to survey timing, respondent selection, interviewer effects, and questionnaire design. Further, we highlight how our findings will provide new insights into the size and direction of, and variation in survey mode effects and their implications for the design of mixed-mode data collection systems.
Shana Warren, "Survey Measurement of COVID-19 Vaccine Acceptance in LMICs"
Abstract: There are many reasons why different survey methods would yield different estimates of vaccine acceptance. We take advantage of instances in which multiple surveys of COVID-19 vaccine acceptance were conducted around the same time in the same country for several African countries to better understand how timing, sample composition, interview mode, and questionnaire design influence estimates of the vaccine acceptance rate. We bring together data from national surveys conducted between 2020 and 2022 by the World Bank, Facebook, UNICEF, Afrobarometer, IPA, and a study commissioned by the Gates Foundation in Burkina Faso, Ethiopia, Kenya and Nigeria. We find that COVID-19 vaccine acceptance estimates were largely consistent over time within a given survey project. There were, however, substantive differences in estimates between survey projects. We show that while sample composition is a significant driver of these differences, it cannot fully explain the large gaps between survey projects. We conclude that mode effects and question design both likely influenced the estimates and that survey measures typically exceed vaccination rates estimated from administrative data.
11:45am - 1:00pm | Lunch | White Auditorium
1:00pm - 2:00pm | Plenary Session: Progressing our Research in a Post-Pandemic World | Kellogg Global Hub: White Auditorium
Jenna Fahle, A New Data Platform from CEGA and J-PAL: The Agricultural Technology Adoption Initiative's Data Portal (ATAI)
The mission of the Agricultural Technology Adoption Initiative (ATAI), co-managed by CEGA and J-PAL, is to rigorously test programs that increase farmer welfare through the broader use of productive technologies in South Asia and sub-Saharan Africa. We aim to generate a rigorous evidence base that helps carefully identify whether particular approaches are successful in spurring agricultural transformation. The ATAI Data Portal (atai-data.org) maximizes the value of this evidence base by providing access to harmonized datasets from the initiative portfolio. Because ATAI funds different principal investigators at different points in time, they employ a number of formats, coding schemes, and documentation for their work. ATAI has undertaken the complicated work of harmonizing or making uniform this dataset portfolio to facilitate analysis across all ATAI-funded research projects. Additionally, ATAI links all datasets with a standard set of geographic variables to enable innovative geographical analysis. Read more about the ATAI Data Portal here.
Benoit Decerf, "Lives, Livelihoods, and Learning: A Global Perspective on the Well-being Impacts of the Covid-19 Pandemic"
Abstract: This study compares the magnitude of the global well-being losses that the COVID-19 pandemic inflicted for the 122 on the most-populous countries over three dimensions: loss of life, loss of income, and loss of learning. The well-being consequences of excess mortality are expressed in years of life lost while those of income losses and school closures are expressed in additional years spent in poverty. Our estimates of the well-being losses from each source are substantial: an average person lost 15 days of life, spent an additional 19 days in poverty due to income losses, and may spend an additional 29 days in poverty due to school closures – if this lost learning is not remediated. While the 2020-2021 period witnessed the largest one-year increase in global poverty in many decades, widespread school closures may lead to even greater years life spent below the poverty line. Most high-income countries suffered more years of life lost than additional years in poverty, while the opposite situation held for many low- and middle-income countries. Overall, high-income countries suffered lower total well-being losses than lower-income countries, unless one year of life lost is valued at least as much as 7 years in poverty.
Tara Slough, "Bureaucratic Incentives and the Production of Administrative Data"
Abstract: Production of administrative data represents a core task of many bureaucracies. Globally, much of the data collected and used by national (central) governments to target resources and make policies is collected and reported by local (decentralized) government bureaucrats. Such data collection is subject to two important agency problems: one between political principals and bureaucrats who collect data within local governments and one between national and local governments. This paper first documents the incentives of local bureaucrats who compile and submit data for three distinct decentralized data collection processes in Colombia: a means-testing system for social programs, a program for distributing natural resource royalties, and the national contracts databases. An original survey of three bureaucrats per local government (one per data collection process) documents substantial within-local government variation in bureaucratic incentives as well as between-program and between-locality variation in national-local government relations. Analysis of the administrative data produced by these bureaucrats provides a mapping between these incentives and bureaucrats' reporting behavior. These analyses show how bureaucratic incentives shape the accuracy of administrative data and offer new diagnostic tools to national governments and researchers who use on these data.
2:00pm - 2:15pm | Break
2:15pm - 3:30pm | 3A. Questionnaire Design: Eliciting Willingness to Pay and Women's Agency | Kellogg Global Hub Room 5301
Javier Romero, "Privacy and Measurement Error in Phone Surveys: The Case of Women’s Agency"
Abstract: There is a growing literature focusing on measurement error in self-reported data. We study the case of women’s agency. Women might misreport their agency due to fear of retaliation or social desirability biases. We conducted a survey experiment in rural Guatemala where all groups received the same questions about women’s agency as part of a phone survey, but we vary the privacy of their responses. Women in the control group answered verbally. Those in the first treatment group answered using code words, allowing for enhanced privacy as the respondent does not risk disclosing potentially sensitive information to individuals in her surroundings. The second treatment group responded using the phone keypad. These women do not risk disclosing potentially sensitive information to individuals in their surroundings or to the enumerator because the enumerator only hears the tone made by the keypad. We find that women answering under higher levels of privacy reveal much lower rates of agency. We provide evidence that is consistent with social desirability bias.
Caitlin Herrington, "Does Bid Quantity Matter? Results from a Field Experiment in Zimbabwe"
Abstract: Researcher pre-specified experimental product quantity is the status-quo when conducting willingness-to-pay (WTP) estimates, resting on the assumption of economic rationality. Yet, recent consumer hypothetical studies have found experimental quantity can impact marginal bidding behavior. This study investigates if, and to what degree, varying bid quantity in WTP elicitation impacts marginal WTP via a non-hypothetical field experiment using 527 bean farmers in rural Zimbabwe. Farmers were randomly assigned to either a fixed quantity group (FQG) where farmers bid on a 2kg seed pack or a variable quantity group (VQG) where farmers’ experimental quantity was matched to their intended purchase quantity. Preliminary results find that, on average, marginal WTP is 55% higher in the FQG than the VQG and increases with highly differentiated bean seeds. We find evidence that this difference in WTP across treatment groups is due to a behavioral bias that can arise from mental budgeting, if intended purchase quantity is above the experimental bid quantity used. These results point to the need for researchers to think critically about the experimental quantity used when designing input-based producer WTP studies, setting the experimental quantity to intended purchase quantity, when possible, to avoid potential overestimation of WTP bids. Differing marginal WTP results from the experimental bid quantity can have major implications to the pricing strategies of agro-dealers, costs to international organizations and NGOs trying to launch a new potentially socially desirable agricultural input, or the decision for potential government price-level interventions for smallholder farmers.
Lelys Dinarte-Diaz, "Measuring Women’s and Young People’s Work: The Role of Screening Questions and Self-Responses"
Abstract: Measuring work is crucial for policy making, especially in low-income countries where informal work represents a high share of total employment. Despite the relevance of the informal work in the labor market, many surveys rely on questions designed to capture traditional types of employment, and do not use appropriate screening questions to encompass all types of activities, possibly leading to the undermeasurement of the activities performed by individuals engaging in atypical work. Moreover, data on youth and women may disproportionally suffer from ‘proxy response’ bias because the respondent may accurately report their own activities, but under‐ or overreport activities of other household members. This paper provides experimental evidence to overcome these limitations and to improve data collection on work using data from 1,008 households and 2,480 individuals aged 15 to 64 years living in rural and peri-urban areas in El Salvador. We designed a methodological experiment to evaluate how a screening list of activities and the acceptance of proxy responses affect the measurement of work. We randomly assigned the 1,008 enrolled households into two treatment arms and a control group. One-third received the standard labor module for which proxy responses were accepted, preceded by a set of screening questions including our list of work activities; another third participated in a traditional survey where self-responses were required; and the last group (control group) was surveyed using the standard labor module for which proxy responses were accepted. Our results indicate that female participants surveyed with the list of activities were 12 percent more likely to report formal work relative to male respondents surveyed with the same method. Moreover, young men who self-responded were between 15 and 19 percent more likely to report being employed or working in the formal sector, compared with older men who were interviewed using the same method.
2:15pm - 3:30pm | 3B. Agriculture: Plot Measurement with Satellites, Changing Behaviors by Sharing Information on Weather, and Implementer Effects | Kellogg Global Hub Room 4101
Ashish Shenoy, "Implementer Identity Effects in Program Evaluation"
Abstract: Implementer effectiveness can be as important as policy design in shaping impacts of development interventions. Prior research has documented systematic differences in impact based on the identity of the implementer (e.g. Vivalt, 2020). We examine how implementer identity affects program evaluation in the context of a package of agricultural subsidies and extension to promote pulse cropping in Bihar, India. This program was implemented as a two-year randomized controlled trial by local NGOs with a history of engagement in the area. Endline data includes a laboratory-style incentive- compatible elicitation of participants’ demand for unsubsidized seeds, in which we experimentally vary the salience of the implementer. In one variation we explicitly advertise our evaluation of the implementer’s efforts, while in the other we describe the exercise as a study on the viability of pulse farming. We find that increasing implementer salience lowers demand for pulse seeds, likely because the program was generally viewed as a failure due to adverse weather. However, the negative salience effect is 3–4 times greater in control relative to treatment villages. This disparity is driven largely by program beneficiaries whose demand increases by 20–25% when the implementer is made salient. Salience also differentially lowers demand among those who had participated in prior NGO initiatives that delivered benefits for free, consistent with price anchoring. Our results conform to a model where program beneficiaries reciprocate by delivering positive evaluations of implementers. We demonstrate they may take costly actions to do so in an incentive-compatible demand elicitation, and the effect is quantitatively large: the estimated negative treatment effect is attenuated by 66% when evaluation is made salient. These findings suggest that program evaluation with popular implementers may be subject to a type of Hawthorne effect that systematically biases in favor of success.
Jess Rudder, "Learning from Weather Forecasts and Adaptation Among Cotton Farmers in Pakistan"
Abstract: Although short-run weather services are widely used by farmers around the world, it is not clear how weather information is incorporated into farmer beliefs and how much they learn from weather information to update their farming practices. In 2022, we ran an RCT with 490,000 cotton farmers in Punjab, Pakistan to deliver short-range weather forecasts and test how farmers used weather information to make farming decisions. A year later, we implemented a follow-up survey to elicit beliefs about climate change and perceptions about risks to test whether farmers who received forecasts were more likely to update their climate beliefs and perceptions of risk in line with long-run climate trends. We will link climate beliefs with long-run trends using satellite data to overcome gaps in coverage of weather stations. There is no consensus in the economics literature about how to elicit climate beliefs and adaptation measures. We asked a series of questions designed to measure the direction of trends such as average temperature, number of days with extreme heat and cold, monsoon onset, and days with extreme rain or drought. Individual questions can be combined into a composite index at small geographic scales to assess whether beliefs are 'correct' and the extent of spatial heterogeneity. We implemented a similar methodology to measure perceptions of risk such as the probability of crop damage due to weather events and short and long-run adaptation such as changing seeds, crops, and on-farm investment. These measures will be combined with RCT data to measure treatment effects from forecasts on other important farming outcomes, such as timing of input application, input investment, yields, and spillovers. For this presentation, I propose to focus on measurement of climate beliefs because the literature is less developed and could benefit from feedback and discussion.
3:30pm - 3:45pm | Break
3:45pm - 4:30pm | 4A. How Trustworthy are Self-Report and Observational Measures? | Kellogg Global Hub Room 5301
Moustafa El-Kashlan, "Measuring Teachers’ Perceptions of Student Performance and Gender Bias"
Abstract: This research studies the extent to which teachers are able to accurately perceive the success of their students on standardized testing. As part of a larger RCT, we ask Ugandan secondary school teachers to estimate the share of the students at their school who received each mark from 1 (best) to 9 (worst) on two high-stakes national examinations, UCE (O-Level) and UACE (A-Level). Crucially, we collected these data after the examinations were sat by students, but before the results were released. We also collect the administrative data from the Uganda National Examinations Board (UNEB). We are thus able to link the estimated distribution of test scores with the actual distribution of student test scores. We observe first that teachers tend to vastly overestimate the performance of students in their school relative to how the students actually perform, with the average “estimated” score being 2-3 scores better than the empirical average for that school. We also find that teachers tend to overestimate the performance of girls on the history exam more than boys on the same exam. Conversely, the teachers overestimate the performance of boys on physics exams more than girls on the same exam. Although we are not powered to detect changes for just history and physics teachers, we do not observe evidence which suggests that knowledge of students in those specific subjects mitigates the gendered perception of student performance, as physics teachers further overestimate boys’ performance in physics exams, while history teachers appear to do the same. Finally, we compare this measure of gender bias with a self-reported measure where we ask teachers whether they believe boys are better in science subjects than girls and the reasons they believe so.
Alexander Fertig, "Bias and Precision in Measurement of Livestock Weight: Evidence from a Benchmarking Exercise in Namibia"
Abstract: Using data collected as part of a large randomized program evaluation in Namibia, we investigate how cattle weight measurements vary across two measurement methods (farmer estimated and scale weights). We find that self-reported cattle weights are biased downwards and are less precise than objective scale measures. We then show that the program intervention itself led to an increase in the bias of self-reported measures, and that relying only on these measures would have resulted in an incorrectly estimated treatment effect. We describe certain conditions and respondent characteristics for which self-reported weights may be less biased, and suggest solutions for alternative measurement methods that are both more accurate than self-reports and more feasible than acquiring scale measures in the field.
3:45pm - 4:30pm | 4B. Surveying Firms and Health Facilities | Kellogg Global Hub Room 4101
Joshua Deutschmann, "Eliciting and Validating Markups: Evidence from Rural Retail Firms in Kenya"
Abstract: Understanding firm markups is a key topic of interest in economics, and estimating those markups has generated a long literature. Atkin et al (2015) demonstrate the potential of an alternative approach: directly asking firms. We build on this approach and test a new method for directly eliciting and validating markups in a study with rural retail firms selling agricultural inputs in Kenya. We generate firm- and product-level markup estimates using firm surveys to ask directly about retail prices and wholesale prices of a range of agricultural inputs commonly sold by firms. We validate and bound our markup estimates using prices paid by mystery shoppers and information from manufacturers and distributors. Firms generally report earning modest markups on agricultural input sales, but there is substantial variation in markups both within and across markets. Firms steer customers asking for a recommendation towards higher-markup products. We additionally document the relationship between markups and product quality, using objective quality measures from mystery shopper purchases. We discuss implications for studying market power and bounding markup estimates using survey data.
Philip Wollburg, "Integrating High Frequency Household Surveys with Health Facility Surveys for Service Delivery Evaluation"
Abstract: The COVID-19 pandemic has highlighted the value of high frequency data collection on health service utilization and outcomes. At the same time, enormous variation in health system capacity and quality across and within low- and middle-income countries has demonstrated that a demand-side perspective is incomplete without giving due consideration to supply-side factors in the effective delivery of health care. For example, demand for vaccination services, in- and outpatient care in the case of acute illness, or routine check-ups is mediated by factors such as the opportunity costs of service access, service quality, capacity, need, and perceived benefits. These factors are difficult to assess in tandem as they require linking data on the users and, importantly, non-users of health services to data on the providers of these services within the same catchment area. We propose an integrated phone survey system of health facility and household surveys as an ambitious solution to this issue. Based on the examples of pilot projects in Ethiopia and Burkina Faso, we discuss opportunities and methodological challenges associated with such an integrated data collection system. On the methodological side, we discuss sampling related issues regarding the selection of health facilities, households, and survey respondents, as well as questionnaire design considerations. Further, we highlight how these design choices depend on the population, service type, and research question of interest. Finally, we sketch out opportunities for substantive policy research regarding the quality of health service delivery, its determinants and outcomes at the individual level that the proposed data collection system facilitates.