A Cross-verified Database of Notable People (3500 BC - 2018 AD)
Nature Scientific Data, 2022
with Morgane Laouenan, Jean-Benoît Eyméoud, Olivier Gergaud, Guillaume Plique and Etienne Wasmer)
Corresponding visualization using the data: Notable people map by Topi Tjukanov.
Coverage: DailyMail UK, NY Post, Metro UK, +175 others
Article Metrics available here
Codes and Dataset available here
Abstract: A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of Wikipedia and Wikidata. Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43,000 of human beings having ever lived), including a third who are not present in the English edition of Wikipedia. Building on seminal works based on Freebase and Wikipedia by Schich et. al 2014 and Yu et. al 2016, we use six additional language editions of Wikipedia as well as Wikidata as complementary sources of information about notable individuals to reduce the Anglo-Saxon bias present in current works. We obtain information about their demographic characteristics, place and death of birth / death, citizenship, and occupation. This is an attempt to create the most comprehensive database employable for understanding the role of culture, gender and creative classes in fields of economic growth, urban economics, cultural development and network analysis.
Abstract: Social connections matter for educational, non-cognitive and long run labour market outcomes. Using a sample of 12,842 students from India, I first show that relatively isolated students face a host of socio-emotional and academic disadvantages. I then implement a two-tier randomized deskmate matching intervention, aimed at improving the outcomes of these isolated students. The results reveal a notable trade-off. Within the classroom, matching isolated students with each other improves their social connections with peers, interactions with teachers and social / non-cognitive skills. However, at the classroom level, this comes at the cost of broader classroom level negative externalities. Specifically, deskmate plans which pair a majority of isolated students with the most popular deskmates improve the overall social integration and academic performance of isolated students, but have no impact on their social and non-cognitive skills. To explain these patterns, I build a model of network formation in which returns to social effort are shaped by both endogenously determined peer interactions and independent sociological mechanisms such as negative social comparisons and proximity effects. Consistent with the empirical findings, the model shows that outcomes for isolated students, in equilibrium, depend on both their immediate deskmate and the overall composition of matches—where the negative externalities of matching more isolated students with each other emerge after a particular threshold. Optimal matching strategies must therefore weigh direct versus group-level impacts, which may move in opposite directions, giving rise to equity-efficiency trade-offs.
Abstract: Paying for college is often a family affair, with both parents and students contributing. We study the effects of college on family finances using administrative data on the universe of federal aid applicants in California linked to credit records. We provide the first comprehensive analysis of how both students and their parents use debt with college attendance and how prices affect those decisions. We start by using an event-study framework to explore how parents’ use of debt and credit outcomes change after their child first submits a federal aid application for college enrollment. While total debt does not change, higher-income parents shift balances from other debt to educational loans. We find that lower-income parents take out more education loans, experience less delinquency on non-educational debt, and see their credit scores rise. We then use discontinuities in eligibility for generous financial aid to test how an exogenous change in the price of college affects parental debt and financial health. We find that parents finance increases in the price of college through educational loans as well as home equity loans. Higher prices increase parental delinquency on debt. The findings highlight an important channel by which college and its rising cost may spill over into the broader financial health of families and economy.
Abstract: How does the climatic experience of previous generations affect today’s attention to environmental questions? Using self-reported beliefs and environmental themes in folklore, we show empirically that the realized intensity of deviations from typical climate conditions in ancestral generations influences how much descendants care about the environment. The effect exhibits a U-shape where more stable and more unstable ancestral climates lead to higher attention today, with a dip for intermediate realizations. We propose a theoretical framework where the value of costly attention to environmental conditions depends on the perceived stability of the environment, prior beliefs about which are shaped through cultural transmission by the experience of ethnic ancestors. The U-shape is rationalized by a double purpose of learning about the environment: optimal utilization of typical conditions and protection against extreme events.
Homophily of Behavioral Traits is Strong in Social Networks, but Depends on Demographics and Increases Segregation
(Reject and Resubmit, Nature Communications)
with Daniel Chen, Matthias Sutter and Camille Terrier
Abstract: Social networks are a key factor for success in life, but they are also strongly segmented by gender, ethnicity, and other demographic characteristics. We present novel evidence on an understudied source of homophily: behavioral traits (such as prosociality, risk aversion, or cooperation). Using unique data from incentivized experiments with more than 3,000 French high-school students, we find high levels of homophily across all behavioral traits that we study. Notably, the extent of homophily depends on demographic similarities, particularly gender. As a result, the demographic-based segregation of networks is further amplified by a behavioral-based segregation, which exacerbates the differences related to gender or socio-economic status. We discuss policy implications of this exacerbation.
Learning from Peers, at Scale: Experimental Evidence from a Peer Tutoring Intervention in Bihar
(Draft on Phase 1 results available upon request)
with Dashleen Kaur, Nikhil Kumar, Madhavi Jha and Tarang Tripathi
Abstract: Altering classroom environments and leveraging peer networks show promise as some of the most cost-effective interventions targeting inputs in the education production function. Yet, the extent to which findings from prior small-scale studies generalize to typical education systems remains unclear. We evaluate this question through a peer tutoring intervention in government primary schools in Bhagalpur, Bihar, conducted with minimal external support and embedded within the noisy infrastructure characteristic of developing-country settings. The program involved 14,077 students in grades 3–5 across 176 schools, where high-performing students led daily small-group remedial math sessions. We find significant gains in math proficiency and reductions in math anxiety among learners. Classroom social networks became tighter and leaders became more central, suggesting broader effects on the learning environment. These results demonstrate that structured peer tutoring can be both effective and scalable, offering a viable pathway to improving foundational learning at scale in low-resource contexts.
Trajectories of Notable Individuals: A Cross-verified Database of Locations
(Final Data Verification in process, draft coming soon)
with Minda Belete, Morgane Laouenan, Olivier Gergaud and Etienne Wasmer
Abstract: Famous individuals contribute to the visibility of cities, and vice-versa. The production of historical data on notable individuals has expanded in recent years but information on their association to locations remains scarce. We extend our older work from the Brief History of Human Time project and improve information on geographical locations beyond birth and death place of 2.29 million notable individuals spread over 3500 years of human history. We compile a consolidated database of places visited by these individuals in their lifetime. Using information from the text in Wikipedia in a structured way, we assign a reasonable range of years for each location associated with an individual to identify their locations of residence and work over their lifetime. We cross verify this information against information contained in Wikidata. We use multiple Wikipedia editions simultaneously to further assign a confidence and intensity measure of association for each location to an individual. We create metrics useful for measuring the impact of the presence of notable individuals on city growth from a historical perspective (with various focuses such as the development of global cities – e.g. 20th century in Americas, 21st century in the Global South, or the Middle-Age and Industrial Revolution in Europe).
Percolation of Wildfires related Credit Shocks through Family Networks
(Data Construction in process)
with Shreya Chandra
Abstract: How would a warming planet affect household finances and inequality through the increased incidence of wildfires around the globe? Current estimates in the literature paint an incomplete picture since they do not account for spillovers emanating from families supporting each other during financial shocks. To understand the mediating effect of risk sharing within families on financial impacts of natural disasters, this project uses the universe of California residents with credit records over the last 20 years. Using ID homogenization and tracking methods, we first construct family units in credit data and then explore how individuals in wildfire affected areas transfer the financial incidence to family members in unaffected areas. Further, we analyze how rich vs poor families share risk between parents and children. Since credit constraints can force poor parents to transfer risk to their children and the lack of credit constraints allow rich parents to create buffers for their children, we explore potential long run inequality impacts of wildfires through divergent wealth and debt levels between households of different attributes.
TeachAIde - Improving Teacher Agency and Student Outcomes through Hypercontextualized Generative AI Chatbots
(Second round of Pilot and Scoping in process)
with Tushar Kundu, Chandraditya Raj and Tarang Tripathi
Abstract: Overcrowded classrooms in India often prevent teachers from meeting diverse student needs. To understand how AI at scale can enhance teacher productivity, we develop TeachAIde — a hyper-contextualized, AI-powered assistant that transforms classroom data on learning levels, peer dynamics, and engagement patterns into clear, curriculum-aligned recommendations. The tool reduces cognitive load and enables differentiated instruction without additional resources. By providing real-time insights, TeachAIde helps teachers adapt lessons, group students effectively, and track progress. We evaluate its impact through a randomized intervention with a saturated rollout across five Indian states, measuring changes in teacher agency, student engagement, and learning outcomes. By enhancing rather than replacing teachers, TeachAIde demonstrates how AI can serve as a scalable force for equity and human capital development in resource-constrained education systems.
Historical Elite Social Networks and the Escape from the Malthusian Trap
(Data Construction in process)
Abstract: Social Connectedness is and has always been a key determinant in individual success and personal growth. But, does it scale up and affect nations and institutions too? Mokyr (2016), “A Culture of Growth” highlights the key role that cross occupational connectedness played in the development of economies in Medieval Europe. For ideas to be generated and then put into practice, academics needed (i) safeguards from the political elite and (ii) connections with like-minded entrepreneurs to transform ideas into reality. This project takes a big data approach to test the impact of cross-occupational connections on economic growth of historical empires. Building on our older work from the Brief History of Human Time project, I generate personal and professional social networks of notable individuals by structurally parsing information from text in different editions of Wikipedia. I use the occurrence of cross links on individual biographies and incidence of sharing similar workplaces / institutions during one’s lifetime to construct historical social networks. I further cross verify this information against specific property indicators present on WikiData. Attributes of the constructed social networks are then benchmarked against historical growth estimates and key events to analyze the role of inter-occupational and inter-generational ties in development of medieval European economies.
Improving Organ Donation rates in Saudi Arabia using Religious Messaging and Simplified Choice Architecture
(Pilot and Scoping Ongoing)
with Faisal Kattan, John List and Mike Price
Can AI reduce Obesity Rates in Saudi Arabia using just In-time Personalized Nudges trained on High Frequency Calorie Diaries?
(Pilot and Scoping Ongoing)
with Faisal Kattan, John List, Vitor Melo and Mike Price
Using Family, Network and Individual nudges to reduce Social Media Use amongst Adolescents and Improve Mental Health Outcomes
(Pilot and Scoping Ongoing)
with Faisal Kattan, John List, Vitor Melo and Lena Song
Abstract: Social networks are a key factor of success in life, but they are also strongly segmented on gender, ethnicity, and other demographic characteristics (Jackson 2010). We present novel evidence on an understudied source of homophily: behavioral traits. Based on unique data collected using incentivized experiments with more than 2,500 French high-school students, we find high levels of homophily across all behavioral traits that we study. Notably, the extent of homophily depends on similarities in demographics, particularly gender. Using network econometrics, we show that the observed homophily is not only an outcome of endogenous network formation, but is also a result of friends influencing each others' behavioral traits. Importantly, the transmission of traits is larger when students share demographic characteristics such as gender.
Abstract: This paper analyses the impact of the fear of losing social status (la peur du déclassement) on the levels of democratisation and economic growth. In a formal model where education makes an individual’s vote count and generates positive externalities for all agents in the economy, I examine the incentives for the elites to initiate a democratic transition by educating the masses when they care about both their status (relative position in the income distribution) and their absolute incomes. In the context of imperfect capital markets, the paper analyses the equilibrium patterns of political outcomes, income distribution, and growth as a function of initial income, inequality, distortions from tax structures and externalities from education. The model illustrates that a higher weight on status preference leads to a lower level of democratisation and economic growth.