Research

Recent Publications

"Combining family history and machine learning to link historical records: The Census Tree data set." Explorations in Economic History 80 (2021), with Joseph Price, Kasey Buckles, and Isaac Riley 

Abstract: A key challenge for research on many questions in the social sciences is that it is difficult to link records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we contribute to recent efforts to create these links with a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. We use these “true” links both to inform the decisions one needs to make when using automated methods to link records and as a training data set for use in a supervised machine learning approach. We describe our procedure and illustrate its potential by linking individuals across the 100% samples of the US censuses from 1900, 1910, and 1920. When linking adjacent censuses, we obtain an overall match rate of 62-65 percent (for over 88.9 million matches), with a false positive rate that is around 6-7 percent and with links that are similar to the population along observable characteristics. Thus, our method allows us to link records with a combination of a high match rate, precision, and representativeness that is beyond the current frontier. Finally, we demonstrate the potential of the data by estimating the degree of intergenerational transmission of literacy between father-son and mother-daughter pairs. 


“Using Linked Census Records to Study Shrinking Cities in the United States from 1900 to 1940”, The Professional Geographer (2021), with Joseph Price and Samuel Otterstrom 


Abstract: We develop a data-driven method for linking people in cities over time that can be used in any country that has data tracking the locations of individuals across multiple periods. We apply this process to United States Census data from 1900 through 1940 and find that, of the 1,000 largest cities in 1900, 15 percent experienced a decline in population by 1940. We also use the large data set for this same time period, linking more than 45 million people across adjacent census records to examine which types of people exit a shrinking city and how their eventual socioeconomic outcomes differ from those who stay. Nationally, we find that those who left shrinking cities had longer life spans, greater income, better jobs, and higher education than those who stayed. We note that the regional analyses tend to follow the positive national pattern while indicating the geographic place-based differences of the cities that lost population. We also show the relation of race to the tendency to migrate from different types of cities. This method for linking millions of individuals across censuses has the potential to reveal other important characteristics of past populations, such as multidecade migration patterns and household changes in various regions over time. 


Baseball and life expectancy: evidence from linked historical data”, Historical Perspectives on Sports Economics (2019), with Joseph Price and Sebastian Brown


Abstract: We construct a new dataset that links information on professional baseball players with genealogical information about their family members. Our sample includes 4,091 major league players born before 1940 along with 8,344 of their siblings. We find that MLB players live about 3 years longer than their siblings. In order to examine how much of this difference is due to a possible income effect, we also construct a sample of 6,134 minor league players (who never made it to the majors) and 9,245 of their siblings. We find an even larger gap for the minor league players with them living about 4.1 years longer than their siblings. The dataset that we’ve constructed provides a unique way to incorporate information from census and vital records to expand the types of measures that we have about professional athletes.

Working Papers

The long-run and intergenerational effects of natural disaster exposure: Evidence from the Galveston Hurricane of 1900


Abstract: I exploit a natural experiment to examine the long-run and intergenerational effects of a major negative shock, exploring how where we live can have long-lasting impacts. I examine outcomes of individuals impacted by the Galveston Hurricane of 1900, whose landfall was poorly forecasted in the United States. Using historic newspaper records from The Houston Post, I am able to identify towns which sustained significant physical damage or were completely destroyed by the storm. Leveraging panel data of linked US Census records for individuals living in southeast Texas in 1900, I find individuals living in towns that were damaged by the hurricane were significantly less likely to migrate to a different county following the storm, were less likely to be employed, and had significantly shorter lifespans, than individuals living in nearby towns that were unaffected. These lifespan and employment effects persist into the second generation. I also find that individuals who were impacted by the storm were less likely to be literate following the storm and had lower quality occupations.  These results point to both persistent long-run and intergenerational effects of negative shocks, which may be driven by individuals’ migration behavior, as impacted individuals were less likely to move to better places. 


"The socioeconomic effects of forced displacement: Evidence from the Tennessee Valley Authority" with Andre'nay Harris

Abstract: We examine the socioeconomic effects of forced migration using individuals who were displaced by the Tennessee Valley Authority (TVA) dam projects in the 1930s. To estimate the effect of forced migration, we use data from the relocation program associated with the TVA and link it to US Census data. We compare individuals living in counties that were impacted by the dam-induced flooding with nearby counties that would have been affected had the dams been in a different location. We find evidence that individuals who were impacted by the dam projects are more likely to participate in the labor force, are more likely to rent their homes, and pay higher rent prices. Due to existing evidence of discrimination in government programs, we examine racial disparities in outcomes and find that while both White and Black individuals have higher labor force participation, Black individuals are more likely to be unemployed.


 “Sex Ratios, Marriage, and Household Composition in Early Twentieth-Century Hawaii”, with Sumner La Croix,  Timothy Halliday, and Joseph Price (revise and resubmit at Asia-Pacific Economic History Review)


Works in Progress

The long run effects of anti-immigrant discrimination in policing: Evidence from Philadelphia

"The Orphan Train Experiment: The Impact of America's First Large-Scale Child Welfare Program" with Maxwell Bullard