Data retrieval from local heritage books - Is artificial intelligence the solution? (2025), with Rafael Biehler, Historical Methods: A Journal of Quantitative and Interdisciplinary History, pp. 1–19, https://doi.org/10.1080/01615440.2025.2512744
Abstract:
Artificial Intelligence (AI) is rapidly transforming all scientific disciplines. Among its many applications, AI can facilitate data retrieval from a wide range of sources. We evaluate the performance of large language models in extracting data from local heritage books—valuable sources for economic and demographic history. We compare the results of AI-driven, Python code-based, and manual data retrieval for random samples of observations from three heritage books. Our analysis shows that Python code-based retrieval consistently outperforms AI, particularly in minimizing issues such as omitted or hallucinated data. Furthermore, we show that, with minor modifications our Python code-based methods can be adapted to other local heritage books, highlighting the robustness and scalability of this traditional approach.
Working papers:
WWZ Working paper, 2022 (04)
Older version: IRES - Discussion paper WP 2019-19, 2019
Abstract:
Europe during the Middle Ages, and contributed to bolstering universities at the dawn of the Humanistic and Scientific Revolutions. We build a unique database of thousands of scholars from university sources covering all of Europe, construct an index of their ability, and map the academic market in the medieval and early modern periods. We show that scholars tended to concentrate in the best universities (agglomeration), that better scholars were more sensitive to the quality of the university (positive sorting) and migrated over greater distances (positive selection). Agglomeration, selection and sorting patterns testify to an integrated academic market, made possible by the use of a common language (Latin).
Abstract:
Background Family reconstitution and data from online genealogies, such as FamiLinx, are two potential sources for investigating mortality dynamics for the period before official lifetables became available. In this paper, we use two of them, the family reconstitution of Imhof and the FamiLinx dataset based on geni.com, to estimate dynamics in life expectancy and discuss the sex-specific differential mortality in the German Empire.
Method Sex-specific lifetables are estimated for the territory of the German Empire from the individual data of the family reconstitution and the online genealogies. On the basis of these lifetables, we estimate the conditional life expectancy and derive the corresponding sex-specific differential mortality. Findings are compared with the official lifetable of the German Empire in 1871–1910. The contribution of each age group to the differential mortality is determined using the stepwise-replacement algorithm.
Results The family reconstitution overestimates conditional life expectancy less than FamiLinx after 1871, when official lifetables are available in the German Empire. However, both sources fail to capture the sex-specific mortality differentials of the official lifetables at the end of the nineteenth century and show a higher life expectancy for males instead of females. The bias in sex-specific mortality rates is particularly pronounced in the age groups 15 to 45.
Discussion Finally, we discuss possible explanations for the biased findings. Notability bias, the patriarchal approach to family trees, and maternal mortality are important mechanisms in the FamiLinx dataset. Censoring due to mobility serves as a potential reason for the bias in the family reconstitution.
The rural exodus and the rise of Europe (2022), with Thomas Baudin, Journal of Economic Growth 27, pp. 365–414, https://doi.org/10.1007/s10887-022-09206-4
Working papers:
MPIDR - Working paper WP 2019-15, 2019
Earlier version: Rural exodus and fertility at the time of industrialization, IRES Discussion Paper 2016-20, 2016
Abstract:
We build a unified model of growth and internal migration and identify its deep parameters using an original set of Swedish data. Our structural estimation and counterfactual experiments suggest that conditions of migration between the countryside and cities have strongly shaped the timing and the intensity of the transition to growth. Mobility cost had to be low enough to enable population movement. Furthermore, initial productivity in rural industries had to be moderate to sustain the first phase of industrialization appearing in the countryside without delaying too much the second phase of industrialization taking place in cities. More than the initial productivity of rural industries or migration costs alone, what truly mattered for the transition to modern economic growth was the interplay between these two elements. By contrast, we evidence a poor role for mortality decline in the whole process. Finally, we discuss why our conclusions on Sweden are exemplary for the rest of Western Europe.
Representativeness is crucial for inferring demographic processes from online genealogies: Evidence from lifespan dynamics (2022), with Diego Alburez-Gutierrez, Proceedings of the National Academy of Sciences 119(10), https://doi.org/10.1073/pnas.2120455119
Abstract:
Crowdsourced online genealogies have an unprecedented potential to shed light on long-run population dynamics, if analyzed properly. We investigate whether the historical mortality dynamics of males in familinx, a popular genealogical dataset, are representative of the general population, or whether they are closer to those of an elite subpopulation in two territories. The first territory is the German Empire, with a low level of genealogical coverage relative to the total population size, while the second territory is The Netherlands, with a higher level of genealogical coverage relative to the population. We find that, for the period around the turn of the 20th century (for which benchmark national life tables are available), mortality is consistently lower and more homogeneous in familinx than in the general population. For that time period, the mortality levels in familinx resemble those of elites in the German Empire, while they are closer to those in national life tables in The Netherlands. For the period before the 19th century, the mortality levels in familinx mirror those of the elites in both territories. We identify the low coverage of the total population and the oversampling of elites in online genealogies as potential explanations for these findings. Emerging digital data may revolutionize our knowledge of historical demographic dynamics, but only if we understand their potential uses and limitations.
Abstract:
When did mortality first start to decline, and among whom? We build a large, new data set with more than 30,000 scholars covering the sixteenth to the early twentieth century to analyze the timing of the mortality decline and the heterogeneity in life expectancy gains among scholars in the Holy Roman Empire. The large sample size, well-defined entry into the risk group, and heterogeneity in social status are among the key advantages of the new database. After recovering from a severe mortality crisis in the seventeenth century, life expectancy among scholars started to increase as early as in the eighteenth century, well before the Industrial Revolution. Our finding that members of scientific academies—an elite group among scholars—were the first to experience mortality improvements suggests that 300 years ago, individuals with higher social status already enjoyed lower mortality. We also show, however, that the onset of mortality improvements among scholars in medicine was delayed, possibly because these scholars were exposed to pathogens and did not have germ theory knowledge that might have protected them. The disadvantage among medical professionals decreased toward the end of the nineteenth century. Our results provide a new perspective on the historical timing of mortality improvements, and the database accompanying our study facilitates replication and extensions.
Working papers:
IRES Discussion Paper 2015-9, 2015
Thünen-Series of Applied Economic Theory, No. 137, 2014
Abstract:
This paper investigates the problem of an “optimum population” concerning age structures in a 3-period OLG-model with endogenous fertility and longevity. The first-best solution for a number-dampened total social welfare function, including Millian and Benthamite utilitarianism as two extreme cases, identifies the optimal age structure, which generally fails in laissez-faire economies. As individuals do not internalize the effect of longevity on life-cycle income, they over-invest in health. Additionally, they choose a non-optimal number of offspring. A calibration exercise for 80 countries emphasizes that the over-aging of populations crucially depends on social preferences and on observed age structures. Interestingly, it appears that, unlike taxes on health expenditures, taxes or subsidies on children to decentralize the first-best solution are sensitive to social preferences. Still, with the introduction of sufficiently large positive externalities of health expenditures or of individuals who do not fully internalize the effect of health efforts on longevity, taxes might become subsidies on health efforts to avoid an under-investment in longevity.
Older version:
Abstract:
European fascist regimes have attached great importance to nationalistic families and designed policies to perpetuate them. Most offered policy packages with interest-free loans repayable through childbirth, along with allowances and tax deductions for large families. Using a difference-in-difference approach and Nazi Germany as a case study, we show that these policies may have counterproductive effects due to negative selection mechanisms in the marriage market. The excessive pressure to marry exerted on singles results in lower quality, ultimately less fertile, and more fragile unions. This finding is important as the main European far-right parties today propose reinstating these policy packages.
More recent papers (available up on request):
Abstract:
In this paper, I show how the design of the health insurance system affects the incentives to give birth. A stylized model illustrates the mechanisms through which a fully-funded private health insurance coverage can be associated with higher fertility. Free coverage of children in the public pay-as-you-go health insurance and income effects due to varying parental premiums of public and fully-funded private health insurance might operate in the opposite direction. The empirical investigation focuses on the German dual health insurance system and relies on data from the German Socio Economic Panel. I apply endogenous treatment effects models for count data to control simultaneously for the observed and the unobserved heterogeneity that explain self-selection into the type of health insurance coverage and the number of births. The results indicate that having private health insurance coverage has a positive effect on fertility in Germany. The finding is robust across several robustness checks.
Abstract:
Das Modell des isolierten Staates bildet die Grundlage für Johann Heinrich von Thünens Untersuchung zur Entlohnung des Produktionsfaktors Arbeit. Innerhalb des Modells wird formal und anhand ausgewählter Zahlenbeispiele der 'naturgemässe Lohn' ermittelt. Mit dem Ziel, Aspekte der Grenzproduktivitätstheorie aufzuzeigen, werden seine Gedanken nachgezeichnet und der formale Zusammenhang hergeleitet. Für die Begründung seiner Lohngleichung A = (a*p)ˆ0.5 wird die Grenzproduktivität innerhalb der funktionellen Einkommensverteilung aufgezeigt. Abschliessend wird die Anwendbarkeit seiner Formel ausserhalb des Modells kritisch geprüft.
Zwischen Förderung und Unterdrückung - Die diskrepanta Geburtenpolitik im Dritten Reich (2025), In: Körper-Teile(n): Interdisziplinäre Veranstaltungen der Aeneas-Silvius-Stiftung, Schwabe Verlag (Basel).
Ortsfamilienbücher - eine exzellente Forschungsgrundlage für Geschichts-, Wirtschafts- und Sozialwissenschaften (2022), with Georg Fertig und Christian Boose, Compgen-Blog.
A dynamic general equilibrium approach to migration in economic history (2021), with Thomas Baudin, High–Performance computing and Data Science in the Max Planck Society, pp. 60 – 61.
Scholars and Literati at the University of Ingolstadt (1459–1800) (2024), with David de la Croix and Clara Kotala, Repertorium eruditorum totius Europae, 14, pp. 55 – 62.
Scholars and Literati at the University of Prague (1348–1800) (2023), with A.M. Gkopi, Repertorium eruditorum totius Europae, 11, pp. 49 – 60.
Are Scholars’ Wages Correlated with their Human Capital? (2023), with David de la Croix, Frédéric Docquier and Alice Fabre, Repertorium eruditorum totius Europae, 10, pp. 9 – 15.
Scholars and Literati at the University of Freiburg (1457–1800) (2023), with A.M. Gkopi, Repertorium eruditorum totius Europae, 9, pp. 59 – 68.
Scholars and Literati at the University of Leipzig (1409–1800) (2022), with David de la Croix, Repertorium eruditorum totius Europae, 8, pp. 33 – 42.
Scholars and Literati at the University of Tübingen (1477–1800) (2022), with David de la Croix, Repertorium eruditorum totius Europae, 7, pp. 21 – 30.
Scholars and Literati at the University of Heidelberg (1386–1800) (2022), with David de la Croix, Repertorium eruditorum totius Europae, 6, pp. 25 – 34.
Scholars and Literati at the University of Leiden (1575–1800) (2022), with David de la Croix, Repertorium eruditorum totius Europae, 5, pp. 9 – 16.
Scholars and Literati at the University of Göttingen (1734–1800) (2021), with David de la Croix, Repertorium eruditorum totius Europae, 4, pp. 1 – 8.
Scholars and Literati at the University of Gießen (1607–1800) (2021), with David de la Croix, Repertorium eruditorum totius Europae, 2, pp. 31 – 38.
Scholars and Literati at the University of Jena (1558–1800) (2021), with David de la Croix, Repertorium eruditorum totius Europae, 1, pp. 25 – 32.
The setting: demographic trends and economic development in Germany and two selected regions (2011), with Stephan Kühntopf and Thusnelda Tivig, In: T. Kronenberg, W. Kuckshinrichs (edit.), Demography and Infrastructures: National and Regional Aspects of Demographic Change, Springer, pp. 11 – 43.