This data is the measure of the share of tasks in each occupation that can be made more productive using Generative AI tools, measured in March 2023, based on GPT 4-level functionality.
This data is used and described in "Generative AI and Firm Values" (Eisfeldt, Schubert, Taska, Zhang, 2025) - please see the paper for details and cite the paper if you end up using this data or this methodology!
Download occupation-level data: [Stata format] [CSV format]
Notes:
There are 4 columns: 1) SOC 2010 codes, 2) Total occupational GenAI exposure, 3) Contribution to occ exposure coming just from core tasks, 4) Contribution to occ exposure coming just from supplemental tasks
For instance, the supplemental exposure share (ShareSupp) we use in the paper is computed as: ShareSupp = genaiexp_estz_supplemental / genaiexp_estz_total
This data is the measure of the expected weighted share of tasks in each firm that can be made more productive using Generative AI tools, measured in March 2023, based on GPT 4-level functionality and the firm's occupational employment structure on LinkedIn.
This data is used and described in "Generative AI and Firm Values" (Eisfeldt, Schubert, Taska, Zhang, 2025) - please see the paper for details and cite the paper if you end up using this data or this methodology!
Download firm-level data: [Stata format] [CSV format]
Notes:
There are 4 columns: 1) Firm GVKEY codes, 2) Total firm GenAI exposure, 3) Contribution to firm exposure coming just from core tasks, 4) Contribution to firm exposure coming just from supplemental tasks
For instance, the firm-level supplemental exposure share (ShareSupp) we use in the paper is computed as: ShareSupp = firmgenaiexp_estz_supplemental / firmgenaiexp_estz_total
This data can be used to create instruments for local housing markets and population growth based on network connections to other cities, or to predict spillovers between cities through migration channels.
This is the replication data for migration network measures in "House Price Contagion and U.S. City Migration Networks" (Schubert, 2024) - please cite the paper if you end up using this data or this methodology!
I provide several versions of the data (for both 2010 CBSA and 1990 commuting zone geographies) to make it easier to use these network measures in other projects. All of these are computed based on public IRS data on county-to-county migration flows and FHFA house price indices (see the paper for details).
Migration exposure weights psi_ij for each CBSA_i from each other CBSA_j (only continental US included). These have been normalized sum to one from the perspective of each CBSA_i and are based on average 1990-1994 migration flows
House price network exposure (variable name nw_f_g_hpi) - this is Sum_j psi_ij d ln P_jt for each year in each CBSA
Log changes in CZ house prices (balanced panel, variable g_hpi): d ln P_it
Unnormalized migration exposure weights psi_ij for each CZ_i to each other CZ_j (only continental US included), all based on average 1990-1994 migration flows
Normalized migration exposure weights psi for each CZ_i to each other CZ_j. These have been normalized to sum to one from the perspective of each CZ_i
House price network exposure (variable name nw_f_g_hpi) - this is Sum_j psi_ij d ln P_jt for each year in each Commuting zone (city_i) based on 1990 boundaries