Microdata offers immense opportunities for research. Students are encouraged to explore datasets and apply them to their empirical projects. This document provides an overview of (mainly) microdata sources suitable for empirical research. Many datasets are cross-sectional, i.e. a snapshot of individuals or households at a given point in time. However, some have a panel dimension that allows you to follow individuals or households over time.
Restrictions for access vary but there is special heading for fully open datasets that can be used for teaching. A few regional datasets are also included as are links to data on specialized topics such as international trade, vacancy job ads etc. For convenience, links to some sources of country-level data are provided.
The resources are particularly relevant for students in economics, sociology, education, and politics. I appreciate suggestions for additional data sources and reports of dead links, but I am not able to provide any assistance on the data or access. For such assistance, consult the documentation from the relevant provider.
General Data Repositories
PolData Repository: Contains links to datasets across various categories like cabinets, citizens (includes socioeconomic panels tracking individuals/households over time), constitutions, political institutions, parties, and more. The list of sources is downloadable with detailed metadata. Access: PolData Repository
World Bank Microdata Library: A collection of micro datasets from the World Bank and other organizations, including the Living Standards Measurement Study (LSMS) which is a household survey program in low- and middle-income countries. Access: World Bank Microdata Library | LSMS
IPUMS: Provides census and survey data from around the world integrated across time and space. Includes tools for analyzing individuals within family and community contexts. The data is cleaned and well-documented. While much of IPUMS is focused on the US it also contains IPUMS International, which offers harmonized census and survey data globally. Access: IPUMS | IPUMS International
Swedish National Data Service (SND) has a primary function to support the accessibility, preservation, and reuse of research data and related materials. Access restrictions vary between datasets. Access: SND
Income and Labor Data
Luxembourg Income Study (LIS): LIS acquires microdatasets with income, wealth, employment, and demographic data from many high- and middle-income countries, and harmonises them to enable cross-national comparisons Includes the Luxembourg Wealth Study (LWS) which is the only cross-national wealth microdatabase in existence. Access: LIS | LWS
Survey of Health, Ageing and Retirement in Europe (SHARE): Microdata on health, social, economic and environmental policies over the life-course of European citizens. Access: SHARE
Household Finance and Consumption Survey (HFCS): Harmonized household-level financial and consumption data across Europe with a panel-dimension. Need registration and possibly co-signing of advisor (probably only available to masters students). Access: HFCS
Panel Study of Income Dynamics (PSID): Longitudinal survey of US households collecting data on employment, income, wealth, expenditures, and more since 1968 (panel dimension). Access: PSID
US National Longitudinal Surveys (NLS): Tracks labor market activity, schooling, fertility, and health across birth cohorts (panel dimension). Some data is restricted access but you can access a lot. Access: NLS
American Community Survey (ACS): Microdata on US labor market outcomes and demographic characteristics. Data is cleaned, well-documented and the sample is large. A useful guide to this data is Matt Holian who has a book with replication files using the ACS. Access: ACS | Holian
Canadian Public Use Microdata Files (CPUMF): Microdata from the Canadian census covering a sample of the population. It is a comprehensive social, demographic and economic database about Canada, its people, and contains a wealth of characteristics on the population. These files enable the study of individuals, families and households. Access: CPUMF
German Socio-Economic Panel (GSOEP): Longitudinal survey of German households since 1984 (panel dimension). Variables include household composition, employment, occupation, earnings, health and satisfaction indicators. Restrictions apply and co-signing of advisor is necessary. The process might take some time and probably only relevant for masters students. However, smaller samples are more readily available for teaching purposes. Access: GSOEP | GSOEP teaching
The Bank of Italy Survey on Household Income and Wealth (SHIW): Tracks Italian household income, savings, wealth, and financial behavior. The sample used in the most recent surveys comprises about 7000 households (16000 individuals), distributed over about 300 Italian municipalities, and it has a panel dimension. Access: SHIW
LISS Panel: A Dutch panel with a representative sample of 5000 households, comprising approximately 7500 individuals. Panel members complete multiple online questionnaires every month. Access: LISS Panel
DNB Household Survey: Annual collection of economic microdata by The Central Bank of the Netherlands. The purpose of the survey is to study the economic and psychological determinants of the saving behavior of households. Each year more than 1500 dutch households participate in the project. Access: DNB Household Survey.
Survey of Consumer Finances (SCF): Cross-sectional survey of U.S. families. The survey data include information on families’ balance sheets, pensions, income, and demographic characteristics. Information is also included from related surveys of pension providers and the earlier such surveys conducted by the Federal Reserve Board. Access: SCF
Consumer Expenditure Survey (CEX): Cross-sectional household data on US consumption expenditures. Access: CEX
Education Data
International Large-Scale Assessments (ILSA): Microdata from studies such as PISA, TIMSS, PIRLS, and PIAAC (PISA for adults). Test scores, student background, survey responses and other useful information on education system. Provides direct links to data and research. Access: ILSA Gateway | PIAAC Direct
US National Education Longitudinal Study (NELS:88): Tracks a cohort of 8th-grade students from 1988 through their academic and professional lives. Detailed survey, test scores, and school data. Access: NELS:88
Chilean Education Data: Micro-level data on students, teachers, and educational outcomes. Includes anonymized personal identifiers for linking data. The Portal Databases allows the study admission to university education, and contains disaggregated information at the level of the individuals participating in the admission process. There might be a variable called 'MRUN' that is an anonymized personal identifier allowing people to connect across all the different databases. Documentation is in Spanish. Access: Chilean Education Data | Portal Databases
Social and Values Surveys
World Values Survey (WVS): Cross-national investigation of human beliefs and values, covering almost 100 countries. Access: WVS
European Values Study (EVS): Surveys basic human values across Europe, focusing on family, work, politics, and more. Access: EVS
The Swedish Society, Opinion and Media Surveys (SOM): Data on values, attitudes and habits of the Swedish population since 1986. Access: SOM
Eurobarometer: Surveys public opinion in the European Union and makes microdata on a broad range of topics available. Access: Eurobarometer.
Afrobarometer: Surveys democracy, governance, and public attitudes across 35 African countries. Access: Afrobarometer
European Social Survey (ESS): Microdata measuring public attitudes, beliefs, and behaviors across Europe. Access: ESS
US General Social Survey (US-GSS): Tracks US social characteristics and attitudes through personal-interview surveys. Access: US-GSS
German General Social Survey (ALLBUS): Tracks attitudes and behaviors in Germany since 1980. Access: ALLBUS
Generations and Gender Program (GGP): Provides an open-access microdata from cross-nationally comparative surveys and contextual data. The purpose of the survey is to provide data about families and life course trajectories. Access: GPP
International Social Survey Program (ISSP): Annual cross-national surveys on diverse social science topics since 1984. Access: ISSP
Election studies and voter behavior
European Election Studies (EES): Individual level data about electoral behaviour in European Parliament elections. Access: ESS
Comparative Study of Electoral Systems (CSES): A collaborative program of research among election study teams from around the world. Data on voting, socio-demographic, district and macro variables. Access: CSES or via CSES-GESIS
Health and Demographics
Demographic and Health Surveys (DHS): Microdata on health, HIV, and nutrition in 90+ low- and middle income countries. Access: DHS
Health and Retirement Study (HRS): Longitudinal data on older Americans, focusing on health, retirement, and aging (panel dimension). Access: HRS
Microdata for Teaching and Educational Purposes
EU Public Use Microdata: The EU provdes public use microdata from The Labor Force Survey (LFS) and the Survey of Income and Living Conditions (SILC). The data are derived from microdata made available for research and should only be used for education and training. Access: LFS | SILC
IAB-FDZ Microdata: The German Institute for Employment Research (IAB) makes public use datafiles (PUF) available for teaching purposes and can they be downloaded after registration. The IAB also provides scientific use datafiles (SUF) with more restrictions. Access: PUF | SUF
German Socio-Economic Panel (GSOEP) for teaching: Smaller version of the GSOEP (see above) longitudinal survey of German households available for teaching purposes. Access: GSOEP teaching
Statistics Finland Microdata: Sets of microdata available for students. Documentation is in Finnish so AI translation may be needed. Access: Statistics Finland
Specialized Datasets
Swedish Vacancies Data: Contains job ads from Platsbanken since 2006, detailing occupation, employer, location, and more. The dataset includes approximately 7 million job ads and it is continuously updated. Especially useful for machine learning projects. Access: Swedish Vacancies Data
UN Comtrade Database. The United Nations Comtrade database aggregates detailed global annual and monthly trade statistics by product and trading partner. Essential for research on international trade. Access: UN Comtrade
Swedish Monetary Policy Event Studies Database: High-frequency financial market reactions to Riksbank’s monetary policy announcements. Access: Swedish Monetary Policy Database
Social Explorer: A (paid) service that collects a large variety of US data under one roof. Access: Social Explorer
Kommun- och landstingsfullmäktigeundersökningen (KOLFU): Surveys of attitudes among Swedish local politicians. Access by application through SND (this is usually smooth). Access: KOLFU 2008 | KOLFU 2012
Norwegian Local Politics Data: Includes datasets on Norwegian politics compiled by Jon Fiva. Access: Norwegian Local Politics Data
Regional Data
Statistics Sweden (SCB) has data at the municipal and regional level on a range of issues. Access: Statistikdatabasen
Valresultat: Swedish election results at a highly detailed regional level (valdistrikt). Access: Valresultat
Opportunity Insights: American datasets analyzing social mobility, life expectancy, education, and more, at detailed geographical levels. Access: Opportunity Insights
Education Opportunity Project: Provides US education data at detailed geographical levels. Access: Education Opportunity Project
Country Level Data
Global Macro Database (GMD). Panel dataset of 46 macroeconomic variables across 243 countries from historical records beginning in the year 1086 to projections through the year 2030. Access: GMD
The OECD provides country level data on a range of issues. Access: OECD Data
Demographic data. The Human Fertility Database (HFD) and the Human Mortality Database (HMD) contain detailed and high-quality historical and recent demographic data. Period and cohort fertility and mortality. E.g. age of mother, birth order, births by month, cause of death. Variable coverage vary somewhat between countries and time-periods. Access: HFD | HMD
The Varieties of Democracy (V-Dem). Comprehensive and detailed democracy ratings and related topics. Access: V-Dem
The Varieties of Indoctrination (V-Indoc) is a global dataset on the politicization of education and the media. Coverage of 160 countries from 1945-2021. Access: V-Indoc
World Inequality Database (WID): Tracks income and wealth inequality within and between countries over the 20th and 21st centuries. Access: WID
Social Policy Indicators Database (SPID): Annual country-level indicators of social insurance entitlements (SIED), child benefits (CID), child care (CCD), social assistance (SAMIP) and the like. Coverage varies somewhat between indicators. Access: SPID
Rockwool-Duke Global Child Welfare Database (R-DGCWD): Cross-country information from the Global North about children who experienced contact with child welfare systems (CWS) or child protective services (CPS) between 2000 and 2020. Access: R-DGCWD
Access and Use
Registration Requirements: Terms and conditions vary. Many datasets require users to register and some that you obtain approval from an advisor. Note that formal approvals can take some time.
Ethical Use: Ensure compliance with data use agreements and maintain participant confidentiality.
Software Tools: Recommended tools include Stata, R, and Python for data analysis. The webpage for Scott Cunningham's book Causal Inference - The Mixtape contains useful examples with code in Stata, R and Python.
Consult the documentation. Do not expect advisors or study councilors to have any knowledge of the datasets. Learning to use them is a personal responsibility.