Micro data

Last changes made: 29th September 2014

Return to my main data website. If you're thinking about running some firm-level (or perhaps plot-level) production functions you should have a look at my detailed overview of the methods in this empirical literature (with Christian Helmers).

Data collections and micro-data repositories

$$ Many UK universities and colleges have subscriptions to the databases maintained by ESDS International (Economic and Social Data Service, based at Essex University). They have some micro-data, most notably the Young Lives data (see last entry in the micro-household section).

The single most useful website for those searching for datasets for development is maintained by Gunilla Petterson, an economics PhD student who is based at the University of Sussex. Her developmentdata.org website has links to a vast number of datasets for development and is constantly updated.

Another extremely useful link is the DEVECONDATA blog maintained by Masayuki Kudamatsu, an economics lecturer based at the Institute of International Economic Studies (IIES) at the University of Stockholm. This not only provides links and regular updates to existing and new datasets for development but also provides crucial information on some of the nitty-gritty data isses. Note that some of the datasets listed on this website requires subscription. Masa also has other very useful links on his personal webpages, such as lecture notes, valuable resources for STATA and a list of regular conferences on Development Economics.

An excellent new resource for household or firm-level data from LDCs is OpenMicroData. I do like their approach: 'OpenMicroData is run by a network of empirical researchers who believe that microdata should be freely available.' Good thinking, guys. So far I can see some of the CSAE African firm and hh datasets linked, as well some data from randomised experiments in education from Burkina Faso. The site has only been up for a few months. [Gunilla Patterson featured the new site on her excellent devdata website]

The World Bank has created a new Central Microdata Catalog for all the micro-level datasets "in catalogs maintained by the World Bank and a number of contributing external repositories." At the moment of writing this repository includes 378 datasets. Slowly, slowly this Open Data malarky is getting serious...

Usually referred to as the BRICS (Brazil, Russia, India, China, South Africa), the fortunes of a group of emerging economies is of particular interest to many development economists. As part of the Pathfinder project the UK ESRC (research council for economics and other social sciences) has published Data Discovery - A rough guide to microdata in Brazil, China, India and South Africa. This details datasets from the four countries and discusses some of the issues involved in public access to data. Focus is on micro-data for health, education, firms, labour markets, housing and crime.

The IQSS Dataverse Network claims to be the world's largest collection of social science research data. As far as I can see this represents primarily the data used in existing papers, although there are also some very interesting 'raw' data links. The project is based at the Institute for Quantitative Social Science at Harvard, so the recent interest in randomized experiments in development is represented quite strongly in this archive. When I accessed it there were over 35,000 studies linking to 640,000 files.

The National Bureau of Economic Research (NBER) has a number of datasets available at their website. These are divided into Macro, Industry, International Trade, Individual, Hospital, Demographics & Vital Statistics, Patent data and other. Most of these datasets are for developed countries.

The US Inter-University Consortium for Political and Social Research (ICPSR) is a huge depository for data relevant for development economics. What I really like about ICPSR is their motto: "Please note that ICPSR does not provide publications, reports, or ready-made statistics. What we do supply are the numeric raw data used to create publications, reports, and figures." I wish some of the international organisations would subscribe to this approach... Your university/institution may need to be a member of ICPSR (check here) for you to get access to the data, but this is not necessarily true. Many of the datasets are in STATA or SAS format already.

The Minnesota Population Center provides the Integrated Public Use Microdata Series (IPUMS International). "IPUMS-International is composed of microdata, which means that it provides information about individual persons and households. This makes it possible for researchers to create tabulations tailored to their particular questions [...] The data series includes information on a broad range of population characteristics, including fertility, nuptiality, life-course transitions, migration, labor-force participation, occupational structure, education, ethnicity, and household composition [...] The database currently describes approximately 325 million persons recorded in 158 censuses taken from 1960 to the present. The database includes censuses from 55 countries" (including LDCs such as Uganda, Rwanda, Cambodia, Kenya and many LAC countries). A large amount of documentation is provided, as well as supplemental data including GIS boudary files. Registration required (provide research project summary).

The Institute for Social & Economic Research at the University of Essex hosts Keeping Track - A guide to longitudinal resources. The site "aims to provide an up-to-date guide to major longitudinal sources of data. The central purpose of this site is to allow users to see what kinds of longitudinal data are available and to locate information about studies which may provide data useful to their research interests. The site covers data sets collected by governmental, academic, private social research, medical and private industrial sources. This site includes household panel surveys, studies following the health of individuals, birth cohort studies, studies following the quality of a product design, and administrative records. Users of this site can find out basic details of the purpose, methodology, timing, coverage, and availability of the longitudinal data sets covered here. The site also offers links to the web pages of individual studies, and provides contact details for people wishing to get more information about any particular study." [via Sebastian Bauhoff @Harvard]

The Institute for Health Metrics and Evaluation (IHME) in Seattle has created GHDx, the Global Health Data Exchange. This is an excellent data resource, a "catalog of the world's health and demographic data. Use the GHDx to research population census data, surveys, registries, indicators and estimates, administrative health data, and financial data related to health." Follow IHME on twitter: @IMHE_UW - they've already got 1,200 followers so their tweets are obviously very useful.

Innovations for Poverty Action (IPA) is a research group comprising any of the most prominent academics of what I'd call the 'new empirical micro'. The outfit was founded by Dean Karlan and brings together the usual suspects at the frontier of development micro (Banerjee, Duflo, Fischer, Kramer, Miguel, etc). Their data website links to some of the data used in published work, e.g. for the de Mel, McKenzie and Woodruff RCT with firms in Indonesia among many other (RCTs). A second interesting resource (primarily in order to get to see where the field is going) is the database of ongoing and complete IPA projects, which can be searched by sector, researcher or country.

The Center for International Data at UC Davis has some productivity datasets for South Korea and Taiwan.

The Data & Information Services Center (DISC) Archive at University of Wisconsin-Madison provides access of population censuses and other demographic data from North and South American countries. [Thanks to Gunilla Petterson, who featured these data on her developmentdata.org site]

The William Davidson Institute provides macro and micro data on emerging and transition economies, the Davidson Data Center and Network. When I checked out the website none of the browsing tools worked, but the keyword search delivered a lot of interesting leads. The database also contains links to other databases, such as the China Data Center at U Michigan.

DataFirst is a Survey Data Archive and training facility at the University of Cape Town, South Africa. The Archive’s holdings include the datasets from all major South African surveys, as well as survey data from other African countries. But: Due to copyright restrictions, the datasets themselves are not downloadable from the site but survey data from surveys conducted by the University of Cape Town are available from DataFirst's website via our Public Access Catalogue.

The Office of Population Research (OPR) at Princeton University is a rich source of data for demography and especially migration research (among other topics). Projects include the ongoing Mexican Migration Project and Latin American Migration Project as well as the Addis Ababa Mortality Surveillance Project. THe World Fertility Survey (for 41 LDCs) should also be of interest. Access to some of the data requires registration. [Thanks to Gunilla Petterson, who featured these data on her developmentdata.org site]

Plamen Nikolov, an economics PhD candidate at Harvard, has put together a handout coa number of micro and macro datasets for development economics. The micro datasets have a distinct focus on household and health-related datasets. [Thanks to Plamen for the pointer]

Back up to the Table of Contents

National data archives

The Tanzania National Bureau of Statistics ('Statistics for Development') has a number of surveys on its website 'Tanzania National Data Archive'. You need to be registered to request data (top-right corner of the screen has the link to the registration). Examples include the Integrated Labour Force Survey 2006 and the Agriculture Sample Census Survey 2002-2003. Data aside the website also has a citations tab, which features articles by Stefan Dercon and Gabriel Demombynes (both with co-authors) among others.

Back up to the Table of Contents

Project-level information/data

The Mapping for Results Platform (beta version) of the World Bank provides detailed information about "our work to reduce poverty and promote sustainable development around the world. This pilot website aims to visualize the location of our projects and to provide access to information about indicators, sectors, funding and results."

A database of a different sort is provided by people at the Chronic Poverty Research Institute at Manchester University: in its 5th update/version the Social Assistance in Developing Countries Database "provide[s] a summary of the evidence available on the effectiveness of social assistance interventions in developing countries". If, for instance, you want to find out what the actual cash transfers of Progresa/Oportunitades amounted to, this document gives you a concise overview of the program.

Microfinance Information Exchange (MIX) provides MIX Market, "the premier source for microfinance data and analysis. Our mission is to promote microfinance transparency through integrated performance information on microfinance institutions (MFIs), investors, networks and service providers associated with the industry. MIX provides objective data and analysis with the goal of strengthening the microfinance sector." You can go down to the level of an individual MFI project (of which there are currently over 1,800 'registered' with MIX) and download the data on performance, borrowers, etc. or you pick and indicator and can view data for all MFIs over the past 5 years. [via DEVECONDATA by Masa Kudamatsu]

The Learning and Educational Achievement in Punjab Schools Survey (LEAPS) project is run by "the World Bank, Pomona College and Harvard University in collaboration with the Government of Punjab and highly trained local counterparts". "The LEAPS Survey consists of data from 823 schools in 112 villages in 3 districts of Punjab. [...] To measure learning outcomes, the LEAPS project administered detailed exams on English, Math, and Urdu to students in Grade III, then followed those same children and tested them again in Grade IV, Grade V, and Grade VI. Teachers were also tested and given extensive surveys so that child-learning outcomes could be linked to teacher qualifications, and parents were surveyed to provide information on educational contributions made at home."

AidData, a partnership between Brigham Young University, the College of William and Mary, and a non-profit development organization, Development Gateway, has released a new database that captures China's development finance activities in Africa. "This database will provide a foundation for researchers, policymakers, journalists, and civil society organizations to analyze the distribution and impact of Chinese development finance to the region. The database contains nearly 1,700 official finance projects in 50 African countries, totaling over $70 billion in reported financial commitments [...] The dataset uses a media-based data collection methodology developed by AidData, which helps synthesize and standardize vast amount of project-specific information contained in thousands of English and Chinese language media reports." Data can be downloaded in full (excel) or visually analyzed. Right now the database runs from 2000 to 2011.

Back up to the Table of Contents

Plot-level data for Agriculture

The Washington-based International Food Policy Research Institute (IFPRI) has an interesting data set for Ethiopia which combines a household survey with a plot-level survey. The title of the project was "Policies for sustainable land management in the Ethiopian Highlands dataset 1998-2000" and the data is in SPSS format.

Fellow CSAE member Andy Zeitlin provides data and background material on a survey of Ghanaian Cocoa Farmers in which he has been involved for a considerable number of years now. The data is now available in 5 waves from 2002 to 2010. Please note: "[t]he data are available in Stata format for public use, and the CSAE is very happy for these to be used. I only ask that you contact me to let me know if you are planning to make use of these data." On Andy's research page (link above) you can find a couple of his papers using this unique dataset.

Back up to the Table of Contents

Firm-level data

The World Bank Enterprise Surveys are a great resource for firm-level data from developing and emerging economies. The data are in Stata format and users need to register to download either the standardized or original country survey datasets. The earliest available data is from 2002 and the latest is from 2007. These datasets are usually panel (based on recall). There is also a mega-file containing all the surveys and the Business Environment and Enterprise Productivity surveys (BEEPS). This aggregate dataset contains 27 European and Central Asian countries which were surveyed with the same questionnaire across countries in 2002 and again in 2005 with a similar questionnaire. The website is regularly updated when new data comes on-stream and you can sign up for an email newsletter, too.

The World Bank Investment Climate surveys are currently off the web. Check this link later. The website promises an 'interactive statistical and econometric tool. Links IC indicators and firm performance. Over 60 comparable surveys available.'

The Business Environment and Enterprise Performance Survey (BEEPS) is a joint initiative of the European Bank for Reconstruction and Development (EBRD) and the World Bank. The survey was first undertaken on behalf of the EBRD and World Bank in 1999–2000, when it was administered to approximately 4000 enterprises in 26 countries of Eastern Europe and Central Asia (including Turkey) to assess the environment for private enterprise and business development. There now exist four rounds of this data, which is available in STATA format for the 2002-2009 panel and for individual years. The objective of the survey is to obtain feedback from enterprises in EBRD countries of operation on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time. The survey examines the quality of the business environment as determined by a wide range of interactions between firms and the state and as such facilities research and serves as an input into policy dialogue with countries in Central and Eastern Europe.

The World Bank enterprise surveys division provide updated raw survey data (free registration) for Belarus, Bulgaria, Germany, India, Kazakhstan, Lithuania, Poland, Romania, Russia, Serbia, Ukraine, and Uzbekistan. These countries are part of the new Management, Organization and Innovation survey work. In total, 1,777 firms were surveyed. 'The purpose of the survey is to measure and compare management practices across countries; to assess the constraints to private sector growth and enterprise performance resulting from management practices; and to stimulate policy dialogue about management practices and innovation.' Interesting work in the area of cross-country analysis of management practices (and their impact on productivity) has also been carried out by John Van Reenen (LSE) with Nick Bloom (Stanford) and various co-authors. On the latter's webpage there are links to a number of the large datasets they have created.

The Centre for the Study of African Studies (CSAE) at Oxford has a number of panel datasets for manufacturing firms in Africa, including a comparative firm-level dataset for Ghana, Kenya, Tanzania, South Africa and Nigeria. These are in STATA format. There is also cross-section data for the manufacturing sector of five African countries, Cameroon, Ghana, Kenya, Zambia and Zimbabwe. This dataset is unusual in having measures of both physical and human capital. Data and programs are in SAS and documentation is provided.

The data for one of the seminal papers in the FDI spillover literature (firm-level), Aitken & Harrison (1999, AER) is available on Ann Harrison's website at UC Berkeley. This covers over 10,000 Venezuelan firms in the period 1976-1989 with an average of 4 waves of data per firm (41,000 observations). Variables include KLEM with two types of labour, plus a number of expenditures.

A working paper by Grid Thoma and co-authors entitled 'Methods and software for the harmonization and combination of datasets: A test based on IP-related data and accounting databases with a large panel of companies at the worldwide level' should be a great resource for anybody wanting to merge firm-level data for productivity analysis. [Thanks to Christian Helmers for pointing out this paper]

The Development Economics Research Group (DERG) at the Department of Economics, University of Copenhagen has (since 2001) been involved in several enterprise surveys in Vietnam (SMEs) and Mozambique. The former has 3 waves since 2002 with the fourth (2009) one coming on-stream soon, the latter has 2002/2006 data.

CSAE has also recently started posting data for working papers it publishes (so far as this is possible vis-a-vis copyright protection). These include data for the papers of my buddies Courtney Monk (apprenticeship in Ghana), Simon Baptist (compares panel data for Ghanaian and Korean firms) and Simon Quinn (Tanzanian labour market; panel).

Chris Woodruff (Warwick/UCSD) has links to firm-level manufacturing data from Hanoi and Ho-Chi-Minh-City from the mid-1990s. Detailed documentation is provided. There is also data for manufacturing firms surveyed in five Eastern European countries, Poland, Slovakia, Romania, Russia and Ukraine.

The NBER Patent Data Project has US patent data for 1976-2006 and there are also some firm-matches available in this database. Related to this the World Intellectual Property Organisation (WIPO) offers WIPO Lex, a "one-stop search facility for national laws and treaties on intellectual property (IP) of WIPO, WTO and UN Members".

Conducted by the World Bank in January/February 2006 (covering 2005 but with some recall data for 2002) the Indonesian Rural Investment Climate Survey (RICS) is an in-depth, quantitative survey of 2549 non-farm enterprises, 2782 households and 149 communities in 6 rural Kabupaten. The RIC Survey data provides the first representative snapshot of the investment climate in six different types of rural Kabupaten, allowing policymakers to identify and address the key constraints to investment and growth. Data is provided in SPSS and Stata format, together with full documentation. [Via Masa Kudamatsu at DEVECONDATA]

Eric Bartelsman (VU Amsterdam), John Haltiwanger (U Maryland) and Stefano Scarpetta (OECD) have created a unique dataset for sectoral productivity and job flow analysis in a number of developing and emerging economies. "The job flow measures are available at a country, sector, size, and year level of observation and the productivity measures are available at a country, sector, year level of observation. As described in detail in the documentation, available measures include not just first moments but higher moments including measures of dispersion and covariances. For example, the job flow measures permit decomposing net employment growth at a disaggregated level into job creation, job destruction as well as the contribution of entry and exit to job creation and job destruction. The data were produced from a series of projects funded by the OECD, the World Bank and other sources." The datasets and code (Stata, SAS) and detailed documentation are all downloadable in one zipped folder. Note: This is NOT firm-level data, but I felt it still fits under this headline since it is likely micro/labour economists will find this a useful resource.

Back up to the Table of Contents

Community-level data

Mexico's Universidad de Guadalajara and Princeton University host the Latin American Migration Project (LAMP), "which was created in 1982 by an interdisciplinary team of researchers to advance our understanding of the complex processes of international migration and immigration to the United States." The researchers have conducted surveys in Colombia, Costa Rica, Dominican Republic, El Salvador, Guatemala, Haiti, Nicaragua, Paraguay, Perú and Puerto Rico, each time in various communities. There's a wealth of information on the website, including survey design, questionnaires, etc. The data is available in SAS, SPSS and Stata format for all country studies.

Back up to the Table of Contents

The Mexican Family Life Survey (MxFLS) is a multi-thematic and longitudinal database which collects, with a single scientific tool, a wide range of information on socioeconomic indicators, demographics and health indicators on the Mexican population. MxFLS is the first Mexican survey with national representation departing from a longitudinal design, tracking the Mexican population for long periods of time regardless of migration decisions with the objective of studying the dynamics of economy, demographics, epidemiology, and population migration throughout this panel study of at least, a 10-year span. The data can be downloaded in Stata format.

Back up to the Table of Contents

Household- and Individual-level/cohort data

Bob Baulch at the Chronic Poverty Research Centre at the University of Manchester has compiled an annotated listing of Household Panel Data Sets in Developing and Transition Countries, featuring among many others the data used for his own work in Pakistan, Vietnam and Bangladesh. The listing is by country and includes information on the waves/years, sample size and major references. [via DEVECONDATA by Masa Kudamatsu]

The International Household Survey Network provides access to over 3,400 household-level datasets . This includes data on from agriculture to child labour to LSMS, income, expenditure... In most cases the link takes you not straight to the data, but to the website of the project or organisation, so may have to search around for a while.

The Bureau for Research and Economic Analysis of Development (BREAD) provides links to a large number of household-level datasets, including among others Family Life Surveys, University of North Carolina Surveys, University of Washington CSDE Vietnam Research Projects, Rural Economic and Demographic Survey (REDS), India Agriculture and Climate Data Set, Indian National Sample Survey Organization, Learning and Education Achievement in Punjab Schools, Colombian Familas en Accion, World Bank Living Standards Measurement Study.

The LSE's development department STICERD (The Suntory and Toyota International Centres for Economics and Related Disciplines) has a "virtual center" for fieldwork in Development Economics. This not only includes datasets and related materials (questionnaires etc.) but also resources related to methodology, including 'The Basics of Developing Questionnaires'.

The Rural Income Generating Activities (RIGA) project has created an internationally comparable database of household income sources from existing household living standards surveys for low and middle-income countries. Most of the surveys used by the RIGA project were developed by national statistical offices in conjunction the World Bank as part of its Living Standards Measurement Study. The database is maintained by the FAO. At present the database incorporates 27 surveys covering 16 countries in Africa, Asia, Eastern Europe and Latin America. In addition RIGA provides a link to research papers that have used the data [thanks to Alberto Zezza at the FAO for letting me know].

Since 1984, the MEASURE DHS (Demographic and Health Surveys) project has provided technical assistance to more than 200 surveys in 75 countries, advancing global understanding of health and population trends in developing countries. DHS are funded by USAID with contributions from other donors. Data are currently collected under the umbrella of the Measure project which is administered by Macro International . Data have been collected in four waves: DHS-I (1986-90), DHS-II (1991-1992), DHS-III (1993-1997), Measure (1998-present).

As part of a project analyzing poverty and social assistance in the transition economies a team at the World Bank under the guidance of Branko Milatovic have created HEIDE (Household Expenditure and Income Data for Transitional Economies), a very large integrated household and individual-level data for nine Eastern European economies in 1993. The (Stata) data covers expenditure, income, assets, household descriptives, individual characteristics and amounts to a total of around 3 million observations. There are files describing variables, data cleaning etc. and a link to a working paper about the project. [This link features on Stefania Lovo's website].

Britain's ippr in partnership with the Global Development Network (GDN) provides data from a major project on migration and development, aimed to assess migration’s impacts, collect evidence on those impacts, help to build research capacity on migration and development issues in developing countries and examine fresh policy options for improving migration’s contribution to development. Apart from rich qualtitative data the researchers collected new nationally-representative household surveys in Colombia, Fiji, Georgia, Ghana, Jamaica, Macedonia and Vietnam. The final implemented survey questionnaires are also provided alongside the datasets, which are provided in Stata format. [This project was featured in a recent tweet by CGD's Michael Clemens @m_clem]

The World Bank's Living Standards Measurement Study (LSMS) offers publications, tools and most importantly access to household-level surveys it has been collecting since 1985.

The World Bank also has a dedicated African Household Survey Databank.

The Mexican Family Life Survey (MxFLS) is a multi-thematic and longitudinal database which collects, with a single scientific tool, a wide range of information on socioeconomic indicators, demographics and health indicators on the Mexican population. MxFLS is the first Mexican survey with national representation departing from a longitudinal design, tracking the Mexican population for long periods of time regardless of migration decisions with the objective of studying the dynamics of economy, demographics, epidemiology, and population migration throughout this panel study of at least, a 10-year span. The data can be downloaded in Stata format.

The Washington-based Education Policy and Data Center (EPDC) "provides global education data, tools for data visualization, and policy-oriented analysis aimed at improving schools and learning in developing countries." They say they have "the world’s largest international education database with over 3.8 millon data points from 200 countries. The data comes from national and international websites including household survey datasets as well as studies and reports." This is not just macro data, but also household surveys and census data; another very useful thing they do is to provide Stata do-files to construct indicators from the hh data.

The National Statistical Office of Bolivia provides access to a number of demographic and health surveys, as well as income expenditure surveys for the 1989-2009 period. The website is in Spanish and registration (free) is required. [Thanks to Gustavo Canavire-Bacarreza, graduate student at Georgia State in Atlanta, for the link]

UNICEF assists countries in collecting and analyzing data in order to fill data gaps for monitoring the situation of children and women through its international household survey initiative the Multiple Indicator Cluster Surveys (MICS). The first round of MICS was conducted around 1995 in more than 60 countries; second round of surveys was conducted in 2000 (around 65 surveys); the third round (50 countries) in 2005-06; the fourth round of Multiple Indicator Cluster Surveys (MICS) is scheduled for 2009-2011 and survey results are expected to be available from 2010 on. Data coverage: in MICS3, as in the previous rounds, three model questionnaires were developed: a household questionnaire, a questionnaire for women aged 15-49, and a questionnaire for children under the age of 5 (addressed to the mother or primary caretaker of the child). [via Sebastian Bauhoff @Harvard]

Conducted by the World Bank in January/February 2006 (covering 2005 but with some recall data for 2002) the Indonesian Rural Investment Climate Survey (RICS) is an in-depth, quantitative survey of 2549 non-farm enterprises, 2782 households and 149 communities in 6 rural Kabupaten. The RIC Survey data provides the first representative snapshot of the investment climate in six different types of rural Kabupaten, allowing policymakers to identify and address the key constraints to investment and growth. Data is provided in SPSS and Stata format, together with full documentation. [Via Masa Kudamatsu at DEVECONDATA]

The International Food Policy Research Institute (IFRPI) offers a wide range of household and community-level surveys on its data website. Chief among these is the set of Ethiopian Rural Household Surveys (ERHS), collected in 6 waves between 1989 and 2004, which is provided with all additional information, questionnaires etc. Note that despite the Amazon-style lingo ('Basket', 'Proceed to checkout') all you need to do is register on the site: then you can access/download all of the datasets featured. The datasets can also be accessed from the IFPRI Dataverse entry.

Chris Udry at Yale's Economic Growth Center (EGC) provides access to household survey data. The introduction to the surveys states that "The surveys would begin with a (clustered) random sample of approximately 5,000 households in 200 communities in rural and urban areas of each country. Every three years following the initial survey, a (stratified) random sample of each individual in the original 5,000 households would be followed for re-interviews." Other than the above document there is not much obvious documentation, but there is data for Ghana and Nigeria, some of it in Stata format (with do-files).

The Russia Longitudinal Monitoring Survey (RLMS) is a series of nationally representative surveys designed to monitor the effects of Russian reforms on the health and economic welfare of households and individuals in the Russian Federation. These effects are measured by a variety of means: detailed monitoring of individuals' health status and dietary intake, precise measurement of household-level expenditures and service utilization, and collection of relevant community-level data, including region-specific prices and community infrastructure data. Data have been collected 19 times since 1992. Of these, 15 represent the RLMS Phase II, which has been run jointly by the Carolina Population Center at the University of North Carolina at Chapel Hill, headed by Barry M. Popkin, and the Demoscope team in Russia, headed by Polina Kozyreva and Mikhail Kosolapov. You need to register to get access to the data and describe your research project. In return the website is probably one of the best I've come across to give information about the data and what has been done with it [This link features on Stefania Lovo's website].

The Townsend Thai Project (initiated and headed by Robert Townsend at MIT) data include both annual and monthly panels, in addition to the collection of environmental data. Originally the Townsend Thai survey focused on villages in four provinces, two in the Northeast and two in the Central region. The baseline survey was conducted in 1997. To date, the Townsend Thai project continues to resurvey the annual and monthly panels. In 2006, the annual surveys extended to include urban areas in the same four provinces. In 2003, an annual survey of villages in the South was added and in 2004, two provinces in the north were included in the annual survey. The project emerged as a means to understand the broader economic and social context in which policies are enacted and research is conducted. Its goal is to build a bridge between policy and research by providing rich data from which academics and policy-makers alike can better understand household activities and behavior, as well as their relationship to the broader regional and national economy.

Sebastian Bauhoff at Harvard offers some links to household-level datasets for China at various US universities, including primarily data on health and population.

If you are interested in calorie consumption, you need to convert the amounts of food consumption (collected from household surveys) to obtain the data. Annex 1 of the FAO (2001)'s Food Balance Sheets: A Handbook provides the conversion factors (how many kilo calories 100 grams of food contain) for a wide variety of foods for international use - note that this data is contained in a pdf, not in excel or STATA. For India consult Gopalan, Sastri, and Balasubramanian's book entitled Nutritive Value of Indian Foods (Hyderabad: National Institute of Nutrition, 1971) [thanks to Masa at DEVECONDATA from which both of these links are lifted].

Nancy Qian at Yale has links to a number of Chinese household surveys on her website, including the China Health and Nutrition Survey (CHNS) at University of North Carolina Population Center as well as the familiar CHIP data (China Household Income Project) available through ICPSR.

Stefan Dercon at Oxford University provides links to a number of datasets he has helped collect, including a Rural Household Survey for Ethiopia (panel), the Kegara Health and Development Survey (Tanzania, panel) and ICRISAT data, as well as Young Lives (see separate entry below). Entirely unrelated, Stefan also provides this gem.

RAND has a number of Family Life Surveys on their website, includings surveys for Malaysia, Indonesia, Guatemala and a region in Bangladesh called Matlab. The website gives a lot of information about the data available.

The Office of Population Research at Princeton University provides access to data from the Mexican Migration Project, the Latin American Migration Project and the World Fertility Surveys (WFS) which were conducted in 41 countries during the 1970s and early 1980s. This is a very good site to find out about data on fertility including the Chinese In-Depth Fertility Surveys.

The Young Lives project at the University of Oxford combines quantitative and qualitative data for childhood poverty in four developing countries. The study is being conducted in Ethiopia, India (in the Andhra Pradesh state), Peru and Vietnam. The study aims to follow 2,000 children (aged approximately 1 year in 2002) and their households, from both urban and rural communities, in each of the four countries (8,000 children in total) for a period of 15 years. Quant waves are in 2002, 2006, 2009, 2012 and 2015, qual waves in 2007, 2009, 2012 and 2015. They've also created the 'Virtual Village', which is quite an effort to visualise data in a new format.

The Malawi Diffusion and Ideational Change Project (MDICP) is a collaboration by people at UPenn and two medical colleges in Malawi. The focus of the study is on the roles of social interactions in (1) the acceptance (or rejection) of modern contraceptive methods and of smaller ideal family size; and (2) the diffusion of knowledge of AIDS symptoms and transmission mechanisms and the evaluation of acceptable strategies of protection against AIDS. The website provides a great deal of information about this and a sister project in Kenya, including papers, qualitative surveys and the quants data. [featured by Masa on Devecondata]

Plamen Nikolov, an economics PhD candidate at Harvard, has put together a handout coa number of micro and macro datasets for development economics. The micro datasets have a distinct focus on household and health-related datasets. [Thanks to Plamen for the pointer]

Back up to the Table of Contents

Disaggregated Conflict Data

ACLED (Armed Conflict Location and Events Dataset), compiled by the Centre for the Study of Civil War (CSCW) at the Peace Research Institute Oslo (PRIO), "is designed for disaggregated conflict analysis and crisis mapping. This dataset codes the location of all reported conflict events in 50 countries in the developing world. Data are currently being coded from 1997 to early 2010 and the project continues to backdate conflict information for African states to the year of independence. These data contain information on the date and location of conflict events, the type of event, the rebel and other groups involved, and changes in territorial control. Specifics on battles, killings, riots, and recruitment activities by rebels, governments, militias, armed groups, protesters and civilians are collected. Events are derived from a variety of sources, mainly concentrating on reports from war zones, humanitarian agencies, and research publications. These data can be used in any GIS, any mapping program, or statistical package." The website also provides links to existing research using this data. [Thanks to Anke Hoeffler at CSAE]

The Households in Conflict Network, funded by The Leverhulme Trust and supported by the Institute of Development Studies at Sussex, the German Institute for Economic Research (DIW) in Berlin and the University of Antwerp, has a Resource & Data website where they provide Philip Verwimp's dataset on victims of genocide in Kibuye, Rwanda (Stata file). This aside the site contains a lot of information on this research topic.

Back up to the Table of Contents

Other surveys

The International Labour Organisation's (ILO) International Programme on the Elimination of Child Labour (IPEC) collects data on the extent, characteristics and determinants of child labour. The micro datasets (mostly cross-sections) are predominantly for African and Latin American countries (data for a total of 30 countries). Their website further contains additional documentation such as the questionnaires, publications and reports compiled from the data.

The Learning and Educational Achievement in Punjab Schools Survey (LEAPS) project is run by "the World Bank, Pomona College and Harvard University in collaboration with the Government of Punjab and highly trained local counterparts". "The LEAPS Survey consists of data from 823 schools in 112 villages in 3 districts of Punjab. [...] To measure learning outcomes, the LEAPS project administered detailed exams on English, Math, and Urdu to students in Grade III, then followed those same children and tested them again in Grade IV, Grade V, and Grade VI. Teachers were also tested and given extensive surveys so that child-learning outcomes could be linked to teacher qualifications, and parents were surveyed to provide information on educational contributions made at home."

Fellow CSAE member Andy Zeitlin provides data and background material on a project which investigates the impact of strengthening information flows on learning outcomes in rural, government primary schools in Uganda. "The baseline survey includes data collected in 100 schools, in 4 districts. This field exercise included collection of a school-level survey instrument, standardized testing of pupils in P3 and P6, and individual questionnaires administered to a sample of head teachers, teachers, School Management Committee members, and parents. Data from the baseline survey are available in Stata format, together with supporting documentation." You should also check out the papers with my colleague Abigail Barr Andy has written using the data, which are available in the 'Research' section of his website.

Back up to the Table of Contents

Randomized and other Experiments

The Abdul Latif Jameel Poverty Action Lab (J-PAL) at MIT, home to the powerful new social science tool of randomized experiments, has links to the data used in some of the Randomistas' work. There are the datasets for the textbooks, remedial education, teacher absenteeism, women as policymakers, healthcare and microfinance experiments.

Data from the path-breaking conditional cash transfer randomised experiment Progresa (now renamed Oportunitades) in Mexico can be found here.

John List at University of Chicago has created a website where he lists "publications and discussion papers in experimental economics that make use of the 'field' in some manner". The information includes a link to the paper, year of publication and sometimes JEL codes. Papers are classified into three categories: "1. Artefactual field experiments, which are the same as conventional lab experiments but with a non-standard subject pool (i.e., non-students). Running Peruvian borrowers through lab games (Karlan, 2005 AER) would be an example of an artefactual field experiment. 2. Framed field experiments, which are identical to artefactual field experiments but with field context in either the commodity, task, or information set that the subjects use. An example would be work that elicits valuations for public goods that occur naturally in the environment of the subjects (see some of Bohm's work). 3. Natural field experiments, which are identical to framed field experiments except that the subjects do not know that they are participants in an experiment. An example could be found among the recent surge in fundraising experiments (see, e.g., List and Lucking-Reiley, 2002, JPE)."

Back up to the Table of Contents

Historical Data

Please go to the relevant section on my Macro page here to view all the historical datasets I could find. I felt it was better to keep them all together, even if some do not represent macro data.

Back up to the Table of Contents

Miscellaneous

Joshua Angrist at MIT has made all of the datasets used in his papers available on his website.

Bob Allen's website at Nuffiled has links to historical wage and price data for a number of countries, cities and occupations respectively.

The Russia Longitudinal Monitoring Survey (RLMS) is a series of nationally representative surveys designed to monitor the effects of Russian reforms on the health and economic welfare of households and individuals in the Russian Federation. These effects are measured by a variety of means: detailed monitoring of individuals' health status and dietary intake; precise measurement of household-level expenditures and service utilization; and collection of relevant community-level data, including region-specific prices and community infrastructure data. Data have been collected sixteen times since 1992. The project is based at the University of North Carolina at Chapel Hill and directed by Barry Popkin.

Back up to the Table of Contents