Development Economics‎ > ‎Data‎ > ‎

Macro data

Last changes made: 17th February 2015

Return to my main data websiteFollow me on Twitter @MEDevEcon to get updates. View Stata commands for Panel Time Series methods to analyse macro panel data.




Aggregate Economy Data

The Penn World Table (PWT) data compiled by the Center for International Comparison at UPenn is the standard dataset for cross-country analysis of aggregate growth and development. The latest version from August 2009 (PWT 6.3) covers 189 countries for some or all of the years 1950-2007. Base year is 2005. There is also a discussion of the changes made to previous versions, which addresses some of the problems with the data raised by Johnson, Larson,  Papageorgiou and Subramanian (2009). Whether analysing the aggregate economy is the right thing to do is a different question...
March 2011: The last UPenn PWT has just been published (after 2012 PWT will be jointly maintained by Robert Feenstra at UC-Davis, and Marcel Timmer and Robert Inklaar at the University of Groningen): Penn World Table version 7. The data covers 189 countries and territories for 1950-2009, with 2005 as reference year. The official reference is "Heston, Robert Summers and Bettina Aten, Penn World Table Version 7.0, Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, March 2011."

The World Bank International Comparison Program (ICP)  collected data in 100 (developing and emerging) economies, divided into five regions, and then combined these with a Eurostat-OECD PPP program, bringing the total to 146 economies. Like the PWT these are PPP data, and given the same base year (2005) they now can be combined, compared and contrasted. Coverage: gross domestic product (GDP), GDP per capita, household consumption, collective government consumption, and capital formation for all 146 economies. Estimates aer based on national surveys that priced nearly 1,000 products and services. Comparative price levels are also included. Downside: this isn't a panel. 2005 is the first year this exercise was undertaken, 2011 will be the next wave.

The World Bank has recently published its annual World Development Report, which this year focuses on Conflict, Security and Development. A dedicated website makes the data underlying the analysis in the report easily accessible. The excel spreadsheet covers a total of 211 countries, with maximum coverage over the years 1960-2009. The data is not limited to conflict and political economy issues but also covers geography, colonial history and foreign aid among other topics. All of the data is publicly available (and many datasets are featured here on MEDevEcon), but the unique advantage here is bringing a vast number of conflict-related data from dozens of sources (PRIO, UNHCR, Polity IV, etc.) together in a single spreadsheet (and doing a great job documenting the data and sources.

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

The World Bank has recently reorganised access to the major cross-country panel datasets it produces, all of which are now available (for browsing or download) from a single website. [Gunilla Patterson featured the new site on her excellent devdata website]
 
World Population, GDP and Per Capita GDP, 1-2003 AD compiled by Angus Maddison at the Groningen Growth & Development Centre (GGDC). 

Jerry Dwyer at the Federal Reserve Bank of Atlanta provides data from his 2006 Economic Inquiry article with Scott L. Baier and Robert Tamura. This covers output, physical and human capital for 145 countries over a long time horizon (1831-2000); the data provides between 2 and 17 time-series observations per country, with an average of around 7. Additional variables of particular interest include average age and experience of the workforce, which allow for Mincerian wage equation-type analysis at the macro level. The data is provided in a neat excel file with additional information on variable definition and construction also provided (along with the article). 

Total Economy Database (1950-2007) compiled by The Conference Board and the Groningen Growth & Development Centre. Contains annual numbers of GDP, population, employment, hours and productivity for about 125 countries.

The UNIDO World Productivity Database compiled by Anders Isaksson at the UN Industrial Development Organisation. Contrary to my initial assumption (oh, bliss), this data does not refer to manufacturing, but to the aggregate economy. Coverage is from 1960 to 2000 for 112 countries. The website is mainly a tool to compute TFP, so you don't really get access to the 'raw' data.

Michael Clemens (CGD) and Lant Prichett (HKS) have produced an interesting alternative measure to per capita income/GDP: 'income per natural' — the mean annual income of persons born in a given country, regardless of where that person now resides. The data is a cross-section for 2000 and the related paper is here. I copied that data into an excel file for ease of use.

The World Bank PovcalNet is an interactive tool to calculate poverty lines and compare them across countries.

Wikiprogress is the official platform for the OECD-hosted Global Project on "Measuring the Progress of Societies" and Wikiprogress.Stat allows users to upload their data and metadata, and to navigate through a robust database of progress indicators. Themes on the website include Ecosystems Condition, Human Well-Being, Economy, Social and Welfare Statistics and Peace. There's a wealth of indicators here (sometimes cross-sectional or limited to a few time-series observations) and the data sources are clearly identified. Available for download to Excel. [Thanks to Angela Costrini Hariche, OECD Development Centre and Statistics Directorate and Project Manager of Wikiprogress]

The World Bank Doing Business project 'provides objective measures of business regulations and their enforcement across 181 economies and selected cities at the subnational and regional level.' The raw data for these surveys (run from 2004 onwards but with varying coverage for individual countries) is available via summary reports , which can then be accessed in excel.

Many of the above are featured on the resource website Macro Data 4 Stata which homogenises several commonly used macroeconomic datasets and imports them into Stata. The project is run by Giulia Catini, Ugo Panizza and Carol Saade and started uploading .dta files fairly recently. The library at present includes data from the Penn World Table and the Groningen Growth and Development Data Centre. The AAA Codes dataset looks particularly handy for anybody doing cross-country analysis  [thanks to Aid-man Nic Van de Sijpe for pointing me to this resource].

Without wanting to sound patronising, I applaud anybody's attempts to make data more widely available, so congratulations to a new upstart called Google, offering access to some World Bank, Eurostat and US data on their website. Don't try and google "Google data" as you won't find it that way ;-) This resource is useful primarily for their data visualisation tool - for individual variable country series can be graphed as lines over time, bars or with the use of maps [thanks to Paddy Carter at Bristol for the pointer]. 

Funded by the IADB, the Oxford Latin American Economic History Database (OxLAD) contains statistical series for a wide range of economic and social indicators covering twenty countries in the region for the period 1900-2000. Its purpose is to provide economic and social historians worldwide with a systematic recompilation of available statistical information in a single on-line source. The website also provides other resources including a long list of references, many of them in Spanish, and detailed discussion of the methodology of data construction. Downloads are in csv format.

A useful resource to learn about how macro data is collected (among other things): the UK Economic and Social Data Service (ESDS) has produced 'Countries and Citizens: Linking international macro and micro data'. This is "an interactive training resource with online tutorials, activities, study guides and videos, designed to show how to combine socio-economic data from country-level aggregate databanks (macro data) with individual-level survey datasets (micro data). It comprises five units, each of which was written by a subject specialist and has been designed as a self guided learning resource. Though specifically for postgraduates and researchers, it may also be of interest to undergraduates." Unit 2 seems quite useful.

Follow me on Twitter @MEDevEcon to get updates

Country-Specific Macro data

John Muellbauer and Janine Aaron at CSAE run a research project on 'Structural Macro-Modelling of the South African Economy' (SMMSAE). They provide a number of indicators and indices they constructed, including FLIB (financial liberalisation) and trade openness indicators in excel/CSV format. The SMMSAE website also links to the papers they have written on macro-modelling for SA.

The data aggregation website Quandl provides access to a vast number (they say 5m) data series for countries around the world. This resource picks up data from various well-known sources (e.g. IMF, World Bank) and links to them from their own easy-to-use website. Perhaps the best feature are dedicated programs for R, Python, Matlab, Excel, Maple, Julia, Clojure [they're making up these names, or is there really a stats program called Julia?] and also Stata. The latter can be installed by typing "ssc install quandl" in Stata (see helpfile for syntax). The only downside so far seems to be that you cannot download panel data like for instance in the World Bank WDI Stata command wbopendata. Maybe Felix Leung is already busy coding that feature...

Marc Muendler at UCSD has brought together a number of useful tools for the analysis of Brazilian data (and some data, too). This includes various price indices, sectoral FDI (1980-2000), tariffs and exchange rates.

The Institute for Applied Economic Research (Ipea) in Brazil provides a range of macro data for the country and its regions. The link is for the Portuguese site, there's also an English version. [Thanks to Manoel Bittencourt, Senior Lecturer at the University of Pretoria/South Africa, for the link]

Chinese macro and micro data: when researching provincial FDI I frequently made use of the China Data Center at U Michigan. Much of the more recent data (primarily statistical yearbooks for various topics as well as provincial statistical yearbooks) is downloadable as Excel worksheets, whereas the earlier data is available in pdf format. There are also the China Survey Data Network and various census datasets. Researchers at Universities may find that their institution has forked out for the annual subscription fee and that they can access these data without additional cost. Statistical Yearbooks for 1996-2001 were freely accessible at the time of writing.

The Socio-Economic Database for Latin America and the Caribbean (SEDLAC) provides statistics on poverty and other distributional and social variables from 25 Latin American and Caribbean countries, based on microdata from households surveys. [Masa featured the new site on his excellent Devecondata website]

Follow me on Twitter @MEDevEcon to get updates


Cultural and Social Norms and Values, Faith and Religion

The World Values Survey represents 5 waves of data from the early 1980s to the late 2000s, covering survey data on social norms and values from 87 nations. The data is provided in SPSS, STATA and SAS formats. Variables related to individuals' happiness, how they feel, what is important in their lives, qualities their children should learn etc.

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

The Fractionalisation dataset, compiled by Alberto Alesina and associates, measures the degree of ethnic, linguistic and religious heterogeneity in various countries. Covering 215 countries (past and present) the dataset contains only one observation for each country. The language and religion indices are based on data from 2001. Most of the data used to compute the ethnic fractionalisation index are from the 1990s, but for some countries older data are used (as far back as 1979). [Via the Norwegian Social Science Data Services MacroDataGuide]

Robert Barro and Rachel McCleary have compiled a cross-country dataset on the share of religious people in the population. "Adherence fractions of population are shown for 10 religion groups and non-religion (incl. atheists) in 1970, 2000, and 1900 (from Barrett)." Data is available for download in excel format from Barro's Harvard data page. His working paper page offers a considerable number of papers on the topic of religion and growth. [via Masa Kudamatsu's DevEconData blog]

Follow me on Twitter @MEDevEcon to get updates

Human Capital (i): Formal Education/Schooling

The seminal dataset on educational attainment, compiled by Robert Barro and Jong-Wha Lee (the 'Barro-Lee data'), is available from a new dedicated website. The data is available for download in full for 146 countries by 5-year age group or 15 years, 25years, and over in 5-year intervals for the period 1950-2010 (in xls, csv, or dta format). The site also links to some previous versions of the dataset and other resources, including Soto and Cohen (2006) and a few select academic papers [Thanks to Adrian Wood for the pointer to the new site.]

The Washington-based Education Policy and Data Center (EPDC) "provides global education data, tools for data visualization, and policy-oriented analysis aimed at improving schools and learning in developing countries." They say they have "the world’s largest international education database with over 3.8 millon data points from 200 countries. The data comes from national and international websites including household survey datasets as well as studies and reports." This is not just macro data, but also household surveys and census data; another very useful thing they do is to provide Stata do-files to construct indicators from the hh data. 

Mauro Caselli, Jörg Mayer and Adrian Wood have compiled a unique extension to the Barro-Lee (2001) and Cohen-Soto (2001) data on average adult years of schooling (attainment) using UNESCO data on literacy rates. Missing values are imputed based on a regression model investigating the link between average adult education and literacy rates in the available data and applied to countries where the attainment variable is missing but literacy rates are available.  Of the 133 countries covered, no imputations were needed for 95, imputations for some but not all years for 19, and imputations for all years for 19. The link is for a zipped folder containing Excel and Stata files as well as detailed documentation. The data is applied in a paper by Jörg and Adrian investigating the global impact of China's industrialisation on other LDCs' structural change. [Thanks to Adrian Wood for making the data available.]

Marcelo Soto and Daniel Cohen have constructed a rival to the Barro & Lee gold standard of data on average years of schooling across 95 countries. From the abstract of their Journal of Economic Growth paper (Vol.12(1), 2007): "We present a new dataset for years of schooling across countries for the 1960–2000 period. The series are constructed from the OECD database on educational attainment and from surveys published by UNESCO. Two features that improve the quality of our data with respect to other series, particularly for series in first-differences, are the use of surveys based on uniform classification systems of education over time, and an intensified use of information by age groups."  [thanks to my man Fabio Manca for pointing me to this resource].

Christian Morrisson and Fabrice Murtin from the OECD have constructed a historical database (entry under 'A century of education') on educational attainment in 74 countries for the period 1870-2010 (decadal estimates), using the perpetual inventory methods before 1960 and then the above Cohen and Soto (2007) database. This data should be particularly interesting in combination with for instance the Maddison data.

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

Jerry Dwyer at the Federal Reserve Bank of Atlanta provides data from his 2006 Economic Inquiry article with Scott L. Baier and Robert Tamura. This covers output, physical and human capital for 145 countries over a long time horizon (1831-2000); the data provides between 2 and 17 time-series observations per country, with an average of around 7. Additional variables of particular interest include average age and experience of the workforce, which allow for Mincerian wage equation-type analysis at the macro level. The data is provided in a neat excel file with additional information on variable definition and construction also provided (along with the article).

A collaborative effort by the IIASA World Population Program and the Vienna Institute of Demography (VID) has reconstructed population data by Age, Gender and Level of Educational Attainment for 120 Countries over the 1970-2000 period. The authors use a method which 'backprojects' the past levels from 2000 data. The files are in excel format and there are a number of working papers with technical details, comparison with observed data, etc. [Thanks to my buddy and human capital wizard Fabio Manca for the link]

Rural and Urban Education data (1960-1985) by C Peter Timmer is available in Chapter 29, 'Agriculture and economic development', of the Handbook of Agricultural Economics, Volume 2, Part 1, 2002, Pages 1487-1546. The link above is for the IDEAS RePec entry of this article: this is a copyrighted publication, but if you have access to the Handbook through your library you can easily copy the data. The coverage is exclusively for developing countries (N=65), and the data offers average years of schooling per person over the age of 25 for the rural and non-rural areas. OECD data on the same topic should allow for the inclusion of developed countries in the analysis.

The World Bank EdStats (hit Data Query link) provide access to UNESCO Institute for Statistics (UIS) data on education. It presents a large number of indicators for more than 200 countries since 1970. Indicators are organized by category, including Pre-primary, Primary, Secondary, Tertiary, Expenditure, Labor, Population, Teachers and Other. In February 2011 UIS launched historical time series data for key indicators of school enrolment and completion (gross enrolment ratios, repetition and completion rates) covering pre-primary to tertiary education. They are reported on a roughly five-year basis since 1970 (some countries more frequently). As far as I could see most of this data ends in the late 1990s... but the other data provided by UNESCO (UIS Data Center) begins at the same period - not sure why they didn't bring these together.

The Lynch School of Education at Boston College provides two unique resources for comparative analysis of educational achievements: (i) the Trends in International Mathematics and Science Study (TIMMS), which "is the largest and most ambitious international study of student achievement ever conducted" and has data from 40 countries in 1995 and a partially overlapping sample for three more recent waves (next wave is 2011); (ii) the Progress in International Reading Literacy Study (PIRLS), which has waves in 2001, 2006 and 2011 (forthcoming), evaluating 150,000 fourth graders (9- and 10-year-olds) in thirty-five (2001) and fourty-odd (2006) countries. Some of these are middle-income countries (e.g. TTO, MAR, IND, IRN).

The same database provides IMF data on public spending on education, from 1985-2000 for 147 countries (via Gunilla Petterson).  This module presents the IMF data on public spending on education from 1985-2000 for 147 developing and transition economies (excel sheets). There are two indicators in the module: (1) total public spending on education as a percent of GDP; and (2) total public spending on education as a percent of total government spending. The underlying data, in millions of local currency, are provided. The breakdown of total education spending into current and capital spending are provided when available.

Quite a number of years ago Aaron Benavot, now at SUNY Albany, and Phyllis Riddle, now at St Vincent College, PA, wrote an article entitled The Expansion of Primary Education, 1870-1940: Trends and Issues, which provides new estimates of primary school enrollment rates for 126 nations and colonies from 1870 to 1940. The data is printed in the Appendix and can easily be imported into Excel. The article was published in the journal Sociology of Education, Vol.61(3), July 1988, pp.191-210. [via Masa Kudamatsu's DevEconData blog]

Emma Smith at the School of Education, University of Birmingham provides a number of resources and data links for educational and social research, including Afrobarometer, Asiabarometer, PISA and World Value Survey. Her website acts as a portal for all the sources of secondary data that are listed in her book ('Using Secondary Data in Educational and Social Research', OUP 2008), as well as providing links to new sources and current developments in the field of secondary data analysis.

Human Capital Inequality, basically adjustments to the above Barro-Lee data, is provided on Rafael Domenech's website, covering 134 countries from 1960-1999, based on his work with Amparo Castello. This dataset was mentioned on the brilliant DEVECONDATA blog.

The World Bank provides GenderStats, which basically pulls out the relevant variables from the WDI database. Hit "Create your own query" to access the database. Education/schooling-related variables are often taken as a proxy for gender equality.

UNESCO has data on literacy in their data centre, with data series beginning in the mid-70s or early 80s. There are also lots of variables on schooling, and public funding for schooling. Data on the number of illiterates per cohort is available here for developing countries from 1970 in 5-year intervals.

The OECD provides access to PISA data (Programme for International Student Assessment) for 2000 to 2009 (4 waves). The most recent data wave will be made availabe on 7 December 2010. The data is in SAS, SPSS or Text format and contains student, school and parent information/questions. This is for 30 OECD/high- and middle-income countries. There is a vast number of variables so you had better see for yourself. [via Gunilla Pettersson's developmentdata.org]

Back up to the Table of Contents
Follow me on Twitter @MEDevEcon to get updates

Human Capital (ii): Health & Subjective Well-being

The MARA/ARMA (Mapping Malaria Risk in Africa/Atlas du Risque de la Malaria en Afrique) project has published extensive data related to malaria, including the MARA LITe malaria prevalence data, malaria distribution maps and estimated populations at risk (as 'raw data' and maps); also available are entomological inoculation rates and reported presence/absence of six species of the anopheles gambiae group (to you and me: mosquitoes) in Africa and islands. The website also features a wealth of resources on malaria in Africa.

The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. You can find a range of health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. I had a quick look at the figures for 'Number of people (all ages) living with HIV' which provides full coverage of mortality rate estimates (i.e. extrapolation/interpolation, etc., distinguished by reporting confidence intervals) for 1990-2009 across a very large number of countries. [referred to in a paper by Paul Calu, World Bank, and Falilou Fall, Sorbonne]

Wikiprogress is the official platform for the OECD-hosted Global Project on "Measuring the Progress of Societies" and Wikiprogress.Stat allows users to upload their data and metadata, and to navigate through a robust database of progress indicators. Themes on the website include Ecosystems Condition, Human Well-Being, Economy, Social and Welfare Statistics and Peace. There's a wealth of indicators here (sometimes cross-sectional or limited to a few time-series observations) and the data sources are clearly identified. Available for download to Excel. [Thanks to Angela Costrini Hariche, OECD Development Centre and Statistics Directorate and Project Manager of Wikiprogress]

The Complex Emergency Database (CE-DAT) is an international initiative that monitors and evaluates the health status of populations affected by complex emergencies. CE-DAT is managed by the Centre for Research on the Epidemiology of Disasters (CRED), based at the School of Public Health of the Université catholique de Louvain in Brussels, Belgium. The data is at subnational level (building on over 2,000 surveys) and covers 1998-2010 (with gaps). It can be viewed in table format or as a map.

The Malaria Atlas Project (MAP) was founded in 2005 and is now led by a team based at the University of Oxford, consisting of Professor Simon Hay, Dr Peter Gething, Catherine Moyes, Professor Dave Smith, Dr Kevin Baird at the Eijkman-Oxford Clinical Research Unit in Indonesia, and Dr Andy Tatem at the University of Southampton.
 The project website brings together various data to provide detailed information on malaria risk. Data are often available in the form of map (for individual years) or spreadsheets for panels. 
[This data source was highlighted in a recent paper by Tracy Jones which featured at the annual CSAE conference 2014]

The WHO maintains WHOSIS (Statistical Information System) which has data on mortality, health services coverage, inequities in health care access among other rubrics. Time series begin in 1990 but are not annual.

The World Health Organisation (WHO) offers the Global Health Atlas. "In a single electronic platform, the WHO’s Communicable Disease Global Atlas is bringing together for analysis and comparison standardized data and statistics for infectious diseases at country, regional, and global levels... [The database covers] the major diseases of poverty including malaria, HIV/AIDS, tuberculosis, the diseases on their way towards eradication and elimination (such as guinea worm, leprosy, lymphatic filariasis) and epidemic prone and emerging infections for example meningitis, cholera, yellow fever and anti-infective drug resistance."

The World Health Organisation (WHO) offers the Global Atlas of the Health Workforce, which features two datasets: the first, aggregated dataset "includes estimates of the stock (absolute numbers) and density (per 1000 population) of health workers for up to 9 occupational categories." In the second, disaggregated dataset "estimates of the stock of health workers are available for some countries for up to 18 occupational categories, reflecting greater distinction of some categories of workers according to assumed differences in skill level and skill specialization".

The visualisation folk at Gapminder (including multiple Roslings) provide very convenient access to a lot of demographic and health data (HIV/AIDS, birthrates, cancer, ...) alongside other useful development data (aid, trade, employment). "Gapminder is a non-profit venture – a modern 'museum' on the Internet – promoting sustainable global development and achievement of the United Nations Millennium Development Goals... The initial activity was to pursue the development of the Trendalyzer software. Trendalyzer sought to unveil the beauty of statistical time series by converting boring numbers into enjoyable, animated and interactive graphics... In March 2007, Google acquired Trendalyzer from the Gapminder Foundation and the team of developers who formerly worked for Gapminder joined Google in California in April 2007." Poor chaps: New salary = googol*previous salary? The data commonly span several decades and are available for download in excel format (wide). [Thanks to Christoph Lakner at CSAE for the pointer.]

A bunch of data from the UN DESA - Population Division, including World Contraceptive Use 2010, International Migrant Stock, World Population Prospects, World Urbanization Prospects (very 'open data', these last three: you can pick a max of 5 countries... Muppets)... [Thanks to Jackie Carter for the tweet]

The World Bank’s comprehensive database of Health, Nutrition and Population (HNP) statistics also cover aspects of education and literacy. It offers data on the number of individuals per age cohort. The series nominally begin in 1960 and go to 2006/7 for up to 220 countries, although coverage varies wildly across indicators.

The UN Inter-agency Group for Child Mortality Estimation (IGME) provides various estimates on their website. These can be downloaded in Excel format and cover almost 200 economies with estimates reaching as far back as 1950 - for the U5MR I looked at there were not just median estimates but upper and lower bounds.   

Betsey Stevenson at the Wharton School of UPenn has a bunch of data on subjective well-being, both US and cross-country, which resulted in a couple of papers with her colleague Justin Wolfers. Zipped data is in Stata 9 or 10 format (huge files!).

The Washington-based Center for Global Development (Roodman, Radelet, Subramanian, Birdsall, Clemens and many others) have a link to datasets on their publications website. Highlights include data on African Health Professionals Abroad (Gunilla Petterson worked on this dataset).

Follow me on Twitter @MEDevEcon to get updates

Human Capital (iii): Labour & Demography

The Minnesota Population Center is the "world’s leading developer" of historical and international census demographic data, most of which are focused on the US and Western Europe, although the IPUMS (Intergrated Public Use Microdata Series) International data covers 44 countries using 130 censuses.

The International Labor Organization (ILO) maintains the LABORSTA database. This provides data for up to 200 countries under the rubrics of (un-)employment, wages, strikes and lockouts, as well as international labour migration (among others).

The PBL Netherlands Environmental Assessment Agency provides the History Database of the Global Environment (interestingly, the acronym is HYDE). HYDE presents (gridded) time series of population and land use for the last 12,000 years ! It also presents various other indicators such as GDP, value added, livestock, agricultural areas and yields, private consumption, greenhouse gas emissions and industrial production data, but only for the last century.

A bunch of data from the UN DESA - Population Division, including World Contraceptive Use 2010, International Migrant Stock, World Population Prospects, World Urbanization Prospects (very 'open data', these last three: you can pick a max of 5 countries... Muppets)... [Thanks to Jackie Carter for the tweet]

The visualisation folk at Gapminder (including multiple Roslings) provide very convenient access to a lot of demographic and health data (HIV/AIDS, birthrates, cancer, ...) alongside other useful development data (aid, trade, employment). "Gapminder is a non-profit venture – a modern 'museum' on the Internet – promoting sustainable global development and achievement of the United Nations Millennium Development Goals... The initial activity was to pursue the development of the Trendalyzer software. Trendalyzer sought to unveil the beauty of statistical time series by converting boring numbers into enjoyable, animated and interactive graphics... In March 2007, Google acquired Trendalyzer from the Gapminder Foundation and the team of developers who formerly worked for Gapminder joined Google in California in April 2007." Poor chaps: New salary = googol*previous salary? The data commonly span several decades and are available for download in excel format (wide). [Thanks to Christoph Lakner at CSAE for the pointer.]

The Complex Emergency Database (CE-DAT) is an international initiative that monitors and evaluates the health status of populations affected by complex emergencies. CE-DAT is managed by the Centre for Research on the Epidemiology of Disasters (CRED), based at the School of Public Health of the Université catholique de Louvain in Brussels, Belgium. The data is at subnational level (building on over 2,000 surveys) and covers 1998-2010 (with gaps). It can be viewed in table format or as a map.

Jerry Dwyer at the Federal Reserve Bank of Atlanta provides data from his 2006 Economic Inquiry article with Scott L. Baier and Robert Tamura. This covers output, physical and human capital for 145 countries over a long time horizon (1831-2000); the data provides between 2 and 17 time-series observations per country, with an average of around 7. Additional variables of particular interest include average age and experience of the workforce, which allow for Mincerian wage equation-type analysis at the macro level. The data is provided in a neat excel file with additional information on variable definition and construction also provided (along with the article). 

The UN body which covers trade and investment, UNCTAD, has created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

Follow me on Twitter @MEDevEcon to get updates

Poverty

The people at OPHI (Oxford Poverty & Human Development Initiative) have developed a new poverty index, which is 'multi-dimensional' (MPI). Sabina Alkire and Maria Emma Santos designed the MPI using a technique for multidimensional measurement created by Sabina Alkire and James Foster. OPHI analysed poverty across 78% of the world’s people in 104 developing countries using the MPI and released the results in advance of the 2010 HDR. For now this is sort of a cross-section, available for download in Excel format.

Follow me on Twitter @MEDevEcon to get updates


Migration (incl Tourism and Forced Displacement/Trafficking) and Remittances

Panel Data on International Migration (1975-2000) compiled by Maurice Schiff and Mirja Channa Sjoblom at the World Bank. This includes data on sending and receiving countries, split by 'level of education'. 

The Global Bilateral Migration Database compiled by the World Bank provides "global matrices of bilateral migrant stocks spanning the period 1960-2000, disaggregated by gender and based primarily on the foreign-born concept... Over one thousand census and population register records are combined to construct decennial matrices corresponding to the last five completed census rounds". Data for up to 226 countries can be downloaded into an excel file.

Giovanni Peri of UC Davis has now published the bilateral migration data used in some of his recent work (aka the Ortega-Peri Database). These can be downloaded in Stata format from Giovanni's personal website where the papers are also available. The data cover 1980-2008, 15 migration destinations in the developed world and 221 migration source countries. [Thanks to Chris Parsons at Oxford's International Migration Institute for the pointer; Chris' own efforts have helped to build a decadal bilateral migration matrix which includes developing economies as recipient countries --- this is the dataset in the entry immediately above]

Other World Bank datasets on migration (no long panels, though) are available here.

The World Bank publishes the Migration and Remittances Factbook (2011) as part of the OpenData initiative. This covers inflows and outflows of remittances from 1970 to 2009 (+2010 estimated) for basically all countries in the world (naturally: lots of missing observations, but from the mid-1970s onwards the data coverage is pretty impressive).

The Global Trade Policy Analysis group at the AgEcon Department of Purdue University provides a number of datasets related to trade but also climate change and migration. "The GTAP Data Base is a fully documented, publicly available global data base which contains complete bilateral trade information, transport and protection linkages among 113 regions for all 57 GTAP commodities for a single year (2004 in the case of the GTAP 7 Data Base)." Single academic user licenses for GTAP 7 are $520, but a large number of free datasets (including summaries of GTAP, Social Accounting Matrix [SAM] extraction, the Global [bilateral] FDI Dataset, Project on Bilateral Labor Migration, CO2 emissions) can be found here.

The International Labor Organization (ILO) maintains the LABORSTA database. This provides data for up to 200 countries under the rubrics of (un-)employment, wages, strikes and lockouts, as well as international labour migration (among others).

The UN body which covers trade and investment, UNCTAD, has created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

The UN High Commissioner for Refugees (UNHCR) publishes a statistical yearbook, covering "Trends in Displacement, Protection and Solutions". It contains statistics on refugees, asylum-seekers, internally displaced persons (IDPs), returnees (refugees and IDPs), stateless persons, among others. From 2000 to 2009 these reports include excel files for download, from 1994-1999 the data tables are contained in pdf files.

A bunch of data from the UN DESA - Population Division, including World Contraceptive Use 2010, International Migrant Stock, World Population Prospects, World Urbanization Prospects (very 'open data', these last three: you can pick a max of 5 countries... Muppets)... [Thanks to Jackie Carter for the tweet]

Seo-Young Cho (Goettingen), Axel Dreher (Heidelberg) and Eric Neumayer (LSE) have created the 3P Anti-Trafficking Policy Index and a dedicated website. Sub-indices cover three policy dimensions: Prosecution, Prevention, Protection; score 1 (worst) 5 (best). Annual data are available for up to 177 countries over the 2000-2009 period.

Louis Putterman at Brown University provides another historical dataset, the World Migration Matrix (1950-2000), detailing for each of 165 countries "the proportion of the ancestors in 1500 of that country's population today that were living within what are now the borders of that and each of the other countries." There's a lot of documentation provided to reference all these estimates.

It sits a little awkward alongside economic migration, but the UN WTO (World Tourism Organisation) provides data on headcount and spending of tourists from 1995-2008 for around 90 countries on the UNdata website.

Follow me on Twitter @MEDevEcon to get updates

Entrepreneurship, SMEs, Privatisation and Business Environment

On his website Thorsten Beck at Tilburg University provides access to data on small and medium enterprises (SME) share of employment in firms with less than 250 employees in manufacturing.

The Global Entrepreneurship Monitor (GEM) research program is an annual assessment of the national level of entrepreneurial activity. Data is collected for 'activity', 'aspirations', and 'attitudes and perceptions' (multiple variables under each rubric). Started as a partnership between London Business School and Babson College, it was initiated in 1999 with 10 countries, expanded to 21 in the year 2000, with 29 countries in 2001 and 37 countries in 2002. GEM 2009 is set to conduct research in 56 countries. GEM data for 1999 - 2006 is currently in the public domain. Full GEM datasets are made available to the public three years after the end of an annual data collection cycle. As such, GEM 2007 data will be made available to the public in January 2011. The data is in SPSS format.

The World Bank World Business Environment Survey (WBES 2000) was administered to enterprises in 80 countries in late 1999 and early 2000, using a standard core enterprise questionnaire methodology. This comprehensive survey of over 10,000 firms covers enterprise responses to multiple questions on the investment climate and business environment as shaped by domestic economic policy; governance; regulatory, infrastructural and financial impediments, as well as assessments of public service quality. There is no access to the raw data from this website, so you will have to go through the variable and sample selection process and then ask for the data in spreadsheet format.

The World Bank Doing Business project 'provides objective measures of business regulations and their enforcement across 181 economies and selected cities at the subnational and regional level.' The raw data for these surveys (run from 2004 onwards but with varying coverage for individual countries) is available via summary reports, which can then be accessed in excel.

The World Bank offers some statistics and details on privatisation transactions in excess of $1million within developing countries. The searchable database is for 2000-2007, but there is also a link to an Excel spreadsheet for the period 1988-1999. Apart from summary statistics on deal value etc., this resource provides transaction-level information (name of the company, sector, year and value of the deal) for developing and emerging economies. There is also a link for this and other World Bank data (e.g. infrastructure, Doing Business) to be mapped on a global scale - unfortunately these google-map based charts do not use standard colouring-in of countries but labels instead, which means they're not that helpful (plus: they cannot be exported). However, I imagine the new World Bank data mapper will supersede this tool very soon.

Labour Regulation data: Andrei Shleifer's website provides links to a number of datasets he has compiled and used with various co-authors. This includes 'Private Credit in 129 Countries' (JFE 2007, with S. Djankov and C. McLiesh), with data from 1978-2002 and data on the 'unofficial economy' (primarily cross-section data).

Follow me on Twitter @MEDevEcon to get updates

Macro Stability, Business Cycles, Banking, Finance and Financial Crises

For Public Finance see section on Social Security, Taxes and the State.

The IADB website hosts the data used in the work on trade intensity and business cycles by César Calderón, Alberto Chong and Ernesto Stein (2006, JIE). From the abstract: "Using annual information for 147 countries for the period 1960-99 we find that the impact of trade intensity on business cycle correlation among developing countries is positive and significant, but substantially smaller than that among industrial countries. Our findings suggest that differences in the responsiveness of cycle synchronization to trade integration between industrial and developing countries are explained by differences in the patterns of specialization and bilateral trade." 

A New Database on Financial Development and Structure (1960-2007), produced by Thorsten Beck, Asli Demirguc-Kunt and Ross Levine, is available on the World Bank website . This is the updated version (April 2010) and provides indicators of financial development and structure (in total 22 variables) across countries (211 countries listed, but there are of course missing observations) and over time.

Systemic Banking Crises: A New Database (1970-2007) is presented by Luc Laeven and Fabian Valencia in their IMF working paper No. 08/224. This paper presents a new database on the timing of systemic banking crises and policy responses to resolve them. The database covers the universe of systemic banking crises for the period 1970-2007, with detailed data on crisis containment and resolution policies for 42 crisis episodes, and also includes data on the timing of currency crises and sovereign debt crises. The database extends and builds on the Caprio, Klingebiel, Laeven, and Noguera (2005) banking crisis database, and is the most complete and detailed database on banking crises to date.

The personal website of Luc Laeven (Deputy Division Chief in the Research Department of the International Monetary Fund and Full Professor of Finance at CentER, Tilburg University) carries a number of interesting datasets for cross-country analysis, including the 'Banking Crisis Database (2010)', Crisis resolution database and Deposit Insurance Database, together with some papers he's written describing and analysing the data. [Thanks to my buddy Andrea Presbitero at Università Politecnica delle Marche for the pointer]

On his website Thorsten Beck at Tilburg University provides cross-section data (2003) on access and use of banking services across 99 developing and developed countries: number of branches, ATMs, loans, deposits. This is from joint work with A.Demirgüç-Kunt and M. Martinez Peria. Thorsten also has panel data on financial development (private credit) for up to 72 countries, from work with A. Demirgüç-Kunt and R. Levine.

The World Bank (Cihák, Demirgüç-Kunt, Feyen & Levine) provides the Global Financial Development Database (GFDD) which covers 1960-2010 for 203 countries. "The Global Financial Development Database is based on this 4x2 framework. It builds on, updates, and extends previous efforts, in particular the data collected for the “Database on Financial Development and Structure”, the Financial Access Survey, the Global Findex and Financial Soundness Indicators. The database includes measures of (a) size of financial institutions and markets (financial depth), (b) degree to which individuals can and do use financial services (access), (c) efficiency of financial intermediaries and markets in intermediating resources and facilitating financial transactions (efficiency), and (d) stability of financial institutions and markets (stability). The dataset can be used to document cross-country differences and time series trends." Data can be downloaded in an Excel file and there is additional documentation.

Scott Baker and Nick Bloom, both at Stanford University, provide all the quarterly stock market returns and disaster data (in Stata) for 60 countries (including a considerable number of emerging economies) over the 1970-2010 time period used in their uncertainty and growth paper. They also provide the do-files to replicate all empirical results in the paper.

The companion website to Carmen Reinhart and Ken Rogoff's This time is different: Eight centuries of financial follies provides access to all the great data they compiled for their research. Topics covered include very long time series for debt/GDP ratio, inflation, exchange rate regimes and many more.

The World Bank provides (in collaboration with the IMF) the Quarterly External Debt Statistics - these come in two variants, the general (GDDS) and special (SDDS) dissemination. Currently, sixty countries have agreed to participate in the SDDS/QEDS database and forty-two Low-Income Countries (LICs) to provide data to the GDDS/QEDS database. Data begins in 1998.

The Joint External Debt Hub (JEDH - pronounced Jedi?) — jointly developed by the Bank for International Settlements (BIS), the International Monetary Fund (IMF), the Organization for Economic Cooperation and Development (OECD) and the World Bank (WB) — brings together external debt data and selected foreign assets from international creditor/market and national debtor sources. The JEDH replaces the (above) Joint BIS-IMF-OECD-WB Statistics on External Debt and brings together 34 data series from the above institutions. Coverage starts in 1990 and can be up to quarterly, for all countries in the world, although this depends on the variable (e.g. 'International Reserves' seemed to have a very good [from 1990] and quarterly coverage but other variables start later, are less frequent and less broad in country-terms). [This dataset was featured in an article by Sarah Bracking in the Google magazine thinkquarterly]

Christoph Trebesch at Munich University (LMU) provides data on debt restructuring episodes from 1950-2010 from a research project with Udaibir Das and Michael Papaioannou (IMF). The data can be downloaded in Excel format and provides information on the timing (month/year) of the restructuring, amount, etc. Over 600 episodes are recorded. An accompanying IMF working paper provides details on concepts and reviews the existing literature.

The World Bank's new International Debt Statistics are now available as part of the institution's Open Data Initiative: "high frequency, quarterly, external and public debt data for both high-income and developing countries collected and compiled by the World Bank in partnership with the International Monetary Fund. Now users can not only examine trends in debt flows within the developing world, but also take a closer look at the external debt of high-income countries, and develop a more complete understanding of global financial flows". Picking the standard measure of external debt burden (in % of GNI) I found data from 1970 to 2011 for around 140 countries (unbalanced). A large number of more differentiated data are available, with varying time series and cross-section coverage.

The Inter-American Development Bank provides data on Bank Ownership and Bank Performance covering 119 countries over the 1995-2002 period. The methodology used to generate the data is described in Micco, Panizza and Yanez (2004) "Bank Ownesrhip and Performance," IDB-RES working paper No. 518.

The Chinn-Ito index (KAOPEN) is an index measuring a country's degree of capital account openness. The index was initially introduced in Chinn and Ito (Journal of Development Economics, 2006). KAOPEN is based on the binary dummy variables that codify the tabulation of restrictions on cross-border financial transactions reported in the IMF's Annual Report on Exchange Arrangements and Exchange Restrictions (AREAER). The dataset is available in the Excel or STATA format. The data file contains the Chinn and Ito index series for the time period of 1970-2007 for 182 countries.  [Thanks to Malgorzata Sulimierska at Sussex University for the link]

Huang Yongfu at Cambridge's Land Economy department has some links to datasets on Financial Developments as well as other resources on the topic (researchers in the area, papers).

The IMF has a new database reporting the access to basic consumer financial services worldwide. At present this data covers 138 economies, nominally for the period 1998-2009, although most countries only have data from 2004 onwards. Annual information covers the reported use of banking services and access to banks' physical outlets. The data for all countries and time periods cam be downloaded as an Excel file. [via economicslinks]

James R. Barth, Gerard Caprio, Jr. and Ross Levine (Auburn, Williams, UC Berkeley) have compiled data on Bank Regulation and Supervision in 180 Countries from 1999 to 2011. "[T]he measures are based upon responses to hundreds of questions, including information on permissible bank activities, capital requirements,  the powers of official supervisory agencies, information disclosure requirements, external governance mechanisms, deposit insurance, 
barriers to entry, and loan provisioning. The dataset also provides information on the organization of regulatory agencies and the size, structure, and performance of banking systems. Since the underlying surveys are large and complex, we construct summary indices of key bank regulatory and supervisory policies to facilitate  cross-country comparisons and analyses of changes in banking policies over time." Me-thinks: Was there no banking regulation before 1999? 

A team of researchers at the IMF and World Bank (Asli Demirgüç-Kunt, Edward Kane and Luc Laeven) have updated some previous database on deposit insurance at private banks: they asked "national officials for information on capital requirements, ownership and governance, activity restrictions, bank supervision, as well as on the specifics of their deposit insurance arrangements." In addition to the data in excel format the link provides access to a working paper which discusses construction and provides some descriptives. Coverage is for over 100 countries in the years 2003, 2010 and 2013. [h/t Andrea Presbitero]

Stijn Claessens and Neeltje van Horen from De Nederlandsche Bank have compiled a database with ownership information for 5,324 banks active in 137 countries over the period 1995-2009 (year of establishment can be earlier and is recorded; Banca Monte dei Paschi di Siena is the oldest in this database, established 1472).  "It includes for each bank its year of establishment, its year of inactivity, its ownership (foreign or domestic) and if foreign owned the home country of the majority shareholder." Downloadable as an Excel file. Sadly no information about state-share of ownership (e.g. recent nationalisation following the global financial crisis). For detailed description of the database, see Claessens and Van Horen, 2013, "Foreign banks: Trends and impact", Journal of Money, Credit and Banking, forthcoming. [Thanks to my buddy Andrea Presbitero at Università Politecnica delle Marche for the pointer]

Sofronis Clerides, Manthos D. Delis and Sotirios Kokas from the Department of Economics, University of Cyprus (Delis is at the University of Surrey) have created a dataset for the estimated degree of competition in the banking sectors of 148 countries over the period 1997-2010. The dataset is contained in tables at the end of their working paper, so a bit of copy and pasting will do the job. You'll find some relevant work on banking regulation and competition (including a 2012 JDE paper) on Delis' personal website.

Robert Barro at Harvard University provides a number of datasets on macroeconomic disasters related to his own research work on his website. This includes GDP and consumption time series for developing and developed economies from the late 19th century onward. The data can be downloaded in Excel format and related working papers are provided in a separate section of the website.

The Global Financial Inclusion (Global Findex) Database is a project funded by the Bill & Melinda Gates Foundation to measure how people in 148 countries --- including the poor, women, and rural residents --- save, borrow, make payments and manage risk. The dataset has been compiled by Leora Klapper and Asli Demirguc-Kunt of the World Bank and can be downloaded from the World Bank Open Data website (there are a total of 517 indicators for a max of 164 countries --- at the moment this is for 2011 only).

The Bank for International Settlements (BIS) "has constructed long series on credit to the private non-financial sector for 40 economies, both advanced and emerging. Credit is provided by domestic banks, all other sectors of the economy and non-residents. The 'private non-financial sector' includes non-financial corporations (both private-owned and public-owned), households and non-profit institutions serving households as defined in the System of National Accounts 2008. In terms of financial instruments, credit covers loans and debt securities." The data is quarterly from 1940 to 2012 (unbalanced panel) and can be downloaded in excel format alongside detailed documentation. [Thanks to my buddy Andrea Presbitero at Università Politecnica delle Marche for the pointer]

$$ The new IMF Financial Soundness Indicators provide information on the health of the entire sector of financial institutions, but also the counterpart corporate and household sectors, and of relevant markets. So far 64 countries have committed to participate, with the frequency of data left to the discretion of the countries. The database contains 12 core indicators, including variables like "Nonperforming loans to total gross loans" or "Return on Assets", with the data (at present) provided in excel spreadsheets. Further new databases from the IMF include the Coordinated Direct Investment Survey (CDIS, from December 2010, dyadic data on inward and outward investment) and the Coordinated Portfolio Investment Survey (CPIS, covers equity securities, long- and short-term debt securities broken down by economy of residence of the issuer of the securities).

$$ Another IMF database are the High Frequency Government Statistics, which has annual, quarterly and (for some variables and countries) monthly data on the balance sheet. This is included in the International Financial Statistics.

Follow me on Twitter @MEDevEcon to get updates

Social security, Public Expenditure, Taxes and the State

David Canning (Harvard) provides social security data for 57 countries over the period 1961-2002 (not annual) on his personal homepage. The data is available in MS Excel 2003 format. As is explained in the documentation of the dataset, social security systems are highly complex and vary greatly across countries, so the authors (this data was used for an article by Bloom, Canning, Mansfield and Moore, Journal of Monetary Economics, 2007) explain in great detail how they constructed the four measures they provide.

The Statistics of Public Expenditure for Economic Development (SPEED) database, compiled by Washington-based IFPRI (International Food Policy Research Institute - their name somewhat undersells the wide-ranging research carried out in the institution), provides the most comprehensive and publicly available public expenditure information for 147 countries in the following sectors: agriculture, education, health, defense, social protection, mining, transport and communication (as well as these two sectors separately), and total expenditure for the period of 1980-2010. Data is downloadable as a vast Excel spreadsheet with lots of additional documentation. [Thanks to my friend and recent IMF recruit La-Bhus Fah Jirasavetakul for the pointer]

The World Tax Indicators (WTI) at the International Center for Public Policy, Georgia State University, offers extensive coverage of the Personal Income Tax (PIT), Corporate Income Tax (CIT), and Value Added Tax (VAT)/ Retail Sales Tax (RST) with greater year, country, and tax category coverage than is currently offered via existing data portals. The WTI uses the raw data to develop several tax indicators such as time varying measures of PIT structural progressivity and tax complexity and offers a large representative dataset with variables that are consistent within countries over time. PIT data is already available for download as Excel or Stata files (175 countries or more, depending on measure; up to 25 years of data), including substantial documentation (brief registration required). 

The World Tax Database is provided by the Ross School of Business, Michigan University. Variables include Tax Revenue and Tax Rates. Data is from 1974 to 1999.

The Quality of Government Institute at the University of Gothenburg publishes the QoG Dataset in Stata, SPSS and csv format. "The aim of the QoG Social Policy Dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and quality of government (QoG). To accomplish this we have compiled a number of freely available data sources, including aggregated public opinion data." There are three versions: (1) a cross-section with global coverage (2002); and two panels for 40 countries either annual (1946-2009) or 5-yearly (1970-2005). The topics covered are Social policy, Tax system, structural conditions for social policy, Public opinion, Political indicators and Quality of government. This is now also provided in Stata format via two user-written commands.

A recent article by Simeon Djankov and co-authors in the AEJ: Macro comes with cross-section data on effective corporate income tax rates in 85 countries (2004). "The data come from a survey, conducted jointly with PricewaterhouseCoopers, of all taxes imposed on "the same" standardized mid-size domestic firm." The authors provide the data in Stata format, together with a do-file.

Mariya Aleksynska and Martin Schindler at the IMF have created a new database of labor market regulations covering 1980-2005 in 91 countries, including low-, middle- and high-income countries. The database contains information on unemployment insurance systems, minimum wage regulations, and employment protection legislation. [Thanks to my former PhD colleague Bob Rijkers, now at the World Bank, for the link].

The Inter-American Development Bank provides the Public Debt around the World database, which includes complete time-series of central government debt for 89 countries over the 1991-2005 period and for seven other countries for the 1993-2005 period. The data (both in STATA and EXCEL format) is described in Dany Jaimovich and Ugo Panizza (2006) "Public Debt around the World: A New Dataset of Central Government Debt" which is included in the zipped folder.

Follow me on Twitter @MEDevEcon to get updates

Infrastructure and logistics

A Research Database on Infrastructure Economic Performance (1980-2002), compiled by Antonio Estache and Ana Goicoechea at the World Bank. The time-series nominally begins in 1980, but for many variables data is only available from 1990 onwards.

David Canning at Harvard has been working on infrastructure datasets for the past decade(s). His most recent (2007) offering is the World Infrastructure Stock Data for 1950-2005, which covers 185(!) countries and is supplied in excel files. The main variables contain information on rail and road networks, telephone lines and electricity generating capacity - of these the telephone data has broadest coverage.

The World Road Statistics (1963-1989) were compiled by the International Road Federation for up to 196 countries and are available on the World Bank website.

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

The World Bank has a dedicated website for the Africa Infrastructure Country Diagnostic (AICD) which combines data collection and analysis on the status of the main network infrastructures. "The AICD database provides cross-country data on network infrastructure for nine major sectors: air transport, information and communication technologies, irrigation, ports, power, railways, roads, water and sanitation." This is a relatively young data collection effort, with only a few years of data available at the time of writing. Download is via the WB's excellent open data system (view data, download as excel or CSV).

Columbia University hosts Nasa's Socioeconomic Data and Applications Center (SEDAC) which provides gROADS, the Global Roads Open Access Data Set (1980 – 2010): "an open access, well documented global data set of roads between settlements using a consistent data model (UNSDI-T v.2) which is, to the extent possible, topologically integrated." [Via the Norwegian Social Science Data Services MacroDataGuide]

The African Development Bank provides the Africa Interactive Infrastructure Atlas in form of PDFs and GIS maps, covering ICT (International gateways, Backbone networks, and GSM coverage), Power (Power plants and Transmission network) and Transport (Airports and air traffic, Ports and sea traffic, Roads (condition and traffic), and Railways). For the interactive PDFs you need to download and open them in Acrobat Reader - the interactive element won't work in your web-browser. Raw data and models employed to create the maps are also available for download.

The UN body which covers trade and investment, UNCTAD, has created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

The World Bank has a Logistics Performance Index (LPI), covering 150 countries. This is cross-section data only. 



Follow me on Twitter @MEDevEcon to get updates

Geography, GIS, Climate & Environment

Geography Datasets, compiled by John Gallup, Andrew Mellinger, and Jeff Sachs, available at Harvard CID. These contain cross-country data on Infectious Diseases, Physical geography and population, Köppen-Geiger Climate zones (The H zone is highland, E is for polar regions), and Agricultural Measures (including soil quality and the share of arable land in each Köppen-Geiger zone). Naturally, this data is a single cross-section.

The G-Econ research project at Yale University is devoted to developing a geophysically based dataset on economic activity for the world. The current data set (GEcon 3.3) is now publicly available and covers "gross cell product" for all regions for 1990, 1995, 2000, and 2005 and includes 27,500 terrestrial observations. The basic metric is the regional equivalent of gross domestic product. Gross cell product (GCP) is measured at a 1-degree longitude by 1-degree latitude resolution at a global scale. Updates will be posted as they become available. The project director is Professor William Nordhaus, Yale University. The GEcon 3.4 (Aug 2010) spreadsheet has over 27,000 entries (cells) and there are at least two time-series data points for GCP of various types (non-mineral, mineral) and more for cell population. [via Masa Kudamatsu's DEVECONDATA blog]

Data on Geodesic Distances are provided by the Centre D'Etudes Prospectives et D'Information Internationales (CEPII). A first dataset incorporates geographical variables for 225 countries in the world, including the geographical coordinates of their capital cities, the languages spoken in the country under different definitions, a variable indicating whether the country is landlocked, etc. A second dataset is dyadic, i.e. includes variables valid for pairs of countries (bilateral distance, among others).

The Center for International Earth Science Information Network (CIESIN) at Columbia's Earth Institute provides dozens of datasets under the headlines of Agriculture, Biodiversity & Ecosystems, Climate Change, Economic Activity, Environmental Assessment & Modeling, Environmental Health, Environmental Treaties, Indicators, Land Use (LU)/ Land Cover (LC) and LU/LC Change, Natural Hazards, Population, Poverty, Remote Sensing for Human Dimensions Research. The overarching theme for all datasets is environment and climate (change). Since not all data are accessible from the website there's a separate page for downloadable data.

A team headed by Cort J. Willmott at the University of California, Los Angeles has put together a website with four to five decades of data on Monthly Air Temperature, Monthly Total Precipitation, Monthly Terrestrial Water Budgets and Monthly Moisture Indices. You'll need some help from a GIS person to get the data transformed. 

The Minimum Distance Data provided by Kristian Skrede Gleditsch at Essex University are based on the list of states outlined in Gleditsch, Kristian S. & Michael D. Ward. 1999. "A Revised List of Independent States since 1816," International Interactions 25:393-413. This list differs in certain ways from the list of system members maintained by the COW project and labels used by the COW project. This data source requires that you have a copy of Perl on your system (Kristian provides links and help). You may also be interested in data on distance between capital cities, available on the same website.

The US National Oceanic and Atmospheric Administration’s (NOAA) National Geophysical Data Center (NGDC) provides data on nighttime light from 5 different satellites covering 1992 to 2009 (Version 4). "Each satellite observes every location on the planet (between 65 degrees S latitude and 65 degrees N latitude) every night at some time between 8:30 and 10:00pm. Using night lights during the dark half of the lunar cycle in seasons when the sun sets early removes intense sources of natural light, leaving mostly man-made light. Readings affected by auroral activity (the northern and southern lights) and forest fires are also removed both manually and using frequency filters." There are a total of 30 files, each a zipped folder of 300MB. [The above quote is taken from Henderson, Storeygard and Weil (2008) Measuring Economic Growth from Outer Space, who use a previous version of the data; see also next entry]

The Food and Agriculture Organisation (FAO) of the UN publishes AquaStat which represents a "global information system on water and agriculture, developed by the Land and Water Division. The main mandate of the programme is to collect, analyze and disseminate information on water resources, water uses, and agricultural water management with an emphasis on countries in Africa, Asia, Latin America and the Caribbean." A bit more specifically, the main Aquastat database reports 70 variables under the headings 'Land use and population', 'Climate and water resources', 'Water use (by sector and by source)', 'Irrigation and drainage development' and 'Environment and health' for 5-year intervals from 1958-1962 onwards for a large number of countries. Other databases include the excellent 'Geo-referenced database on dams' and data on 'River sediment yields'. The data can be exported in CSV format. 

Adam Storeygard, formerly an Economics PhD at Brown but soon at Tufts (update 2012!), has a selection of GIS Global Spatial Datasets on a dedicated website. Categories include administrative boundaries, population and other demographic indicators, economic indicators, data related to agriculture, infrastructure, climate and terrain. He's also put together some Miscellaneous notes and resources on learning GIS for beginners, building on his own experience of working with GIS data.

The World Bank has recently published its annual World Development Report, which this year focuses on Conflict, Security and Development. A dedicated website makes the data underlying the analysis in the report easily accessible. The excel spreadsheet covers a total of 211 countries, with maximum coverage over the years 1960-2009. The data is not limited to conflict and political economy issues but also covers geography, colonial history and foreign aid among other topics. All of the data is publicly available (and many datasets are featured here on MEDevEcon), but the unique advantage here is bringing a vast number of conflict-related data from dozens of sources (PRIO, UNHCR, Polity IV, etc.) together in a single spreadsheet (and doing a great job documenting the data and sources.

"In a world of secrets and closed access to data, it comes as a pleasant surprise to discover that there is a huge quantity of data available to anyone, free of charge. This data has complete world coverage, and an astonishing range of data types all gathered together in one package": Vector Map (VMap) Level 0, provided by mapAbility.com. The VMap Level 0 database provides worldwide coverage of vector-based geospatial data which can be viewed at 1:1,000,000 scale, i.e. 1cm=10km. "Need the national coastlines, elevation contours, roads and railways for any country you can think of? They are there, of course. Populated places, administrative boundaries, inland waterways? There too. But how about the more obscure data types - Lighthouse, Fish Farm, Cease-Fire Line, Oasis, Wharf, Communication Tower? All there as well." [via DEVECONDATA by Masa Kudamatsu]

As part of AQUASTAT the Food and Agriculture Organisation (FAO) of the UN provides databases for over 1,300 dams in Sub-Saharan Africa and over 1,100 dams in the Middle East/Central Asia (excel files with substantial documentation). Each dam is geo-referenced and additional information includes dam height, capacity, reservoir area, river, nearest city, among others.

A vast number of geo-spatial datasets including the Gridded Population of the World and Global Earthquake Hazard Distribution are linked by the Socio-Economic Data and Applications Center (SEDAC) at Columbia University. [thanks to Yanos Zylberberg at PSE for the link]

The U.S. National Oceanic and Atmospheric Administration's National Geophysical Data Center (NGDC) provides "geophysical data from the Sun to the Earth and Earth's sea floor and solid earth environment, including Earth observations from space". This includes data on natural hazards, such as the 'Global Significant Earthquake Database, 2150 B.C. to present' and 'The Significant Volcanic Eruption Database' among others. Other intriguing categories for data are 'Space Weather' and 'Bathymetry' (the study of underwater depth of lake or ocean floors). Download as ArcIMS interactive maps, tab-delimited data files or just plain-old html.

The Global Trade Policy Analysis group at the AgEcon Department of Purdue University provides a number of datasets related to trade but also climate change and geography. "The GTAP Data Base is a fully documented, publicly available global data base which contains complete bilateral trade information, transport and protection linkages among 113 regions for all 57 GTAP commodities for a single year (2004 in the case of the GTAP 7 Data Base)." Single academic user licenses for GTAP 7 are $520, but a large number of free datasets (including summaries of GTAP, Social Accounting Matrix [SAM] extraction, the Global [bilateral] FDI Dataset, Project on Bilateral Labor Migration, CO2 emissions) can be found here.

The Global Environment Monitoring Unit (GEM), one of seven scientific units that make up the Institute for Environment and Sustainability (IES) at the European Commission's Joint Research Centre (JRC) [so I could have just said: The EC] provides a large number of geo-spatial datasets. Topics include land cover, biodiversity and fire (Global AVHRR fire probability map 1982-1999). One of the gems of this collection (no pun intended) is the global map of accessibility which charts travel time to major cities. [via DEVECONDATA by Masa Kudamatsu]

The Yale/Columbia Environmental Sustainability Index (ESI) is a measure of overall progress towards environmental sustainability. The index provides a composite profile of national environmental stewardship based on a compilation of indicators derived from underlying datasets. They provide access to the ESI data and associated maps for the Pilot 2000, 2001, 2002 and 2005 versions of the ESI. "Note that because of data and methodological improvements to each subsequent version of the ESI, the country scores cannot be utilized in time-series analysis."
[Via the Norwegian Social Science Data Services MacroDataGuide]

Gridded Population of the World (version 3), constructed by the Center for International Earth Science Information Network (CIESIN) at Columbia University, provides spatial data on population around the world in 1990, 1995, 2000 with 2.5 arc-minute grid resolution [thanks to Masa at DEVECONDATA for reporting this link].

The Center for Geographic Analysis at Harvard University in collaboration with Shanghai's Fudan University provides a large number of historical GIS 'maps' for China: once mastered (no simple task) this type of Geographical Information Systems (GIS) data allows for spatial analysis of Chinese development. You need to register but access is free, data is in shapefiles or xls or Access (depending on the dataset). There are a large number of datasets from the days of the Legalists and Qin Shihuang (221 BC) to the 1990s (AD).

The World Shipping Register provides free access to their World Sea Ports database. For each country its ports' longitude, latitude and time zone are provided, for some port the maximum draft is also provided. Given the geospatial information this data could be used to calculate distance to closest port. 

Nils Weidmann, Jan Ketil Rød and Lars-Erik Cederman from ETH in Zurich have created GREGthe 'Geo-referencing of ethnic groups' dataset employs geographic information systems (GIS) to represent group territories as polygons, covering a total of 8969 polygons, provided in ESRI shapefile format.

The Fractionalisation dataset, compiled by Alberto Alesina and associates, measures the degree of ethnic, linguistic and religious heterogeneity in various countries. Covering 215 countries (past and present) the dataset contains only one observation for each country. The language and religion indices are based on data from 2001. Most of the data used to compute the ethnic fractionalisation index are from the 1990s, but for some countries older data are used (as far back as 1979). [Via the Norwegian Social Science Data Services MacroDataGuide]

The PBL Netherlands Environmental Assessment Agency provides the History Database of the Global Environment (interestingly, the acronym is HYDE). HYDE presents (gridded) time series of population and land use for the last 12,000 years ! It also presents various other indicators such as GDP, value added, livestock, agricultural areas and yields, private consumption, greenhouse gas emissions and industrial production data, but only for the last century.

The Global Runoff Data Centre at BfG (sadly not the Big Friendly Giant but the German Bundesanstalt für Gewässerkunde) provides access to data on global hydrological data. "The initial dataset of monthly river discharge data over a period of several years around 1980 was supplemented with the 'UNESCO monthly river discharge data collection 1965-85'. Today the database comprises discharge data of more than 7.000 gauging stations from all over the world. Since 1993 the total number of station-years has increased by a factor of around 10." 'Standard services' include Freshwater Fluxes into the World Oceans, Major River Basins of the World and Long-Term Mean Monthly Discharges. [this data features in the work by Abhishek Chakravarty at UCL/Essex.]

The World Wildlife Fund (WWF) hosts the global lakes and wetlands database (GLWD) which has been developed in partnership with the Center for Environmental Systems Research, University of Kassel, Germany. It is available for download as three separate ArcView layers (two polygon shapefiles and one grid). [via DEVECONDATA by Masa Kudamatsu]

Matthew Ciolek at Australian National University edits the site for the Old World Trade Routes (OWTRAD) Project: "This site supports online research in the field of dromography and provides a public-access electronic archive of geo/chrono-referenced data on land, river and maritime trade routes of Eurasia and Africa during the period 10,000 BCE - circa 1820 CE." The files are published in CSV, MapInfo and Google Earth (KML) formats, downloadable by region. There's also a link to the Trade Routes Resources blog [via Masa Kudamatsu's DevEconData blog]

More gravity data from Jon Haveland: great circle distance between capital cities for 176 countries, provided on his website. He also offers contiguity data, i.e. information on which countries share a common border or a small body of water border, for 176 countries (including the GDR and other Soviet Bloc countries). Finally: language data for 176 countries. All of these files are text files.

The Climatic Research Unit at the University of East Anglia (much in the news recently) has longitudial data on Temperature and Precipitation (both gridded geospatial) - if you know how to whittle these massive datasets down to national data, they might be quite useful.

Michael E. Mann, Raymond S. Bradley, and Malcolm K. Hughes provide the data to go with their 1998 Nature article entitled 'Global-Scale Temperature Patterns and Climate Forcing over the past Six Centuries'. There are annual grid-ed temperature data for 1730-1980 and even longer time series going back to the 1400s. [Thanks to James Fenske at Oxford for pointing out this database]

Diego Puga at the Madrid Institute for Advanced Studies (IMDEA) provides data on 'terrain ruggedness' (the Terrain Ruggedness Index was originally devised by Riley, DeGloria, and Elliot (1999) to quantify topographic heterogeneity in wildlife habitats providing concealment for preys and lookout posts) which is used in a paper of Diego's with Nathan Nunn. The data (which also includes some other geographical variables) is in Stata format.

The NASA Goddard Space Flight Center provides various data for the Global Precipitation Climatology Project (GPCP). Most interesting should prove the Global Monthly Merged Precipitation Analyses of GPCP available 1979-present day.

UNEP has a convenient database offering access to a large number of datasets from the World Bank and UN organisations - the vast majority of these seem to be freely accessible. They have extensive data on Geography & Environment, including some geospatial datasets. 

The Veterinärmedizinische Universität Wien provides detailed information on the Köppen-Geiger climate classification for the world: "Based on recent data sets from the Climatic Research Unit (CRU) of the University of East Anglia and the Global Precipitation Climatology Centre (GPCC) at the German Weather Service, we present here a new digital Köppen-Geiger world map on climate classification for the second half of the 20th century." Data in ASCII, as well as shape format and as grid file for GIS software.

The International Center for Tropical Agriculture (CIAT) has produced a Crop Atlas of the World: "The map shows derived estimates of the spatial distribution and productivity of crops for 10-km grids using a novel allocation approach involving the fusion of sub-national crop production statistics. The values in this digital [map] are the number of harvested hectares within each 10 km grid cell. This data includes area harvested in multiple season (therefore this is NOT the physical harvested area, but rather the total area harvested) [...] The sub-national crop production data comes from agricultural censuses and surveys and has scaled values, so as to obtain national production estimates that were compatible with the annual average FAO national crop statistics for 1999-2001. The prototype crop distribution database used in this study is available from the authors upon request but is currently being regenerated using newer and additional data sources (including revisions based on expert validation) and an enhanced allocation algorithm." If you have Google Earth you can look at these data maps.

The Norwegian Centre for the Study of Civil War within the International Peace Research Institute, Oslo (PRIO) has a number of datasets relating to geography and resource endowment. These include data on diamond resources, petroleum resources, length of international boundaries, minimum distance between countries, and data on rivers/river basins shared between countriess.

Cross-country data on the environment is covered in one of the datasets provided by Andrew Rose of Haas Business School, UCB. This is the dataset associated with the "Is Trade Good or Bad for the Environment? Sorting out the Causality" paper and has information on emissions, bird species, threatened species, water quality, and many more variables. Andrew also provides a huge number of geographical data and other standard and non-standard cross-country regression type variable (religion, institutions, diversity,...) from 1960 to 2000 for up to 208 countries (decadal data). This can be found in 'data12.dta' in the files associated with the "Size Really Doesn’t Matter: In Search of a National Scale Effect" paper.

The WHO Collaborating Centre for Research on the Epidemiology of Disasters (CRED) maintains an International Emergency Events Database called EM-DAT which covers all sorts of disasters, including natural ones (drought, floods, insect infestation), from the early 20th century to the present day. The data can be presented on an annual basis for 'people affected', 'injured', 'homeless' or 'deaths' from the distaster-type specified and downloaded in excel format.

Sea-level rise and storm-surge intensification data compiled by Susmita Dasgupta at the World Bank. The former assesses consequences of continued SLR for 84 coastal developing countries, providing data on the impacted land area, population, GDP, agricultural area, urban area and wetlands if sea-levels rise by 1 to 5 meters (excel worksheets). The latter considers the potential impact of a large (1-in-100-year) storm surge by contemporary standards, and then compares it with its 10% intensification which is expected to occur in this century. Again the impact on land area etc. is provided.

Follow me on Twitter @MEDevEcon to get updates

Energy and Natural Resources

The World Bank Wealth of Nations dataset provides country-level data on comprehensive wealth, adjusted net saving and non-renewable resource rents indicators. It presents a set of “wealth accounts” for over 150 countries for 1995, 2000, and 2005 which allows a longer-term assessment of global, regional and country performance in building wealth. Adjusted Net Saving (takes into account CO2 damages, natural resource depletion etc.) and non-renewable resource rent (oil, gas, tin, copper, etc.) indicators are calculated annually from 1970 to 2008.

The International Energy Association has compiled the Joint Oil Data Initiative (JODI), which provides data for up to 90 countries, although it centres its attention on the 30 largest oil producers and consumers. The time-series goes back to January 2002 (monthly data) and covers stocks, imports/exports, refinery output and other relevant variables for crude oil, liquified gas, diesel, and other oil products. They are planning a similar database (InterEnerStat) for all fuels and flows.

The Norwegian Centre for the Study of Civil War within the International Peace Research Institute, Oslo (PRIO) has a number of datasets relating to geography and resource endowment. These include data on diamond resources, petroleum resources, length of international boundaries, minimum distance between countries, and data on rivers/river basins shared between countriess. 

The Food and Agriculture Organisation (FAO) of the UN publishes AquaStat which represents a "global information system on water and agriculture, developed by the Land and Water Division. The main mandate of the programme is to collect, analyze and disseminate information on water resources, water uses, and agricultural water management with an emphasis on countries in Africa, Asia, Latin America and the Caribbean." A bit more specifically, the main Aquastat database reports 70 variables under the headings 'Land use and population', 'Climate and water resources', 'Water use (by sector and by source)', 'Irrigation and drainage development' and 'Environment and health' for 5-year intervals from 1958-1962 onwards for a large number of countries. Other databases include the excellent 'Geo-referenced database on dams' and data on 'River sediment yields'. The data can be exported in CSV format. 

The Global Runoff Data Centre at BfG (sadly not the Big Friendly Giant but the German Bundesanstalt für Gewässerkunde) provides access to data on global hydrological data. "The initial dataset of monthly river discharge data over a period of several years around 1980 was supplemented with the 'UNESCO monthly river discharge data collection 1965-85'. Today the database comprises discharge data of more than 7.000 gauging stations from all over the world. Since 1993 the total number of station-years has increased by a factor of around 10." 'Standard services' include Freshwater Fluxes into the World Oceans, Major River Basins of the World and Long-Term Mean Monthly Discharges. [this data features in the work by Abhishek Chakravarty at UCL/Essex.]

Back up to the Table of Contents
Follow me on Twitter @MEDevEcon to get updates

Innovation, Patents, R&D and Intangible Capital

Innovation and Development Around the World (1960-2000) by Daniel Lederman and Laura Saenz at the World Bank offers data on 'innovative activities' from the 1960s onwards.

The INTAN-Invest project run by folk at Imperial College, The Conference Board and LUISS including Carol Corrado and Jonathan Haskel has published broad sector-level data for knowledge based capital in a number of OECD countries. The data cover 8 sectors (agri, services, manufacturing, construction, etc.) in 14 countries from 1995 to 2010 and can be downloaded in excel format alongside detailed documentation from the project website

$$ The OECD maintains OECD.stat which has statistics on R&D, patents and other science & technology topics. Provision is limited to the OECD member states, the BRICS and a small number of other countries.

UNESCO provides statistics on R&D expenditure and personnel in their Data Centre, although it seems this data does not stretch further back than 1996.

The World Intellectual Property Organisation (WIPO) publishes the World Intellectual Property Indicators which includes for instance data for "Patent applications by patent office (1883-2008)" (read: country) which can be downloaded as excel or CSV file. Similarly of great interest should be "Patent grants by patent office (1883-2008)" and other statistics on 'Patents in Force' and 'Patent Intensity'. WIPO also has further resources on trademarks and plant varieties(!) among others. A second resource for patent data is the European Patent Office (EPO) which has a number of free databases on its website. [Thanks to Christian Helmers for these links]

The World Intellectual Property Organisation (WIPO) also offers WIPO Lex, a "one-stop search facility for national laws and treaties on intellectual property (IP) of WIPO, WTO and UN Members".

Diego Comin and Bart Hobijn constructed the Historical Cross-Country Technology Adoption (HCCTA) dataset, available at NBER. This data allows for the analysis of the adoption patterns of some of the major technologies introduced in the past 250 years across the World's leading industrialized economies. This comes as an excel file with macros included, but if you prefer to play around with full data you can download the ASCII version.

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

The Agricultural Science and Technology Indicators (ASTI) are provided by IFPRI. These are agricultural R&D indicators for developing countries only, with varying time-series coverage (earliest time around 1970, most recent up to around 2002). The data is split by institutional category (Higher Education, Private, Public Sector, NFP, government agencies) and provides numbers on researchers and R&D expenditure on agriculture.

For data covering agriculture R&D in developed countries check out the 'Science & Technology' section of the UNESCO database.

Follow me on Twitter @MEDevEcon to get updates

Political Economy (i): law, institutions and governance

The Norwegian Social Science Data Services (NSD) have compiled The Macro Data Guide, "An International Social Science Resource" covering many sources with data arranged by country or topic. It seems that coverage is particular strong on topics of political science, including elections, parties, etc (but that's just my perception). For each dataset there is very useful background information on coverage, time span, topics, documentation and when the dataset was last accessed. Definitely a good starting point for any macro data search.

New tools in comparative political economy: The Database of Political Institutions (1975-2006), based on the 2001 dataset created by Thorsten Beck, George Clarke, Alberto Groff, Philip Keefer, and Patrick Walsh for the World Bank. This is now available in an updated version to 2010 (Stata 10 file). [thanks to Sarah Brierley who tweets from Accra @sabrierley]

Polity IV Project: Political Regime Characteristics and Transitions (1800-2006). The Polity IV Project is run by Monty G. Marshall and Keith Jaggers (Principal Investigators), Ted Robert Gurr (Founder). Currently covers 162 countries, although not all for the entire period, obviously. Related to this, Kristian Skrede Gleditsch at Essex University provides links to the POLITY IV project, the modified P4 and P4D data, and older versions of the Polity data.

The Economic Freedom of the World database (2007) is compiled by the Fraser Institute. This contains data (from 1970 onwards) on government size, legal structures, property rights, freedom to trade, etc. for around 120 countries.

The African Elections Database (AED) created by Albert C. Nunley provides election data on 48 sub-Saharan countries, from 1990 to 2011. Each country's election page starts with a political profile. The political profile gives an overview of the political leadership and a brief history of the political situation in the country since its independence. A list of political parties (sorted alphabetically by acronym) and coalitions are found at the end of the political profile.

Freedom House is one of the most commonly used data providers for all measures related to political economy. Their current version ranges from 1973 to 2014 and covers 195 countries, data can be downloaded in Excel format. [Thanks to Corcaigh (UCC) Phd student Sean O'Conner for pointing out this oversight]

The International Institute for Democracy and Electoral Assistance (International IDEA) provides the Unified Database, which covers topics including Direct Democracy, Electoral Justice, Electoral System Design, Gender quotas, Voter Turnout, Voting from Abroad, Electoral Management Design, Political Finance. The Unified Database has global coverage, including data from provinces and previously existing, now dissolved, states. [Via the Norwegian Social Science Data Services MacroDataGuide]

Hard to comprehend, really, but I seem to have so far missed out on linking to two of the most frequently used resources when it comes to 'freedom in the world'. Freedom in the World Comparative and Historical Data by Freedom House provides country-level scores for political rights and civil liberties from 1973 onwards, plus a dataset on electoral democracy which they started collecting in the late 80s. All are available free for download, unlike the Political Risk Services Group's International Country Risk Guide (ICRG), which is $425. You should also have a look at the links provided by Freedom House in the 'Resources' tab. [Thanks to Nalan Basturk at the Erasmus School of Economics in Rotterdam for pointing this out] 

The Centripetal Democratic Governance dataset was compiled by John Gerring, Strom Thacker and Carola Moreno for a study that examined various political institutions’ impact on the quality of governanceThe dataset consists of 42 variables measuring the degree to which government institutions centralise power. In addition, the dataset contains social and economic variables and measures of bureaucratic quality from other sources. The dataset covers 225 countries and territories over the 1960-2011 time horizon. Available in Stata, ASCII and Excel format.
 [Via the Norwegian Social Science Data Services MacroDataGuide]

A dataset on electoral systems developed by Ugo Panizza and others at the Inter-American Development Bank provides indicators for the degree to which individual politicians can further their careers by appealing to narrow geographic constituencies on the one hand, or party constituencies on the other. The data covers 183 countries and runs from 1978 to 2001 (unbalanced panel). Excel and Stata files, as well as a working paper describing the data and highlighting its potential use in research exploring the connections between electoral systems and economic outcomes, are available for download. 


The Geneva-based Inter-Parliamentary Union (IPU) provides data on Women in National Parliaments from 1997 (archive link) to the present day, covering 188 countries. [Via the Norwegian Social Science Data Services MacroDataGuide]

Fulvio Castellacci and Jose Miguel Natera have created a balanced panel dataset for cross-country analyses of national systems, growth and development (CANA) hosted by the Norwegian Institute of International Affairs. The originality of this dataset (which draws on a variety of sources) is in that the gaps in the data have been filled, using a methodology of multiple (and repeated) imputations by two political scientists, Honaker and King (2010). I have not looked at the Castellaci & Natera paper describing the data construction and robustness checks in detail, but am a priori quite sceptical about imputations: these macro variables are likely to be integrated, so imputations could be rather misleading. On the other hand, missing data is a serious problem for a lot of the dimensions they consider: (1) Innovation and technological capabilities; (2) Education and human capital; (3) Infrastructures; (4) Economic competitiveness; (5) Social capital; (6) Political and institutional factors. There are a total of 41 indicators for 134 countries over the period 1980-2008. The data is in excel format and well-documented. I'd say keep an eye out for reviews and applications of this dataset.

The Quality of Government Institute at the University of Gothenburg publishes the QoG Dataset in Stata, SPSS and csv format. "The aim of the QoG Social Policy Dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and quality of government (QoG). To accomplish this we have compiled a number of freely available data sources, including aggregated public opinion data." There are three versions: (1) a cross-section with global coverage (2002); and two panels for 40 countries either annual (1946-2009) or 5-yearly (1970-2005). The topics covered are Social policy, Tax system, structural conditions for social policy, Public opinion, Political indicators and Quality of government. This is now also provided in Stata format via two user-written commands.

Benjamin Graham at the University of Southern California has created the the International Political Economy Data Resource. "In some respects, the dataset is akin to the Quality of Government Institute's "standard" panel dataset in that it includes many of the same regime-type measures and such.  But this dataset has much more of an IPE focus than the QoG data and includes various measures of exchange-rate classifications, financial openness, tariffs/trade policy, membership in international organizations, and such that are not currently in the QoG data.  So it doesn't cast as wide a net as the QoG data do, but it delves into IPE-related measures to a greater degree than does any of the QoG datasets." 
A working paper associated with the dataset can be found here [Thanks to Rob O'Reilly, a political scientist at Emory University, for pointing me to this dataset - the above blurb was written by Rob as well!]. 

Geert Bekaert and Campbell R. Harvey at Duke have compiled a country risk database which provides 'A Chronology of Important Financial, Economic and Political Events in Emerging Markets' for 55 countries. The data is presented on country-specific websites so you'll have a little copying and pasting to do before you can analyse the data. [Thanks to my former PhD colleague Bob Rijkers, now at the World Bank, for the link].

Tatu Vanhanen and the International Peace Research Institute, Oslo (PRIO) offer the Polyarchy dataset, which covers 187 countries over the period 1810 to 2000. This contains the Vanhanen Index of Democracy and the data on which this index is based.

The Cingranelli-Richards (CIRI) Human Rights Dataset, hosted by SUNY Binghampton, contains standards-based quantitative information on government respect for 15 internationally recognized human rights for 195 countries, annually from 1981-2009. The data describe a wide variety of government human rights practices (15) including torture, workers' rights, and women’s rights over a 29-year period. This dataset is featured in the World Bank WDR 2011 (and is conveniently included in its dedicated Excel file).

The Comparative Study of Electoral Systems (CSES) is a collaborative program of research among election study teams from around the world. "The CSES is composed of three parts: first, a common module of public opinion survey questions is included in each participant country's post-election study. These "micro" level data include vote choice, candidate and party evaluations, current and retrospective economic evaluations, evaluation of the electoral system itself, in addition to standardized sociodemographic measures. Second, district level data are reported for each respondent, including electoral returns, turnout, and the number of candidates. Finally, system or "macro" level data report aggregate electoral returns, electoral rules and formulas, and regime characteristics." Covers >50 countries from 1996-2011. 
[Via the Norwegian Social Science Data Services MacroDataGuide]

The Democratic Electoral Systems Around the World (DES) dataset, compiled by Nils-Christian Bormann and Matt Golder, describes some of the more important electoral institutions used in legislative and presidential elections around the world in a consistent and comparative manner. In total, the data contain information on 1,197 legislative and 433 presidential elections that occurred in democracies from 1946 (or independence) through 2011. Available in Stata and Excel format with detailed codebook.

The World Bank has recently published its annual World Development Report, which this year focuses on Conflict, Security and Development. A dedicated website makes the data underlying the analysis in the report easily accessible. The excel spreadsheet covers a total of 211 countries, with maximum coverage over the years 1960-2009. The data is not limited to conflict and political economy issues but also covers geography, colonial history and foreign aid among other topics. All of the data is publicly available (and many datasets are featured here on MEDevEcon), but the unique advantage here is bringing a vast number of conflict-related data from dozens of sources (PRIO, UNHCR, Polity IV, etc.) together in a single spreadsheet (and doing a great job documenting the data and sources.
 
The World Bank has a new dataset on Worldwide governance indicators, with data available 1996-2008. They define governance as having six dimensions, Voice and Accountability, Political Stability and Absence of Violence, Government Effectiveness, Regulatory Quality, Rule of Law and Control of Corruption, and provide indicators for each of these in up to 212 countries. This data is described in a paper by Kaufmann, Kraay and Mastruzzi (2009).

The World Bank has now consolidated thet data on 'actionable' governance indicators in a single web portal, the AGI data portal. Actionable governance indicators are narrowly defined and disaggregated indicators that focus on relatively specific aspects of governance and could provide guidance on the design of reforms and monitoring of impacts. This means it provides links to over 1,000 indicator taken from sources such as AfroBarometer, the Doing Business surveys or the Press Freedom Index by Reporters without Borders. [via Gunilla Pettersson's developmentdata.org]

Staffan I. Lindberg at University of Florida provides the  Elections and Democracy in Africa (1989-2003) dataset. This includes variables providing information about the voter turnout, whether the incumbant accepted the election and other interesting pol-econ data, described in detail here. I got this link off Masa Kudamasu's blog.

Law, Debt, Informal Economy and Labour Regulation data: Andrei Shleifer's website provides links to a number of datasets he has compiled and used with various co-authors. This includes 'Private Credit in 129 Countries' (JFE 2007, with S. Djankov and C. McLiesh), with data from 1978-2002 and data on the 'unofficial economy' (primarily cross-section data).

The Multi-Dimensional Representation of Political Systems (MIRPS) dataset is also made available at the Norwegian Centre for the Study of Civil War within the International Peace Research Institute, Oslo (PRIO).

Data on Ethnic Composition of the population from the 1940s is available from the Norwegian Centre for the Study of Civil War within the International Peace Research Institute, Oslo (PRIO).

Nils Weidmann, Jan Ketil Rød and Lars-Erik Cederman from ETH in Zurich have created GREGthe 'Geo-referencing of ethnic groups' dataset employs geographic information systems (GIS) to represent group territories as polygons, covering a total of 8969 polygons, provided in ESRI shapefile format.

Hein Goemans at University of Rochester
provides access to the Archigos database on state leaders and their political 'fate' (1875 - 2004). This database is a collaborative effort with Giacomo Chiozza (Vanderbilt) and Kristian Skrede Gleditsch (Essex University) and contains information on leaders' gender, birth- and death-date, previous times in office and their post-exit fate.

The List of Independent States in ascii format and tentative list of microstates that fall short of the 250,000 threshold are also made available by Kristian Skrede Gleditsch. The former is from 1816 to 2006.

Graziella Bertocchi and Chiara Strozzi at the Università degli studi di Modena e Reggio Emilia have constructed the Citizenship Laws dataset, which contains information on citizenship laws in 162 countries of the world with reference to the years 1948, 1975, and 2001. "The available information concerns the way in which countries regulate citizenship acquisition at birth, with a distinction among jus soli (i.e., by birthplace), jus sanguinis (i.e., by descent), and mixed regimes. We also collect information about naturalization requirements... The dataset also contains information for the main border changes which have affected the countries in our sample."

The Washington-based Center for Global Development (Roodman, Radelet, Subramanian, Birdsall, Clemens and many others) have a link to datasets on their publications website. Highlights include data on 'the fate of young democracies' (since 1960), which offers "underlying reasons for backsliding and reversal in the world’s fledgling democracies".

The African Research Program at Harvard University has data on institutions, violence, and economic variables for Africa. They also provide data which they refer to as 'controls': geographic, climatic, demographic, sociological and international information on African countries from a variety of sources. Most of this data is for 1960-2000 for 47 SSA countries.

The Center for International Development and and Conflict Management at University of Maryland provides the Minorities at Risk (MAR) dataset. The MAR project currently maintains data on 284 politically active ethnic groups. The centerpiece of the project is a dataset that tracks groups on political, economic, and cultural dimensions. The project also maintains analytic summaries of group histories, risk assessments, and group chronologies for each group in the dataset. From the same institution comes the Minorities at Risk Organizational Behavior (MAROB) dataset, covering 118 ethnopolitical organizations representing 22 MAR groups in 26 countries of the Middle East and North Africa from 1980 to 2004 (in csv or Stata format).

Andrew Rose, an economist at the Haas Business School, UBC, provides decadal data on standard political economy variables from 1960-2000 in decades for up to 208 countries in the files associated with the "Size Really Doesn’t Matter: In Search of a National Scale Effect" paper.

Axel Dreher at the University of Goettingen has a couple of datasets on World Bank and IMF projects and funding facilities from 1970 onwards for up to 160 countries.

Data on Ethnic Power Relations and Ethnic Conflicts is available from Brian Min's website (a PhD student in PolSci at UCLA). This includes 'Waves of War': Location and 'purpose' of war around the world 1816-2001 (464 wars); 'From Empire to Nation State', which takes fixed geographical territories instead of countries as units of analysis, allowing for the tracing of a territory’s political and economic development before and after independence from 1816 to 2001. This database now has a dedicated website.

The Stockholm International Peace Research Institute (SIPRI) has a number of extremely detailed databases related to military expenditure, arms transfers, arms embargos as well as multilateral peace operations. The arms transfers database, for instance, includes trade registers with information on each deal including, inter alia, the suppliers and recipients, the type and number of weapon systems ordered and delivered, the years of deliveries, and the financial value of the deal. Some of the data can be downloaded as excel files, others as Word rich text format.

The Inter-American Development Bank (IADB) has created DataGov, providing governance indicators from key public databases consolidated for all countries in the world. This site has changed quite a bit since I last had a look at it - everything is now in graphs using Flash (I imagine), but there's still the opportunity to download the data to excel [thanks to Paul Clist for reminding me].

Follow me on Twitter @MEDevEcon to get updates

Political Economy (ii): conflict, legacy of conflict, terrorism and weapons

The World Bank has recently published its annual World Development Report, which this year focuses on Conflict, Security and Development. A dedicated website makes the data underlying the analysis in the report easily accessible. The excel spreadsheet covers a total of 211 countries, with maximum coverage over the years 1960-2009. The data is not limited to conflict and political economy issues but also covers geography, colonial history and foreign aid among other topics. All of the data is publicly available (and many datasets are featured here on MEDevEcon), but the unique advantage here is bringing a vast number of conflict-related data from dozens of sources (PRIO, UNHCR, Polity IV, etc.) together in a single spreadsheet (and doing a great job documenting the data and sources.

The Integrated Network for Societal Conflict Research (INSCR) was established to coordinate and integrate information resources produced and used by the Center for Systemic Peace, based in Vienna, Virginia. They provide a wealth of datasets: Forcibly Displaced Populations (1964-2008), Major Episodes of Political Violence, (MEPV, 1946-2008), PITF State Failure Problem Set (1955-2009), High Casualty Terrorist Bombings (1992-2010), Memberships in Conventional Intergovernmental Organizations (1952-1997), Polity IV (1800-2009), Coups d'Etat (1946-2009), State Fragility Index and Matrix Time-Series Data (1995-2009), Crime in India: Riots, Murders, and Dacoity (1954-2006), India Sub-National Problem Set (1960-2004). The INSCR data resources cover all independent countries with a total population of 500,000 people in 2008 (163 countries in 2009). Most of the data are regularly updated and can be downloaded in SPSS and Excel format. [I found out about this resource through a paper by Olaf de Groot (DIW) and Anja Shortland (Brunel)]

The Department of Economics at Royal Holloway, University of London hosts the Conflict Analysis Resources website. This not only comprises a large number of datasets related to the topic (Correlates of War, Termination of Civil War etc.) but also additional resources such as surveys of the literature and active researchers.

The Robert S. Strauss Center for International Security and Law, at the University of Texas at Austin hosts the Social Conflict in Africa Database (SCAD), "a resource for conducting research and analysis on various forms of social and political unrest in Africa. It includes over 6,000 social conflict events across Africa from 1990 to 2009, including riots, strikes, protests, coups, and communal violence." The entire database can be downloaded as Excel CSV file and contains very detailed information on location, actors, duration etc. of the conflict. [I found out about this website via Masa Kudamatsu's DEVECONDATA]

Page Fortna at Columbia University has a couple of interesting datasets for the analysis of civil war and interstate conflict. 'Peacekeeping and the Peacekept: Data on Peacekeeping in Civil Wars 1989-2004' and 'The Cease-Fires Data Set: The Duration of Peace after Interstate Wars 1946-1994' are provided in Stata format together with some more information on the data. Page's own research papers (on the same site) should also be insightful. [Thanks to Martha Ross at Nottingham for the pointer - Martha is now a PhD student at Wageningen University]

James Feardon at Stanford University provides a number of datasets he created/compiled to analyse civil conflict. His personal website provides the Stata datasets as well as access to the academic paper he has written with various co-authors. [Thanks to my buddy Eoin McGuirk @eoinmcguirk at Berkeley/Trinity Dublin for the pointer]

The Armed Conflict 1946–2001: A New Dataset was compiled by Nils Petter Gleditsch, Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Håvard Strand and is available at the World Bank website. The datsaset is available in STATA format. The authors are from the Norwegian Centre for the Study of Civil War and Uppsala Conflict Data Program (UCDP) at the Department of Peace and Conflict Research, Uppsala University, where this dataset can be located in an extended version (2008) among a number of other datasets on conflict.

The Heidelberg Institute for International Conflict Studies (HIIK) constructed the COSIMO database (project leader Frank R. Pfetsch), which records information on political conflicts between 1945 and today. At present, COSIMO 2.0 includes information on far more than 500 conflicts with over 2,500 phases. By the systematic recording of single conflict measures, the new conception enables the detailed description of the conflict development in violent and non-violent phases. In addition, the databank includes extensive information on the structure of state and non-state actors, that are recorded per year. At the moment the 2.0 version is not available online, but you can email the project team. Version 1.3 is available in Excel format for 1945 to 1998.

The Political Instability Task Force (PITF) has compiled annual information on each of four types of political instability events for all countries with a total population of 500,000 or greater, covering the period 1955 to the most current year; these events include ethnic wars, revolutionary wars, genocides and politicides, and adverse regime changes (all of these are contained in separate 'problem sets' for download as excel files). The PITF website is hosted by the Center for Global Policy at George Mason University and the funding comes from the CIA. [Thanks to Masa Kudamatsu's DEVECONDATA blog for listing the link]

The World Bank provides the Landmine Contamination, Casualties and Clearance database. which contains country level data on a broad range of issues related to landmines and cluster munitions, including contamination, casualties and clearance, and their associated cost. The data was compiled from two sources: Landmine and Cluster Munition Monitor and annual surveys by the United Nations Mine Action Service (UNMAS). Coverage is 1999-2009 with annual updates scheduled for October.

Data on Urban Social Disturbances covering 55 major cities, 23 in Sub-Saharan Africa and 32 in Central- and East Asia, in 49 different countries for the 1960-2006 period are available from the Norwegian Centre for the Study of Civil War within the International Peace Research Institute, Oslo (PRIO).

Data on Ethnic Power Relations and Ethnic Conflicts is available from Brian Min's website (a PhD student in PolSci at UCLA). This includes 'Waves of War': Location and 'purpose' of war around the world 1816-2001 (464 wars); 'From Empire to Nation State', which takes fixed geographical territories instead of countries as units of analysis, allowing for the tracing of a territory’s political and economic development before and after independence from 1816 to 2001.

The Expanded War Database is provided by Kristian Skrede Gleditsch at Essex University. These data contain a revised and expanded list of wars to conform with the list of independent states outlined in Gleditsch & Ward (1999) 'Interstate System Membership: A Revised List of the Independent States since 1816'.

The International Crisis Behavior (ICB) database is made available by the Center for International Development and Conflict Management at University of Maryland, covering all international and foreign policy crises for the period 1918-2005. This version includes data on 447 international crises (icb1v8) and 983 crisis actors (icb2v8). The data are stored in SPSS data files (or tab-delimited text files). The same source also provides the Dyadic-Level Crisis Data. This dataset contains information about 882 non-directed crisis dyads identified from the main data collections offered by the ICB Project. The data Set spans the years 1918-2001. A 'crisis dyad' is a pair of states satisfying each of the following three conditions: (1) both are members of the interstate system, (2) at least one of the states satisfies all three of the ICB necessary conditions for crisis involvement, and (3) at least one of the states has directed a hostile action against the other.

The International Maritime Bureau's (IMB) Piracy Reporting Centre (err, PRC) logs incidents of piracy. The data is used by Olaf de Groot (DIW) and Anja Shortland (Brunel) in an aptly entitle paper on 'Gov-arrrgh-nance - Jolly Rogers and Dodgy Rulers' (to be presented at the RES 2011 conference at Royal Hollway next month; link to paper here). They write "The IMB provides narratives on all incidents of piracy reported (voluntarily) by captains and ship-owners as well as annual counts of incidents of piracy for each country" and make a number of suggestions/changes as to the way piracy incidents are coded. Data is from 1997 to 2009.

Some folk at the University of Maryland have created the Global Terrorism Database (GTD), "an open-source database including information on terrorist events around the world from 1970 through 2008 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 87,000 cases." This includes more than 38,000 bombings, 13,000 assassinations, and 4,000 kidnappings since 1970. Registration required. [This website was first featured on the ever-evolving DEVECONDATA by Masa Kudamatsu]

Haverford College in the United States hosts the Global Terrorism Resource Database, compiled by Nicholas Lotito (class of 2010), and updated by Katie Drooyan (class of 2011), under the direction of Professor Barak Mendelsohn. Although the bulk of terrorism research findings are presented via traditional literature (e.g. articles, journals, reports, and press releases), this database focuses on other sources. In particular, this database lists sources for raw datasets and databases that combine a significant number of resources (e.g. the US government's Worldwide Incidents Tracking System; the Global Terrorism Database compiled by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland; Al-Qa’ida Attacks: 1994-2007 – RAND Corporation; International Terrorism: Attributes of Terrorist Events (ITERATE); University of Oklahoma and the University of Arkansas American Terrorism Database; and many more). The site also hosts the Al-Qaeda Statements Index, a student-created Haverford resource.

The Global Terrorism Database (GTD) compiled by the National Consortium for the Study of Terrorism and the Responses to Terrorism (START) is "an open-source database including information on terrorist events around the world from 1970 through 2010 (with additional annual updates planned for the future) ... [The] GTD includes systematic data on domestic as well as transnational and international terrorist incidents that have occurred during this time period and now includes more than 98,000 cases. For each GTD incident, information is available on the date and location of the incident, the weapons used and nature of the target, the number of casualties, and --when identifiable-- the group or individual responsible." You need to register to gain access to the data. [This is used in research by Walter Enders (the time-series man) and Gary Hoover, both of the University of Alabama, presented at the Chicago AEA and discussed in the AER P&P 102(3), pp.267-272. Hoover, incidentally, has an interesting survey of journal editors about plagiarism/research ethics.]

The US State Department publishes reports on an annual basis which provide rich information (plenty an RA could be employed to do grounded research on this) on global terrorism: "U.S. law requires the Secretary of State to provide Congress, by April 30 of each year, a full and complete report on terrorism with regard to those countries and groups meeting criteria set forth in the legislation. This annual report is entitled Country Reports on Terrorism. Beginning with the report for 2004, it replaced the previously published Patterns of Global Terrorism." The latter go back to 1995, so in total around 15 years of terrorism data are available here.

The African Research Program at Harvard University has data on institutions, violence, and economic variables for Africa. They also provide data which they refer to as 'controls': geographic, climatic, demographic, sociological and international information on African countries from a variety of sources. Most of this data is for 1960-2000 for 47 SSA countries. 

The Stockholm International Peace Research Institute (SIPRI) has a number of extremely detailed databases related to military expenditure, arms transfers, arms embargos as well as multilateral peace operations. The arms transfers database, for instance, includes trade registers with information on each deal including, inter alia, the suppliers and recipients, the type and number of weapon systems ordered and delivered, the years of deliveries, and the financial value of the deal. Some of the data can be downloaded as excel files, others as Word rich text format.


Perhaps also check the sources listed under Political Economy (i) above.

Follow me on Twitter @MEDevEcon to get updates

Drugs and Money Laundering

The U.S. State Department International Narcotics Control Strategy Report (INCSR) covers both 'Drug and Chemical Control' as well as 'Money laundering and financial crime'. Archived reports go back to 1996 (all the way to 2010). Note that these are reports, not data - if one were willing to pay an RA to go through the material there's an incredible amount of information on these two topics, given that there are reports for each country and the more recent ones are written in a questionnaire style ("Ability to freeze terrorist assets without delay: YES") so could be easily coded. Note that financial crime does not seem to cover anything that went on in, Iceland, at Wall Street or in the City over the past few years...

Back up to the Table of Contents
Follow me on Twitter @MEDevEcon to get updates

Inequality

Branko L. Milanovic, of the World Bank and CUNY, has amalgamated all the different sources for measures of income inequality (Gini Coefficients) in one Stata dataset. Includes data from "Luxembourg Income Study (LIS), Socio-Economic Database for Latin America (SEDLAC), Survey of Living Conditions (SILC) by Eurostat, World Income Distribution (WYD; the full data set is available here), World Bank Europe and Central Asia dataset, World Institute for Development Research (WIDER), World Bank Povcal, and Ginis from individual long-term inequality studies" where the last item is novel to the latest update in 2013. There are also Excel files and documentation at the same link. [via @UrbDemogrphics]

The World Income Inequality Database (WIID2) (~1960-2006) compiled by UNU WIDER has Gini coefficients for 167 countries, although the time-series element of the dataset varies considerably. The website states: "WIID2 consists of a checked and corrected WIID1, a new update of the Deininger & Squire database from the World Bank, new estimates from the Luxembourg Income Study and Transmonee, and other new sources as they have became available." Quintile and decile income share data is also contained in the dataset. Latest update June 2014 now out!

The other standard source for inequality data is the Estimated Household Income Inequality Data Set (EHII) prepared by the University of Texas Inequality Project, where you can also find other data on inequality. Particularly noteworthy here is the UTIP-UNIDO dataset which calculates the industrial pay-inequality measures for 156 countries from 1963-2003. It has a total of 3,554 observations based on the UNIDO Industrial Statistics, thus representing a very large cross-section dimension and containing annual data.

The Luxembourg Income Study (LIS) is a cross-national Data Archive and a Research Institute located in Luxembourg. The LIS archive contains two primary databases.  The LIS Database includes income microdata from a large number of countries at multiple points in time, starting from the early 1980s. The newer LWS Database includes wealth microdata from a smaller selection of countries. Both databases include labour market and demographic data as well.

Facundo Alvaredo, Tony Atkinson, Thomas Piketty and Emmanuel Saez at Oxford, PSE and Berkeley have created the World Top Incomes Database. "The world top incomes database aims to providing convenient on line access to all the existent series. This is an ongoing endeavour, and we will progressively update the base with new observations, as authors extend the series forwards and backwards. Despite the database's name, we will also add information on the distribution of earnings and the distribution of wealth. Around forty-five further countries are presently under study." This is very much work in progress.

A 2005 IMF working paper by Garbis Iradian (Deputy Director, Africa/Middle East at the Institute of International Finance, Washington) provides inequality data for 82 countries over the period 1965–2003 (the data is averaged over periods of three to seven years). The data is constructed from household surveys.

The dataset on income inequality compiled by Klaus Deininger and Lyn Squire for the World Bank is one of the most commonly used data to investigate any links between inequality and growth at the macro level. The data distributes unevenly for 138 countries and over the period of 1890-1996 (but much shorter and sporadic for the vast majority of countries).  For some countries this is not merely the Gini, but also cumulative quintile shares, available for download in Excel format.

The World Bank Development Economics Department has developed the Global Income Distribution Dynamics (GIDD), the first global CGE-microsimulation model. "The GIDD takes into account the macro nature of growth and of economic policies and adds a microeconomic—that is, household and individual—dimension to it. The GIDD includes distributional data for 121 countries and covers 90 percent of the world population." The data cover the period 1992 to 2005 (survey year), although most observations are in 2000-2002 and 2005. "The GIDD database is not a mere compilation of secondary cross-country inequality indices. Instead, it is an actual presentation of a truly global income distribution based entirely on household survey data. Additionally, the GIDD global income distribution data includes information on the conditional distribution of important household income determinants like education, age, household size, among others." There is extensive documentation for the data on the website, together with a research agenda and recent work by the group. Download is in excel spreadsheet or as a Stata file [This is used in a recent paper on trade in agriculture and global poverty by Bussolo, De Hoyos and Medvedev (all World Bank) in World Economy Vol.34(12), December 2011.] 

The Society for the Study of Economic Inequality (ECINEQ) has links to a number of datasets for the analysis of inequality. These include the Cross-National Equivalent File (CNEF) which contains equivalently defined variables for the British Household Panel Study (BHPS), the Household Income and Labour Dynamics in Australia (HILDA), the Korea Labor and Income Panel Study (KLIPS) (new!), the Panel Study of Income Dynamics (PSID), the Swiss Household Panel (SHP), the Canadian Survey of Labour and Income Dynamics (SLID), and the German Socio-Economic Panel (SOEP).

Follow me on Twitter @MEDevEcon to get updates

Sectoral Data (i): Agriculture

The UN Food and Agriculture Organisation (FAO) provides FAOSTAT, which is divided into ProdSTAT, TradeSTAT, ResourceSTAT, etc. This is the primary source for cross-country data on agricultural production and trade. Data usually starts in 1961, although this varies greatly by variable and dataset.

The Earth Trends database at the World Resources Institute also has (among other links) the complete FAO data for agricultural production, land, inputs, etc. from 1961 onwards. UPDATE August 2012: Sadly, the WRI no longer seem to host this dataset.

A Cross-Country Database for Sector Investment and Capital (1967-1992) is a unique World Bank dataset which provides investment in fixed capital stock for agriculture, as well as a capital stock variable created by the authors - Al Crego, Rita Butzer, Yair Mundlak, and Don Larson. This version provides data from 1967 to 1992 for (up to) 63 developing and developed countries. Manufacturing is also covered separately. New-ish: The World Bank team has also created an expanded version of this, which goes up to the year 2000, albeit only for 30 countries. As far as I know the latter dataset is not in the public domain, although Rita Butzer told me they are keen to get people to use it (so just email one of the aforementioned authors). A link to a STATA version of the 'old' dataset is here.

The World Bank recently completed a big data compilation exercise for Distortions to Agricultural Incentives, with a team of researchers headed by Kym Anderson providing various Estimates of Distortions to Agricultural Incentives (1955-2007). A core database provides data for Nominal Rates of Assistance to producers (NRAs), together with a set of Consumer Tax Equivalents (CTEs), for farm products and a set of Relative Rates of Assistance to farmers in 75 focus countries. Note that the variable 'border price' (bp) does however not represent the... how can I say this... 'border price', but a hypothetical producer price in the absence of distortions (domestic producer price divided by (1+NRA) and expressed in USD). The border price (fob) is not contained in the main datafile but can be found in the individual country spreadsheet (rows 37-39 for primary products, or 44-46 for lightly processed products). I am grateful to Kym Anderson and Ernesto Valenzuela for clarification; they also point to an alternative data reporter at Adelaide University where they are both based. See also next entry.

I recently had to opportunity to find out more about the India part of the new World KLEMS project, which like the EU namesake involves the University of Groningen with a number of international partners. Asia, India and Latin America KLEMS are not live yet but China KLEMS, involving the China Industrial Productivity Database 2011 is live. If you're interested in sectoral data it is also worth noting that Margaret McMillan (IFPRI/Tufts) together with Dani Rodrik (Harvard), Jon Temple (Bristol) and Marcel Timmer (Groningen) have started an ESRC/DfID-funded project last year with the aim to construct "a harmonised long-term sectoral dataset for several countries in Sub-Saharan Africa... This dataset will consist of time series information on value added in international prices and employment for ten broad economic sectors for the period from 1960 to 2010." Guess it doesn't harm pointing to my recent work on the analysis of sectoral data vs aggregate data forthcoming in the World Bank Economic Review.

The Global Trade Policy Analysis group at the AgEcon Department of Purdue University provides a number of datasets related to trade but also climate change and geography. "The GTAP Data Base is a fully documented, publicly available global data base which contains complete bilateral trade information, transport and protection linkages among 113 regions for all 57 GTAP commodities for a single year (2004 in the case of the GTAP 7 Data Base)." Single academic user licenses for GTAP 7 are $520, but a large number of free datasets (including summaries of GTAP, Social Accounting Matrix [SAM] extraction, the Global [bilateral] FDI Dataset, Project on Bilateral Labor Migration, CO2 emissions and utilities related to the Distortions of Agri Incentives project) can be found here.

HarvestChoice, a collaboration between IFPRI and researchers at the University of Minnesota, "generates knowledge products to help guide strategic investments to improve the well-being of poor people in sub-Saharan Africa through more productive and profitable farming." A vast number of datasets related to agricultural production, markets, demography, climate, etc. is available for download from their website. A lot of emphasis is placed on spatial?GIS data with further tools for map-making etc available on the website. There are also a wealth of publications and policy briefs on all topics related to production, R&D and innovation in agriculture (includes analysis of US data). 

The OECD has a dedicated database for PSE (Producer and Consumer Support Estimates) which covers OECD member states as well as a small number of Eastern European 'Emerging' Economies and the BRICS countries for 1986 to 2008. A recent paper by Kym Anderson compares and contrasts the methodology applied in his own work (see previous entry) constructing measures of agricultural production and trade distortions with that of the OECD.

The International Center for Tropical Agriculture (CIAT) has produced a Crop Atlas of the World: "The map shows derived estimates of the spatial distribution and productivity of crops for 10-km grids using a novel allocation approach involving the fusion of sub-national crop production statistics. The values in this digital [map] are the number of harvested hectares within each 10 km grid cell. This data includes area harvested in multiple season (therefore this is NOT the physical harvested area, but rather the total area harvested) [...] The sub-national crop production data comes from agricultural censuses and surveys and has scaled values, so as to obtain national production estimates that were compatible with the annual average FAO national crop statistics for 1999-2001. The prototype crop distribution database used in this study is available from the authors upon request but is currently being regenerated using newer and additional data sources (including revisions based on expert validation) and an enhanced allocation algorithm." If you have Google Earth you can look at these data maps.

The Agricultural Market Access Database (AMAD) is a collection of available public data on WTO market access in agriculture. It contains data for over 50 countries. After registration, all files can be downloaded for free (self-extracting zip files) and there is documentation on how to do this.

World Bank research team including Kee Hiau Looi, Alessandro Nicita and Marcelo Olarreaga has devised Overall Trade Restrictiveness Indices (OTRI) for aggregate trade, as well as manufacturing and agricultural trade separately. For now this measure is only available for 2009, but the team suggests that this will be updated once new data become available. "The Overall Trade Restrictiveness Index (OTRI) summarizes the trade policy stance of a country by calculating the uniform tariff that will keep its overall imports at the current level when the country in fact has different tariffs for different goods. In a nutshell, the OTRI is a more sophisticated way to calculate the weighted average tariff of a given country, with the weights reflect the composition of import volume and import demand elasticities of each imported product." These data as well as demand elasticities are available in excel/CSV format, a number of papers by the authors in the EJ and REStat are also referenced/linked.

The UN body which covers trade and investment, UNCTAD, has created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

A Cross-Country Database For Sectoral Employment And Productivity In Asia And Latin America (1950-2005) by Marcel P. Timmer and Gaaitzen J. de Vries at the Groningen Growth and Development Centre. This is a balanced panel with data on agriculture, mining, manufacturing, construction, public utilities, retail and wholesale trade, transport and communication, finance and business services, other market services and government services. The sample comprises 10 East Asian Countries (nope, not China) and 9 Latin American ones. Variables covered in the data set are value added, output deflators, and persons employed... but not investment (or capital stock). Now part of the socalled 10-sector database.

The Food and Agricultural Policy Research Institute (FAPRI), with research centers at the Center for Agricultural and Rural Development (CARD) at Iowa State University and the Center for National Food and Agricultural Policy (CNFAP) at the University of Missouri-Columbia, provides data on commodities and agricultural policy (at the product level) for a large number of developing and developed countries (time-series dimension differs widely across variables, products and countries).

The Food and Agriculture Organisation (FAO) of the UN publishes AquaStat which represents a "global information system on water and agriculture, developed by the Land and Water Division. The main mandate of the programme is to collect, analyze and disseminate information on water resources, water uses, and agricultural water management with an emphasis on countries in Africa, Asia, Latin America and the Caribbean." A bit more specifically, the main Aquastat database reports 70 variables under the headings 'Land use and population', 'Climate and water resources', 'Water use (by sector and by source)', 'Irrigation and drainage development' and 'Environment and health' for 5-year intervals from 1958-1962 onwards for a large number of countries. Other databases include the excellent 'Geo-referenced database on dams' and data on 'River sediment yields'. The data can be exported in CSV format. 

CIMMYT, which stands for International Maize and Wheat Improvement Center (didn't you know?) have currently got three separate datasets on their website. First, some statistical price series for wheat, maize, sorghum, barley, rice, oil, fertilizers, and freight rate for wheat. This is a good dataset to act as reference global market price, since there are monthly observations for e.g. CIF Rotterdam price, but coverage varies a lot. Another interesting dataset is for agricultural production in Mexico: Agricultural information (1980-2008) related to planted area, harvested area, production, and production value of 657 permanent and seasonal crops, per cycle and regime (entries are in Spanish, but it's not too difficult to guess what 'valor ($)' means... Finally, they report the FAO data but you can pick alternative regional aggregation. [Thanks to Doug Gollin for these links]

The Center for International Earth Science Information Network (CIESIN) at Columbia's Earth Institute provides dozens of datasets under the headlines of Agriculture, Biodiversity & Ecosystems, Climate Change, Economic Activity, Environmental Assessment & Modeling, Environmental Health, Environmental Treaties, Indicators, Land Use (LU)/ Land Cover (LC) and LU/LC Change, Natural Hazards, Population, Poverty, Remote Sensing for Human Dimensions Research. The overarching theme for all datasets is environment and climate (change). Since not all data are accessible from the website there's a separate page for downloadable data.

The PBL Netherlands Environmental Assessment Agency provides the History Database of the Global Environment (interestingly, the acronym is HYDE). HYDE presents (gridded) time series of population and land use for the last 12,000 years ! It also presents various other indicators such as GDP, value added, livestock, agricultural areas and yields, private consumption, greenhouse gas emissions and industrial production data, but only for the last century.

Rural and Urban Education (i.e. rural = proxy for agricultural population) data (1960-1985) by C Peter Timmer is available in Chapter 29, 'Agriculture and economic development', of the Handbook of Agricultural Economics, Volume 2, Part 1, 2002, Pages 1487-1546. The link above is for the IDEAS RePec entry of this article: this is a copyrighted publication, but if you have access to the Handbook through your library you can easily copy the data. The coverage is exclusively for developing countries (N=65), and the data offers average years of schooling per person over the age of 25 for the rural and non-rural areas. OECD data on the same topic should allow for the inclusion of developed countries in the analysis.

International Land Quality Indexes (1987) by Willis Peterson covers relative land and cropland quality for 126 countries. The data is a single cross-section. The link is for the University of Minnesota, Department of Agriculture and Applied Economics paper (Staff Paper P87-10, 1987), which can be downloaded from the excellent AgEcon website at the same institution.

Still with the topic of arable land, the FAO Statistics Division also provides Gini data for land holdings as well as the Number and Area of Holdings by Tenure of Holdings. These data are decadal (1970/1980/1990).

The Köppen-Geiger Climate zones, documented in the Geography Datasets by John Gallup, Andrew Mellinger, and Jeff Sachs mentioned above are obviously a good resource for investigations of global agricultural production.

IFRPI offers acess to a number of geospatial datasets on agricultural production systems and agroeco-systems.

Louis Putterman at Brown University has compiled an Agricultural Transition Year Data Set which provides estimates for "the year when the first significant region within each of 165 present-day countries underwent a transition from reliance mainly on gathered wild and hunted food sources to reliance mainly on cultivated crops (and livestock)." This data is very much in line with the long-run growth theory work coming out of Brown.

AgroMaps at the FAO also has extensive geospatial data with relevance to agriculture.

The Agricultural Science and Technology Indicators (ASTI) are provided by IFPRI. These are agricultural R&D indicators for developing countries only, with varying time-series coverage (earliest time around 1970, most recent up to around 2002). The data is split by institutional category (Higher Education, Private, Public Sector, NFP, government agencies) and provides numbers on researchers and R&D expenditure on agriculture.

For data covering agriculture R&D in developed countries check out the Science & Technology section of the UNESCO database.

Follow me on Twitter @MEDevEcon to get updates

Sectoral Data (ii): Manufacturing

Trade, Production and Protection (TPT) was compiled by Alessandro Nicita and Marcelo Olarreaga at the World Bank. This provides data on 28 manufacturing sub-sectors for 1976-2004, compiled from UNIDO IndStat and COMTRADE - since neither of these two are freely available, the Nicita-Olarreaga dataset is quite a find. Although the files are quite hefty in size, the bilateral trade data should be interesting. Note also the (static) Input-Output matrices. Apart from the WB paper describing the dataset (see link on the TPT website), it helps reading this paper by Tetsuo Yamada of UNIDO.

$$ The Centre D'Etudes Prospectives et D'Information Internationales (CEPII) provides TradeProd, the Trade, Production and Bilateral Protection Database used in Mayer & Zingago (2005). This data is linked to the aforementioned Nicita & Olarreaga database and covers bilateral trade data for 1980-2004, manufacturing production data to 2004, and protection data to 2001.

$$ Said UNIDO IndStat4 database is now available in its 2008 version, providing number of establishments, employment, wages and salaries, output, value added, gross fixed capital formation (so that capital stock can be constructed via the perpetual inventory method) and number of female employees at the 4-digit-level of ISIC (Rev. 3!). This means data from 1990 onwards for 151 manufacturing categories. There is a second dataset contained at the 4-digit-level of ISIC (Rev. 2!), which covers 81 manufacturing sectors from 1980 onwards.
The UNIDO IndStat3 database from 2006 covers 29 manufacturing sectors in 181 countries from 1963-2004 at the 3-digit-level of ISIC (Rev. 2). The data covers the same variable as the previous dataset. This is the dataset of choice if you're running production functions since it has reasonable investment coverage from around 1970.
UNIDO IndStat2 2009 is out for 23 subsectors across 161 countries (1963-2007). Naturally, they do not collect data for the sectoral deflator, so it is necessary to go to the UN Common Database or individual country statistic offices to collect this information.

A Cross-Country Database For Sectoral Employment And Productivity In Asia And Latin America (1950-2005) by Marcel P. Timmer and Gaaitzen J. de Vries at the Groningen Growth and Development Centre. Described in the agriculture section above. This is now part of the 10-sector database.

World Bank research team including Kee Hiau Looi, Alessandro Nicita and Marcelo Olarreaga has devised Overall Trade Restrictiveness Indices (OTRI) for aggregate trade, as well as manufacturing and agricultural trade separately. For now this measure is only available for 2009, but the team suggests that this will be updated once new data become available. "The Overall Trade Restrictiveness Index (OTRI) summarizes the trade policy stance of a country by calculating the uniform tariff that will keep its overall imports at the current level when the country in fact has different tariffs for different goods. In a nutshell, the OTRI is a more sophisticated way to calculate the weighted average tariff of a given country, with the weights reflect the composition of import volume and import demand elasticities of each imported product." These data as well as demand elasticities are available in excel/CSV format, a number of papers by the authors in the EJ and REStat are also referenced/linked.

I recently had to opportunity to find out more about the India part of the new World KLEMS project, which like the EU namesake involves the University of Groningen with a number of international partners. Asia, India and Latin America KLEMS are not live yet but China KLEMS, involving the China Industrial Productivity Database 2011 is live. If you're interested in sectoral data it is also worth noting that Margaret McMillan (IFPRI/Tufts) together with Dani Rodrik (Harvard), Jon Temple (Bristol) and Marcel Timmer (Groningen) have started an ESRC/DfID-funded project last year with the aim to construct "a harmonised long-term sectoral dataset for several countries in Sub-Saharan Africa... This dataset will consist of time series information on value added in international prices and employment for ten broad economic sectors for the period from 1960 to 2010." Guess it doesn't harm pointing to my recent work on the analysis of sectoral data vs aggregate data forthcoming in the World Bank Economic Review.

The Rural and Urban Education data (1960-1985) by C Peter Timmer described in the agriculture section above could be applied to manufacturing as well (non-rural education to proxy for education in manufacturing).

Occupational Wages around the World for 161 occupations in over 150 countries from 1983 to 2003, compiled by Richard Freeman and Reemco Oostendorp. Available at the NBER website (in STATA or ASCII format).

Follow me on Twitter @MEDevEcon to get updates

Sectoral Data (iii): Services

The World Bank’s Services Trade Restrictions Database provides comparable information on services trade policy measures for 103 countries, five sectors (telecommunications, finance, transportation, retail and professional services) and key modes of delivery. "Compared to the vast empirical literature on policies affecting trade in goods, the empirical analysis of services trade policy is still in its infancy. One major constraint has been inadequate data on policies affecting services trade. Our limited knowledge of the pattern of services policy contrasts with the importance of services. Today, some 80 percent of GDP in the United States and the European Union originates from services, and the proportion is well over 50 percent in most countries, industrial and developing alike."

I recently had to opportunity to find out more about the India part of the new World KLEMS project, which like the EU namesake involves the University of Groningen with a number of international partners. Asia, India and Latin America KLEMS are not live yet but China KLEMS, involving the China Industrial Productivity Database 2011 is live. If you're interested in sectoral data it is also worth noting that Margaret McMillan (IFPRI/Tufts) together with Dani Rodrik (Harvard), Jon Temple (Bristol) and Marcel Timmer (Groningen) have started an ESRC/DfID-funded project last year with the aim to construct "a harmonised long-term sectoral dataset for several countries in Sub-Saharan Africa... This dataset will consist of time series information on value added in international prices and employment for ten broad economic sectors for the period from 1960 to 2010." Guess it doesn't harm pointing to my recent work on the analysis of sectoral data vs aggregate data forthcoming in the World Bank Economic Review.
Follow me on Twitter @MEDevEcon to get updates


Trade Flows, Trade Protection/Policy and Globalisation

The World Integrated Trade Solution (WITS) by the World Bank is not a dataset but a software which enables the use of COMTRADE, TRAINS (UNCTAD), IDB & CTS (WTO). The link is for the software - this only works if the user subscribes to the above dataset(s).

Mitch Abdon of the Stata Daily blog recently suggested a way of combining the UN ComtradeTools and Stata. Comtrade is the International Merchandise Trade Statistics (IMTS) of the UN, which records item-level trade for all countries in the world and contains around 1.8 bn observations from 1962 onwards. Access to this data is free, but for technical reasons a maximum of 50,000 observations per query (even more reason to use the Stata Daily application). Having installed the software which allows one to download Comtrade data (registration/subscription required for access) there are a number of simple steps to pull this data directly into Stata and save it. In fact, the entire process is run from within Stata once everything is installed. Since I had some minor trouble setting up and getting this tool to work I've written a simple Stata 10 do-file with additional information.

The European Commission's eurostat COMEXT database covers trade data from 1988 to 2009 (monthly or annual) for trade with the EU or its member countries. There are some restrictions on the maximum number of cells that can be downloaded, though. You may be better off going to COMTRADE and using the help by Mitch Abdon of the Stata Daily blog to combine the UN ComtradeTools and Stata:having installed the software which allows one to download Comtrade data (registration/subscription required for access) there are a number of simple steps to pull this data directly into Stata and save it. The entire process is run from within Stata once everything is installed. Since I had some minor trouble setting up and getting this tool to work I've written a simple Stata 10 do-file with additional information. [the COMEXT data is used in a recent ECB paper by Gabor Pula and Daniel Santabárbara]

UNCTAD Statistical Databases compiled by the UN Conference on Trade and Development cover items such as the World Investment Report (WIR) which has data on Foreign Direct Investment and Transnational Corporations. It also has Commodity Price Statistics, data on world trade in 'creative products', ICT statistics and TRAINS (mainly tariff data). Access is free as far as I know, but you need to register. It's a bit of a struggle to get through the menus, so basically look out for 'Interactive Database' on the side-menu, since this will offer access to the Beyond 20/20 database (at least in the case of FDI). The UNCTAD Handbook of Statistics now has been updated as well (is this subscription-based?).

Update September 2010: UNCTAD has now created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

The Global Trade Policy Analysis group at the AgEcon Department of Purdue University provides a number of datasets related to trade but also climate change and geography. "The GTAP Data Base is a fully documented, publicly available global data base which contains complete bilateral trade information, transport and protection linkages among 113 regions for all 57 GTAP commodities for a single year (2004 in the case of the GTAP 7 Data Base)." Single academic user licenses for GTAP 7 are $520, but a large number of free datasets (including summaries of GTAP, Social Accounting Matrix [SAM] extraction, the Global [bilateral] FDI Dataset, Project on Bilateral Labor Migration, CO2 emissions) can be found here.

The World Bank's Temporary Trade Barriers Database (TTBD) website hosts newly collected, freely available, and detailed data on more than thirty different national governments’ use of policies such as antidumping (AD, 1980s-2010), global safeguards (SG, 1995-2010), China-specific transitional safeguard (CSG, 2002-2010) measures, and countervailing duties (CVD, 1980s-2010). The information provided here in this detailed database will cover over 95% of the global use of these particular import-restricting trade remedy instruments. Information is provided in excel files on a country-by-country basis, given the amount of detail provided for each county. The website also features research reports and meta-information. Chad P. Bown seems to be the person in charge.

The World Bank provides the Trade Costs Dataset which contains estimates of bilateral trade costs in agriculture and manufactured goods for the 1995-2010 period. It is built on trade and production data collected in 178 countries. Symmetric bilateral trade costs are computed using the Inverse Gravity Framework (Novy 2009), which estimates trade costs for each country pair using bilateral trade and gross national output.

Kristian Skrede Gleditsch at Essex University provides data estimates of trade flows between independent states (1948-2000) and GDP per capita of independent states (1950-2004) for around 150 countries. These data files are too large to be opened in excel, but Kristian provides some tips on how to proceed.

The World Shipping Register provides free access to their World Sea Ports database. For each country its ports' longitude, latitude and time zone are provided, for some port the maximum draft is also provided. Given the geospatial information this data could be used to calculate distance to closest port.

$$ The Centre D'Etudes Prospectives et D'Information Internationales (CEPII) BACI (Italian for kiss, if I'm not mistaken; wonder whether the French knew that) dataset aims to provide the most disaggregated international trade database (more than 5,000 products) for the largest number of countries (over 200) and years (from 1995 to 2005, with updates to follow). It took me a while to realise that there is a disclaimer: "Files by year of BACI data for the period 1995-2005 are available for researchers already subscribing to the United Nations COMTRADE database. Users of BACI have to testify that their organisation is fully licensed COMTRADE to download BACI." So no baci after all.

The TPT database by Nicita & Olarreaga mentioned above has data on manufacturing trade, including bi-lateral trade flows (1976-2004).

A World Bank research team including Kee Hiau Looi, Alessandro Nicita and Marcelo Olarreaga has devised Overall Trade Restrictiveness Indices (OTRI) for aggregate trade, as well as manufacturing and agricultural trade separately. For now this measure is only available for 2009, but the team suggests that this will be updated once new data become available. "The Overall Trade Restrictiveness Index (OTRI) summarizes the trade policy stance of a country by calculating the uniform tariff that will keep its overall imports at the current level when the country in fact has different tariffs for different goods. In a nutshell, the OTRI is a more sophisticated way to calculate the weighted average tariff of a given country, with the weights reflect the composition of import volume and import demand elasticities of each imported product." These data as well as demand elasticities are available in excel/CSV format, a number of papers by the authors in the EJ and REStat are also referenced/linked.

The FAO provides trade data for agricultural goods in its TradeSTAT database. This can also be accessed via the World Resources Institute link - both of these can be found in the Agriculture section of this webpage.

The World Bank’s Services Trade Restrictions Database provides comparable information on services trade policy measures for 103 countries, five sectors (telecommunications, finance, transportation, retail and professional services) and key modes of delivery. "Compared to the vast empirical literature on policies affecting trade in goods, the empirical analysis of services trade policy is still in its infancy. One major constraint has been inadequate data on policies affecting services trade. Our limited knowledge of the pattern of services policy contrasts with the importance of services. Today, some 80 percent of GDP in the United States and the European Union originates from services, and the proportion is well over 50 percent in most countries, industrial and developing alike." 

A team at the World Bank comprising Tolga Cebeci, Ana M. Fernandes, Caroline Freund, and Martha Denisse Pierola have come up with the Exporter Dynamics Database. This presently covers around 45 developed and developing countries, covering mainly 2003-2009 but also the 1990s for some countries. "It allows for cross-country comparisons of exporters based on factors such as size, survival, growth, and concentration. More countries will be added as the database expands. Until now, most databases focus not on exporting firms, but on the aggregate flow of goods across borders based on countries or products." Melitz will be happy!

Matthew Ciolek at Australian National University edits the site for the Old World Trade Routes (OWTRAD) Project: "This site supports online research in the field of dromography and provides a public-access electronic archive of geo/chrono-referenced data on land, river and maritime trade routes of Eurasia and Africa during the period 10,000 BCE - circa 1820 CE." The files are published in CSV, MapInfo and Google Earth (KML) formats, downloadable by region. There's also a link to the Trade Routes Resources blog [via Masa Kudamatsu's DevEconData blog]

The IADB website hosts the data used in the work on trade intensity and business cycles by César Calderón, Alberto Chong and Ernesto Stein (2006, JIE). From the abstract: "Using annual information for 147 countries for the period 1960-99 we find that the impact of trade intensity on business cycle correlation among developing countries is positive and significant, but substantially smaller than that among industrial countries. Our findings suggest that differences in the responsiveness of cycle synchronization to trade integration between industrial and developing countries are explained by differences in the patterns of specialization and bilateral trade." 

The World Bank recently completed a big data compilation exercise for Distortions to Agricultural Incentives, with a team of researchers headed by Kym Anderson providing various Estimates of Distortions to Agricultural Incentives (1955-2007). A core database provides data for Nominal Rates of Assistance to producers (NRAs), together with a set of Consumer Tax Equivalents (CTEs), for farm products and a set of Relative Rates of Assistance to farmers in 75 focus countries. Note that the variable 'border price' (bp) does however not represent the... how can I say this... 'border price', but a hypothetical producer price in the absence of distortions (domestic producer price divided by (1+NRA) and expressed in USD). The border price (fob) is not contained in the main datafile but can be found in the individual country spreadsheet (rows 37-39 for primary products, or 44-46 for lightly processed products). I am grateful to Kym Anderson and Ernesto Valenzuela for clarification; they also point to an alternative data reporter at Adelaide University where they are both based.

The OECD has a dedicated database for PSE (Producer and Consumer Support Estimates) which covers OECD member states as well as a small number of Eastern European 'Emerging' Economies and the BRICS countries for 1986 to 2008. A recent paper by Kym Anderson compares and contrasts the methodology applied in his own work (see previous entry) constructing measures of agricultural production and trade distortions with that of the OECD.

The Agricultural Market Access Database (AMAD) is a collection of available public data on WTO market access in agriculture. It contains data for over 50 countries. After registration, all files can be downloaded for free (self-extracting zip files) and there is documentation on how to do this.

Proximity in Product Space and Diversification Strategies, compiled by Valentino Piana, is not a standard trade dataset but contains data emerging from research by C. A. Hidalgo, B. Klinger, A.-L. Barabasi, and R. Haussman, published in Science under the title The Product Space Conditions the Development of Nations (subscription required?). The basic idea in the paper is that the product space is like a forest, and each product is a tree, with the fanciest (highest value-added) products concentrated in the centre. Countries/producers in this product space are then like monkeys that jump from tree to tree... I'd prefer if they'd taken squirrels. Using 4-digit level trade data for a large number of countries (not sure what the time-horizon is) the authors establish the probability of a country producing and exporting good x, given that it produces and exports good y. Data at the above link includes 775-by-775 matrix of revealed proximities between products among other things.

World Bank data on Trade and Import Barriers has been collated by Francis Ng. Most significant time-series dimension among these is for Trends in average applied tariff rates in developing and industrial countries, 1981-2007.

The NBER-UN Trade Data (1962-2000) is available at the Center for International Data, UC Davis website. This data, constructed by the Bobs Feenstra and Lipsey with co-authors (see NBER paper #11040 for details), is organised by 4-digit ITC (Rev.2). Primacy is given for the mirror data (i.e. trade flows as reported by importing country). Wide country coverage. Users must agree not to resell or distribute the data for 1984-2000.

The homepage of Andrew Rose, an economist at Haas Business School, UCB, has tonnes of data on trade, including bi-lateral trade data from 1950-1999, which can be found in the file associated with the paper “Do We Really Know that the WTO increases Trade?”. This dataset also contains a small number of geographical variables and lots of information on regional trade agreements, GATT/WTO, etc. Note that the country codes are the IMF International Financial Statisticss codes, a list of which can be found here.
The dataset associated with "Does the WTO Make Trade More Stable?" seems to have a slightly better coverage, but with somewhat less variables.
Various measures for 'Remoteness' of a country (linked to GDP and distance measures) are contained in the panel data link associated with the "Currency Unions and International Integration" paper - coverage is 1960-1996 for up to 210 countries. He has a more up-to-date version of the same variable to 2000 somewhere on his website but I haven't been able to find it yet.

The World Bank compiles the World Trade Indicators, covering 299 indicators for 210 countries from 1995 to 2007.

Axel Dreher, a professor at the University of Goettingen in Germany, has compiled an Index of Globalisation, providing data on the economic, social and political dimensions of globalisation for 122 countries (1970-2005). As Masa Kudamatsu points out in his blog, Dani Rodrik seems to like the look of this approach. This index is now the the KOF Index of Globalization, provided by the KOF Swiss Economic institute at the Eidgenoessische Technische Hochschule (ETH) in Zurich. It offers data on three main dimensions of globalization (economic, social, political) in addition to variables measuring actual economic flows, economic restrictions, data on information flows, data on personal contact and data on cultural proximity. Data are available on an annual basis for 208 countries over the period 1970-2007. This index is still based on work by Axel Dreher (now at Heidelberg, affiliated to KOF Swiss Economic Institute at ETH) and co-authors.

$$ The IMF has the Direction of Trade Statistics which provides total bilateral and multilateral exports and imports (from COMTRADE), aggregated at national or regional group level. The database contains over 100,000 quarterly and annual time series data for over 200 countries and territories. The period for which data are available varies from country to country, but most countries’ data extend from the 1980’s to the present. A 'historical' dataset is available for a smaller subset of countries (1948-1980). ESDS covers this database and also provides additional information.

At Jon Haveland's website we can access James Rauch's categorization of SITC Rev. 2 industries according to three possible types: differentiated products, reference priced, or homogeneous goods.

$$ This is a subscription-based data source, but we were recently searching for some terms of trade data and found that the World Bank World Development Indicators actually go back to 1980, whereas anything from UNCTAD only goes back to the mid-1990s. Search for "Net barter terms of trade index".

Trade Unit Value indices (import, export) are provided in the TradePrices database by the Centre D'Etudes Prospectives et D'Information Internationales (CEPII) at the aggregate manufacturing sector level as well as 3-digit-level (ISIC).

The US Government Office of the Trade Representative National Trade Estimate Report on Foreign Trade Barriers (link for 2008 report) surveys significant foreign barriers to U.S. exports. The report provides, where feasible, quantitative estimates of the impact of these foreign practices on the value of U.S. exports. These reports (compiled since the 1980s but only available online from 2001) are only published in pdf format. The data is perhaps most useful when reading up on the trade regime in individual countries.

The Center for International Business (CIB) at the Tuck School of Business at Dartmouth has established the CIB Trade Agreements Database and Archive. The database contains the text-searchable versions of all bilateral and regional free trade agreements and customs union agreements that have been notified to the WTO, and are in force, plus many that have not been notified to the WTO. The Archive contains the full texts of these agreements.

Follow me on Twitter @MEDevEcon to get updates

Investment Flows, including Foreign Aid and Foreign Direct Investment

With regard to macro FDI data the UNCTAD World Investment Directory on-line provides a wealth of information on FDI inflows and outflows, stocks, etc. Note that you CAN download data for more than just one country by going to 'Interactive Database' (links on the left of the page), which will take you to 'FDIStat' which is of the standard 20/20 format.

Update September 2010: UNCTAD has now created a snazzy website that combines all of its statistical databases: UNCTADstat has lots of data on trade (merchandise, services), FDI flows and stocks (inward FDI from 1970!), external finance (incl. remittances), labour force/employment, global commodity price indices (from 1960!) as well as some more recent rubrics such as the creative and information economies and maritime transport (from around 2000).

The Global Trade Policy Analysis group at the AgEcon Department of Purdue University provides a number of datasets related to trade and investment but also climate change and geography. "The GTAP Data Base is a fully documented, publicly available global data base which contains complete bilateral trade information, transport and protection linkages among 113 regions for all 57 GTAP commodities for a single year (2004 in the case of the GTAP 7 Data Base)." Single academic user licenses for GTAP 7 are $520, but a large number of free datasets (including summaries of GTAP, Social Accounting Matrix [SAM] extraction, the Global [bilateral] FDI Dataset, Project on Bilateral Labor Migration, CO2 emissions) can be found here.

A fantastic resource for aid empiric fans is provided by AidData (see separate entry below): replication data for a vast number of empirical papers related to aid and development (all those Tarp et al, Rajan and Subramanian, Burnside and Dollar, Roodman papers) are linked or provided for download. [Thanks to Paddy Carter at Bristol for the link]

AidData, a partnership between Brigham Young University, the College of William and Mary, and a non-profit development organization, Development Gateway, has released a new database that captures China's development finance activities in Africa. "This database will provide a foundation for researchers, policymakers, journalists, and civil society organizations to analyze the distribution and impact of Chinese development finance to the region. The database contains nearly 1,700 official finance projects in 50 African countries, totaling over $70 billion in reported financial commitments [...] The dataset uses a media-based data collection methodology developed by AidData, which helps synthesize and standardize vast amount of project-specific information contained in thousands of English and Chinese language media reports." Data can be downloaded in full (excel) or visually analyzed. Right now the database runs from 2000 to 2011.


A new database for all metrics related to foreign aid has been launched with a conference in Oxford in March 2010: AidData has compiled figures "from a range of official sources, including the OECD Creditor Reporting System (CRS) database, donor annual reports, project documents from both bilateral and multilateral aid agencies, and data gathered directly from donor agencies". Crucially, the database covers both commitments and disbursements (which like in the FDI case deviate considerably) and refers to grants, mixed loans and grants, loans at discretionary rates from multilateral agencies, loans/loan guarantees at market rates, lechnical assistance, and sector program aid transfers in cash or in kind. There's a blog and lots of dedicated tools and information about aid data. All of this is the follow-up to the PLAID Project (a partnership of the College of William and Mary and Brigham Young University) which has now merged with Development Gateway's Accessible Information on Development Activities (AiDA) [thanks to Nic van de Sijpe for the pointer].

The Kiel Institute for the World Economy provides very detailed foreign-investment data for three OECD economies, namely Germany, Japan and the United States. The data are annual for 1980 to 2010 and give you the share of each of these three countries' sectoral investment in geographic regions (and a small groups of named countries within each region outside the OECD) as a percentage of total sectoral FDI.

The World Bank has recently published its annual World Development Report, which this year focuses on Conflict, Security and Development. A dedicated website makes the data underlying the analysis in the report easily accessible. The excel spreadsheet covers a total of 211 countries, with maximum coverage over the years 1960-2009. The data is not limited to conflict and political economy issues but also covers geography, colonial history and foreign aid among other topics. All of the data is publicly available (and many datasets are featured here on MEDevEcon), but the unique advantage here is bringing a vast number of conflict-related data from dozens of sources (PRIO, UNHCR, Polity IV, etc.) together in a single spreadsheet (and doing a great job documenting the data and sources.

$$ The IMF recently started the Coordinated Direct Investment Survey (CSID), which will provide a measure of the stock of FDI by source country. 130 receiving or investing economies have signed up for this project, which will provide the first data in around mid-2010. Unfortunately, they will only publish the stock data, not the flows on which these are based.

The OECD has detailed data on aid flows and ODA, as well as international direct investment, contained in its brilliant new OECD iLibrary. Data coverage varies, e.g. for FDI flows by industry we have data for 1980-2007, whereas for ODA by recipient country the data is for 1960-2007. The OECD also maintains the QWIDS Query Wizard for International Development Statistics, which helps when you are selecting and downloading aid-related statistics.

The World Bank and the Organisation for Economic Co-operation and Development (OECD) "have partnered to make global data on aid funding more easily accessible. Aidflows offers new transparency about the flow of development funds from countries providing aid resources (donors) to countries receiving these funds (beneficiaries).  This initiative is part of ongoing efforts to enhance the open access to data and information on development aid." For the moment it seems (conditional on my not being too inept to find the option) that display of data is limited to the last decade - it might be useful to change this given that lots more data is available. There are a lot of graphs and tables, bringing together WB and OECD indicators/data - a useful feature is the link to the WB and OECD data sources, i.e. you get taken to OECD DAC dataset if you want more details on ODA. [This was mentioned in a blog entry by Neil Fathom of the World Bank]

The UN Office for Coordination of Humanitarian Affairs (OCHA) maintains the Financial Tracking Service (FTS). "FTS is a global, real-time database which records all reported international humanitarian aid (including that for NGOs and the Red Cross/Red Crescent Movement, bilateral aid, in-kind aid, and private donations). FTS features a special focus on consolidated and flash appeals, because they cover the major humanitarian crises and because their funding requirements are well defined - which allows FTS to indicate to what extent populations in crisis receive humanitarian aid in proportion to needs... All FTS data are provided by donors or recipient organisations." [this data was featured on the UK Guardian newspaper's Global Development Data website]

The Washington-based Center for Global Development (Roodman, Radelet, Subramanian, Birdsall, Clemens and many others) have a link to datasets on their publications website. Highlights include data on net-aid transfers (1960-2007):see the next two entries.

From the same source is David Roodman's Commitment to Development Index (CDI), which "rates 22 rich countries on how much they help poor countries build prosperity, good government, and security. Each rich country gets scores in seven policy areas, which are averaged for an overall score." The CDI was first compiled in 2003.

CGD's David Roodman has updated his Net Aid Transfer database at the beginning of this year. "NAT is built from the same underlying DAC data as ODA. The NAT data set includes totals by donor (for 1960–2009), by recipient (1965–2009), and by donor and recipient (1965–2009), all in current and constant dollars. Figures by donor are also available in national currencies. The data tables by donor and recipient are too large to fit in a Microsoft Excel 2003 file, and so are provided as comma-delimited text files in a zip archive." This paper Roodman has written in 2005 is also relevant.

The World Bank publishes the Migration and Remittances Factbook (2011) as part of the OpenData initiative. This covers inflows and outflows of remittances from 1970 to 2009 (+2010 estimated) for basically all countries in the world (naturally: lots of missing observations, but from the mid-1970s onwards the data coverage is pretty impressive).

Chris Adam at Oxford University provides the files for the social accounting matrix and GAMS/Matlab program code for his work on aid in African economies for use in CGE and DSGE modelling.

Follow me on Twitter @MEDevEcon to get updates

Historical Data (pre-1900)

Please note that the data below may also contain micro-datasets - I felt it more advantageous to bring all the historical datasets together in one spot, rather than divide them between macro and micro.

Diego Comin and Bart Hobijn
constructed the Historical Cross-Country Technology Adoption (HCCTA) dataset, available at NBER. This data allows for the analysis of the adoption patterns of some of the major technologies introduced in the past 250 years across the World's leading industrialized economies. This comes as an excel file with macros included, but if you prefer to play around with full data you can download the ASCII version.

The Economic History Association has links to a number of databases for economic historians. In order to use these you just need to register with EH (free). Just looking at the data titles, this is a great resource: Italy - Florentine Domains and the City of Verona: 1427, French Slave and Long Distance Trading Profits During the 18th Century, Ottoman Economic/Social History: 1600-1900, to name just a few. Naturally, these data are primarily for (now) developed economies, but there are some links to colonial data, e.g. Developing Country Export Statistics: 1840, 1860, 1880 and 1900.

The International Institute of Social History based in Amsterdam provides access to a wide variety of historical datasets: wages, prices, and exchange rates for many countries around the world.

The European State Finance Database is an open repository for economic historians co-managed by Dr D’Maris Coffman (Centre for Financial History, Newnham College, University of Cambridge) and Dr Anne Murphy (University of Hertfordshire). It "represents the outcome of an international collaborative research project for the collection, archiving and dissemination of data on European fiscal history across the medieval, early modern and modern periods." At the moment there are links to around 60 datasets, covering Spanish crown finance, Restoration Excise Receipts (1660-1708) and many other interesting-sounding datasets. Definitely a treasure trove for empirically-minded economic historians. [Thanks to my buddy Mark Koyama for pointing out this database]

The Center for International Price Research at Vanderbilt provides links to a number of micro-level price datasets in various regions of the world, including historical data. This includes data from six US cities covering January 1700 to December 1861 for a variety of goods, as well as modern-day price data for 680 goods from Japan (84 months from January 2000).

Another astonishing resource for historical data is provided by the Global Price and Income Group at UC Davis. Looking at their datamap, SSA is blank, but there are quite a few sources for Latin America, South Asia and East Asia.

A vast repository of digitised maps is now available on the web. "The David Rumsey Map Collection was started over 25 years ago and contains more than 150,000 maps. The collection focuses on rare 18th and 19th century maps of North and South America, although it also has maps of the World, Asia, Africa, Europe, and Oceania." This covers maps from periods of the 1700s to the 1950s.

The Jordà-Schularick-Taylor Macrohistory Database covers 17 advanced economies since 1870 on an annual basis. It comprises 25 real and nominal variables. Among these, there are time series that had been hitherto unavailable to researchers, among them financial variables such as bank credit to the non-financial private sector, mortgage lending and long-term house prices. The database captures the near-universe of advanced-country macroeconomic and asset price dynamics, covering on average over 90 percent of advanced-economy output and over 50 percent of world output.

Louis Putterman at Brown University has compiled an Agricultural Transition Year Data Set which provides estimates for "the year when the first significant region within each of 165 present-day countries underwent a transition from reliance mainly on gathered wild and hunted food sources to reliance mainly on cultivated crops (and livestock)." This data is very much in line with the long-run growth theory work coming out of Brown.

Data on Anglo-African Trade (1699-1808), originally compiled by Marion Johnson, is available on the Dutch Data Archiving and Networked Services webpages, crediting J. Th. Lindblad at Leiden. "This dataset contains figures on the trade between England and Africa during the period 1699-1808: imports, exports, re-exports and indirect imports. A distinction is made between different trade flows (Londen, outports, re-exports in time and out of time, etc.). Quantities and values are given for 1100 different commodities in the eighteenth century, units (also decimalized) and pounds. Aggregates are given for each year and for each type of trade. The dataset also includes the total trade figures for England between 1700 until 1800. The dataset has been created for research purposes, in order to analyse the trade between England and Africa in the eighteenth century." Documentation is limited and you have to register and log in to get access to these data (in txt format).

A team of researchers headed by David Eltis and Martin Halbert (both at Emory University in Atlanta) provide a fantastic resource for the empirical analysis of the slave trade: The Trans-Atlantic Slave Trade Database "comprises nearly 35,000 individual slaving expeditions between 1514 and 1866. Records of the voyages have been found in archives and libraries throughout the Atlantic world. They provide information about vessels, enslaved peoples, slave traders and owners, and trading routes. [...] The website provides full interactive capability to analyze the data and report results in the form of statistical tables, graphs, maps, or on a timeline." The dataset contains the 99 variables and is made available in three formats: SPSS (.sav), comma delimited (.csv), and dBase (.dbf).  [Thanks to James Fenske at Oxford for pointing out this database]

Bob Allen's website at Nuffiled has links to historical wage and price data for a number of countries, cities and occupations respectively.

The ifo Prussian Economic History Database (iPEHD) is a county-level database covering a rich collection of variables for all counties of Prussia during the 19th century. The Royal Prussian Statistical Office collected these data in a number of censuses over the period 1816-1901 (over 600,000 observations), with much county-level information surviving in the archives. These data provide a unique treasure for unprecedented micro-regional empirical research in economic history, analyzing the importance of such factors as education, religion, fertility, and many others for the economic development of Prussia in the 19th century. Excellent documentation is provided. [Thanks to Branko Milanovic (@BrankoMilan) who tweeted about this data source]

David Ormrod, James M. Gibson and Owen Lyne (University of Kent) provide decadal time series data on rent movements in London and the South-East of England for 1580-1914. Their paper lends support to the notion that ‘the city drove the countryside, not the reverse’ in terms of development, which is (said to be) expressed in research by Bob Allen amongst others. The debate seems relevant to development economics today where some people talk about anti-agriculture bias and suggest that because the largest share of the work force is engaged in agriculture this sector must be the focus on development/policy efforts.

Patrick Manning at the World History Center, University of Pittsburgh, is governor to the World-Historical Dataverse Project, which is "intended to the contribute to creation of a comprehensive set of data on social-scientific, health, and environmental data for the world as a whole and for its constituent regions and localities, for the past four or five centuries." At present a total of nineteen datasets are linked but I imagine this is going to increase soon. [Also check out Manning's article in the Journal of Comparative Economics "Historical datasets on Africa and the African Atlantic" (subscription required) from which the previous entry on Anglo-African trade was taken]

Joerg Baten, a professor for economic history at Tuebingen (or as we folk from nearby Metzingen would say: Gogenhausen) University provides a wealth of historical data on the website for his chair. One data hub provides height measures ("Data on heights and the biological standard of living are among the most important sources of information in social- and economic-historical research, especially for the pre-statistical period") for Germany, the US, Austria, and a number of other countries. The second data hub is entitled 'Firms and Capital Markets' and offers stock exchange data data from Germany, Russia, the US, England and China starting from the early 19th century. Users need to register to access the data and are also encouraged to deposit their own historical datasets (not all data posted is from Professor Baten).

Louis Putterman at Brown University provides another historical dataset, the World Migration Matrix (1950-2000), detailing for each of 165 countries "the proportion of the ancestors in 1500 of that country's population today that were living within what are now the borders of that and each of the other countries." There's a lot of documentation provided to reference all these estimates.

The PBL Netherlands Environmental Assessment Agency provides the History Database of the Global Environment (interestingly, the acronym is HYDE). HYDE presents (gridded) time series of population and land use for the last 12,000 years ! It also presents various other indicators such as GDP, value added, livestock, agricultural areas and yields, private consumption, greenhouse gas emissions and industrial production data, but only for the last century.

The Yale School of Management has a dedicated website for Historical Financial Research Data which includes the Shanghai Stock Exchange project (during the nineteenth and beginning of the twentieth centuries) and data for the famous South Sea Bubble: "The South Seas Bubble 1720 Project is a collection of stock prices for  a large number of the traded companies in 1720. These include Dutch firms quoted in markets in the Netherlands, British firms quoted in the Netherlands,  and some previously unstudied  British firms quoted in London."

The Center for Financial Stability (CFS) hosts the Historical Financial Statistics, which aims "to be a source of comprehensive, authoritative, easy-to-use macroeconomic data stretching back several centuries. Our target range of coverage is from 1492 to the present, with special emphasis on the years before 1950, which few databases cover in detail." (hm, why start with 1492 if most data are for other countries than North American ones?). The archive, edited by Kurt Schuler, was only started in late 2010, so there are for now a lot of empty spreadsheets in the 'Country' section of the website (which splits statistics into 'Country tables' and 'International tables'). [I found a link to HFS on GMU's David Youngberg's website]

Nathan Nunn at Harvard University provides the data for his papers on his personal website, which includes (among others) US state-level data on slavery (1790-1860) and slavery data for The Americas in 1750. The data is in Stata format.

Matthew Ciolek at Australian National University edits the site for the Old World Trade Routes (OWTRAD) Project: "This site supports online research in the field of dromography and provides a public-access electronic archive of geo/chrono-referenced data on land, river and maritime trade routes of Eurasia and Africa during the period 10,000 BCE - circa 1820 CE." The files are published in CSV, MapInfo and Google Earth (KML) formats, downloadable by region. There's also a link to the Trade Routes Resources blog [via Masa Kudamatsu's DevEconData blog]

The Department of Economic History at Lund University provides a number of very long time series of prices for agricultural products as well as prices and wages for other industrial sectors. The earliest data are for 1776, and the coverage is typically at the regional level (22 regions - whatever happened to region 9?). Data can be downloaded in excel spreadsheet format.

Historic Commodity Price data (1835-1950) for 35 countries, which includes some developing and colonialised countries such as China, Cuba, Ceylon, among others. The data is provided by Chris Blattman and the link gives a number of papers and references with detailed information on the data. [I got this link off Masa Kudamatsu's blog.]

The Electronic Repository for Russian Historical Statistics, compiled by Andrei Markevich at Stanford's Hoover Institution, contains a selection of basic indicators of social and economic development within seven broad topics for five historical cross-sections (1795, 1858, 1897, 1959, 2002). Subdivided in twenty-six subtopics these data can be downloaded in excel format. Data are provided for individual regions according to the administrative-territorial division of the Russian state for each cross-section. Note: the data labels/variable names and (seemingly) very detailed descriptions are all in Russian.

The Center for Geographic Analysis at Harvard University in collaboration with Shanghai's Fudan University provides a large number of historical GIS 'maps' for China: once mastered (no simple task) this type of Geographical Information Systems (GIS) data allows for spatial analysis of Chinese development. You need to register but access is free, data is in shapefiles or xls or Access (depending on the dataset). There are a large number of datasets from the days of the Legalists and Qin Shihuang (221 BC) to the 1990s (AD).

Michael E. Mann, Raymond S. Bradley, and Malcolm K. Hughes provide the data to go with their 1998 Nature article entitled 'Global-Scale Temperature Patterns and Climate Forcing over the past Six Centuries'. There are annual grid-ed temperature data for 1730-1980 and even longer time series going back to the 1400s. [Thanks to James Fenske at Oxford for pointing out this database]

The Data & Information Services Center (DISC) Archive at University of Wisconsin-Madison provides access to the raw data and documentation which contains information on the following slave trade topics from the eighteenth and nineteenth centuries: records of slave ship movement between Africa and the Americas, slave ships of eighteenth century France, slave trade to Rio de Janeiro, Virginia slave trade in the eighteenth century, English slave trade (House of Lords Survey), Angola slave trade in the eighteenth century, internal slave trade to Rio de Janeiro, slave trade to Havana, Cuba, Nantes slave trade in the eighteenth century, and slave trade to Jamaica. This aside DISC hosts a number of datasets with relevance for economic historians [Thanks to Gunilla Petterson, who featured the DISC site on developmentdata.org]

Funded by the IADB, the Oxford Latin American Economic History Database (OxLAD) contains statistical series for a wide range of economic and social indicators covering twenty countries in the region for the period 1900-2000. Its purpose is to provide economic and social historians worldwide with a systematic recompilation of available statistical information in a single on-line source. The website also provides other resources including a long list of references, many of them in Spanish, and detailed discussion of the methodology of data construction. Downloads are in csv format.

Back up to the Table of Contents
Follow me on Twitter @MEDevEcon to get updates


Ĉ
Markus Eberhardt,
31 Mar 2011, 08:46
Comments