Data sources for gravity

    • Full gravity datasets

      • CEPII's gravity dataset : CEPII Gravity Data, based on Head et al. (2010). This is a new version with updated data until the year 2015. Here is the codebook that documents the dataset. There is also a lighter version of the data that includes trade flows for replication of results in Head et al. (2010).

      • Rose's datasets : Andrew Rose has generously posted many gravity datasets used in his well-known papers that investigate currency union or GATT/WTO trade e.ffects in a gravity framework.

      • Trade and Production datasets: Two datasets are available that combine bilateral trade and production for about 30 ISIC industries and several (recent) years. The original one was developed by Marcello Ollearaga and Allessandro Nicita at the World Bank. The data was expanded by Soledad Zignago, JosĂ© de Sousa and Thierry Mayer and is available at CEPII's tradeprod page. Documentation of the data is available at those web pages, but an essential advantage of those is that it allows for quite straightforward calculation of internal trade flows that can be compared with international ones. Also both contain measures of bilateral protection.

  • Bilateral trade flows

      • DoTS: Direction of Trade Statistics consists of bilateral aggregate trade data provided by the IMF. The most recent decade (currently 2002--2011) are available at this Query Builder after registering for a free trial. DoTS includes 213 reporting countries and 249 partner countries.

      • Comtrade: United Nations compilation of disaggregated bilateral trade data. Available the 6-digit Harmonized System level and using the Basic Economic Classification (BEC) which permits categorization of data into intermediates, consumption and capital goods.

      • CoW: The correlates of war project includes the Barbieri bilateral trade data. It is based mainly on DoTS but extended to include Taiwan and historical data back to 1870.

      • GTAP 7: GTAP 7 provides 2004 bilateral trade data for 57 commodities and 113 regions and would therefore appear to be potentially useful database for cross-sectional gravity estimation.

      • Others: Eurostat, WB trade and production data set. OECD for sectoral trade data (including services) and FDI stocks and flows.

      • Commodity flow survey. State to state flows of good in the United States. see downloadable files below.

    • National data

Production and Expenditure: World Bank WDI is the principle source of economy size measures. The full WDI/GDF data set includes over 1200 variables. Data goes back to 1960 but is missing some major countries in the 1960s (such as India). Another source of GDP is the Penn World Tables, whose 2012 release (version 7.1) contains data from 1950 to 2010. PWT has from 55 (1950) to 190 (2010) country/territories with non-missing GDP data. Unlike the WDI, the PWT includes complete GDP data from Taiwan. Katherine Barbieri provides additional GDP data. While GDP is a good size measure, theory suggests we should use total value of production for Y_i and total expenditure on all sources include self for X_n. Total production value excludes services that enter GDP but it includes intermediate input purchases that are netted out of GDP. Armed with production value one can obtain X_{nn} by subtracting the sum of exports to all foreign markets. One can obtain X_n by summing all imports and adding X_{nn}.

    • Trade costs data

      • TRAINS: The main source of tariff.s is the WITS interface put together by the World Bank to access TRAINS data compiled by UNCTAD since 1989 and also WTO-sourced protection data (http://wits.worldbank.org/wits/).

      • MACMAP: CEPII and ITC have also put together a database called MacMap (http://www.macmap.org/), which takes into account a richer set of bilateral preferences and also proposes tariff. equivalents for a large set of non-ad valorem measures. It is however available for only 3 years (2001, 2004, and 2007).

      • GTAP : GTAP also contains trade protection measures.

      • David Hummels proposes online some tariff and freight data mostly for US trade.

      • CEPII has an integrated bilateral dataset with variables like distances, common language, etc. that do not vary over time.