Notes on Vietnamese data

I have been using the Vietnam Household Living Standard Surveys (VHLSSs), population census data, and enterprise data for a number of research projects. In the process I have done a significant amount of verification and cleaning of the data. On this page I have posted some details. If you find this information useful, I'd love to hear about your project and how my analysis helped.

Getting access to Vietnamese data

Here are some contacts for officially accessing datasets:

Last I heard (April 2022), the VHLSS datasets have become difficult to access. Mr. Nguyễn Thế Quân <ntquan at gso dot gov dot vn> has told researchers the GSO does not have a dissemination policy for the VHLSSs and thus cannot make it available.

Samples of the population censuses are available through IPUMS International for 1989, 1999, and 2009. However, some researchers have had access to a larger sample and with more detailed geographic information than is available through IPUMS International. Ms. Nguyen Thi Thanh Mai nttmai[at]gso[dot]gov[dot]vn has previously helped provide access to the 2014 intercensal population survey as well as samples of the 2009 and 2019 censuses with more detailed geographic identifiers than is available via IPUMS International.

The enterprise data is best accessed through Mr. Nguyen Viet Phong phonggsovn[at]yahoo[dot]com. I've heard of very different prices being paid by different researchers. I'm not 100% clear whether this is because the GSO is actually charging different prices or whether it is because some researchers are only buying access to some of the datasets from each year.

I have purchased access to the 2007 Establishment Census through contacting Ms. Hai Nguyen, redseanthhai[at]yahoo[dot]com. She seems able to help with accessing other datasets as well.

Income estimates

"Comparison of income between the 'short' and 'long' samples of the 2002 VHLSS"

 "Comparison of income between the 'short' and 'long' samples of the 2004 VHLSS"

My paper with Dwayne Benjamin and Loren Brandt provides household income estimates for 2002 through 2014. 


VHLSS household and individual panels

There is a household panel component between the 2002 and 2004, 2004 and 2006, and 2006 and 2008 VHLSSs. In each panel there are some mistakes in the originally provided matches suggested by the GSO. In most cases these matches are easy to detect by comparing the gender and year of birth information of suggested matches within the household. This leads to the conclusion that some households are incorrectly matched and that some individuals within households are incorrectly matched. Using additional confidential information, I have created my own revised versions. I have made the most changes to the 2002-04 VHLSS panel. Please see here for a discussion of the likely improvement in matching for the 2002-04 panel. Please cite McCaig and Pavcnik (2015) "Informal employment in a growing and globalizing low-income country," American Economic Review, 105(5), pp. 545-50, which was my first published paper using these revised panels.

2002-2004 VHLSS household and individual panels (revised 03 July 2013)

The 2002 household identifier was created according to:

gen double hhid02 = xa*10^5 + hoso

The 2004 household identifier was created according to:

gen double hhid04=tinh*10^9+huyen*10^7+xa*10^5+diaban*100+hoso 


Household panel 2004-2006 VHLSS (revised 9 August 2010)

Individual panel 2004-2006 VHLSS (revised 2 July 2013).


Household panel 2006-2008 VHLSS

Individual panel 2006-2008 VHLSS


The 2006 household identifier was created according to:

gen double hhid06 = tinh*10^9 + huyen*10^7 + xa*10^5 + diaban*10^2 + hoso

And the 2008 household identifier was created according to:

gen double hhid08 = tinh*10^10 + huyen*10^8 + xa*10^6 + diaban*10^3 + hoso


2010-12 VHLSS household and individual panels This zip file contains the cleaned household and individual panels, and a short document describing how the panels were constructed and cleaned. Please cite McCaig and Pavcnik (2020) when using this data.


Consistent province codes over time

As many users of Vietnamese data know, the number of provinces has changed significantly since the late 1980s. In most cases the changing of provincial boundaries was either a splitting or aggregating of existing provinces as opposed to districts being reallocated between provinces. The province codes also change within surveys and across data sources. Here I am going to post files for creating consistent definitions of province codes across various datasets. Please email me if you notice any corrections I should make.

Consistent province codes across the 1989, 1999, and 2009 Population and Housing Censuses

Consistent province codes across the 2002-2010 VHLSSs and the 2000-2010 enterprise data


Consistent district codes

In on-going work, we are working on creating consistent districts from 1999 through 2019. I will update here in the future.


Consistent industry codes over time

The 1992/93 Vietnam Living Standard Survey (VLSS) used a different set of industry codes than the subsequent household surveys, the 1997/98 VLSS and the 2002, 2004, and 2006 Vietnam Household Living Standards Surveys (VHLSS). The industry codes in the 1992/93 VLSS were based on an adaptation of revision 2 of the International Standard Industrial Classification (ISIC) whereas the latter four household surveys used industry codes based on an adaptation of ISIC revision 3. Below I have uploaded two documents. The first provides a description of where the industry codes listed in the surveys deviate from the ISIC nomenclature. The second provides a mapping between the two sets of codes. Please email me if you notice any corrections I should make.

Inconsistencies in industry codes

Concordance between 1992/93 VLSS industry codes and 1997/98 VLSS through 2006 VHLSS

Below, I have also uploaded copies of the industry codes as updated in 1993, 2007, and 2018. To my knowledge, VSIC1993 is the basis of industry codes used in the 2002 through 2006 VHLSSs, the 2000 through 2007 enterprise data, and the 1999 population census. VSIC2007 is used in the 2008 through 2018 VHLSSs, the 2008 through 2017 enterprise data, and the 2009 population census. VSIC2018 is used in the 2019 population census.

VSIC1993 (based on ISIC rev3)

VSIC2007 (based on ISIC rev4)

VSIC2018 (based on ISIC rev4)

I have made some attempts at concordances between the various VSICs:

3-digit VSIC2007 to 3-digit VSIC1993

4-digit VSIC2018 to 3-digit VSIC1993

5-digit VSIC2007 to 2-digit VSIC1993

2-digit VSIC2007 to 2-digit VSIC1993

VSIC 2007 - VSIC 1993 conversion

VSIC 2018 - VSIC 2007 conversion

Household survey documentation

As it can sometimes be hard to find this information online, here is some documentation for various household surveys in Vietnam.

2002 VHLSS household questionnaire 2002 VHLSS general introduction

2002 and 2004 VHLSS basic information

2004 VHLSS household questionnaire (EN) 2004 VHLSS household questionnaire (VN) 2004 VHLSS Handbook

2006 VHLSS household questionnaire (EN) 2006 VHLSS commune questionnaire (EN)

2008 VHLSS household questionnaire (EN) 2008 VHLSS handbook

2010 VHLSS household questionnaire 2010 VHLSS handbook  

2012 VHLSS household & commune questionnaires (EN) 2012 VHLSS handbook

2014 VHLSS  household & commune questionnaires (EN)

2018 VHLSS household questionnaire (EN)


Vietnam Enterprise Census

Here is some documentation associated with the annual Vietnam Enterprise Census.

Questionnaires 2000 to 2017


Country Codes

In the enterprise census data, there is a module about the source country of investment. Here is, to my knowledge, the correct set of country names and codes, as provided to me by the GSO.

Country names and codes


US tariffs before and after the BTA

In a series of papers with various coauthors, I have used the change in U.S. tariffs on imports from Vietnam as an export shock. Here I have provided these tariffs at the 2-, 3-, and 4-digit industry level based on International Standard Industrial Classification revision 3. Please see "Exporting out of poverty: Provincial poverty in Vietnam and U.S. market access" for details on how the industry-level tariffs were created.

2-digit tariffs (Please cite "Export markets and labor allocation in a low-income country")

3-digit tariffs (Please cite "Exporting out of poverty: Provincial poverty in Vietnam and U.S. market access")

4-digit tariffs (Please cite "FDI inflows and domestic firms: Adjustments to new export opportunities")


Matching businesses over surveys in the VHLSSs

In McCaig and Pavcnik (2021), we construct a panel of businesses, mostly informal, run by households as reported in the 2004 through 2018 VHLSSs. You can use our business panel by following these instructions: Business panel replication. Note that the VHLSS data must be purchased from the General Statistics Office. We ask that you please cite our paper if you use our dataset.

Near Sa Pa, Vietnam