UHS

Availability of the UHS data (by Jan 2021)

The only publicly accessible way to get the UHS data is through the data center of the Chinese University of Hong Kong (http://ww2.usc.cuhk.edu.hk/DCS/Catalog.aspx).

What they have are [31-1-86-92] Urban Household Survey, from 1986 to 1992, and [31-1-93-97] Urban Household Survey, from 1993 to 1997. 

This is 12 years of UHS data from 10 provinces.

You need to download an application form and apply for them. They are not free. 86-92 is 360 USD and 93-97 is 70 USD (by Jan 2021). 

Despite all shortcomings, this is the only publicly available UHS data and I call it the "official one" or "10 provinces data".

Alternatively, you can find some privately circulated data. These data cover 1986 to 2009 and contain information on 16 provinces. There are another 5 years of data from 2010 to 2014 but only for 4 provinces. I call them "unofficial one" or "16 provinces data".

In the 16 provinces' data, each province has less sample than the 10 provinces' data. For example, in 1995, there are 500 households in Beijing in the 10 provinces data. In the 16 provinces' data, there are only 393 households. The missing seems random to me. For example, households 2, 4, 11, 25, 26, 27, and 30 are missed in the first 30 Beijing households for the latter one. These households also don't have extremely high or low income (except household 2). For those households that appear in both data, I check a few of them randomly and their data are exactly the same.

Income and Expenditure data from 1992 to 2001

I begin my research with the "unofficial data" and one thing that confuses me a lot is that there is a huge dip between 1992 and 2001 in income and expenditure. I guess it is because in these years, income and expenditure data are reported at the per capita level rather than the household level. So for these years, I multiple these data by the corresponding number of household members and everything seems fine then.

However, is it a glitch in the "unofficial data"? 

Eventually, I got the official data and I find that the official data has the same problem. So you can be assured when using the "unofficial data".


The saving rate in the UHS

It is well-known that the household saving rate in China is high. The grey line represents the household saving rate from the official report. However, the saving rate would be super low if you simply calculate the saving rate as 1-(total expenditure/total income) with UHS data. It is merely around 5% for a long time (the orange line).

Long story short, to get a saving rate close to the official value, you need to use the disposable income as income and the consumption expenditure as the expenditure:

saving rate=1-(consumption expenditure/disposable income)

which gives the blue line.

In the UHS, the household balance sheet is constructed in the following way:

total income+cash at hand at the beginning+loan income (借贷收入, basically money withdrawn from the bank)

                                                                                                    =

total expenditure+cash at hand at the end of the year+ loan expenditure (借贷支出, money saved in the bank).

Technically, the difference between total income and total expenditure should yield the net change in wealth.  However, the problem is that both the income side and the expenditure side include gifting and family support and there could be huge measurement errors. Therefore, on the expenditure side, we need to only use the consumption expenditure and exclude any non-consumption expenditure.

Things are a bit messy on the income side. The item of "disposable income" is only available from the year 1992, which is total (pre-tax) income minus income tax and other income from home producing. Before the year 1992, neither of these two sub-incomes was surveyed and we had to assume disposable income equals total income. Moreover, between 1992 and 2002, these two sub-incomes were neglectable and the disposable income is essentially the total income. In other words, saving rate=1-(consumption expenditure/total income). The numerator shrinks and the saving rate increases (the blue line)

Another thing comes in. Before the year 2002, there is an item called "living cost income" (生活费收入), which is total income minus gifting and family support income. It seems that this living cost income is corresponding to the consumption expenditure. However, the living cost income is roughly 90% of the total income (remember, the disposable income is nearly 99% of the total income before the year 2002). If we calculate the saving rate=1-(consumption expenditure/living cost income), we end up again with very low saving rates.

Using the disposable income is not so logically consistent but it does match the official report. If we believe the money does lay in the bank, we need to use this adjustment.

Two working papers adopt similar adjustments:

NBER (Why Are Saving Rates so High in China?), p16: "Household saving is computed as the difference between disposable income and consumption expenditures on food, clothing, housing services, transportation, communication, entertainment, education, medical care, and other miscellaneous items."

IMF (Why are Saving Rates of Urban Households in China Rising?),  p7: "We measure savings as the difference between disposable income and consumption expenditures."

The drop in the female labor participation rate in 1988? Measurement error!  

Between 1987 and 1988, the female labor participation of women aged 16-54 seems to have dropped significantly (as indicated by the blue line in the top figure). However, if we limit the sample to women aged 25-54, this pattern is no longer evident (as indicated by the orange line in the top figure). 

That is because in 1986 and 1987, the labor force information of many young women aged 16-19 was missing. In these two years, the labor participation rate of this age group was nearly 100%. By 1998, most teenagers within this age group had their labor market information available (as indicated by the blue line in the bottom figure). Consequently, the decline observed in the female labor participation rate in 1988 is solely a result of measurement error.