Working Papers

Abstract

We describe differences between the commonly used version of the U.S. Census of Manufactures and what establishments themselves report. The originally reported data has substantially more dispersion in measured establishment inputs, output, and productivity. Even after trimming, measured allocative efficiency is substantially higher in the cleaned data than in the raw data: around 5x higher in 2002 and 2007, and 50x in 2012. Without trimming, the changes are substantially larger. We describe a Bayesian approach for editing and imputation that can be used across contexts, discussing how to incorporate analysts’ manual edits and tax records, as the Census currently does.

Abstract

We describe differences between the commonly used version of the U.S. Census of Manufactures available at the FSRDCs and what establishments themselves report. The originally reported data has substantially more dispersion in measured establishment productivity. Measured allocative efficiency is substantially higher in the cleaned data than the raw data: 4x higher in 2002, 20x in 2007, and 80x in 2012. Many of the important editing strategies at the Census, including industry analysts’ manual edits and edits using tax records, are infeasible in non-U.S. datasets. We describe a new Bayesian approach for editing and imputation that can be used across contexts.


In this paper we describe the U.S. Census Bureau's redesign and production implementation of the Longitudinal Business Database (LBD) first introduced by Jarmin and Miranda (2002). The LBD is used to create the Business Dynamics Statistics (BDS), tabulations describing the entry, exit, expansion, and contraction of businesses. The new LBD and BDS also incorporate information formerly provided by the Statistics of U.S. Businesses program, which produced similar year-to- year measures of employment and establishment flows. We describe in detail how the LBD is created from curation of the input administrative data, longitudinal matching, retiming of economic census-year births and deaths, creation of vintage consistent industry codes and noise factors, and the creation and cleaning of each year of LBD data. This documentation is intended to facilitate the proper use and understanding of the data by both researchers with approved projects accessing the LBD microdata and those using the BDS tabulations.

In May 2017, the Associate Director of Economic Programs (ADEP) and the Associate Director of Research and Methodology (ADRM) established a cross-directorate team to investigate the feasibility of developing synthetic establishment-level micro-data with sufficiently high utility and privacy protection features for public dissemination from a subset of Economic Census industries defined by six-digit 2012 North American Industry Classification System (NAICS) codes. The investigation presented in this report is more comprehensive, covering 42 industries in eighteen economic sectors covered by the Economic Census. These industries are not a random sample. This research project was designed as a “proof of concept,” with understanding from upper management that post-research activities such as implementation of the recommended procedures in a production setting and development of a validation server were out of scope. This report presents the results of this research.

We describe four new lines of inquiry adding to the 2017 Economic Census regarding (i) retail health clinics, (ii) management practices in health care services, (iii) self-service in retail and service industries, and (iv) water use in manufacturing and mining industries. These were proposed by economists from the U.S. Census Bureau's Center for Economic Studies in order to fill data gaps in current Census Bureau products concerning the U.S. economy. The new content addresses such issues as the rise in importance of health care and its complexity, the adoption of automation technologies, and the importance of measuring water, a critical input to many manufacturing and mining industries.

There has been a strong surge in aggregate productivity growth in India since 1990, following significant economic reforms. Three recent studies have used two distinct methodologies to decompose the sources of growth, and all conclude that it has been driven by within-plant increases in technical efficiency and not between-plant reallocation of inputs. Given the nature of the reforms, where many barriers to input reallocation were removed, this finding has surprised researchers and been dubbed “India’s Mysterious Manufacturing Miracle.” In this paper, we show that the methodologies used may artificially understate the extent of reallocation. One approach, using growth in value added, counts all reallocation growth arising from the movement of intermediate inputs as technical efficiency growth. The second approach, using the Olley-Pakes decomposition, uses estimates of plant-level total factor productivity (TFP) as a proxy for the marginal product of inputs. However, in equilibrium, TFP and the marginal product of inputs are unrelated. Using microdata on manufacturing from five countries – India, the U.S., Chile, Colombia, and Slovenia – we show that both approaches significantly understate the true role of reallocation in economic growth. In particular, reallocation of materials is responsible for over half of aggregate Indian manufacturing productivity growth since 2000, substantially larger than either the contribution of primary inputs or the change in the covariance of productivity and size.

As part of processing the Census of Manufactures, the Census Bureau edits some data items and imputes for missing data and some data that is deemed erroneous. Until recently it was difficult for researchers using the plant-level data to determine which data items were changed or imputed during the editing and imputation process, because the edit/imputation processing flags were not available to researchers. This paper describes the process of reconstructing the edit/imputation flags for variables in the 1977, 1982, 1987, 1992 and 1997 Censuses of Manufactures using recently recovered Census Bureau files. The paper also reports summary statistics for the percentage of cases that are imputed for key variables. Excluding plants with fewer than 5 employees, imputation rates for several key variables range from 8% to 54% for the manufacturing sector as a whole, and from 1% to 72% at the 2-digit SIC industry level.