Empirics

This post talks about the empirical techniques that can be employed to confront your theory (model) with the reality (data). In everyday life, people often try to persuade each other by some special cases (stories or personal experience), but economists need to make systematic use of all the evidence available. This is a very technical and difficult job distinguishing you from non-economists. We mentioned that "the method determines the technique", so let's recap some different types of models ("the method") to which different empirical techniques should be applied.

To build a model, we connect endogenous variables, exogenous variables and parameters together in an equation system based on theory (economic modelling) or data (statistical modelling) or a bit of both (econometric modelling). This taxonomy focuses on the process in which the model is constructed and created. Once a model is built, it belongs to one of two types in terms of its mathematical nature: a reduced-form model or a structural model. The former is usually built by statistical/econometric modelling, so little theory, if any, is involved to develop such a model. A usual form of this type of model is to put the endogenous variables on the left hand side of the equations and the exogenous variables on the right hand side, such as a linear regression model, probit, tobit, VARMA, VECM and fixed/random effects models. The latter, in contrast, builds model equations from some strict theoretical basis, which can either be ad hoc or derived, such as SEM or DSGE models.

In estimating a reduced-form model (e.g. linear regression models, VARMA, VECM), distance-based estimators (e.g. OLS, GLS, 2SLS, 3SLS, GMM, SMM, II) are usually used as a simpler alternative to distribution-based estimators (e.g. maximum likelihood, Bayesian)[1]. The ultimate purpose of estimation is to find the optimal value of parameters to minimise the distance between observed and predicted dependent variables (rather than to maximise the model’s likelihood of observing the data). There are two types of distance-based estimator in terms of how the distance is measured:

  • VALUES of the dependent variables: OLS, GLS, least absolute deviation, etc.

  • FEATURES of the dependent variables: 2SLS, 3SLS, GMM, SMM, II, etc. The features can be any statistical properties[2] but identification[3] must be satisfied.

In estimating a structural model (e.g. SEM, DSGE)[4], one additional step is needed if OLS/GLS/SMM/II is used, i.e. to solve the model from its structural form to its reduced form (or “final form”). It is because an equation in a structural model can contain more than one endogenous variables when the model is constructed. In contrast, if 2SLS/3SLS/GMM is used, then estimation can be conducted based on the structural form directly.

Technically, the application of the distance-based estimation is quite similar for reduced-form and structural models, apart from having to rewriting the structural model in a reduced form and having to satisfy the identification condition. However, due to the richer information contained in the structural model, it does not require all endogenous variables to be observable in estimation, but that is a must in estimating a reduced-form model. Any unobservables can be implied/estimated from the structural model, as long as the stochastic nonsingularity[5] condition is satisfied. Moreover, SMM/II is also more flexible in utilising fragmentary data. Any data feature such as moments and qualitative restrictions can be incorporated into the estimation procedure, so the data do not have to be continuous or balanced.

Abbreviations: VARMA: vector autoregressive moving average; VECM: vector error correction model; SEM: structural equation model; DSGE: dynamic stochastic general equilibrium; OLS: ordinary least squares; GLS: generalised least squares; 2SLS: 2 stage least squares; 3SLS: 3 stage least squares; GMM: generalised method of moments; SMM: simulated method of moments; II: indirect inference.

[1] Distribution-based estimator is said to be more efficient because it uses full information of the data.

[2] For 2SLS and 3SLS, the moment condition is the orthogonality (zero correlation) between the error term and the instruments. For GMM/SMM/II, the moment conditions can be any statistical moments, correlations, autocorrelations, impulse response functions or any “auxiliary regression” based on the dependent variables. In a broad sense, 2SLS, 3SLS, GMM and SMM can be treated as special cases of II.

[3] The number of moment conditions must be no less than the number of parameters to be estimated. In the case of 2SLS, this identification condition is equivalent to that the number of excluded instruments must be no less than the number of endogenous regressors, because the former is just the number of moment conditions, and the latter is just the number of parameters.

[4] The equations in an SEM model are assumed (ad hoc), while those in a DSGE model are derived from microeconomic principles such as optimisation and equilibrium (microfounded). Also, SEM is simultaneous equation system, while DSGE is dynamic.

[5] The number of errors is no less than the number of observables.