Is your data free from the outlier problem?
Stata Command: testout
Stata Command: testout.ado
Diagnostic testing of outliers by statistical tests of the bound first- and second-moment conditions for consistency and root-n asymptotic normality, respectively. A rejection of the test of the bound first-moment condition for consistency implies that the point estimates reported by regress or ivregress are incredible. A rejection of the test of the bounded second-moment condition for the root-n asymptotic normality implies that the standard errors reported by regress or ivregress are incredible. To justify the standard empirical procedures based on point estimates, standard errors, t ratios, confidence intervals, and p-values, it is imperative that researchers fail to reject both of the two tests.
When geographical units (e.g., states) are used as clusters, it is likely that the cluster sum of the score fails to have finite moments. Hence, it is strongly recommended to use this testout command with the cluster() option when a researcher considers using geographical units as cluster units. See my slides presented at the 2023 Stata Symposium. See these slides for solutions you can make in case the testout command rejects the null-hypotheses.
The Stata output below indicates that, for the main regression of an empirical paper published in Quarterly Journal of Economics in 2008, the point estimates are credible but the standard errors may not be.
Installation:
. ssc install testout
Usage:
. testout y x
. testout y x, iv(z)
. testout y x, cluster(state)
. testout y x, iv(z) cluster(state)
Help:
. help testout
Reference: Sasaki, Y & Y. Wang (2021) Diagnostic Testing of Finite Moment Conditions for the Consistency and Root-N Asymptotic Normality of the GMM and M Estimators. Journal of Business & Economic Statistics (forthcoming). Paper.
Download the manuscript and package in preparation for The Stata Journal (not submitted yet)
testout -- Executes diagnostic testing of outliers by statistical tests of the bound first- and second-moment conditions for credible point estimates and credible standard errors, respectively.
Syntax
testout y x1 x2 ... [if] [in] [, iv(varlist) cluster(varname) aweight(varname) pweight(varname) k(real) alpha(real) maxw(real) prec(real)]
Description
testout executes diagnostic testing of outliers by statistical tests of the bounded first- and second-moment conditions for consistency and root-n asymptotic normality based on Sasaki & Wang (2021). The command takes as input a scalar dependent variable y and a list of independent variables x1, x2, ... An iv is an option, and x1 is treated as an endogenous variable if the option iv is invoked. The output of the command includes the p-values of the tests and the indicators of whether the tests reject the null hypotheses at the significance level alpha. If the test rejects the null hypothesis of the bounded first-moment condition for consistency, then the point estimates of reg (or ivreg when the option iv is invoked) are unreliable. If the test rejects the null hypothesis of the bounded second-moment condition for root-n asymptotic normality, then the standard errors of reg (or ivreg when the option iv is invoked) are unreliable.
Options
iv(varlist) sets instrumental variable(s). If this option is not invoked, then the command executes the outlier tests for the OLS (reg). If this option is invoked, then the command executes the outlier tests for the 2SLS (ivreg) where the first independent variable x1 is treated as an endogenous variable instrumented by the list of variables included in iv.
cluster(varname) sets a clustering variable. If this option is not invoked, then the command does not cluster observations.
aweight(varname) sets analytic weights. If this option is not invoked, then the command does not impose analytic weights.
pweight(varname) sets probability weights. If this option is not invoked, then the command does not impose probability weights.
k(real) sets the number of extreme order statistics to be used for the tests. This number has to be an integer no smaller than 3. Not invoking this option will automatically set k to the five percent of the sample size.
alpha(real) sets the significace level of the statistical tests. The default value is alpha(0.05).
maxw(real) sets the maximum value of the set of the tail index in the composite alternative over which the likelihood function in the numerator of the likelihood ratio test statistic is integrated with respect to the Lebesgue measure. The default value is maxw(2).
prec(real) sets the precision of approximating the null distribution of the test statistic by simulations. Setting this number to a smaller value results in more accurate tests and p-values at the expense of a longer execution time of the command. The default value is prec(0.00025).
Examples
Consider the following regression:
. use "tip_chicago.dta" . regress wh_pop_change indicat min* income vacant rent single public
The outlier tests for this regression can be implemented by:
. testout wh_pop_change indicat min* income vacant rent single public
We can repeat these lines for another city:
. use "tip_washington.dta" . regress wh_pop_change indicat min* income vacant rent single public . testout wh_pop_change indicat min* income vacant rent single public
Stored results
testout stores the following in r():
Scalars r(N) observations r(G) cluster size r(k) number of extreme order statistics r(alpha) level of statistical significance r(pval1) p-value for the test of consistency r(pval2) p-value for the test of root-n normality
Macros r(cmd) testout r(mtd) reg or ivreg r(test1) result of the test of consistency r(test2) result of the test of root-n normality
Reference
Y. Sasaki & Y. Wang. 2021. Diagnostic Testing of Finite Moment Conditions for the Consistency and Root-N Asymptotic Normality of the GMM and M Estimators, Journal of Business & Economic Statistics, forthcoming. Link to Paper.
Authors
Yuya Sasaki, Vanderbilt University, Nashville, TN.
Yulong Wang, Syracuse University, Syracuse, NY.