Is your data free from the outlier problem?

Stata Command: testout

Stata Command: testout.ado

Diagnostic testing of outliers by statistical tests of the bound first- and second-moment conditions for consistency and root-n asymptotic normality, respectively. A rejection of the test of the bound first-moment condition for consistency implies that the point estimates reported by regress or ivregress are incredible. A rejection of the test of the bounded second-moment condition for the root-n asymptotic normality implies that the standard errors reported by regress or ivregress are incredible. To justify the standard empirical procedures based on point estimates, standard errors, t ratios, confidence intervals, and p-values, it is imperative that researchers fail to reject both of the two tests. 

When geographical units (e.g., states) are used as clusters, it is likely that the cluster sum of the score fails to have finite moments. Hence, it is strongly recommended to use this testout command with the cluster() option when a researcher considers using geographical units as cluster units. See my slides presented at the 2023 Stata Symposium. See these slides for solutions you can make in case the testout command rejects the null-hypotheses.

The  Stata output below indicates that, for the main regression of an empirical paper published in Quarterly Journal of Economics in 2008, the point estimates are credible but the standard errors may not be.

Installation:    

    . ssc install testout

Usage:

    . testout y x

    . testout y x, iv(z)

    . testout y x, cluster(state)

    . testout y x, iv(z) cluster(state)

Help:    

    . help testout

Reference:  Sasaki, Y & Y. Wang (2021) Diagnostic Testing of Finite Moment Conditions for the Consistency and Root-N Asymptotic Normality of the GMM and M Estimators. Journal of Business & Economic Statistics (forthcoming). Paper.

Download the manuscript and package in preparation for The Stata Journal (not submitted yet)

Title
    testout -- Executes diagnostic testing of outliers by statistical tests        of the bound first- and second-moment conditions for credible point        estimates and credible standard errors, respectively.

Syntax
    testout y x1 x2 ...  [if] [in] [, iv(varlist) cluster(varname)                 aweight(varname) pweight(varname) k(real) alpha(real)                 maxw(real) prec(real)]

Description
    testout executes diagnostic testing of outliers by statistical tests of        the bounded first- and second-moment conditions for consistency and        root-n asymptotic normality based on Sasaki & Wang (2021).  The        command takes as input a scalar dependent variable y and a list of        independent variables x1, x2, ...  An iv is an option, and x1 is        treated as an endogenous variable if the option iv is invoked.  The        output of the command includes the p-values of the tests and the        indicators of whether the tests reject the null hypotheses at the        significance level alpha.  If the test rejects the null hypothesis of        the bounded first-moment condition for consistency, then the point        estimates of reg (or ivreg when the option iv is invoked) are        unreliable.  If the test rejects the null hypothesis of the bounded        second-moment condition for root-n asymptotic normality, then the        standard errors of reg (or ivreg when the option iv is invoked) are        unreliable.

Options
    iv(varlist) sets instrumental variable(s).  If this option is not invoked,        then the command executes the outlier tests for the OLS (reg).  If        this option is invoked, then the command executes the outlier tests        for the 2SLS (ivreg) where the first independent variable x1 is        treated as an endogenous variable instrumented by the list of        variables included in iv.
    cluster(varname) sets a clustering variable.  If this option is not        invoked, then the command does not cluster observations.
    aweight(varname) sets analytic weights.  If this option is not invoked,        then the command does not impose analytic weights.
    pweight(varname) sets probability weights.  If this option is not invoked,        then the command does not impose probability weights.
    k(real) sets the number of extreme order statistics to be used for the        tests.  This number has to be an integer no smaller than 3.  Not        invoking this option will automatically set k to the five percent of        the sample size.
    alpha(real) sets the significace level of the statistical tests.  The        default value is alpha(0.05).
    maxw(real) sets the maximum value of the set of the tail index in the        composite alternative over which the likelihood function in the        numerator of the likelihood ratio test statistic is integrated with        respect to the Lebesgue measure.  The default value is maxw(2).
    prec(real) sets the precision of approximating the null distribution of        the test statistic by simulations.  Setting this number to a smaller        value results in more accurate tests and p-values at the expense of a        longer execution time of the command.  The default value is        prec(0.00025).

Examples
    Consider the following regression:
    . use "tip_chicago.dta"    . regress wh_pop_change indicat min* income vacant rent single public
    The outlier tests for this regression can be implemented by:
    . testout wh_pop_change indicat min* income vacant rent single public
    We can repeat these lines for another city:
    . use "tip_washington.dta"    . regress wh_pop_change indicat min* income vacant rent single public    . testout wh_pop_change indicat min* income vacant rent single public

Stored results
    testout stores the following in r():
    Scalars        r(N)           observations        r(G)           cluster size        r(k)           number of extreme order statistics        r(alpha)       level of statistical significance        r(pval1)       p-value for the test of consistency        r(pval2)       p-value for the test of root-n normality
    Macros        r(cmd)         testout        r(mtd)         reg or ivreg        r(test1)       result of the test of consistency        r(test2)       result of the test of root-n normality

Reference
    Y. Sasaki & Y. Wang. 2021. Diagnostic Testing of Finite Moment Conditions        for the Consistency and Root-N Asymptotic Normality of the GMM and M        Estimators, Journal of Business & Economic Statistics, forthcoming.          Link to Paper.

Authors
    Yuya Sasaki, Vanderbilt University, Nashville, TN.
    Yulong Wang, Syracuse University, Syracuse, NY.