Research

In the presence of normal error terms, I show that the second-order bootstrap (SOB) performs comparably well to the wild bootstrap in terms of size and considerably better in terms of power. However, it seems that this superior performance may be due to making use of information about the data-generating process (DGP) that practitioners seldom have in practice. Indeed, this study finds that the more the distribution of the error terms deviates from normality, the worse the SOB performs relative to the wild bootstrap. This study also shows that the choice of two-point distribution used in the wild bootstrap DGP has an enormous effect on both size and power. Quite unexpectedly, tests based on the Mammen distribution---which take explicit account of skewness---actually perform substantially worse than those based on the symmetric Rademacher distribution, even in the presence of severe skewness and kurtosis. These results corroborate earlier findings that the Mammen distribution has little to recommend it, and that the Rademacher-based variation of the wild bootstrap is to be preferred in practice.

Draft

Out-of-sample predictive performance tests were originally intended for forecast comparison and remain valid for that purpose. However, the last two decades has seen a boom in research on the uses of out-of-sample tests for purposes besides forecast comparison. The wisdom of using such out-of-sample tests when full sample alternatives are available has come under scrutiny in recent years. Using Monte Carlo simulations, this paper demonstrates that a recently proposed out-of-sample procedure for model selection indeed performs markedly worse compared to its full sample counterpart in terms of statistical size and power. This paper also reproduces an applied example whereby the conclusions are sensitive to the choice of out-of-sample versus full sample procedure.

Draft

(Job market paper)

The racial integration of the U.S. Army during the Korean War (1950-1953) is one of the largest and swiftest desegregation episodes in American history. Integration began in an effort to reinforce badly depleted all-white units, and went on to become Army-wide policy for reasons of military efficiency. The first part of this paper evaluates whether the Army achieved its goal of improving efficiency as measured by the survival rates of wounded soldiers. Using casualty data, I develop a novel wartime integration measure to quantify exogenous changes in racial integration over time and across regiments. Based on a modified difference-in-differences strategy, I find that a one standard deviation increase in regimental integration improved overall casualty survival rates by 3%. The second part of the paper explores the effects of wartime racial integration on the prejudicial attitudes of veterans after the war. To do so, I link individual soldiers to post-war social security and cemetery data using an unsupervised learning algorithm. With these linked samples, I show that a one standard deviation increase in wartime racial integration caused white veterans to live in more racially diverse neighborhoods and marry spouses with less distinctively white names. These results provide suggestive evidence that large-scale interracial contact reduces prejudice on a long-term basis.

Draft