Job market paper:
The number of parameters in a standard multinomial choice model increases linearly with the number of choice alternatives and number of explanatory variables. Since many applications involve large choice sets with categorical explanatory variables, which enter the model as large sets of binary dummies, the number of parameters easily approaches the sample size. This paper proposes methods for data-driven parameter clustering over outcome categories and explanatory dummy categories in a multinomial probit setting. A Dirichlet process mixture encourages parameters to cluster over the categories, which favours a parsimonious model specification without imposing model restrictions. Simulation studies and an application to a choice dataset of holiday destinations show a decrease in parameter uncertainty and an enhancement of the parameter interpretability, relative to a standard multinomial choice model.
Keywords: large choice sets, Dirichlet process prior, multinomial probit model, high-dimensional models
We introduce an asymptotically unbiased estimator for the full high-dimensional parameter vector in linear regression models where the number of variables exceeds the number of available observations. The estimator is accompanied by a closed-form expression for the covariance matrix of the estimates that is free of tuning parameters. This enables the construction of confidence intervals that are valid uniformly over the parameter vector. Estimates are obtained by using a scaled Moore-Penrose pseudoinverse as an approximate inverse of the singular empirical covariance matrix of the regressors. The approximation induces a bias, which is then corrected for using the lasso. Regularization of the pseudoinverse is shown to yield narrower confidence intervals under a suitable choice of the regularization parameter. The methods are illustrated in Monte Carlo experiments and in an empirical example where gross domestic product is explained by a large number of macroeconomic and financial indicators.
Random subspace methods are a novel approach to obtain accurate forecasts in high-dimensional regression settings. Forecasts are constructed from random subsets of predictors or randomly weighted predictors. We provide a theoretical justification for these strategies by deriving bounds on their asymptotic mean squared forecast error, which are highly informative on the scenarios where the methods work well. Monte Carlo simulations confirm the theoretical findings and show improvements in predictive accuracy relative to widely used benchmarks. The predictive accuracy on monthly macroeconomic FRED-MD data increases substantially, with random subspace methods outperforming all competing methods for at least 66% of the series.
- A Bayesian Infinite Hidden Markov Vector Autoregressive Model with Richard Paap and Michel van der Wel
We propose a Bayesian infinite hidden Markov model to estimate time-varying parameters in a vector autoregressive model. The Markov structure allows for heterogeneity over time while accounting for state-persistence. By modelling the transition distribution as a Dirichlet process mixture model, parameters can vary over potentially an infinite number of regimes. The Dirichlet process however favours a parsimonious model without imposing restrictions on the parameter space. An empirical application demonstrates the ability of the model to capture both smooth and abrupt parameter changes over time, and a real-time forecasting exercise shows excellent predictive performance even in large dimensional VARs.
In this paper we study what professional forecasters predict. We use spectral analysis and state space modeling to decompose economic time series into a trend, business-cycle, and irregular component. To examine which components are captured by professional forecasters, we regress their forecasts on the estimated components extracted from both the spectral analysis and the state space model. For both decomposition methods we find that the Survey of Professional Forecasters in the short run can predict almost all variation in the time series due to the trend and business-cycle, but the forecasts contain little or no significant information about the variation in the irregular component.