Figure: The U.S Economic Recession Forecasts by our method
“Predicting Discrete Outcomes Using Many Highly Correlated Predictors”
with Tae-Hwy Lee
[Presented at The 33rd Annual Meeting of the Midwest Econometrics Group at Federal Reserve Bank of Cleveland, Ohio, Nov 2023]
[Accepted to present at The International Association for Applied Econometrics Annual Conference, University of Turin, Italy, Jun 2025]
[Accepted to present at The 13th World Congress of the Econometric Society, Seoul, Korea]
This paper studies binary outcome prediction using Linear Discriminant Analysis (LDA), a widely used classification method known for its success in various applications. However, it encounters challenges in high-dimensional classification, particularly with a singular covariance matrix. Existing solutions like Naive Bayes (NB) and Sparse LDA have limitations, especially in economic and financial contexts where dense covariance matrices are prevalent. To address this, we propose the Factor Adjusted Naive Bayes (FANB) method, which disentangles highly correlated predictors into a new set comprising latent common factors and idiosyncratic components and then applies the NB rule. Recognizing that only some factors and idiosyncratic components hold predictive power for a particular outcome, we introduce FANB AdaBoost for variable selection. In addition, we introduce Markov Chain FANB (MCFANB) and MCFANB AdaBoost for time series data, capturing the serial dependence of discrete outcomes by considering the current state class and transition probabilities. Theoretical analysis confirms the consistency of the method, while simulations show superior out-of-sample error performance. Applying these methods to forecast U.S. economic recessions using FRED-MD data reveals outstanding forecast accuracy across various horizons. This confirms the benefits of using both factors and idiosyncratic components in classification over original variables or common factors alone.
Figure: Five-state mortgage loan performance forecasts
by our method vs. the LASSO
“Multi-class Classification with Application to Forecasting Mortgage Loan Delinquencies” with Tae-Hwy Lee
We develop a multi-class classification method called the Multi-Class Markov Chain Factor Adjusted Naive Bayes AdaBoost to deal with correlated predictors in Multiclass Classification. This method is equivalent to fitting a forward stagewise additive model using a multi-class exponential loss function with Multi-class Markov Chain Factor Adjusted Naive Bayes as the base learner. We demonstrate the superior classification performance of the proposed method compared with other classification methods in extensive Monte Carlo simulations. We apply the proposed method to the classification problem for mortgage loan delinquencies, categorized into five states: current, 30+ days delinquent, 60+ days delinquent, 90+ days delinquent, and other later stages (such as foreclosure, real estate-owned properties, and loans that are paid off). Our five-state mortgage loan forecasting study spans multiple economic cycles (both pre- and post-COVID-19) and utilizes a unique dataset containing origination and monthly performance records for over 23 million mortgages from October 2013 to May 2023, provided by Fannie Mae. The dataset is further enriched with extensive state-level and national-level economic data. Our method effectively handles high-dimensional and highly correlated predictors and incorporates the transition matrix of the Markov chain, which represents the month-by-month progression of loans between delinquency states. Empirical results demonstrate that our approach achieves high accuracy in forecasting both majority and minority classes, surpassing the performance of the LASSO method.
“Quadratic Discriminant Analysis Class-dependent Factor Adjusted”
with Tae-Hwy Lee
This paper studies discrete outcome prediction using Quadratic Discriminant Analysis (QDA), where predictors are normally distributed and allowed to have different covariance matrices, compared to the Linear Discriminant Analysis (LDA), which assumes the common covariance matrix over the two classes. When predictors are highly correlated and admit the factor model structure, we address the problem of highly correlated QDA by the factor-adjusted procedure, which transforms highly correlated predictors into a new set of predictors comprising latent common factors and idiosyncratic components and then applies the independent rule on these weakly correlated predictors. However, the conventional framework for factor estimation often applies PCA to the common (unconditional) covariance matrix and typically assumes that factor loadings remain constant over an extended period of time. Therefore, we introduce a class-dependent factor-adjusted procedure that applies PCA on class-covariance matrices and allows the loadings to be functions of the class. We propose the class-dependent factor-adjusted Naive Bayes QDA (FANB-QDA) method for classification and demonstrate its performance in extensive simulations and application to forecast the directions (up/down) of the S&P 500 index using individual stock returns.
"Predictability of Real Estate Private Equity Returns via Machine Learning" with William Hughes
Assessed Machine Learning methods’ ability to improve forecasts of private Commercial Real Estate returns with interpretations (casual inference ) for upturns and downturns of the market.
Utilized ML methods (Random Forest, XGBoost, Principal Component Regression, Facebook Prophet, and Neural Networks) to draw insights from a massive dataset of over 5,000 predictors spanning 1978 through 2024 from the National Council of Real Estate Investment Fiduciaries (NCREIF) and Federal Reserve Bank.
The XGBoost (ML method) reduced the forecasting error by 68% over the simple regression and 26% over the multivariate regression methods.
"Forecasting Real Estate Returns with Quantile Factor Model" with Pedro Isaac Chavez-Lopez
Employed the novel method of the factor model for quantile regression to forecast the conditional quantile of Real Estate Investment Trust (REIT) Returns, which also selects the relevant factors from a big dataset of predictors.
Conducted out-of-sample analysis of large panel data (more than 32,000 stocks from 1963 to December 2024) of monthly returns from CRSP for all REITs and stocks listed on the NYSE, AMEX, and NASDAQ.
"Binary Forecast Combination with Boosting" with Tae-Hwy Lee
Developed the forecast combination method for binary outcomes using a ``super" AdaBoost that nests all different forecasting models by taking the forecasting models as its base learners.
Extension to combine the probability forecasts using the Real AdaBoost algorithm.
"Neyman-Pearson Paradigms for Asymmetric and Imbalanced Classification with Many Highly Correlated Predictors"
Addressed the trio issues of (i) the asymmetric binary classification, (ii) the imbalanced classification, and (iii) highly dimensional and correlated predictors.
Proposed the Factor-Adjusted LDA-based Neyman-Pearson classification method to solve these challenges by minimizing type II errors while ensuring the prioritized type I error remains below a user-defined threshold.