9. Masini, Ricardo P. and Marcelo C. Medeiros (2024). Balancing Flexibility and Interpretability: A Conditional Linear Model Estimation via Random Forest. PDF file.
Traditional parametric econometric models often rely on rigid functional forms, while nonparametric techniques, despite their flexibility, frequently lack interpretability. This paper proposes a parsimonious alternative by modeling the outcome Y as a linear function of a vector of variables of interest X, conditional on additional covariates Z. Specifically, the conditional expectation is expressed as 𝔼[Y|X,Z]=Xβ(Z), where β(⋅) is an unknown Lipschitz-continuous function. We introduce an adaptation of the Random Forest (RF) algorithm to estimate this model, balancing the flexibility of machine learning methods with the interpretability of traditional linear models. This approach addresses a key challenge in applied econometrics by accommodating heterogeneity in the relationship between covariates and outcomes. Furthermore, the heterogeneous partial effects of X on Y are represented by β(⋅) and can be directly estimated using our proposed method. Our framework effectively unifies established parametric and nonparametric models, including varying-coefficient, switching regression, and additive models. We provide theoretical guarantees, such as pointwise and
Lp-norm rates of convergence for the estimator, and establish a pointwise central limit theorem through subsampling, aiding inference on the function β(⋅). We present Monte Carlo simulation results to assess the finite-sample performance of the method.
Keywords: random forests, heterogeneous partial effects, machine learning
8. Fan, Qingliang, Marcelo C. Medeiros, Hanming Yang, and Songshan Yang (2024). Cost-aware Portfolios in a Large Universe of Assets. PDF file.
This paper considers the finite horizon portfolio rebalancing problem in terms of mean-variance optimization, where decisions are made based on current information on asset returns and transaction costs. The study's novelty is that the transaction costs are integrated within the optimization problem in a high-dimensional portfolio setting where the number of assets is larger than the sample size. We propose portfolio construction and rebalancing models with nonconvex penalty considering two types of transaction cost, the proportional transaction cost and the quadratic transaction cost. We establish the desired theoretical properties under mild regularity conditions. Monte Carlo simulations and empirical studies using S&P 500 and Russell 2000 stocks show the satisfactory performance of the proposed portfolio and highlight the importance of involving the transaction costs when rebalancing a portfolio.
Keywords: High-dimensional Portfolio Optimization, Optimal Rebalancing, SCAD, Transaction Costs.
7. Medeiros, Marcelo C. and Chuanping Sun (2024). A Sorted Penalty Estimator: Inference for a Correlation-Robust Shrinkage Method. PDF file.
Variable correlations present significant challenges for a wide range of LASSOtype shrinkage methods in big data modeling. This paper introduces a correlationrobust shrinkage estimator, advancing both theoretical and practical aspects of highdimensional estimation. We establish the (non-)asymptotic properties of this estimator under relaxed assumptions, including a mixing condition and allowance for heavier tails beyond the typical sub-Gaussian setting. Additionally, we demonstrate model selection consistency under mild conditions. We further propose a de-biased version of the estimator, proving its asymptotic normality. Simulated data reveal that the de-biased estimator outperforms traditional benchmarks. In an empirical application, we employ this de-biased estimator to identify key Economic Policy Uncertainty (EPU) factors that explain inflation levels. Our findings suggest that news-based EPU factors play a crucial role in explaining CPI dynamics.
Keywords: Correlated Variables, De-biased estimator, Shrinkage, LASSO, sorted-penalty, correlation robust, variable selection consistency, alpha-mixing
6. Medeiros, Marcelo C., Erik Christian Montes Schütte and Tobias Skipper Soussi (2022). Global Inflation Forecasting: Benefits from Machine Learning Methods. PDF file.
This paper considers inflation forecasting for a vast panel of countries. We combine the information from common factors driving global inflation as well as country-specific inflation in order to build a set of different models. We also rely on new advances in the Machine Learning literature. We show that random forests and neural networks are very competitive models, and their superiority, although stable across most of the time period considered, increases during recessions. We also show that it is easier to forecast countries with more developed economies. The forecasting gains seem to be partially explained by the degree of trade openness.
Keywords: global inflation, inflation forecasting, machine learning, random forests, neural networks, shrinkage
5. Medeiros, Marcelo C. and Henrique F. Pires (2021). The Proper Use of Google Trends in Forecasting Models. PDF file.
It is widely known that Google Trends have become one of the most popular free tools used by forecasters both in academics and in the private and public sectors. There are many papers, from several different fields, concluding that Google Trends improve forecasts' accuracy. However, what seems to be widely unknown, is that each sample of Google search data is different from the other, even if you set the same search term, data and location. This means that it is possible to find arbitrary conclusions merely by chance. This paper aims to show why and when it can become a problem and how to overcome this obstacle.
Keywords: Google trends, forecasting, nowcasting, big data
4. Carneiro, Carlos B., Iúri H. Ferreira, Marcelo C. Medeiros, Henrique F. Pires, and Eduardo Zilberman (2020). The Effects of Mobility Restrictions on the Early Spread of Infectious Diseases: The Covid-19 Case. PDF file.
We adopt state-of-art statistical tools to assess the impact of lockdowns on the short-run evolution of the number of cases and deaths in the early stages of the Covid-19 pandemic in the United States. To do so, we explore the different timing in which US states adopted lockdown policies, and divide them among treated and control groups. For each treated state, we construct an artificial counterfactual based on data from the untreated states. On average, and in the very short-run, the counterfactual accumulated number of cases would be two times larger if lockdown policies were not implemented. This shed light of the potential benefits of mobility restrictions on the early spread of infectious diseases and help policymakers to evaluate the trade-off between health benefits and socio-economic impacts.
Keywords: Infections diseases, Covid-19, lockdown effects, mobility restrictions, ArCo, synthetic control
3. Martins, Leonardo L. and Marcelo C. Medeiros (2022). The Impacts of Mobility on Covid-19 Dynamics: Using Soft and Hard Data. PDF file.
This paper has the goal of evaluating how changes in mobility have affected the infection spread of Covid-19 throughout the 2020-2021 years. However, identifying a "clean" causal relation is not an easy task due to a high number of non-observable (behavioral) effects. We suggest the usage of Google Trends and News-based indexes as controls for some of these behavioral effects and we find that a 1% increase in residential mobility (i.e. a reduction in overall mobility) has a significant impact on reducing both Covid-19 cases (at least 3.02\% on a one-month horizon) and deaths (at least 2.43% at the two-weeks horizon) over the 2020-2021 sample. We also evaluate the effects of mobility on Covid-19 spread in the restricted sample (only 2020) where vaccines were not available. The results of diminishing mobility over cases and deaths on the restricted sample are still observable (with similar magnitudes in terms of residential mobility) and cumulative higher, as the effects of restricting workplace mobility turn out to be also significant: a 1\% decrease in workplace mobility diminishes cases around 1% and deaths around 2%.
Keywords: Covid-19, Mobility, Causality, Dynamic Panel
2. Ferreira, Iúri H. and Marcelo C. Medeiros (2022). Modeling and Forecasting Intraday Market Returns: a Machine Learning Approach. PDF file.
In this paper we examine the relation between market returns and volatility measures through machine learning methods in a high-frequency environment. We implement a minute-by-minute rolling window intraday estimation method using two nonlinear models: Long-Short-Term Memory (LSTM) neural networks and Random Forests (RF). Our estimations show that the CBOE Volatility Index (VIX) is the strongest candidate predictor for intraday market returns in our analysis, specially when implemented through the LSTM model. This model also improves significantly the performance of the lagged market return as predictive variable. Finally, intraday RF estimation outputs indicate that there is no performance improvement with this method, and it may even worsen the results in some cases.
Keywords: Return predictability, high frequency data, machine learning, nonlinear models, neural networks, LSTM, random forests
1. Fonseca, Yuri, Marcelo Medeiros, Gabriel Vasconcelos and Alvaro Veiga (2020). BooST: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions. PDF file.
In this paper, we introduce a new machine learning (ML) model for nonlinear regression called the Boosted Smooth Transition Regression Trees (BooST), which is a combination of boosting algorithms with smooth transition regression trees. The main advantage of the BooST model is the estimation of the derivatives (partial effects) of very general nonlinear models. Therefore, the model can provide more interpretation about the mapping between the covariates and the dependent variable than other tree-based models, such as Random Forests. We present several examples with both simulated and real data.
Keywords: machine learning, boosting, regression trees, nonlinear regression, partial effects, smooth transition.