Big Data & Machine Learning

Big Data and Machine Learning

"vOILatility: Forecasting Oil Prices under Uncertainty," mimeo, University of California Riverside, 2018. 

Abstract: "Historically, oil prices are subject to sudden jumps as well as smoother changes due to changes in supply and demand. Although linear models may capture some of the dynamics in between the jumps in-sample, they fail to represent and predict nonlinearities underlying the market out-of-sample, real time. Some of the abrupt changes in oil price dynamics were due to OPEC decisions in the 1970s-1980s. Recent developments such as shifts to new technology or cooperation of Russia and OPEC can potential engender new structural breaks in the oil market dynamics, with the possibility of markedly different results in out of sample real time forecasts. Models and methods that take into account instability/breaks might substantially improve forecasts. Nonlinear models reveal additional information compared to frameworks that take into account only average linear effect of one series on another. This paper proposes a model specifically designed to forecast oil prices taking into account potential nonlinearities and nonstationarities. The autoregressive multivariate mixed frequency model has probabilities of structural breaks in the mean and volatility of oil price as a function of several variables including: indicators of potential sudden changes in oil supply/price (news on OPEC, Russia’s oil policy and changes in inventories), indicator of economywide demand and oil consumption in the largest consumers and importers of oil, indicator of recent technology shifts, indicator of changes in risk. Preliminary results indicate that the model provides accurate real-time forecast of oil price remarkably superior to forecasts from alternative linear frameworks."

  

Predicting Default Risk of Small Business Loans with Big Data" with Hien Nguyen.

Abstract: The paper applies big data sets on firm characteristics, bank balance sheets and loan information to study the default risk of loans to small businesses under the Small Business Administration (SBA) 7(a) loan guarantee program. I find that loan age is the most important predictor of loan default for all periods: before, during and after the 2008 financial crisis. Bank balance sheet variables-bank capital and bank assets-follow loan age in ranking for before crisis and during crisis periods. However, after the crisis firm characteristics, earnings-to-assets and debt-to-assets, surpass bank variables to be the most important predictors after loan age. The results show that due to major reforms in the banking industry after the most recent financial crisis, the quality of bank balance sheets is improved. Bank characteristics, therefore, are less crucial in determining the quality of loans after 2008.

"New Class of Volatility –SVR and LSTM Models," with Igor Morais (Abstract)

The use of neural networks and machine learning for solving complex nonlinear problems has become more promising with greater availability of data and powerful algorithms. One of these options is the use of Deep Learning and SVR-Support Vector Regression for analysis of financial market time series. This paper makes use of these two techniques to estimate the daily volatility of the S & P500, comparing its prediction results with the traditional deterministic models of the GARCH family and of stochastic volatility. The major contributions of this study are related to the applied methods, with emphasis on the implementation of different kernel functions in SVR models and of different activation functions in the use of LSTM in Deep Learning. The results indicate that even in the absence of information on parameters that are obtained with parametric models, the fact is that these new techniques are more efficient in predicting volatility for different crisis scenarios.