The mcbroken project tracks the status of over 10,000 McDonald's ice cream machines around the world. This page presents historical data and live daily forecasts concerning revenue lost from broken machines, at a rate of $625 per machine-day. Each day's observation is scraped and added to the data at 12PM PST. After combining historical and updated data, missing and outlier values are algorithmically marked, and the series is stabilized via Box-Cox transformations. The models shown below are retrained daily using consistent methodology but flexible hyperparameters, and used to produce 30-day forecasts.
For more details on the code, test set result notebooks, and AWS and Docker setup, see my GitHub repo.
ARIMA techniques model the time series as a linear combination of prior values and errors from them, after applying appropriate differencing. The mcbroken series exhibits strong 7-day seasonality (Mondays are very sad days for McFlurries), so this project also makes use of seasonal lags and differences. In combination with one-hot encoded exogenous terms to handle filled-in missing values and outliers, this gives us a SARIMAX model. Order of the components is tuned using pmdarima's auto_arima, which makes use of the stepwise Hyndman-Khandakar algorithm to optimize AIC error - tempering model complexity with risk of overfitting.
When tested for the period of January 25 to February 22, 2025, this ARIMA setup achieved a Mean Absolute Percentage Error (MAPE) of roughly 7.18%, off by only $54k on average (Mean Absolute Error, MAE). This is roughly half the MAE of $122k from a simple seasonal naive forecast based on repeating the last 7 days in the data.
Exponential smoothing makes use of weighted averages of past data attributes to form a forecast. The Holt-Winters method used here models the level, trend, and seasonality of the data - this implementation uses a damped trend, which dies off over time. Multiplicative terms are not really appropriate for the mcbroken series - here the model uses additive trends and seasonality (making the model a special case of ARIMA under certain conditions). While other forecasting methods covered have closed form or built-in prediction intervals, those for this method make use of resampled/bootstrapped residual errors to simulate possible future paths for the series.
When testing on January 25 to February 22, 2025, this model had relative difficulty, at a mean absolute error of roughly $118k. As a relatively inflexible method, Holt-Winters may not have been able to handle the complexity of mcbroken data. Exponential smoothing also cannot easily handle outliers or missing values. My implementation made use of simple repetitions of seven-day lags to fill through identified missing and outlier cases - this may have distorted patterns in the underlying series somewhat.Â
Facebook's Prophet algorithm makes use of piecewise-linear trends that evolve at changepoints, as well as periodic seasonal Fourier terms, to quickly create effective daily forecasts. Several key hyperparameters can significantly influence Prophet's performance; for this task, attention was paid to the seasonality and trend changepoint frequency priors to control the strength of those components, as well as the range of data considered for changepoints. An RMSE objective was implemented on a 30-day validation set using the Optuna Bayesian hyperparameter search package; optima are found in only a few dozen iterations/within several minutes.
Prophet achieved a Mean Absolute Percentage Error (MAPE) of roughly 4.59% from January 25 to February 22, 2025, with an MAE of only $39k. This was the best performing of the methods presented. It was able to easily adapt to constantly shifting trends in the data, as well as accommodate the most historical training data of any method (the others made use of roughly the past year only). Prophet can handle outliers and missing values natively, on top of those detectable by programmatic logic and encoded as exogenous variables.