Application General Notes

Time series methods: the book by Shumway.

Time Series Analysis and Its Applications: With R Examples (Springer Texts in Statistics)

by Robert H. Shumway, David S. Stoffer

Great book! I have it in my own time series library (eight books or more, depending on how you categorize them; a dozen or better if you add financial and econometric methods).

Time series methods are very important. And they go far beyond what most people realize is available. I usually run into the moving average method or some variant (EWMA). However, many folks don't realize that this may be the least appropriate way to model the signal, particularly if the characteristic of the signal is primarily autoregressive. The ACF and PACF are the keys to characterizing the signal.

And most folks, when they do attempt to use time series methods, usually omit any measure of model uncertainty that is usually characterized by the model confidence bands. In my (very humble) opinion, not showing model confidence bands - particularly when doing forecasting - is problematic as it can lead the "customer" to believe that we have perfect knowledge about a stochastic (not deterministic) system.

===============================================================

This note is a bit premature as I was planning to address time series methods in a bit more comprehensive fashion, but thought that I would remark about the Shumway book.

I have started what is intended to be an overview of the field of time series applications (for my own audience) on the Statwiki site.

https://sites.google.com/a/crlstatistics.net/crlstatwiki/main_page/methods/time-series

As time permits, I intend to provide examples worked in R and Statistica.

I wish that I had more time to spend on the application of time series methods. I did do one project that combined linear regression, time series (on the regression residuals), and then used statistical process control methods to look for early signs of a shift in the signal. This project also incorporated simulation to see just how sensitive the model was, and how small a shift it could detect over the course of a year (modeling a shift in product sales after commercialization of a product upgrade).

Other areas of interest include GARCH (of all varieties), vector autoregression, multivariate time series, state space methods, frequency-domain modeling, and financial and econometric applications.

===============================================================

I had intended my first note to be about the importance of knowing and verifying model assumptions. I'll probably add more on this later, but I do have a collection of thoughts on the statref site if you are interested. I have also included thoughts on specific applications on specific pages on the statref site where appropriate (and time has permitted).

https://sites.google.com/a/crlstatistics.net/crlstatwiki/main_page/fundamentals/model-assumptions

In an attempt to reinforce the credibility of the information that I provide to my group (or anyone) via the Statwiki site, I have also collected thoughts on the importance of model assumptions from various established experts.

https://sites.google.com/a/crlstatistics.net/crlstatwiki/main_page/fundamentals/model-assumptions/who-says-that-model-assumptions-need-to-be-checked

Thoughts that were offered by one of my instructors on what statisticians do, and how this is different from other fields. Here is what comes to mind.

> Explain variation

> Separate signal from noise

> Quantify uncertainty **

> Check model validity **

** The last two (quantify uncertainty; check model validity) are especially noteworthy.

Explain variation

Statisticians explain variation by partioning it, separating variation into that which can be explained vs. that which cannot be explained.

Separate signal from noise

Related to 'explaining variation'. Since "noise" is characterized as random, finding a signal in noise is something that statisticians are uniquely qualified to do.

"The fundamental difference between engineering with and without statistics boils down to the difference between the use of a scientific method based upon the concept of laws of nature that do not allow for chance or uncertainty and a scientific method based upon the concept of laws of probability as an attribute of nature."

~ W. A. Shewart

Quantify uncertainty

Given a scatterplot of Y vs. X, any grade school kid can draw a subjective best fit line through the data. Executives get paid a lot of money to do essentially the same thing using MS Excel. What a statistician does that is different is to express the uncertainty in the model.

The purpose of that field of science known as "statistics" is to provide the means for measuring the amount of subjectivity that goes into the scientists' conclusions and thus to separate "science" from "opinion."

~ W. J. Conover, "Practical Nonparametric Statistics"

Check model validity

Many people are experts at creating models, or at creating software systems to implement models.

Most models are based on a set of assumptions. (Linear regression: residuals are iid. Black-Scholes: the data are normally distributed.) "Quants" in the financial field are experts at creating software that is based on these models, and that can process enormous amounts of data at lightning speed. What is often overlooked - and what a statistician would bring to the table - is the rigor of continually checking to see if the model's assumptions are valid. This is often done through analyses of the model residuals. This is especially important when the model is built to explain past events, but is applied toward current and future data. If the model assumptions are not met (with the data used to create the model, or when the model is applied to future data), then the model validity is in question, as are any conclusions or business/financial decisions that are based on the model. What we are left with is a sophisticated system that is really good at generating wrong answers at lightning speed.

What are the limitations of the model? Under what conditions is it valid? Has this been verified? Recently?

"All models are wrong, but some are useful."

~ George Box

Engineers and scientists are good at building models. Some models are based on physical principles (behavior of electrons in a p-n junction; Arrhenius model for chemical reactions) and can be tested. But most times the data show some elements of the stochastic nature of the universe. It is here, when the data is not perfectly deterministic, where engineers might become uncomfortable and where statisticians can lend a hand.

A friend and mentor of mine (a seasoned biostatistician) once told me that, in his view, statisticians are second-rate mathematicians but are first-rate scientists.

> second-rate mathematicians

Someone with a Ph.D. in mathematics probably has a much broader set of knowledge and tools than someone with a Ph.D. in statistics.

> first-rate scientists

Statisticians are experts at using the scientific method. Stating null and alternate (research) hypotheses, creating studies (experiments) to test those hypotheses, and testing the hypotheses by analyzing the data from the study are all activites that are "bread and butter" for statisticians. Also of note are the skills of testing model validity, and of realizing when and how statistical methods can be abused (and taking measures to avoid this). Many non-statistician scientists, in my opinion (based on observations), get sloppy with the scientific method and allow their pre-concieved biases to influence the results of their experiments (one example: discarding data that doesn't fit the model) and the conclusions.