This is an open collaborative project maintained by Taha Kass-Hout, MD, MS and Zhiheng (Roy) Xu, PhD
CPA aims at detecting any change in the mean of a process in historical data
Example questions to be answered by performing CPA
- Did a change occur?
- Did more than one change occur?
- When did the changes occur?
- How confident are we that they are real changes?
CPA assumes that the process (time series) must be DISTRIBUTED IDENTICALLY, and the observations must be INDEPENDENT (at least there is no strong autocorrelation)
- No specific distribution is assumed
- It can handle all types of time ordered data including, data from non-normal distributions, ill-behaved data such as particle counts and complaint data and data with outliers
- If CPA is applied on the ranks, it will provide results that are robust to outliers
- CPA can detect subtle changes which may not be detected by control charts. Thus. CPA and control charts can be used in a complementary fashion [please see below]
- CPA characterizes better the changes detected by providing associated confidence levels and confidence intervals (CI’s) for the times of the changes
- It is not a monitoring tool but a tool to analyze historical time-ordered data. It is not efficient at detecting isolated abnormal points like C2/W2 or control charts such as CUSUM
- If there is too much autocorrelation in the data, some changes could be confused with autoregressive effects
- The bootstrapping approach used in CPA will not produce identical results each time when it is performed
How to Calculate CPA
- Determine the Series Mean
- Accumulate Running Sum of differences between Mean and individual values
- Plot CUSUM series
- The point farthest from 0 denotes a Change-Point
- Break into two sections at CP:
- Analyze each subseries for additional significant CPs
- Bootstrapping provides us with a measure of the CP’s significance
Aberration detection algorithms
detect isolated or grouped abnormalities to detect major changes quickly. The methods find abnormalities by updating the data collection time-by-time, and control the change-wise errors to detect abnormalities.
CPA, on the other hand, uses a recursive algorithm to detect multiple change points (orange vertical lines) by splitting a given time series into two sub-series repeatedly and by applying the CPA algorithm on each sub-series to find a change point based on cumulative sums of the sub-series. A change point indicates the series means shifts from its previous mean to another. The green piece-wise constant lines represent mean shifts.
Aberration detection algorithms are generally better at detecting isolated or grouped abnormalities, while CPA algorithm is better at detecting subtle changes which may not be detected by aberration methods. Two methods can be used in a complementary fashion to get better understanding.