Examples
Using candy production data to demonstrate the different functionalities and use cases for cpVisualise and cpLabel.
Using candy production data to demonstrate the different functionalities and use cases for cpVisualise and cpLabel.
The US Candy Production data used throughout this example is freely available and can be found on Kaggle.
The data contains the industrial production index per month for candy and demonstrates exaggerated seasonality. The industrial production (IP) index measures the real output of all relevant establishments located in the United States, regardless of their ownership, but not those located in U.S. territories. This dataset tracks industrial production every month from January 1972 to August 2017.
To load the data we will use the following code:
data = read.csv("candy_production.csv", header = TRUE)
Next we select the column we would like to work with. In this case we are interested in the industrial production column.
data = dat$IPG3113N
Now we are ready to start using cpVisualise and cpLabel!
The main advantage of using cpVisualise, over directly using the changepoint package, is the ability to visually look through a range of penalty values. This enables the user to segment data appropriately, so long as they have an idea of what the structure should look like. To start we will use cpVisualise to inspect the data, and explore the effect different penalty values have on the number and locations of the changepoints.
The only parameter that has to be set by the user is the range of penalty values to test. The correct range of values depends on what kind of segmentation you are looking for. In the case of this data we have set the range to be between 100 and 2000, giving us the ability to look at both macro and micro trends in the data.
library(CpVis)
cpVisualise(data, penalty_range = c(100,2000))
After running the command a new browser window should pop up with the following interface.
Using the interface, we can quickly inspect the effect that different penalty values have on the subsequent changepoint locations. In this case we can see that for large penalty values (2000) the data has been segmented into 6 distinct segments. Smaller penalty values result in more and smaller segments, closely matching the annual spike in production for the holiday seasons.
Furthermore, to the right hand side of the interface are two panels that provide useful information about the dataset as a whole, and a histogram of the weighted (by length) segment means.
Finally, you save the currently selected changepoints by clicking the "Save Changepoints" button.
CpLabel provides an easy to use interface to label any univariate dataset so that it can be used to learn a penalty function using the penalty learning package. The main advantage of using the interface, as opposed to directly using the penalty learning package is the ability to quickly iterate on the annotations, as with each change you can train the model and visually inspect the predicted changepoints.
Using cpLabel is very straightforward, having previously loaded the univariate data, all we have to do is pass the data to the function.
library(CpVis)
cpLabel(data)
After running the command a new browser window should pop up with the following interface.
CpLabel allows us to add regions we think a changepoint might exist, any white area represents an area where no changepoint should be present. There is no overlap between labels, meaning that when you select a region that contains a changepoint region using the no Label category, you will be overwriting the changepoint region label.
The number of segments, roughly equates to how many models will be tested on the data, more models will naturally take longer to train - so begin with a conservative number.
CpLabel, also supports pre-labeling functionality. This uses the changepoints found using the unsupervised method in cpVisualise to pre-label small regions surrounding the changpoints. Having saved the changepoints while using cpVisualise we can then use the cpLabel function with the "unsupervised_changepoints" set to TRUE.
cpLabel(data, unsupervised_changepoints = TRUE)
Depending on which penalty value (and resulting changepoints) you decided to save, the interface should start with labeled changepoint regions. Once its loaded you are then free to edit or alter the changepoints as before.
Pre-labeled changepoint regions