Unsupervised change point interface.
Supervised penalty learning and data labeling interface.
This R package provides an interactive web interface for working with the changepoint and penalty learning packages. This packages' primary focus is to allow researchers to visually explore and quickly find the best parameters for both unsupervised and supervised changepoint detection algorithms. The interactive visual interface has the advantage of being easy to use, speeding up the analysis for domain experts and analysts, while simultaneously lowering the barrier of entry for those who wish to work with specific functions from the changepoint/ penalty learning packages.
There are two key interfaces, cpVisualise and cpLabel:
CpVisualise, implements the "cpt.mean" function from the changepoint package which identifies the changes in mean for a given univariate dataset. Specifically, the PELT [reference pelt method] method is used to find the exact position of changepoints in the data given a particular penalty value. For further information regarding the changepoint package, please refer to the documentation which can be found here.
CpLabel, implements a simple changepoint labeling tool as well as the penalty learning algorithm to learn a penalty function that can be used to accurately predict changepoints in labeled data. Please refer to the Penalty learning package documentation for further information.
This package was developed as part of Google Summer of Code.
A changepoint is typically defined as a point in time where the distribution of a data-stream changes in a distinct manner, for example, typically one may look for changepoints in mean, and/or variance. Usually, this is performed in an unsupervised setting where we have no labelled examples of true changepoints. However, in practice, we usually have examples of periods of time where we know no changes should be present, or conversely where changes are expected to exist. When and where such information is available, we can potentially use this to aid our judgement of how to set complexity penalties in the changepoint estimation task, and thus, decide on an appropriate number of changepoints, a task which currently requires time-consuming parameter tuning by domain experts. Currently, this process is severely hampered by a lack of streamlined tools required for the task, namely, visualisation of changepoint solutions (across tuning parameters), interactive labelling of data-streams, and finally taking this feedback into account when learning penalty functions.
Taken together these components fall into recent efforts to produce explainable AI systems within the growing community of research that involves 'the human in the loop' to monitor and control complicated algorithms. Such approaches aims to complement the system's capabilities with the contextual domain knowledge, creativity, and decision making capabilities of humans. To help humans understand and control an algorithmic system, interactive visualizations provide a range of potential while leveraging humans' capabilities of parallel and simultaneous perception, pattern detection, as well as exploratory analysis. In this particular project, we seek a simple visualization interface to support, i) human labeling of the data with the aid of several complementary measures on the data-stream such as as mean, trends, min, max, variance, etc. and ii) interactive exploration of the result space suggested by a changepoint detection algorithm. Eventually, any visualization will help communicating the data and respective decisions to peers and larger audiences in the form of reports, posters, slideshows, or open web-documentations.
Two key packages related to this work are discussed below:
penaltyLearning - Provides a mechanism for learning penalty level for given univariate sequence and labelled changepoint regions. While the package provides a useful method to suggest an optimal penalty level for defining changepoint segmentation it is geared largely towards the genetics community. The penalty learning method has been integrated to this package as a more general labelling framework with an enhanced focus on visualisation and interaction. This allow's for quick user comparison between unsupervised and supervised changepoint methods.
changepoint - Provides various methods for segmenting individual time-series based on mean and variance. The included methods (mainly PELT) are used to perform unsupervised segmentation, which are extend through visualisation of such solutions to enable better interpretation of changepoint output. Experience with end-users suggests it is a time-consuming process to find an appropriate penalty parameter using these methods, in large part due to lack of coherent visualisation of solution paths.
A range of recent work that focuses on interactive visualization of AI systems [1] and can be summarized under the terms 'Explainable' or 'interactive AI'. Examples include interactive playgrounds such as TensorFlow Playground (http://playground.tensorflow.org) and Momentum (https://distill.pub/2017/momentum/), tools for interactive machine learning (https://learningfromusersworkshop.github.io/) as well as more story-like descriptions of studies and analysis cases (http://formafluens.io/client/mix10.html). A great variety of further tools and research is summarized online: http://visxai.io/program.html. More specific, tools such as SmallMultiples [2] and BayesPiles [3] use simple segmentation methods (far less sophisticated than changepoint detection in R) as a proof-of-concept to demonstrate interactive visualization approaches to detect states in temporal networks. In both cases, visualization is used to provide a user with a holistic view of the data (i.e., a time sequence of networks) including more specific information required to aid a user in making decisions about temporal states. Interaction is used complementary to automatic segmentation to allow a user to explore a found segmentation solution (explore states in the network) as well as to quickly refine an automatic solution by splitting and combining states. Finally, time curves [4] are a far more generic way to visualize changes over time, e.g., for multiple timeseries. To the best of our knowledge, no tool and visualization interface exists that allows analysts to explore the solution path of changepoint detection methods in simple time-series. This package hopes to lay the foundations for interfaces and methods that enable changepoint detection across a variety of domains.
This software is licenced under the GPLv3 license.
[1] Li, Tianyi, Gregorio Convertino, Wenbo Wang, Haley Most, Tristan Zajonc, and Yi-Hsun Tsai. "HyperTuner: Visual Analytics for Hyperparameter Tuning by Professionals."
[2] Bach, Benjamin, Nathalie Henry‐Riche, Tim Dwyer, Tara Madhyastha, J‐D. Fekete, and Thomas Grabowski. "Small MultiPiles: Piling time to explore temporal patterns in dynamic networks." In Computer Graphics Forum, vol. 34, no. 3, pp. 31-40. 2015.
[3] Vogogias, Athanasios, Jessie Kennedy, Daniel Archambault, Benjamin Bach, V. Anne Smith, and Hannah Currant. "BayesPiles: Visualisation Support for Bayesian Network Structure Learning." ACM Transactions on Intelligent Systems and Technology (TIST) 10, no. 1 (2018): 5.
[4] Bach, Benjamin, Conglei Shi, Nicolas Heulot, Tara Madhyastha, Tom Grabowski, and Pierre Dragicevic. "Time curves: Folding time to visualize patterns of temporal evolution in data." IEEE transactions on visualization and computer graphics 22, no. 1 (2016): 559-568.