The Canonical Interval Forest (CIF) Classifier for Time Series Classification

Supporting information and data for the paper "The Canonical Interval Forest (CIF) Classifier for Time Series Classification"

We introduce a new classifier for time series classification (TSC), the Canonical Interval Forest (CIF). Leveraging a newly documented set of 22 informative time series summary features, catch22, and the efficient but performance lacking time series forest (TSF), we set a new state of the art for interval based classification. Through replacement of TSF in the HIVE-COTE ensemble (HC-CIF), we advance the state of the art in general TSC accuracy, significantly improving on the previous most accurate classifiers, HIVE-COTE, TS-CHIEF and InceptionTime on 112 UCR datasets. We show CIF has an aptitude for multivariate classification, showing a significant improvement on accuracy over other approaches on 26 equal length datasets from the UEA multivariate archive.

In the interest of reproducibility we release our code, results files and datasets used in the paper on this page.

Results Files:

Univariate:

Formatted accuracy results, averaged over 30 fold and for fold 0 (default train/test split)

WebsiteResults.xlsx

Critical difference diagrams for above results table

Accuracy

Balanced Accuracy

SummaryCDAcc.pdf

SummaryCDBalAcc.pdf

AUROC

F1 Score

SummaryCDAUROC.pdf

SummaryCDF1.pdf

Accuracy results for each classifier/dataset combination, by fold and average over all folds

Detailed results files for each classifier/dataset combination for 30 folds, can be processed using the MultipleClassifierEvaluation tsml class

ResultsByClassifier.zip

DetailedResults.zip

Multivariate:

WebsiteResultsMV.xlsx

Critical difference diagrams for above results table

Accuracy

Balanced Accuracy

SummaryCDAccMV.pdf

SummaryCDBalAccMV.pdf

AUROC

F1 Score

SummaryCDAUROCMV.pdf

SummaryCDF1MV.pdf

Accuracy results for each classifier/dataset combination

Detailed results files for each classifier/dataset combination, can be processed using the MultipleClassifierEvaluation tsml class

ResultsByClassifierMV.zip

DetailedResultsMV.zip

Java code:

We implement the version of catch22 used and CIF in Java using the WEKA/tsml framework. For other classifiers shown (except InceptionTime), we use the versions implemented in the Java tsml package.

uea-machine-learning/tsmlJava time series machine learning tools in a Weka compatible toolkit - uea-machine-learning/tsml

Code used to produce our experiments

Our code in JAR form

CIFICDM2020.zip

CIFICDM2020.jar

Usage examples:

In the following we provide some usage example for running and configuring CIF, and running experiments using our code and classifiers in the tsml package.

Our code uses the Weka classifier interface, as such data loading and methods for building/classification are uniform to other Weka/tsml classifiers.

//Data loading

Instances train = DatasetLoading.loadDataNullable("path/datasetName_TRAIN");

Instances test = DatasetLoading.loadDataNullable("path/datasetName_TEST");

//Classifier training

CIF cif = new CIF();

cif.buildClassifier(train);

//Predictions, single case at a time

double classPrediction = cif.classifyInstance(test.get(0));

double[] classProbabilities = cif.distributionForInstance(test.get(0));

//Visualisation

double[][] temporalImportanceCurves = cif.temporalImportanceCurve()

CIF has a variety of parameters that can be configured. An example of some and how to change them are provided below. By default the parameters used for CIF in the paper will be set.

CIF cif = new CIF();

cif.setNumTrees(500);                        //Number of trees to build for the forest

cif.setTrainTimeLimit(TimeUnit.HOURS, 1);    //Train time contract, overrides number of trees

cif.setNumIntervalsFinder();                 //Number of intervals per tree, takes a function of the series length

cif.setAttSubsampleSize(8);                  //Number of attributes to subsample per tree

cif.setEstimator(CIF.EstimatorMethod.OOB);   //Method for obtaining train set performance

We provide two methods of running an experiment for a single classifier, dataset and fold combination.

In the provided CIFICDM2020.java file, the main method can be run to achieve this. The below parameters at the top of the method must be configured.

String datasetPath = "./datasets/";          //Path where dataset files are stored

String resultsPath = "./results/";           //Path to write results file to

String datasetName = "ItalyPowerDemand";     //Name of the dataset used for this experiment

String classifierName = "CIF";               //Name of the classifier to be ran

int fold = 0;                                //Experiment fold, used for dataset resampling and random seed

boolean generateTrainFold = true;            //Generate a results file for the train data, used in HIVE-COTE

Alternatively the JAR file can be ran from a command line, requiring the same parameters as above in argument form.

java -jar CIFICDM2020.jar -dp={datasetPath} -rp={resultsPath} -dn={datasetName} -cn={classifierName} -f={fold} -gtf={generateTrainFold}

sktime-dl:

We ran InceptionTime using the deep learning extension package for sktime (https://github.com/alan-turing-institute/sktime), sktime-dl. At the time of use this was only available in the development branch.

alan-turing-institute/sktimeA scikit-learn compatible Python toolbox for machine learning with time series - alan-turing-institute/sktime

sktime/sktime-dlDeep learning extension package for sktime based on Keras - sktime/sktime-dl

Datasets:

Below we include the datasets used in our study in ARFF format. These can be downloaded in other formats from http://www.timeseriesclassification.com/.

112 UCR archive datasets of equal length and with no missing values

3 Asphalt datasets used in our case study

UCR112.zip

Asphalt3.zip

26 UEA archive multivariate datasets of equal length and with no missing values

UEA26MV.zip

Classifier Parameters:

Classifier parameters used in experimentation, with m being the series length and d being the number of dimensions.

Univariate

Multivariate

Page updated

Google Sites

Report abuse