We introduce a new classifier for time series classification (TSC), the Canonical Interval Forest (CIF). Leveraging a newly documented set of 22 informative time series summary features, catch22, and the efficient but performance lacking time series forest (TSF), we set a new state of the art for interval based classification. Through replacement of TSF in the HIVE-COTE ensemble (HC-CIF), we advance the state of the art in general TSC accuracy, significantly improving on the previous most accurate classifiers, HIVE-COTE, TS-CHIEF and InceptionTime on 112 UCR datasets. We show CIF has an aptitude for multivariate classification, showing a significant improvement on accuracy over other approaches on 26 equal length datasets from the UEA multivariate archive.
In the interest of reproducibility we release our code, results files and datasets used in the paper on this page.
Univariate:
Formatted accuracy results, averaged over 30 fold and for fold 0 (default train/test split)
Critical difference diagrams for above results table
Accuracy
Balanced Accuracy
AUROC
F1 Score
Accuracy results for each classifier/dataset combination, by fold and average over all folds
Detailed results files for each classifier/dataset combination for 30 folds, can be processed using the MultipleClassifierEvaluation tsml class
Multivariate:
Critical difference diagrams for above results table
Accuracy
Balanced Accuracy
AUROC
F1 Score
Accuracy results for each classifier/dataset combination
Detailed results files for each classifier/dataset combination, can be processed using the MultipleClassifierEvaluation tsml class
We implement the version of catch22 used and CIF in Java using the WEKA/tsml framework. For other classifiers shown (except InceptionTime), we use the versions implemented in the Java tsml package.
Code used to produce our experiments
Our code in JAR form
In the following we provide some usage example for running and configuring CIF, and running experiments using our code and classifiers in the tsml package.
Our code uses the Weka classifier interface, as such data loading and methods for building/classification are uniform to other Weka/tsml classifiers.
//Data loading
Instances train = DatasetLoading.loadDataNullable("path/datasetName_TRAIN");
Instances test = DatasetLoading.loadDataNullable("path/datasetName_TEST");
//Classifier training
CIF cif = new CIF();
cif.buildClassifier(train);
//Predictions, single case at a time
double classPrediction = cif.classifyInstance(test.get(0));
double[] classProbabilities = cif.distributionForInstance(test.get(0));
//Visualisation
double[][] temporalImportanceCurves = cif.temporalImportanceCurve()
CIF has a variety of parameters that can be configured. An example of some and how to change them are provided below. By default the parameters used for CIF in the paper will be set.
CIF cif = new CIF();
cif.setNumTrees(500); //Number of trees to build for the forest
cif.setTrainTimeLimit(TimeUnit.HOURS, 1); //Train time contract, overrides number of trees
cif.setNumIntervalsFinder(); //Number of intervals per tree, takes a function of the series length
cif.setAttSubsampleSize(8); //Number of attributes to subsample per tree
cif.setEstimator(CIF.EstimatorMethod.OOB); //Method for obtaining train set performance
We provide two methods of running an experiment for a single classifier, dataset and fold combination.
In the provided CIFICDM2020.java file, the main method can be run to achieve this. The below parameters at the top of the method must be configured.
String datasetPath = "./datasets/"; //Path where dataset files are stored
String resultsPath = "./results/"; //Path to write results file to
String datasetName = "ItalyPowerDemand"; //Name of the dataset used for this experiment
String classifierName = "CIF"; //Name of the classifier to be ran
int fold = 0; //Experiment fold, used for dataset resampling and random seed
boolean generateTrainFold = true; //Generate a results file for the train data, used in HIVE-COTE
Alternatively the JAR file can be ran from a command line, requiring the same parameters as above in argument form.
java -jar CIFICDM2020.jar -dp={datasetPath} -rp={resultsPath} -dn={datasetName} -cn={classifierName} -f={fold} -gtf={generateTrainFold}
We ran InceptionTime using the deep learning extension package for sktime (https://github.com/alan-turing-institute/sktime), sktime-dl. At the time of use this was only available in the development branch.
Below we include the datasets used in our study in ARFF format. These can be downloaded in other formats from http://www.timeseriesclassification.com/.
112 UCR archive datasets of equal length and with no missing values
3 Asphalt datasets used in our case study
26 UEA archive multivariate datasets of equal length and with no missing values
Classifier parameters used in experimentation, with m being the series length and d being the number of dimensions.
Univariate
Multivariate