The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification

Supporting information and data for the paper "The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification"

We introduce a new classifier for time series classification (TSC), the Temporal Dictionary Ensemble (TDE). Pooling features from a range of dictionary based classifier advancements with a novel Gaussian processes (GP) based ensemble selection method, we set a new state of the art for dictionary based classification. Through replacement of the bag of Symbolic-Fourier-Approximation symbols (BOSS) classifer in the HIVE-COTE ensemble (HC-TDE), we advance the state of the art in general TSC accuracy, significantly improving on the previous most accurate classifiers, HIVE-COTE, TS-CHIEF, InceptionTime and ROCKET on 112 UCR datasets.

In the interest of reproducibility we release our code, results files and datasets used in the paper on this page.

Results Files:

WebsiteResults.xlsx

Critical difference diagrams for above results table

Accuracy

Balanced Accuracy

SummaryCDAcc.pdf

SummaryCDBalAcc.pdf

AUROC

F1 Score

SummaryCDAUROC.pdf

SummaryCDF1.pdf

Accuracy results for each classifier/dataset combination, by fold and average over all folds

Detailed results files for each classifier/dataset combination for 30 folds, can be processed using the MultipleClassifierEvaluation tsml class

ResultsByClassifier.zip

ResultsFiles.zip

Java code:

We implement TDE in Java using the WEKA/tsml framework. For other classifiers shown (except InceptionTime and ROCKET), we use the versions implemented in the Java tsml package.

uea-machine-learning/tsmlJava time series machine learning tools in a Weka compatible toolkit - uea-machine-learning/tsml

Code used to produce our experiments

Our code in JAR form

TDE_ECMLPKDD2020.zip

TDE_ECMLPKDD2020.jar

Usage examples:

In the following we provide some usage example for running and configuring TDE, and running experiments using our code and classifiers in the tsml package.

Our code uses the Weka classifier interface, as such data loading and methods for building/classification are uniform to other Weka/tsml classifiers.

//Data loading

Instances train = DatasetLoading.loadDataNullable("path/datasetName_TRAIN");

Instances test = DatasetLoading.loadDataNullable("path/datasetName_TEST");

//Classifier training

TDE tde = new TDE();

tde.buildClassifier(train);

//Predictions, single case at a time

double classPrediction = tde.classifyInstance(test.get(0));

double[] classProbabilities = tde.distributionForInstance(test.get(0));

TDE has a variety of parameters that can be configured. An example of some and how to change them are provided below. By default the parameters used for TDE in the paper will be set.

TDE tde = new TDE();

tde.setParametersConsidered(250); //Number of parameter sets to be considered for the ensemble

tde.setTrainTimeLimit(TimeUnit.HOURS, 1); //Train time contract, overrides parameters considered

tde.setMaxEnsembleSize(100); //Maximum number of classifiers in the ensemble

tde.setTrainProportion(0.7); //Proportion of the train set randomly sub-sampled per classifier

We provide two methods of running an experiment for a single classifier, dataset and fold combination.

In the provided TDE_ECMLPKDD2020.java file, the main method can be run to achieve this. The below parameters at the top of the method must be configured.

String datasetPath = "./datasets/"; //Path where dataset files are stored

String resultsPath = "./results/"; //Path to write results file to

String datasetName = "ItalyPowerDemand"; //Name of the dataset used for this experiment

String classifierName = "TDE"; //Name of the classifier to be ran

int fold = 0; //Experiment fold, used for dataset resampling and random seed

boolean generateTrainFold = true; //Generate a results file for the train data, used in HIVE-COTE

Alternatively the JAR file can be ran from a command line, requiring the same parameters as above in argument form.

java -jar TDE_ECMLPKDD2020.jar -dp={datasetPath} -rp={resultsPath} -dn={datasetName} -cn={classifierName} -f={fold} -gtf={generateTrainFold}

sktime:

The ROCKET classiifer was run using the sktime python time series package.

We ran InceptionTime using the sktime deep learning extension package, sktime-dl. At the time of use this was only available in the development branch.

alan-turing-institute/sktimeA scikit-learn compatible Python toolbox for machine learning with time series - alan-turing-institute/sktime

sktime/sktime-dlDeep learning extension package for sktime based on Keras - sktime/sktime-dl

Datasets:

Below we include the datasets used in our study in ARFF format. These can be downloaded in other formats from http://www.timeseriesclassification.com/. Data folds are deterministic and can be generated using the tsml DatasetLoading.sampleDataset(directory,datasetName,foldID) method.

112 UCR archive datasets of equal length and with no missing values used in our experiments

UCR112.zip

Classifier Parameters:

Classifier parameters used in experimentation. Non-WEASEL window length ranges contain m/4 values, with m being the series length.

Page updated

Google Sites

Report abuse