Info-CELS: Informative Saliency Map Guided Counterfactual Explanation for Time Series Classification

Project Summary

As the demand for interpretable machine learning approaches continues to grow, there is an increasing necessity for human involvement in providing informative explanations for model decisions. This is necessary for building trust and transparency in AI-based systems, leading to the emergence of the Explainable Artificial Intelligence (XAI) field. Recently, a novel counterfactual explanation model, CELS, has been introduced. CELS learns a saliency map for the interest of an instance and generates a counterfactual explanation guided by the learned saliency map. While CELS represents the first attempt to exploit learned saliency maps not only to provide intuitive explanations for the reason behind the decision made by the time series classifier but also to explore post hoc counterfactual explanations, it exhibits limitations in terms of high validity for the sake of ensuring high proximity and sparsity. In this paper, we present an enhanced approach that builds upon CELS. While the original model achieved promising results in terms of sparsity and proximity, it faced limitations in validity. Our proposed method addresses this limitation by removing mask normalization to provide more informative and valid counterfactual explanations. Through extensive experimentation on datasets from various domains, we demonstrate that our approach outperforms the CELS model, achieving higher validity and producing more informative explanations.

Dataset description

Coffee dataset

Food spectrographs are used in chemometrics to classify food types, a task that has obvious applications in food safety and quality assurance. The coffee data set is a two class problem to distinguish between Robusta and Aribica coffee beans. Further information can be found in the original paper Briandet et al. Discrimination of Arabica and Robusta in Instant Coffee by Fourier Transform Infrared Spectroscopy and Chemometrics J. Agricultural and Food Chemistry, 44 (1), 1996. The data was first used in the time series classification literature in Bagnall et al. Transformation Based Ensembles for Time Series Classification, SDM 2012.

GunPoint dataset

Food spectrographs are used in chemometrics to classify food types, a task that has obvious applications in food safety and quality assurance. The coffee data set is a two class problem to distinguish between Robusta and Aribica coffee beans. Further information can be found in the original paper Briandet et al. Discrimination of Arabica and Robusta in Instant Coffee by Fourier Transform Infrared Spectroscopy and Chemometrics J. Agricultural and Food Chemistry, 44 (1), 1996. The data was first used in the time series classification literature in Bagnall et al. Transformation Based Ensembles for Time Series Classification, SDM 2012.

ECG200 dataset

This dataset was formatted by R. Olszewski as part of his thesis "Generalized feature extraction for structural pattern recognition in time-series data" at Carnegie Mellon University, 2001. Each series traces the electrical activity recorded during one heartbeat. The two classes are a normal heartbeat and a Myocardial Infarction.

TwoLeadECG dataset

TwoLeadECG is an ECG dataset taken from physionet by Eamonn Keogh. Specifically, the data is from MIT-BIH Long-Term ECG Database (ltdb) Record ltdb/15814, begin at time 420, ending at 1019. The task is to distinguish between signal 0 and signal 1.

CBF dataset

Cylinder-Bell-Funnel is a simulated data set defined by Naoki Saito in his thesis "Local Feature Extraction and Its Applications Using a Library of Bases ". Data from each class is standard normal noise plus an offset term which differs for each class.

BirdChicken dataset

MPEG-7 CE Shape-1 Part B is a database of binary images developed for testing MPEG-7 shape descriptors, and is available free online. It is used for testing contour/image and skeleton-based descriptors. Classes of images vary broadly, and include classes that are similar in shape to one another. There are 20 instances of each class, and 60 classes in total. The outlines of these images have been extracted and mapped into 1-D series of distances to the centre. This BirdChicken dataset includes two classes (20 instances of Bird class and 20 instances of Chicken class), which is used for the problem of distinguishing between an outline of a bird and a chicken

Plane dataset

The fighter aeroplane shape database included Mirage, Eurofighter, F-14, Harrier, F-22 and F-15. Since F-14 has two possible shapes, one when its wings are closed and another when its wings are opened, total number of shape classes are seven. Each class includes 30 shape samples. Shape database was created by taking digital pictures of diecast replica models of these aeroplanes from top. Pictures were captured at 640 by 480 resolution, and were segmented using Spedge and Medge [6] color image segmentation algorithm. Contours of the segmented planes were used for training and testing of the classifier. Further details are available in the paper Thakoor, Ninad, and Jean Gao. "Shape classifier based on generalized probabilistic descent method with hidden Markov descriptor." Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. Vol. 1. IEEE, 2005. (Link Here)

Original and counterfactual instances comparison (left column)

saliency map visualization (right column)

Coffee

GunPoint

ECG200

TwoLeadECG

CBF

BirdChicken

Plane

Visual Analysis of Dataset Patterns and Class Distributions

Coffee

GunPoint

ECG200

TwoLeadECG

BirdChicken

CBF

Plane

The above figures display the training samples for each dataset, grouped by their respective classes. These plots provide an overview of the characteristic patterns present in each dataset and highlight the differences between classes. Notably, some datasets—such as Coffee—exhibit relatively uniform waveforms, leading to closer clustering among samples. As a result, counterfactual explanations generated from these datasets tend to involve fewer or more localized perturbations. In contrast, datasets like BirdChicken display greater variability in their waveforms, which naturally leads to broader perturbations when constructing counterfactual instances.

The OOD results for NG, ALIBI, SG, and TimeX