MotifAug

Project Summary

Data augmentation has proved to be extremely beneficial to the advancement of current state-of-the-art machine learning classification models. Not only does it allow balancing datasets collected in domains with rare positive events, but it also helps increase the size of small datasets to feed data-hungry deep learning models. More often than not, this results in a significant performance increase. However, data augmentation in the time series domain is still lagging compared to other domains such as image and text data. In this paper, we propose a MotifAug, a parameter-free, pattern mixing-based time series data augmentation method that improves previous approaches in the literature. MotifAug leverages the warping path constructed by MotifDTW, a novel alignment method that uses the Matrix Profile (MP) motif discovery mechanism and Dynamic Time Warping (DTW) to align two time series data instances. To our knowledge, this is the first effort to perform time series data augmentation using MP and motif discovery. We perform a thorough experimental evaluation on the University of California Riverside (UCR) archive of time series datasets and compare the performance of MotifAug to state-of-the-art pattern mixing patterns. The results show that our method produces realistic time series instances that lead to higher classification performance gain.

Performance Gain Detailed Tables

Average f1-scores using 10 ROCKET models

Average f1-scores using 3 ResNet models

Source Code

Download Source Code Here

Instructions:

Install requirements in requirements.txt.
Download desired UCR datasets from here.
Run python motifaug_ucr.py $data_path $dataset_name $results_path where:
- $data_path is set to the directory where the UCR datasets have been downloaded.
- $dataset_name is the name of the desired UCR dataset.
- $results_path is set to the directory where the resulting counterfactual data is to be saved.