PRCIS: Pattern Recognition Comparison in Series
We are happy to announce that PRCIS has been accepted to ICKG 2022.
Audrey Der, Chin-Chia Michael Yeh, Renjie Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
emails: {ader003, rwu034}@ucr.edu, {miyeh, junpenwa, yazheng, zzhuang, liawang, wzhan}@visa.com, eamonn@cs.ucr.edu
Resources and Code
Paper (arxiv; PDF)
LINK TO REPOSITORY: Contains the codebase and subsets of data (when subsets were used).
Note: This is a supplementary website intended to be referenced in tandem with its corresponding paper, and is not meant to be used alone.
Note: "PRECIS" was the original spelling of the method, and any remaining instances of this spelling are a byproduct of this change.
Quickstart
Notation
In the paper we refer to the dictionary parameters S and L as the size of the dictionary and length of the patterns within. Due to the fact the codebase written over an extended period of time, the naming of the variables within may vary.
S may be referred to as NUMPAT ("number of patterns") for short.
L may be referred to as WINLEN ("window length"), CYCLELEN ("cycle length"), or something along the lines of "pattern length".
A Short Tutorial
The codebase uses Experiment objects as defined below for generating Yeh Dictionaries and calculating distance matrices (regardless of dictionary creation method).
class Experiment:
def __init__(self, distmet, dict_settings, algyield=True, multivariate=False, downsamplefactor=1):
self.distmet = distmet # distance metric, "DTW", "ED", "PRECIS"
self.numpatt = dict_settings[0]
self.cyclelen = dict_settings[1]
self.algyield = algyield # yield to dict method or exclude any generated patterns not of this exact length
self.multivariate = multivariate # multivar PRECIS extension; only used during the development of this work, not presented in paper
self.downsamplefactor = downsamplefactor #typically untouched; only used during development of this work, not presented in paper
Here is a simple sample snippet of what creating Yeh Dictionaries from each time series and computing a PRECIS distance matrix:
exp = Experiment("PRECIS",[4,150])
use_dicts = []
for ts in dataset:
d, idxs = make_exemplar(ts) #idxs will be a list of tuples in the form of (start,end) indices of each pattern from ts
use_dicts.append(d)
distmat = exp.distmat_from_dicts(use_dicts)
The Yeh Dictionary creation method is directly called within class methods, and is automatically used during make_exemplar. To use a different dictionary method, do not use Experiment.make_exemplar.
Clustering
Note: Figure placement indicators may not be accurate when viewed on a mobile device.
Note: Rival methods not pictured on this website are easily viewable by viewing their notebooks through the github repository.
OPSD_CLU.ipynb: left, top) OPSD Two-month snippets of the electrical power demand data from four randomly selected countries in Europe. Includes:
(Not Pictured) OPSD Random Day Strawman
WeAllWalk_CLU.ipynb: left, bottom) We-All-Walk Dendrogram.
Includes:
Catch22:
All features
(Not Pictured) FS features (as determined during classification)
(Not Pictured) Random Non-Obvious Holiday
TaipeiMRT_CLU.ipynb: left, middle) Taipei MRT Clustering
NASAMill_CLU.ipynb: right, top) NASA Mill Dataset
(Not Pictured) k-shape
(Not Pictured) Folder of Results: Cluster by Period
bottom right) Due to the sensitive nature of the data, we cannot share the dataset or code used to generate the Business Merchant figures at this time. We thank you for your understanding.
Links to Papers &/ Datasets
WeAllWalk (specific subset used is included in our code distribution)
Taipei MRT Data (scroll to the bottom of the linked webpage)
Classification
Please see the paper for the table of results.
Datasets:
WeAllWalk, the subset of data used in our experiments is included in our code distribution
Anomaly Detection
BVD_AD.ipynb: PRECIS and Anomaly Detection
Links to Datasets Used
Note: The MATLAB implementation of Telemanom was used to produce the following runs.