An Efficient Content-based Time Series Retrieval System
Abstract
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem.
Source Code
You can download the source code here, which contains the hyper-parameter settings and other details. Please note that the code for transforming the UCR Archive into the content-based time series retrieval benchmark dataset is also included in the zip file. The included readme provides instructions on how to use the code. If you have any further questions about the implementation, please feel free to contact the authors.
The Residual Network 2D Method
The PDF containing detailed information about the Residual Network 2D method can be downloaded from here.
The Content-based Time Series Retrieval Benchmark Dataset
To convert the UCR Archive to a CTSR benchmark dataset, we followed these steps:
We extracted all time series data from each dataset and merged any identical time series that appear across multiple datasets. This step is crucial as identical time series may exist in multiple datasets. For instance, the same time series exist in the DodgerLoopDay, DodgerLoopGame, and DodgerLoopWeekend datasets as these datasets were created from the same set of time series [1].
To ensure consistency in length, we normalized the length of all time series to 512. For those longer than 512 time steps, we shortened them using the resample function from the SciPy library [2, 3]. Conversely, for shorter time series, we zero-padded them to 512 time steps.
We applied z-normalization to all time series. Padded zeros were ignored during normalization. The z-normalization step is a standard procedure for preparing time series data [1, 4].
We generated the ground truth labels for each pair of time series by determining whether the pair was relevant or not relevant. Two time series were considered relevant if they belonged to the same dataset and shared the same class label in the original UCR Archive [1]. Otherwise, they were considered not relevant.
We split the data into three mutually exclusive sets: training, test, and validation. Specifically, we randomly selected 10% of the time series as test queries, another 10% as validation queries, and used the remaining as training data. In order to guarantee that each test/validation query had a sufficient number of relevant time series in the training set, we transferred any query time series with fewer than two relevant time series to the training set. Following this data splitting procedure, we obtained 136,377 training time series, 17,005 test queries, and 17,005 validation queries.
To facilitate efficient evaluation, we sampled 1,000 time series from the training set for each test/validation query. With the use of this sampling step, only 1,000 relevant scores needed to be computed per query, as opposed to 136,377 scores. The sampling procedure was conducted in the following manner: If the number of relevant time series for a given query was less than 100, we selected all relevant time series. However, if the number of relevant time series exceeded 100, we randomly chose 100 relevant time series to be included in the sampled set. We ensured that the number of relevant time series in each sampled set was less than or equal to 100 (i.e., 10% of 1,000). For the remaining time series, we sampled them randomly from the irrelevant time series in the training set.
Results of Significance Tests Between Each Pair of Methods
We conducted two-sample t-tests with a significance level of 0.05 to compare the performance of different methods. Specifically, this table presents the test results for NDCG@10. The tables for PREC@10 and AP@10 are identical to the one shown here. When examining Table 1 in the paper along with the table here, we observe that the RN2D method significantly outperforms all other methods. Comparing RN2D with the proposed RN2Dw/T method, we can see that both methods exhibit similar performance in terms of PREC, AP, and NDCG. However, the proposed RN2Dw/T method offers a much faster query time.
References
[1] Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. 2019. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305.
[2] The SciPy community. 2022. scipy.signal.resample — SciPy v1.9.1 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample.html
[3] Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272.
[4] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Search- ing and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 262–270.