An Efficient Content-based Time Series Retrieval System

Abstract

A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem.

Source Code

You can download the source code here, which contains the hyper-parameter settings and other details. Please note that the code for transforming the UCR Archive into the content-based time series retrieval benchmark dataset is also included in the zip file. The included readme provides instructions on how to use the code. If you have any further questions about the implementation, please feel free to contact the authors.

The Residual Network 2D Method

The PDF containing detailed information about the Residual Network 2D method can be downloaded from here.

The Content-based Time Series Retrieval Benchmark Dataset

To convert the UCR Archive to a CTSR benchmark dataset, we followed these steps:

Results of Significance Tests Between Each Pair of Methods

We conducted two-sample t-tests with a significance level of 0.05 to compare the performance of different methods. Specifically, this table presents the test results for NDCG@10. The tables for PREC@10 and AP@10 are identical to the one shown here. When examining Table 1 in the paper along with the table here, we observe that the RN2D method significantly outperforms all other methods. Comparing RN2D with the proposed RN2Dw/T method, we can see that both methods exhibit similar performance in terms of PREC, AP, and NDCG. However, the proposed RN2Dw/T method offers a much faster query time.

References

[1] Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. 2019. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305.

[2] The SciPy community. 2022. scipy.signal.resample — SciPy v1.9.1 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample.html

[3] Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272.

[4] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Search- ing and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 262–270.