Setting an asymptotically optimal threshold for detecting anomalies in a multivariate gaussian sample with application to time series.
Marie Turčičová
Department of Statistical Modelling
Institute of Computer Science of the Czech Academy Sciences
Location: Institute of Computer Science, Pod Vodárenskou věží 2, 180 00 Prague, Czechia
Room 318
Date: Tuesday 18 March 2025
Time: 13:00 CET
ZOOM: Seminar ISCB Czechia
https://cesnet.zoom.us/j/98333618032?pwd=Isuin5lcoNgPm4PfN4iipbwdXWwhEx.1
Meeting ID: 983 3361 8032
Passcode: 203831
Abstract:
Anomalies, often referred to as outliers, are data points that deviate significantly from the rest of the dataset. These points may represent errors or unusual observations, and their detection can reveal important events, such as production faults, system defects, or health issues, what makes their identification highly valuable. A wide variety of anomaly detection techniques exist, as no single method is universally effective. The basic approach to detecting anomalies relies on a manually set threshold, but selecting such a threshold is a non-trivial statistical task. In this talk, we propose a threshold for detecting anomalies in data with a multivariate normal distribution, specifically when anomalous observations are rare and differ from the rest of the data by their mean value. Under certain conditions, the proposed threshold is shown to be asymptotically optimal in the sense that the expected number of misidentified outliers tends to zero as the sample size increases. The performance of the proposed threshold is compared with other popular thresholding methods through simulations in both univariate and multivariate cases. Additionally, the method is applied to real data collected within the DigiWell project.
Keywords: anomaly detection, threshold selection, multivariate normal distribution
References:
Butucea, C., Ndaoud, M., Stepanova, N., Tsybakov, A.B.: Variable Selection with Hamming Loss, The Annals of Statistics, Vol. 46, No. 5, p. 1837-1975, 2018. https://www.doi.org/10.1214/17-aos1572.
Cui, X.: Optimal Component Selection in High Dimension, Master Thesis at Carleton University, 2014. https://repository.library.carleton.ca/concern/etds/xg94hq191?locale=en