Eric Medvet, Alberto Bartoli,
Proc.
Fourth International
Conference on Detection of Intrusions & Malware, and
Vulnerability Assessment, pp. 60-78, Lecture Notes in Computer Science 4579 Springer 2007
(http://www.dimva2007.org/).
Anomaly detection is a commonly used approach for constructing
intrusion detection systems. A key requirement is that the data used
for building the resource profile are indeed attack-free, but this
issue is often skipped or taken for granted. In this work we consider
the problem of corruption in the learning data, with respect to a
specific detection system, i.e., a web site integrity checker. We used
corrupted learning sets and observed their impact on performance (in
terms of false positives and false negatives). This analysis enabled us
to gain important insights into this rather unexplored issue. Based on
this analysis we also present a procedure for detecting whether a
learning set is corrupted. We evaluated the performance of our proposal
and obtained very good results up to a corruption rate close to 50\%.
Our experiments are based on collections of real data and consider
three different flavors of anomaly detection. |