A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

Landauer, Max; Skopik, Florian; Wurzenberger, Markus

doi:10.1145/3660768

Computer Science > Machine Learning

arXiv:2309.02854 (cs)

[Submitted on 6 Sep 2023]

Title:A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

Authors:Max Landauer, Florian Skopik, Markus Wurzenberger

View PDF

Abstract:Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2309.02854 [cs.LG]
	(or arXiv:2309.02854v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.02854
Journal reference:	Proceedings of the ACM on Software Engineering (FSE 2024)
Related DOI:	https://doi.org/10.1145/3660768

Submission history

From: Max Landauer [view email]
[v1] Wed, 6 Sep 2023 09:31:17 UTC (1,342 KB)

Computer Science > Machine Learning

Title:A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators