Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Catovic, Armin; Cartwright, Carolyn; Gebreyesus, Yasmin Tesfaldet; Ferlin, Simone

Computer Science > Machine Learning

arXiv:2103.06927 (cs)

[Submitted on 11 Mar 2021]

Title:Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Authors:Armin Catovic, Carolyn Cartwright, Yasmin Tesfaldet Gebreyesus, Simone Ferlin

View PDF

Abstract:Logs are a common way to record detailed run-time information in software. As modern software systems evolve in scale and complexity, logs have become indispensable to understanding the internal states of the system. At the same time however, manually inspecting logs has become impractical. In recent times, there has been more emphasis on statistical and machine learning (ML) based methods for analyzing logs. While the results have shown promise, most of the literature focuses on algorithms and state-of-the-art (SOTA), while largely ignoring the practical aspects. In this paper we demonstrate our end-to-end log classification pipeline, Linnaeus. Besides showing the more traditional ML flow, we also demonstrate our solutions for adaptability and re-use, integration towards large scale software development processes, and how we cope with lack of labelled data. We hope Linnaeus can serve as a blueprint for, and inspire the integration of, various ML based solutions in other large scale industrial settings.

Comments:	8 pages, 7 figures; to be included in ICSE/WAIN'21
Subjects:	Machine Learning (cs.LG)
ACM classes:	I.2.0; D.2.5; D.2.12; D.2.13
Cite as:	arXiv:2103.06927 [cs.LG]
	(or arXiv:2103.06927v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.06927

Submission history

From: Armin Catovic [view email]
[v1] Thu, 11 Mar 2021 19:58:53 UTC (541 KB)

Computer Science > Machine Learning

Title:Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Linnaeus: A highly reusable and adaptable ML based log classification pipeline

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators