Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers

Semmelrock, Harald; Ross-Hellauer, Tony; Kopeinik, Simone; Theiler, Dieter; Haberl, Armin; Thalmann, Stefan; Kowald, Dominik

Computer Science > Software Engineering

arXiv:2406.14325v2 (cs)

[Submitted on 20 Jun 2024 (v1), revised 2 Jul 2024 (this version, v2), latest version 26 Feb 2025 (v3)]

Title:Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers

Authors:Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Armin Haberl, Stefan Thalmann, Dominik Kowald

View PDF HTML (experimental)

Abstract:Research in various fields is currently experiencing challenges regarding the reproducibility of results. This problem is also prevalent in machine learning (ML) research. The issue arises, for example, due to unpublished data and/or source code and the sensitivity of ML training conditions. Although different solutions have been proposed to address this issue, such as using ML platforms, the level of reproducibility in ML-driven research remains unsatisfactory. Therefore, in this article, we discuss the reproducibility of ML-driven research with three main aims: (i) identifying the barriers to reproducibility when applying ML in research as well as categorize the barriers to different types of reproducibility (description, code, data, and experiment reproducibility), (ii) discussing potential drivers such as tools, practices, and interventions that support ML reproducibility, as well as distinguish between technology-driven drivers, procedural drivers, and drivers related to awareness and education, and (iii) mapping the drivers to the barriers. With this work, we hope to provide insights and to contribute to the decision-making process regarding the adoption of different solutions to support ML reproducibility.

Comments:	Pre-print of submission for the AI Magazine - comments to this pre-print are very welcome
Subjects:	Software Engineering (cs.SE); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2406.14325 [cs.SE]
	(or arXiv:2406.14325v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2406.14325

Submission history

From: Dominik Kowald PhD [view email]
[v1] Thu, 20 Jun 2024 13:56:42 UTC (379 KB)
[v2] Tue, 2 Jul 2024 15:36:32 UTC (380 KB)
[v3] Wed, 26 Feb 2025 11:34:49 UTC (227 KB)

Computer Science > Software Engineering

Title:Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators