Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

Zhu, Houkun; Scheinert, Dominik; Thamsen, Lauritz; Gontarska, Kordian; Kao, Odej

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2207.09298 (cs)

[Submitted on 19 Jul 2022 (v1), last revised 22 Jul 2022 (this version, v2)]

Title:Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

Authors:Houkun Zhu, Dominik Scheinert, Lauritz Thamsen, Kordian Gontarska, Odej Kao

View PDF

Abstract:Distributed file systems are widely used nowadays, yet using their default configurations is often not optimal. At the same time, tuning configuration parameters is typically challenging and time-consuming. It demands expertise and tuning operations can also be expensive. This is especially the case for static parameters, where changes take effect only after a restart of the system or workloads. We propose a novel approach, Magpie, which utilizes deep reinforcement learning to tune static parameters by strategically exploring and exploiting configuration parameter spaces. To boost the tuning of the static parameters, our method employs both server and client metrics of distributed file systems to understand the relationship between static parameters and performance. Our empirical evaluation results show that Magpie can noticeably improve the performance of the distributed file system Lustre, where our approach on average achieves 91.8% throughput gains against default configuration after tuning towards single performance indicator optimization, while it reaches 39.7% more throughput gains against the baseline.

Comments:	Accepted at The IEEE International Conference on Cloud Engineering (IC2E) conference 2022
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2207.09298 [cs.DC]
	(or arXiv:2207.09298v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2207.09298

Submission history

From: Houkun Zhu [view email]
[v1] Tue, 19 Jul 2022 14:32:07 UTC (633 KB)
[v2] Fri, 22 Jul 2022 13:53:19 UTC (633 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators