Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

Zhang, Beibei; Zhu, Hongwei; Gao, Feng; Yang, Zhihui; Wang, Sean Xiaoyang

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2312.04025 (cs)

[Submitted on 7 Dec 2023 (v1), last revised 26 Dec 2023 (this version, v3)]

Title:Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

Authors:Beibei Zhang, Hongwei Zhu, Feng Gao, Zhihui Yang, Sean Xiaoyang Wang

View PDF HTML (experimental)

Abstract:The escalating size of Deep Neural Networks (DNNs) has spurred a growing research interest in hosting and serving DNN models across multiple devices. A number of studies have been reported to partition a DNN model across devices, providing device placement solutions. The methods appeared in the literature, however, either suffer from poor placement performance due to the exponential search space or miss an optimal placement as a consequence of the reduced search space with limited heuristics. Moreover, these methods have ignored the runtime inter-operator optimization of a computation graph when coarsening the graph, which degrades the end-to-end inference performance. This paper presents Moirai that better exploits runtime inter-operator fusion in a model to render a coarsened computation graph, reducing the search space while maintaining the inter-operator optimization provided by inference backends. Moirai also generalizes the device placement algorithm from multiple perspectives by considering inference constraints and device this http URL experimental evaluation with 11 large DNNs demonstrates that Moirai outperforms the state-of-the-art counterparts, i.e., Placeto, m-SCT, and GETF, up to 4.28$\times$ in reduction of the end-to-end inference latency. Moirai code is anonymously released at \url{this https URL}.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.04025 [cs.DC]
	(or arXiv:2312.04025v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2312.04025

Submission history

From: Beibei Zhang [view email]
[v1] Thu, 7 Dec 2023 03:46:14 UTC (9,616 KB)
[v2] Mon, 11 Dec 2023 09:31:43 UTC (1 KB) (withdrawn)
[v3] Tue, 26 Dec 2023 06:21:20 UTC (9,616 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators