Graft: Efficient Inference Serving for Hybrid Deep Learning with SLO Guarantees via DNN Re-alignment

Wu, Jing; Wang, Lin; Jin, Qirui; Liu, Fangming

Abstract:Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks, yet their ever-increasing computational demands are hindering their deployment on resource-constrained mobile devices. Hybrid deep learning partitions a DNN into two parts and deploys them across the mobile device and a server, aiming to reduce inference latency or prolong battery life of mobile devices. However, such partitioning produces (non-uniform) DNN fragments which are hard to serve efficiently on the this http URL paper presents Graft -- an efficient inference serving system for hybrid deep learning with latency service-level objective (SLO) guarantees. Our main insight is to mitigate the non-uniformity by a core concept called DNN re-alignment, allowing multiple heterogeneous DNN fragments to be restructured to share layers. To fully exploit the potential of DNN re-alignment, Graft employs fine-grained GPU resource sharing. Based on that, we propose efficient algorithms for merging, grouping, and re-aligning DNN fragments to maximize request batching opportunities, minimizing resource consumption while guaranteeing the inference latency SLO. We implement a Graft prototype and perform extensive experiments with five types of widely used DNNs and real-world network traces. Our results show that Graft improves resource efficiency by up to 70% compared with the state-of-the-art inference serving systems.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2312.10636 [cs.DC]
	(or arXiv:2312.10636v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2312.10636

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Graft: Efficient Inference Serving for Hybrid Deep Learning with SLO Guarantees via DNN Re-alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators