Practical Performance Guarantees for Pipelined DNN Inference

Archer, Aaron; Fahrbach, Matthew; Liu, Kuikui; Prabhu, Prakash

Computer Science > Machine Learning

arXiv:2311.03703 (cs)

[Submitted on 7 Nov 2023 (v1), last revised 4 Jun 2024 (this version, v3)]

Title:Practical Performance Guarantees for Pipelined DNN Inference

Authors:Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu

View PDF HTML (experimental)

Abstract:We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \{2, 4, 8, 16, 32, 64\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across a production testbed with $k = 16$ pipeline stages, our MIP formulations raise the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds close the optimality gap by a factor of 9.855x.

Comments:	17 pages, 5 figures
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2311.03703 [cs.LG]
	(or arXiv:2311.03703v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.03703

Submission history

From: Matthew Fahrbach [view email]
[v1] Tue, 7 Nov 2023 03:55:39 UTC (1,998 KB)
[v2] Fri, 3 May 2024 14:05:17 UTC (4,722 KB)
[v3] Tue, 4 Jun 2024 13:58:30 UTC (1,950 KB)

Computer Science > Machine Learning

Title:Practical Performance Guarantees for Pipelined DNN Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Practical Performance Guarantees for Pipelined DNN Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators