Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Jangda, Abhinav; Huang, Jun; Liu, Guodong; Sabet, Amir Hossein Nodehi; Maleki, Saeed; Miao, Youshan; Musuvathi, Madanlal; Mytkowicz, Todd; Sarikivi, Olli

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2105.05720 (cs)

[Submitted on 12 May 2021 (v1), last revised 26 Mar 2022 (this version, v5)]

Title:Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Authors:Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

View PDF

Abstract:Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the optimization opportunities across such barrier. Breaking this abstraction with a holistic consideration can provide many optimizations to provide performance improvements in distributed workloads. Manually applying these optimizations needs modifications in underlying computation and communication libraries for each scenario, which is time consuming and error-prone.
Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication. CoCoNeT contains several machine learning aware transformations to optimize a program and a compiler to generate high performance kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNeT enables us to optimize data-, model-and pipeline-parallel workloads in large language models with only a few lines of code. Experiments show CoCoNeT significantly outperforms state-of-the-art distributed machine learning implementations.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Programming Languages (cs.PL)
Cite as:	arXiv:2105.05720 [cs.DC]
	(or arXiv:2105.05720v5 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2105.05720

Submission history

From: Abhinav Jangda [view email]
[v1] Wed, 12 May 2021 15:13:43 UTC (1,150 KB)
[v2] Thu, 13 May 2021 01:04:11 UTC (1,150 KB)
[v3] Wed, 25 Aug 2021 15:36:04 UTC (1,865 KB)
[v4] Tue, 16 Nov 2021 01:48:27 UTC (2,151 KB)
[v5] Sat, 26 Mar 2022 16:25:41 UTC (4,170 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators