Towards Demystifying Serverless Machine Learning Training

Jiang, Jiawei; Gan, Shaoduo; Liu, Yue; Wang, Fanlin; Alonso, Gustavo; Klimovic, Ana; Singla, Ankit; Wu, Wentao; Zhang, Ce

doi:10.1145/3448016.3459240

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2105.07806 (cs)

[Submitted on 17 May 2021]

Title:Towards Demystifying Serverless Machine Learning Training

Authors:Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

View PDF

Abstract:The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We present a design space covering design choices such as optimization algorithms and synchronization protocols, and implement a platform, LambdaML, that enables a fair comparison between FaaS and IaaS. We present experimental results using LambdaML, and further develop an analytic model to capture cost/performance tradeoffs that must be considered when opting for a serverless infrastructure. Our results indicate that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge. In general, FaaS can be much faster but it is never significantly cheaper than IaaS.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2105.07806 [cs.DC]
	(or arXiv:2105.07806v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2105.07806
Related DOI:	https://doi.org/10.1145/3448016.3459240

Submission history

From: Jiawei Jiang [view email]
[v1] Mon, 17 May 2021 13:19:23 UTC (4,411 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2021-05

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jiawei Jiang
Shaoduo Gan
Yue Liu
Gustavo Alonso
Ankit Singla

…

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Towards Demystifying Serverless Machine Learning Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Towards Demystifying Serverless Machine Learning Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators