Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems

Wilkins, Grant; Keshav, Srinivasan; Mortier, Richard

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2407.04014 (cs)

[Submitted on 4 Jul 2024]

Title:Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems

Authors:Grant Wilkins, Srinivasan Keshav, Richard Mortier

View PDF HTML (experimental)

Abstract:The rapid adoption of large language models (LLMs) has led to significant advances in natural language processing and text generation. However, the energy consumed through LLM model inference remains a major challenge for sustainable AI deployment. To address this problem, we model the workload-dependent energy consumption and runtime of LLM inference tasks on heterogeneous GPU-CPU systems. By conducting an extensive characterization study of several state-of-the-art LLMs and analyzing their energy and runtime behavior across different magnitudes of input prompts and output text, we develop accurate (R^2>0.96) energy and runtime models for each LLM. We employ these models to explore an offline, energy-optimal LLM workload scheduling framework. Through a case study, we demonstrate the advantages of energy and accuracy aware scheduling compared to existing best practices.

Comments:	7 pages, appearing at HotCarbon 2024
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2407.04014 [cs.DC]
	(or arXiv:2407.04014v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2407.04014

Submission history

From: Grant Wilkins [view email]
[v1] Thu, 4 Jul 2024 15:45:15 UTC (10,642 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators