Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Liu, Zhiyan; Lan, Qiao; Huang, Kaibin

Computer Science > Information Theory

arXiv:2204.05223v1 (cs)

[Submitted on 11 Apr 2022 (this version), latest version 30 Dec 2022 (v2)]

Title:Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Authors:Zhiyan Liu, Qiao Lan, Kaibin Huang

View PDF

Abstract:The deployment of inference services at the network edge, called edge inference, offloads computation-intensive inference tasks from mobile devices to edge servers, thereby enhancing the former's capabilities and battery lives. In a multiuser system, the joint allocation of communication-and-computation ($\text{C}^\text{2}$) resources (i.e., scheduling and bandwidth allocation) is made challenging by adopting efficient inference techniques, batching and early exiting, and further complicated by the heterogeneity in users' requirements on accuracy and latency. Batching groups multiple tasks into one batch for parallel processing to reduce time-consuming memory access and thereby boosts the throughput (i.e., completed task per second). On the other hand, early exiting allows a task to exit from a deep-neural network without traversing the whole network to support a tradeoff between accuracy and latency. In this work, we study optimal $\text{C}^\text{2}$ resource allocation with batching and early exiting, which is an NP-complete integer program. A set of efficient algorithms are designed under the criterion of maximum throughput by tackling the challenge. Experimental results demonstrate that both optimal and sub-optimal $\text{C}^\text{2}$ resource allocation algorithms can leverage integrated batching and early exiting to achieve 200% throughput gain over conventional schemes.

Comments:	This is an extended version of a submission to IEEE journal
Subjects:	Information Theory (cs.IT); Signal Processing (eess.SP)
Cite as:	arXiv:2204.05223 [cs.IT]
	(or arXiv:2204.05223v1 [cs.IT] for this version)
	https://doi.org/10.48550/arXiv.2204.05223

Submission history

From: Zhiyan Liu [view email]
[v1] Mon, 11 Apr 2022 16:13:41 UTC (2,042 KB)
[v2] Fri, 30 Dec 2022 05:42:39 UTC (1,238 KB)

Computer Science > Information Theory

Title:Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Theory

Title:Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators