Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Runhua, Zhang; Hongxu, Jiang; Fangzheng, Tian; Jinkun, Geng; Xiaobin, Li; Yuhang, Ma; Chenhui, Zhu; Dong, Dong; Xin, Li; Haojie, Wang

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2302.00282 (cs)

[Submitted on 1 Feb 2023]

Title:Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Authors:Zhang Runhua, Jiang Hongxu, Tian Fangzheng, Geng Jinkun, Li Xiaobin, Ma Yuhang, Zhu Chenhui, Dong Dong, Li Xin, Wang Haojie

View PDF

Abstract:Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.
In this paper, we propose Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 21.2\%--84.9\% and 17.9\%--96.2\% , respectively. Besides, Xenos also outperforms the widely-used TVM by 3.22$\times$--17.92$\times$. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x--3.78x compared with the single device.

Comments:	The preliminary version is accepted by the 28th International Conference on Database Systems for Advanced Applications (DASFAA-2023)
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2302.00282 [cs.DC]
	(or arXiv:2302.00282v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2302.00282

Submission history

From: Jinkun Geng [view email]
[v1] Wed, 1 Feb 2023 07:25:08 UTC (3,632 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators