Model-Distributed Inference for Large Language Models at the Edge

Macario, Davide; Seferoglu, Hulya; Koyuncu, Erdem

Computer Science > Machine Learning

arXiv:2505.18164 (cs)

[Submitted on 13 May 2025]

Title:Model-Distributed Inference for Large Language Models at the Edge

Authors:Davide Macario, Hulya Seferoglu, Erdem Koyuncu

View PDF HTML (experimental)

Abstract:We introduce Model-Distributed Inference for Large-Language Models (MDI-LLM), a novel framework designed to facilitate the deployment of state-of-the-art large-language models (LLMs) across low-power devices at the edge. This is accomplished by dividing the model into multiple partitions, which are then assigned to different devices/nodes within the network. These nodes exchange intermediate activation vectors via device-to-device links, enabling collaborative computation. To enhance the efficiency of this process, we propose the "recurrent pipeline parallelism" technique, which reduces idle time on each device and facilitates parallel inference during the generation of multiple text sequences. By leveraging the combined computational resources of multiple edge devices, MDI-LLM enables the deployment of LLMs that exceed the memory capacity of individual devices, making it possible to perform inference on low-cost hardware. Furthermore, as the number of participating devices increases, MDI-LLM boosts token generation throughput and reduces memory consumption per device.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2505.18164 [cs.LG]
	(or arXiv:2505.18164v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2505.18164

Submission history

From: Hulya Seferoglu [view email]
[v1] Tue, 13 May 2025 12:07:37 UTC (521 KB)

Computer Science > Machine Learning

Title:Model-Distributed Inference for Large Language Models at the Edge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model-Distributed Inference for Large Language Models at the Edge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators