DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Liu, Yuhan; Huang, Yuyang; Yao, Jiayi; Gu, Zhuohan; Du, Kuntai; Li, Hanchen; Cheng, Yihua; Jiang, Junchen; Lu, Shan; Musuvathi, Madan; Choukse, Esha

Computer Science > Multiagent Systems

arXiv:2411.02820 (cs)

[Submitted on 5 Nov 2024 (v1), last revised 19 Dec 2024 (this version, v3)]

Title:DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Authors:Yuhan Liu, Yuyang Huang, Jiayi Yao, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy.
Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

Subjects:	Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2411.02820 [cs.MA]
	(or arXiv:2411.02820v3 [cs.MA] for this version)
	https://doi.org/10.48550/arXiv.2411.02820

Submission history

From: Yuhan Liu [view email]
[v1] Tue, 5 Nov 2024 05:41:41 UTC (1,761 KB)
[v2] Fri, 13 Dec 2024 17:53:25 UTC (7,038 KB)
[v3] Thu, 19 Dec 2024 23:52:16 UTC (7,041 KB)

Computer Science > Multiagent Systems

Title:DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multiagent Systems

Title:DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators