KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Wang, Jiahao; Han, Jinbo; Wei, Xingda; Shen, Sijie; Zhang, Dingyan; Fang, Chenguang; Chen, Rong; Yu, Wenyuan; Chen, Haibo

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2506.02634 (cs)

[Submitted on 3 Jun 2025 (v1), last revised 19 Jun 2025 (this version, v3)]

Title:KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Authors:Jiahao Wang, Jinbo Han, Xingda Wei, Sijie Shen, Dingyan Zhang, Chenguang Fang, Rong Chen, Wenyuan Yu, Haibo Chen

View PDF HTML (experimental)

Abstract:Serving large language models (LLMs) is important for cloud providers, and caching intermediate results (KV\$) after processing each request substantially improves serving throughput and latency. However, there is limited understanding of how LLM serving benefits from KV\$ caching, where system design decisions like cache eviction policies are highly workload-dependent. In this paper, we present the first systematic characterization of the KV\$ workload patterns from one of the leading LLM service providers. We draw observations that were not covered by previous studies focusing on synthetic workloads, including: KV\$ reuses are skewed across requests, where reuses between single-turn requests are equally important as multi-turn requests; the reuse time and probability are diverse considering all requests, but for a specific request category, the pattern tends to be predictable; and the overall cache size required for an ideal cache hit ratio is moderate. Based on the characterization, we further propose a workload-aware cache eviction policy that improves the serving performance under real-world traces, especially with limited cache capacity.

Comments:	Accepted by USENIX ATC'25
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.02634 [cs.DC]
	(or arXiv:2506.02634v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2506.02634

Submission history

From: Jiahao Wang [view email]
[v1] Tue, 3 Jun 2025 08:51:38 UTC (26,463 KB)
[v2] Sat, 14 Jun 2025 04:39:21 UTC (26,038 KB)
[v3] Thu, 19 Jun 2025 02:18:16 UTC (26,033 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators