Skip to main content

Showing 1–1 of 1 results for author: Yankun, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.15304  [pdf, other

    cs.LG cs.AI cs.CL

    SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention

    Authors: Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan

    Abstract: For the efficient inference of Large Language Models (LLMs), the effective compression of key-value (KV) cache is essential. Three main types of KV cache compression techniques, namely sparsity, channel compression, and quantization, have been identified. This study presents SVDq, a Singular Value Decomposition (SVD) - based mixed precision quantization method for K cache. Initially, K cache is tr… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    MSC Class: 68T50