Skip to main content

Showing 1–1 of 1 results for author: O'Halloran, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.16187  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.DS cs.PF

    HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

    Authors: Minghui Liu, Tahseen Rabbani, Tony O'Halloran, Ananth Sankaralingam, Mary-Anne Hartley, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos

    Abstract: Transformer-based large language models (LLMs) use the key-value (KV) cache to significantly accelerate inference by storing the key and value embeddings of past tokens. However, this cache consumes significant GPU memory. In this work, we introduce HashEvict, an algorithm that uses locality-sensitive hashing (LSH) to compress the KV cache. HashEvict quickly locates tokens in the cache that are co… ▽ More

    Submitted 4 June, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 10 pages, 6 figures, 2 tables