Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Bruch, Sebastian; Nardini, Franco Maria; Rulli, Cosimo; Venturini, Rossano

Computer Science > Data Structures and Algorithms

arXiv:2509.24815 (cs)

[Submitted on 29 Sep 2025]

Title:Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Authors:Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

View PDF HTML (experimental)

Abstract:Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.

Subjects:	Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2509.24815 [cs.DS]
	(or arXiv:2509.24815v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2509.24815

Submission history

From: Sebastian Bruch [view email]
[v1] Mon, 29 Sep 2025 14:02:45 UTC (782 KB)

Computer Science > Data Structures and Algorithms

Title:Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators