WARP: An Efficient Engine for Multi-Vector Retrieval
Authors:
Jan Luca Scheerer,
Matei Zaharia,
Christopher Potts,
Gustavo Alonso,
Omar Khattab
Abstract:
Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARP$_\text{SELECT}$ for dynamic similarity imputa…
▽ More
Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARP$_\text{SELECT}$ for dynamic similarity imputation; (2) implicit decompression, avoiding costly vector reconstruction during retrieval; and (3) a two-stage reduction process for efficient score aggregation. Combined with highly-optimized C++ kernels, our system reduces end-to-end latency compared to XTR's reference implementation by 41x, and achieves a 3x speedup over the ColBERTv2/PLAID engine, while preserving retrieval quality.
△ Less
Submitted 30 April, 2025; v1 submitted 29 January, 2025;
originally announced January 2025.
QirK: Question Answering via Intermediate Representation on Knowledge Graphs
Authors:
Jan Luca Scheerer,
Anton Lykov,
Moe Kayali,
Ilias Fountalis,
Dan Olteanu,
Nikolaos Vasiloglou,
Dan Suciu
Abstract:
We demonstrate QirK, a system for answering natural language questions on Knowledge Graphs (KG). QirK can answer structurally complex questions that are still beyond the reach of emerging Large Language Models (LLMs). It does so using a unique combination of database technology, LLMs, and semantic search over vector embeddings. The glue for these components is an intermediate representation (IR).…
▽ More
We demonstrate QirK, a system for answering natural language questions on Knowledge Graphs (KG). QirK can answer structurally complex questions that are still beyond the reach of emerging Large Language Models (LLMs). It does so using a unique combination of database technology, LLMs, and semantic search over vector embeddings. The glue for these components is an intermediate representation (IR). The input question is mapped to IR using LLMs, which is then repaired into a valid relational database query with the aid of a semantic search on vector embeddings. This allows a practical synthesis of LLM capabilities and KG reliability.
A short video demonstrating QirK is available at https://youtu.be/6c81BLmOZ0U.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.