Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Movement
Authors:
Felipe Aramburú,
William Malpica,
Kaouther Abrougui,
Amin Aramoon,
Romulo Auccapuclla,
Claude Brisson,
Matthijs Brobbel,
Colby Farrell,
Pradeep Garigipati,
Joost Hoozemans,
Supun Kamburugamuve,
Akhil Nair,
Alexander Ocsa,
Johan Peltenburg,
Rubén Quesada López,
Deepak Sihag,
Ahmet Uyar,
Dhruv Vats,
Michael Wendt,
Jignesh M. Patel,
Rodrigo Aramburú
Abstract:
Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can leverage accelerators such as GPUs, which are now ubiquitous in the compute infrastructure. This introduces many challenges, the majority of which are related to when, where, and how to best move dat…
▽ More
Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can leverage accelerators such as GPUs, which are now ubiquitous in the compute infrastructure. This introduces many challenges, the majority of which are related to when, where, and how to best move data around the system. We present Theseus -- a production-ready enterprise-scale distributed accelerator-native query engine designed to balance data movement, memory utilization, and computation in an accelerator-based system context. Specialized asynchronous control mechanisms are tightly coupled to the hardware resources for the purpose of network communication, data pre-loading, data spilling across memories and storage, and GPU compute tasks. The memory subsystem contains a mechanism for fixed-size page-locked host memory allocations to increase throughput and reduce memory fragmentation. For the TPC-H benchmarks at scale factors ranging from 1k to 30k on cloud infrastructure, Theseus outperforms Databricks Photon by up to $4\times$ at cost parity. Theseus is capable of processing all queries of the TPC-H and TPC-DS benchmarks at scale factor 100k (100 TB scale) with as few as 2 DGX A100 640GB nodes.
△ Less
Submitted 7 August, 2025;
originally announced August 2025.
Named Entity Recognition in Context
Authors:
Colin Brisson,
Ayoub Kahfy,
Marc Bui,
Frédéric Constant
Abstract:
We present the Named Entity Recognition system developed by the Edit Dunhuang team for the EvaHan2025 competition. Our approach integrates three core components: (1) Pindola, a modern transformer-based bidirectional encoder pretrained on a large corpus of Classical Chinese texts; (2) a retrieval module that fetches relevant external context for each target sequence; and (3) a generative reasoning…
▽ More
We present the Named Entity Recognition system developed by the Edit Dunhuang team for the EvaHan2025 competition. Our approach integrates three core components: (1) Pindola, a modern transformer-based bidirectional encoder pretrained on a large corpus of Classical Chinese texts; (2) a retrieval module that fetches relevant external context for each target sequence; and (3) a generative reasoning step that summarizes retrieved context in Classical Chinese for more robust entity disambiguation. Using this approach, we achieve an average F1 score of 85.58, improving upon the competition baseline by nearly 5 points.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.