-
The Cambridge Report on Database Research
Authors:
Anastasia Ailamaki,
Samuel Madden,
Daniel Abadi,
Gustavo Alonso,
Sihem Amer-Yahia,
Magdalena Balazinska,
Philip A. Bernstein,
Peter Boncz,
Michael Cafarella,
Surajit Chaudhuri,
Susan Davidson,
David DeWitt,
Yanlei Diao,
Xin Luna Dong,
Michael Franklin,
Juliana Freire,
Johannes Gehrke,
Alon Halevy,
Joseph M. Hellerstein,
Mark D. Hill,
Stratos Idreos,
Yannis Ioannidis,
Christoph Koch,
Donald Kossmann,
Tim Kraska
, et al. (21 additional authors not shown)
Abstract:
On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five…
▽ More
On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five years to produce a forward looking report.
This report summarizes the key takeaways from our discussions. We begin with a retrospective on the academic, open source, and commercial successes of the community over the past five years. We then turn to future opportunities, with a focus on core data systems, particularly in the context of cloud computing and emerging hardware, as well as on the growing impact of data science, data governance, and generative AI.
This document is not intended as an exhaustive survey of all technical challenges or industry innovations in the field. Rather, it reflects the perspectives of senior community members on the most pressing challenges and promising opportunities ahead.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Efficiently making (almost) any concurrency control mechanism serializable
Authors:
Tianzheng Wang,
Ryan Johnson,
Alan Fekete,
Ippokratis Pandis
Abstract:
Concurrency control (CC) algorithms must trade off strictness for performance. Serializable CC schemes generally pay higher cost to prevent anomalies, both in runtime overhead and in efforts wasted by aborting transactions. We propose the serial safety net (SSN), a serializability-enforcing certifier which can be applied with minimal overhead on top of various CC schemes that offer higher performa…
▽ More
Concurrency control (CC) algorithms must trade off strictness for performance. Serializable CC schemes generally pay higher cost to prevent anomalies, both in runtime overhead and in efforts wasted by aborting transactions. We propose the serial safety net (SSN), a serializability-enforcing certifier which can be applied with minimal overhead on top of various CC schemes that offer higher performance but admit anomalies, such as snapshot isolation and read committed. The underlying CC retains control of scheduling and transactional accesses, while SSN tracks the resulting dependencies. At commit time, SSN performs an efficient validation test by examining only direct dependencies of the committing transaction to determine whether it can commit safely or must abort to avoid a potential dependency cycle.
SSN performs robustly for various workloads. It maintains the characteristics of the underlying CC without biasing toward certain types of transactions, though the underlying CC might. Besides traditional OLTP workloads, SSN also allows efficient handling of heterogeneous workloads with long, read-mostly transactions. SSN can avoid tracking the majority of reads (thus reducing the overhead of serializability certification) and still produce serializable executions with little overhead. The dependency tracking and validation tests can be done efficiently, fully parallel and latch-free, for multi-version systems on modern hardware with substantial core count and large main memory.
We demonstrate the efficiency, accuracy and robustness of SSN using extensive simulations and an implementation that overlays snapshot isolation in ERMIA, a memory-optimized OLTP engine that is capable of running different CC schemes. Evaluation results confirm that SSN is a promising approach to serializability with robust performance and low overhead for various workloads.
△ Less
Submitted 4 May, 2017; v1 submitted 13 May, 2016;
originally announced May 2016.
-
OLTP on Hardware Islands
Authors:
Danica Porobic,
Ippokratis Pandis,
Miguel Branco,
Pınar Tözün,
Anastasia Ailamaki
Abstract:
Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and to the processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in…
▽ More
Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and to the processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, hardware heterogeneity does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. Finally, we argue in favor of shared-nothing deployments that are topology- and workload-aware and take advantage of fast on-chip communication between islands of cores on the same socket.
△ Less
Submitted 1 August, 2012;
originally announced August 2012.