Extending Polaris to Support Transactions
Authors:
Josep Aguilar-Saborit,
Raghu Ramakrishnan,
Kevin Bocksrocker,
Alan Halverson,
Konstantin Kosinsky,
Ryan O'Connor,
Nadejda Poliakova,
Moe Shafiei,
Taewoo Kim,
Phil Kon-Kim,
Haris Mahmud-Ansari,
Blazej Matuszyk,
Matt Miles,
Sumin Mohanan,
Cristian Petculescu,
Ishan Rahesh-Madan,
Emma Rose-Wirshing,
Elias Yousefi
Abstract:
In Polaris, we introduced a cloud-native distributed query processor to perform analytics at scale. In this paper, we extend the underlying Polaris distributed computation framework, which can be thought of as a read-only transaction engine, to execute general transactions (including updates, deletes, inserts and bulk loads, in addition to queries) for Tier 1 warehousing workloads in a highly perf…
▽ More
In Polaris, we introduced a cloud-native distributed query processor to perform analytics at scale. In this paper, we extend the underlying Polaris distributed computation framework, which can be thought of as a read-only transaction engine, to execute general transactions (including updates, deletes, inserts and bulk loads, in addition to queries) for Tier 1 warehousing workloads in a highly performant and predictable manner. We take advantage of the immutability of data files in log-structured data stores and build on SQL Server transaction management to deliver full transactional support with Snapshot Isolation semantics, including multi-table and multi-statement transactions. With the enhancements described in this paper, Polaris supports both query processing and transactions for T-SQL in Microsoft Fabric.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
Optimization of Imperative Programs in a Relational Database
Authors:
Karthik Ramachandra,
Kwanghyun Park,
K. Venkatesh Emani,
Alan Halverson,
Cesar Galindo-Legaria,
Conor Cunningham
Abstract:
For decades, RDBMSs have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, the evaluation of imperative programs has remained naive and highly inefficient. Imperative programs offer several benefits over SQL…
▽ More
For decades, RDBMSs have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, the evaluation of imperative programs has remained naive and highly inefficient. Imperative programs offer several benefits over SQL and hence are often preferred and widely used. But unfortunately, their abysmal performance discourages, and even prohibits their use in many situations. We address this important problem that has hitherto received little attention.
We present Froid, an extensible framework for optimizing imperative programs in relational databases. Froid's novel approach automatically transforms entire User Defined Functions (UDFs) into relational algebraic expressions, and embeds them into the calling SQL query. This form is now amenable to cost-based optimization and results in efficient, set-oriented, parallel plans as opposed to inefficient, iterative, serial execution of UDFs. Froid's approach additionally brings the benefits of many compiler optimizations to UDFs with no additional implementation effort. We describe the design of Froid and present our experimental evaluation that demonstrates performance improvements of up to multiple orders of magnitude on real workloads.
△ Less
Submitted 20 August, 2019; v1 submitted 1 December, 2017;
originally announced December 2017.