-
Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue
Authors:
Daniela Böhm,
Georg Gottlob,
Matthias Lanzinger,
Davide Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work…
▽ More
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.
△ Less
Submitted 20 June, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Soft and Constrained Hypertree Width
Authors:
Matthias Lanzinger,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer,
Georg Gottlob
Abstract:
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the…
▽ More
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the performance of query evaluation. It is therefore of interest to restrict the computation of decompositions by constraints and to guide this computation by preferences. To this end, we propose a novel framework based on candidate tree decompositions, which allows us to introduce soft hypertree width (shw). This width measure is a relaxation of hypertree width (hw); it is never greater than hw and, in some cases, shw may actually be lower than hw. Most importantly, shw preserves the tractability of deciding if a given CQ is below some fixed bound, while offering more algorithmic flexibility. In particular, it provides a natural way to incorporate preferences and constraints into the computation of decompositions. A prototype implementation and preliminary experiments confirm that this novel framework can indeed have a practical impact on query evaluation.
△ Less
Submitted 20 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
On autoregressive deep learning models for day-ahead wind power forecasting with irregular shutdowns due to redispatching
Authors:
Stefan Meisenbacher,
Silas Aaron Selzer,
Mehdi Dado,
Maximilian Beichter,
Tim Martin,
Markus Zdrallek,
Peter Bretschneider,
Veit Hagenmeyer,
Ralf Mikut
Abstract:
Renewable energies and their operation are becoming increasingly vital for the stability of electrical power grids since conventional power plants are progressively being displaced, and their contribution to redispatch interventions is thereby diminishing. In order to consider renewable energies like Wind Power (WP) for such interventions as a substitute, day-ahead forecasts are necessary to commu…
▽ More
Renewable energies and their operation are becoming increasingly vital for the stability of electrical power grids since conventional power plants are progressively being displaced, and their contribution to redispatch interventions is thereby diminishing. In order to consider renewable energies like Wind Power (WP) for such interventions as a substitute, day-ahead forecasts are necessary to communicate their availability for redispatch planning. In this context, automated and scalable forecasting models are required for the deployment to thousands of locally-distributed onshore WP turbines. Furthermore, the irregular interventions into the WP generation capabilities due to redispatch shutdowns pose challenges in the design and operation of WP forecasting models. Since state-of-the-art forecasting methods consider past WP generation values alongside day-ahead weather forecasts, redispatch shutdowns may impact the forecast. Therefore, the present paper highlights these challenges and analyzes state-of-the-art forecasting methods on data sets with both regular and irregular shutdowns. Specifically, we compare the forecasting accuracy of three autoregressive Deep Learning (DL) methods to methods based on WP curve modeling. Interestingly, the latter achieve lower forecasting errors, have fewer requirements for data cleaning during modeling and operation while being computationally more efficient, suggesting their advantages in practical applications.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
Avoiding Materialisation for Guarded Aggregate Queries
Authors:
Matthias Lanzinger,
Reinhard Pichler,
Alexander Selzer
Abstract:
Optimising queries with many joins is known to be a hard problem. The explosion of intermediate results as opposed to a much smaller final result poses a serious challenge to modern database management systems (DBMSs). This is particularly glaring in case of analytical queries that join many tables, but ultimately only output comparatively small aggregate information. Analogous problems are faced…
▽ More
Optimising queries with many joins is known to be a hard problem. The explosion of intermediate results as opposed to a much smaller final result poses a serious challenge to modern database management systems (DBMSs). This is particularly glaring in case of analytical queries that join many tables, but ultimately only output comparatively small aggregate information. Analogous problems are faced by graph database systems when processing analytical queries with aggregates on top of complex path queries.
In this work, we propose novel optimisation techniques both, on the logical and physical level, that allow us to avoid the materialisation of join results for certain types of aggregate queries. The key to these optimisations is the notion of guardedness, by which we impose restrictions on the occurrence of attributes in GROUP BY clauses and in aggregate expressions. The efficacy of our optimisations is validated through their implementation in Spark SQL and extensive empirical evaluation on various standard benchmarks.
△ Less
Submitted 30 November, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Structure-Guided Query Evaluation: Towards Bridging the Gap from Theory to Practice
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and pro…
▽ More
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and proposed efficient query evaluation techniques through variants of Yannakakis' algorithm. However, although most queries in practice actually are acyclic or have low width, structure-guided query evaluation techniques based on Yannakakis' algorithm have not found their way into mainstream database technology yet. The goal of this work is to address this gap between theory and practice and to demonstrate that the consideration of query structure can improve query evaluation performance on modern DBMSs significantly in cases that have been traditionally challenging. In particular, we study the performance of structure-guided query evaluation in three architecturally distinct DBMSs by rewriting SQL queries into a sequence of SQL statements that express an execution of Yannakakis' algorithm. Moreover, we identify a class of queries that is particularly well suited for our approach and allows query answering in a variety of common scenarios without materializing any join. Through empirical evaluation we show that structure-guided query evaluation can make the evaluation of many difficult join queries feasible whereas their evaluation requires a prohibitive amount of time and memory on current DBMSs.
△ Less
Submitted 22 May, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Integration of Skyline Queries into Spark SQL
Authors:
Lukas Grasmann,
Reinhard Pichler,
Alexander Selzer
Abstract:
Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewritin…
▽ More
Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.