-
Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue
Authors:
Daniela Böhm,
Georg Gottlob,
Matthias Lanzinger,
Davide Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work…
▽ More
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Common Foundations for SHACL, ShEx, and PG-Schema
Authors:
S. Ahmetaj,
I. Boneva,
J. Hidders,
K. Hose,
M. Jakubowski,
J. E. Labra-Gayo,
W. Martens,
F. Mogavero,
F. Murlak,
C. Okulmus,
A. Polleres,
O. Savkovic,
M. Simkus,
D. Tomaszuk
Abstract:
Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of ap…
▽ More
Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of applications and availability of different data models, such as RDF and property graphs, both the Semantic Web and the database community have independently developed graph schema languages: SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide formal, concise definitions of the core components of each of these schema languages. We employ a uniform framework to facilitate a comprehensive comparison between the languages and identify a common set of functionalities, shedding light on both overlapping and distinctive features of the three languages.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Soft and Constrained Hypertree Width
Authors:
Matthias Lanzinger,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer,
Georg Gottlob
Abstract:
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the…
▽ More
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the performance of query evaluation. It is therefore of interest to restrict the computation of decompositions by constraints and to guide this computation by preferences. To this end, we propose a novel framework based on candidate tree decompositions, which allows us to introduce soft hypertree width (shw). This width measure is a relaxation of hypertree width (hw); it is never greater than hw and, in some cases, shw may actually be lower than hw. Most importantly, shw preserves the tractability of deciding if a given CQ is below some fixed bound, while offering more algorithmic flexibility. In particular, it provides a natural way to incorporate preferences and constraints into the computation of decompositions. A prototype implementation and preliminary experiments confirm that this novel framework can indeed have a practical impact on query evaluation.
△ Less
Submitted 20 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
Towards Practicable Algorithms for Rewriting Graph Queries beyond DL-Lite
Authors:
Bianca Löhnert,
Nikolaus Augsten,
Cem Okulmus,
Magdalena Ortiz
Abstract:
Despite the many advantages that ontology-based data access (OBDA) has brought to a range of application domains, state-of-the-art OBDA systems still do not support popular graph database management systems such as Neo4j. Algorithms for query rewriting focus on languages like conjunctive queries and their unions, which are fragments of first-order logic and were developed for relational data. Such…
▽ More
Despite the many advantages that ontology-based data access (OBDA) has brought to a range of application domains, state-of-the-art OBDA systems still do not support popular graph database management systems such as Neo4j. Algorithms for query rewriting focus on languages like conjunctive queries and their unions, which are fragments of first-order logic and were developed for relational data. Such query languages are poorly suited for querying graph data. Moreover, they also limit the expressiveness of the ontology languages that admit rewritings, restricting them to those where the data complexity of reasoning is not higher than it is in first-order logic. In this paper, we propose a technique for rewriting a family of navigational queries for a suitably restricted fragment of ELHI that extends DL-Lite and that is NL-complete in data complexity. We implemented a proof-of-concept prototype that rewrites into Cypher queries, and tested it on a real-world cognitive neuroscience use case with promising results.
△ Less
Submitted 23 April, 2025; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Expressive Power and Complexity Results for SIGNAL, an Industry-scale Process Query Language
Authors:
Timotheus Kampik,
Cem Okulmus
Abstract:
With the increased adoption of process mining, there is also a need for practical solutions that work at industry scales. In this context, process querying methods (PQMs) have emerged as an important tool for drawing inferences from event logs. Here, it can be expected that industry approaches differ from academic ones, due to practical engineering and business considerations. To understand what i…
▽ More
With the increased adoption of process mining, there is also a need for practical solutions that work at industry scales. In this context, process querying methods (PQMs) have emerged as an important tool for drawing inferences from event logs. Here, it can be expected that industry approaches differ from academic ones, due to practical engineering and business considerations. To understand what is at the core of industry-scale PQMs, a formal analysis of the underlying languages can provide a solid foundation. To this end, we formally analyse SIGNAL, an industry-scale language for querying business process event logs developed by a large enterprise software vendor. The formal analysis shows that the core capabilities of SIGNAL, which we refer to as the SIGNAL Conjunctive Core, are more expressive than relational algebra and thus not captured by standard relational databases. We provide an upper-bound on the expressiveness via a reduction to semi-positive Datalog, which also leads to an upper bound of P-hard for the data complexity of evaluating SIGNAL Conjunctive Core queries. The findings provide first insights into how (real-world) process query languages are fundamentally different from the more generally prevalent structured query languages for querying relational databases and provide a rigorous foundation for extending the existing capabilities of the industry-scale state-of-the-art of process data querying.
△ Less
Submitted 24 July, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Structure-Guided Query Evaluation: Towards Bridging the Gap from Theory to Practice
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and pro…
▽ More
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and proposed efficient query evaluation techniques through variants of Yannakakis' algorithm. However, although most queries in practice actually are acyclic or have low width, structure-guided query evaluation techniques based on Yannakakis' algorithm have not found their way into mainstream database technology yet. The goal of this work is to address this gap between theory and practice and to demonstrate that the consideration of query structure can improve query evaluation performance on modern DBMSs significantly in cases that have been traditionally challenging. In particular, we study the performance of structure-guided query evaluation in three architecturally distinct DBMSs by rewriting SQL queries into a sequence of SQL statements that express an execution of Yannakakis' algorithm. Moreover, we identify a class of queries that is particularly well suited for our approach and allows query answering in a variety of common scenarios without materializing any join. Through empirical evaluation we show that structure-guided query evaluation can make the evaluation of many difficult join queries feasible whereas their evaluation requires a prohibitive amount of time and memory on current DBMSs.
△ Less
Submitted 22 May, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Incremental Updates of Generalized Hypertree Decompositions
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus
Abstract:
Structural decomposition methods, such as generalized hypertree decompositions, have been successfully used for solving constraint satisfaction problems (CSPs). As decompositions can be reused to solve CSPs with the same constraint scopes, investing resources in computing good decompositions is beneficial, even though the computation itself is hard. Unfortunately, current methods need to compute a…
▽ More
Structural decomposition methods, such as generalized hypertree decompositions, have been successfully used for solving constraint satisfaction problems (CSPs). As decompositions can be reused to solve CSPs with the same constraint scopes, investing resources in computing good decompositions is beneficial, even though the computation itself is hard. Unfortunately, current methods need to compute a completely new decomposition even if the scopes change only slightly. In this paper, we make the first steps toward solving the problem of updating the decomposition of a CSP $P$ so that it becomes a valid decomposition of a new CSP $P'$ produced by some modification of $P$. Even though the problem is hard in theory, we propose and implement a framework for effectively updating GHDs. The experimental evaluation of our algorithm strongly suggests practical applicability.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth
Authors:
Georg Gottlob,
Matthias Lanzinger,
Cem Okulmus,
Reinhard Pichler
Abstract:
Modern trends in data collection are bringing current mainstream techniques for database query processing to their limits. Consequently, various novel approaches for efficient query processing are being actively studied. One such approach is based on hypertree decompositions (HDs), which have been shown to carry great potential to process complex queries more efficiently and with stronger theoreti…
▽ More
Modern trends in data collection are bringing current mainstream techniques for database query processing to their limits. Consequently, various novel approaches for efficient query processing are being actively studied. One such approach is based on hypertree decompositions (HDs), which have been shown to carry great potential to process complex queries more efficiently and with stronger theoretical guarantees. However, using HDs for query execution relies on the difficult task of computing decompositions of the query structure, which guides the efficient execution of the query. From theoretical results we know that the performance of purely sequential methods is inherently limited, yet the problem is susceptible to parallelisation.
In this paper we propose the first algorithm for computing hypertree decompositions that is well-suited for parallelisation. The proposed algorithm log-k-decomp requires only a logarithmic number of recursion levels and additionally allows for highly parallelised pruning of the search space by restriction to balanced separators. We provide detailed experimental evaluation over the HyperBench benchmark and demonstrate that our approach is highly effective especially for complex queries.
△ Less
Submitted 12 April, 2022; v1 submitted 28 April, 2021;
originally announced April 2021.
-
The HyperTrac Project: Recent Progress and Future Research Directions on Hypergraph Decompositions
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus,
Reinhard Pichler
Abstract:
Constraint Satisfaction Problems (CSPs) play a central role in many applications in Artificial Intelligence and Operations Research. In general, solving CSPs is NP-complete. The structure of CSPs is best described by hypergraphs. Therefore, various forms of hypergraph decompositions have been proposed in the literature to identify tractable fragments of CSPs. However, also the computation of a con…
▽ More
Constraint Satisfaction Problems (CSPs) play a central role in many applications in Artificial Intelligence and Operations Research. In general, solving CSPs is NP-complete. The structure of CSPs is best described by hypergraphs. Therefore, various forms of hypergraph decompositions have been proposed in the literature to identify tractable fragments of CSPs. However, also the computation of a concrete hypergraph decomposition is a challenging task in itself. In this paper, we report on recent progress in the study of hypergraph decompositions and we outline several directions for future research.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.