-
Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue
Authors:
Daniela Böhm,
Georg Gottlob,
Matthias Lanzinger,
Davide Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work…
▽ More
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not.
In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Soft and Constrained Hypertree Width
Authors:
Matthias Lanzinger,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer,
Georg Gottlob
Abstract:
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the…
▽ More
Hypertree decompositions provide a way to evaluate Conjunctive Queries (CQs) in polynomial time, where the exponent of this polynomial is determined by the width of the decomposition. In theory, the goal of efficient CQ evaluation therefore has to be a minimisation of the width. However, in practical settings, it turns out that there are also other properties of a decomposition that influence the performance of query evaluation. It is therefore of interest to restrict the computation of decompositions by constraints and to guide this computation by preferences. To this end, we propose a novel framework based on candidate tree decompositions, which allows us to introduce soft hypertree width (shw). This width measure is a relaxation of hypertree width (hw); it is never greater than hw and, in some cases, shw may actually be lower than hw. Most importantly, shw preserves the tractability of deciding if a given CQ is below some fixed bound, while offering more algorithmic flexibility. In particular, it provides a natural way to incorporate preferences and constraints into the computation of decompositions. A prototype implementation and preliminary experiments confirm that this novel framework can indeed have a practical impact on query evaluation.
△ Less
Submitted 20 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
Fuzzy Datalog$^\exists$ over Arbitrary t-Norms
Authors:
Matthias Lanzinger,
Stefano Sferrazza,
Przemysław A. Wałęga,
Georg Gottlob
Abstract:
One of the main challenges in the area of Neuro-Symbolic AI is to perform logical reasoning in the presence of both neural and symbolic data. This requires combining heterogeneous data sources such as knowledge graphs, neural model predictions, structured databases, crowd-sourced data, and many more. To allow for such reasoning, we generalise the standard rule-based language Datalog with existenti…
▽ More
One of the main challenges in the area of Neuro-Symbolic AI is to perform logical reasoning in the presence of both neural and symbolic data. This requires combining heterogeneous data sources such as knowledge graphs, neural model predictions, structured databases, crowd-sourced data, and many more. To allow for such reasoning, we generalise the standard rule-based language Datalog with existential rules (commonly referred to as tuple-generating dependencies) to the fuzzy setting, by allowing for arbitrary t-norms in the place of classical conjunctions in rule bodies. The resulting formalism allows us to perform reasoning about data associated with degrees of uncertainty while preserving computational complexity results and the applicability of reasoning techniques established for the standard Datalog setting. In particular, we provide fuzzy extensions of Datalog chases which produce fuzzy universal models and we exploit them to show that in important fragments of the language, reasoning has the same complexity as in the classical setting.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Authors:
Lingzhi Wang,
Xingshan Zeng,
Jinsong Guo,
Kam-Fai Wong,
Georg Gottlob
Abstract:
This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SeUL, a novel method that enables selective and fine-grained unlearning for language models. Unlike previous work that employs a fully reversed training objective in unlearning, SeUL minimize…
▽ More
This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SeUL, a novel method that enables selective and fine-grained unlearning for language models. Unlike previous work that employs a fully reversed training objective in unlearning, SeUL minimizes the negative impact on the capability of language models, particularly in terms of generation. Furthermore, we introduce two innovative evaluation metrics, sensitive extraction likelihood (S-EL) and sensitive memorization accuracy (S-MA), specifically designed to assess the effectiveness of forgetting sensitive information. In support of the unlearning framework, we propose efficient automatic online and offline sensitive span annotation methods. The online selection method, based on language probability scores, ensures computational efficiency, while the offline annotation involves a two-stage LLM-based process for robust verification. In summary, this paper contributes a novel selective unlearning method (SeUL), introduces specialized evaluation metrics (S-EL and S-MA) for assessing sensitive information forgetting, and proposes automatic online and offline sensitive span annotation methods to support the overall unlearning framework and evaluation process.
△ Less
Submitted 16 December, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Dyadic Existential Rules
Authors:
Georg Gottlob,
Marco Manna,
Cinzia Marte
Abstract:
Existential rules form an expressive Datalog-based language to specify ontological knowledge. The presence of existential quantification in rule-heads, however, makes the main reasoning tasks undecidable. To overcome this limitation, in the last two decades, a number of classes of existential rules guaranteeing the decidability of query answering have been proposed. Unfortunately, only some of the…
▽ More
Existential rules form an expressive Datalog-based language to specify ontological knowledge. The presence of existential quantification in rule-heads, however, makes the main reasoning tasks undecidable. To overcome this limitation, in the last two decades, a number of classes of existential rules guaranteeing the decidability of query answering have been proposed. Unfortunately, only some of these classes fully encompass Datalog and, often, this comes at the price of higher computational complexity. Moreover, expressive classes are typically unable to exploit tools developed for classes exhibiting lower expressiveness. To mitigate these shortcomings, this paper introduces a novel general syntactic condition that allows us to define, systematically and in a uniform way, from any decidable class $\mathcal{C}$ of existential rules, a new class called Dyadic-$\mathcal{C}$ enjoying the following properties: $(i)$ it is decidable; $(ii)$ it generalises Datalog; $(iii)$ it generalises $\mathcal{C}$; $(iv)$ it can effectively exploit any reasoner for query answering over $\mathcal{C}$; and $(v)$ its computational complexity does not exceed the highest between the one of $\mathcal{C}$ and the one of Datalog. Under consideration in Theory and Practice of Logic Programming (TPLP).
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
SparqLog: A System for Efficient Evaluation of SPARQL 1.1 Queries via Datalog [Experiment, Analysis and Benchmark]
Authors:
Renzo Angles,
Georg Gottlob,
Aleksandar Pavlovic,
Reinhard Pichler,
Emanuel Sallinger
Abstract:
Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in t…
▽ More
Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in the DB area, a different set of languages such as SPARQL and OWL is used in the SW area. Interoperability between such technologies is still a challenge. The goal of this work is to present a uniform and consistent framework meeting important requirements from both, the SW and DB field.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Structure-Guided Query Evaluation: Towards Bridging the Gap from Theory to Practice
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus,
Reinhard Pichler,
Alexander Selzer
Abstract:
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and pro…
▽ More
Join queries involving many relations pose a severe challenge to today's query optimisation techniques. To some extent, this is due to the fact that these techniques do not pay sufficient attention to structural properties of the query. In stark contrast, the Database Theory community has intensively studied structural properties of queries (such as acyclicity and various notions of width) and proposed efficient query evaluation techniques through variants of Yannakakis' algorithm. However, although most queries in practice actually are acyclic or have low width, structure-guided query evaluation techniques based on Yannakakis' algorithm have not found their way into mainstream database technology yet. The goal of this work is to address this gap between theory and practice and to demonstrate that the consideration of query structure can improve query evaluation performance on modern DBMSs significantly in cases that have been traditionally challenging. In particular, we study the performance of structure-guided query evaluation in three architecturally distinct DBMSs by rewriting SQL queries into a sequence of SQL statements that express an execution of Yannakakis' algorithm. Moreover, we identify a class of queries that is particularly well suited for our approach and allows query answering in a variety of common scenarios without materializing any join. Through empirical evaluation we show that structure-guided query evaluation can make the evaluation of many difficult join queries feasible whereas their evaluation requires a prohibitive amount of time and memory on current DBMSs.
△ Less
Submitted 22 May, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Incremental Updates of Generalized Hypertree Decompositions
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus
Abstract:
Structural decomposition methods, such as generalized hypertree decompositions, have been successfully used for solving constraint satisfaction problems (CSPs). As decompositions can be reused to solve CSPs with the same constraint scopes, investing resources in computing good decompositions is beneficial, even though the computation itself is hard. Unfortunately, current methods need to compute a…
▽ More
Structural decomposition methods, such as generalized hypertree decompositions, have been successfully used for solving constraint satisfaction problems (CSPs). As decompositions can be reused to solve CSPs with the same constraint scopes, investing resources in computing good decompositions is beneficial, even though the computation itself is hard. Unfortunately, current methods need to compute a completely new decomposition even if the scopes change only slightly. In this paper, we make the first steps toward solving the problem of updating the decomposition of a CSP $P$ so that it becomes a valid decomposition of a new CSP $P'$ produced by some modification of $P$. Even though the problem is hard in theory, we propose and implement a framework for effectively updating GHDs. The experimental evaluation of our algorithm strongly suggests practical applicability.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Non-Uniformly Terminating Chase: Size and Complexity
Authors:
Marco Calautti,
Georg Gottlob,
Andreas Pieris
Abstract:
The chase procedure, originally introduced for checking implication of database constraints, and later on used for computing data exchange solutions, has recently become a central algorithmic tool in rule-based ontological reasoning. In this context, a key problem is non-uniform chase termination: does the chase of a database w.r.t. a rule-based ontology terminate? And if this is the case, what is…
▽ More
The chase procedure, originally introduced for checking implication of database constraints, and later on used for computing data exchange solutions, has recently become a central algorithmic tool in rule-based ontological reasoning. In this context, a key problem is non-uniform chase termination: does the chase of a database w.r.t. a rule-based ontology terminate? And if this is the case, what is the size of the result of the chase? We focus on guarded tuple-generating dependencies (TGDs), which form a robust rule-based ontology language, and study the above central questions for the semi-oblivious version of the chase. One of our main findings is that non-uniform semi-oblivious chase termination for guarded TGDs is feasible in polynomial time w.r.t. the database, and the size of the result of the chase (whenever is finite) is linear w.r.t. the database. Towards our results concerning non-uniform chase termination, we show that basic techniques such as simplification and linearization, originally introduced in the context of ontological query answering, can be safely applied to the chase termination problem.
△ Less
Submitted 26 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
MV-Datalog+-: Effective Rule-based Reasoning with Uncertain Observations
Authors:
Matthias Lanzinger,
Stefano Sferrazza,
Georg Gottlob
Abstract:
Modern applications combine information from a great variety of sources. Oftentimes, some of these sources, like Machine-Learning systems, are not strictly binary but associated with some degree of (lack of) confidence in the observation. We propose MV-Datalog and MV-Datalog+- as extensions of Datalog and Datalog+-, respectively, to the fuzzy semantics of infinite-valued Lukasiewicz logic L as lan…
▽ More
Modern applications combine information from a great variety of sources. Oftentimes, some of these sources, like Machine-Learning systems, are not strictly binary but associated with some degree of (lack of) confidence in the observation. We propose MV-Datalog and MV-Datalog+- as extensions of Datalog and Datalog+-, respectively, to the fuzzy semantics of infinite-valued Lukasiewicz logic L as languages for effectively reasoning in scenarios where such uncertain observations occur. We show that the semantics of MV-Datalog exhibits similar model-theoretic properties as Datalog. In particular, we show that (fuzzy) entailment can be decided via minimal fuzzy models. We show that when they exist, such minimal fuzzy models are unique (when they exist) and can be characterised in terms of a linear optimisation problem over the output of a fixed-point procedure. On the basis of this characterisation, we propose similar many-valued semantics for rules with existential quantification in the head, extending Datalog+-. This paper is under consideration for acceptance in TPLP.
△ Less
Submitted 13 May, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
On the Complexity of Inductively Learning Guarded Rules
Authors:
Andrei Draghici,
Georg Gottlob,
Matthias Lanzinger
Abstract:
We investigate the computational complexity of mining guarded clauses from clausal datasets through the framework of inductive logic programming (ILP). We show that learning guarded clauses is NP-complete and thus one step below the $σ^P_2$-complete task of learning Horn clauses on the polynomial hierarchy. Motivated by practical applications on large datasets we identify a natural tractable fragm…
▽ More
We investigate the computational complexity of mining guarded clauses from clausal datasets through the framework of inductive logic programming (ILP). We show that learning guarded clauses is NP-complete and thus one step below the $σ^P_2$-complete task of learning Horn clauses on the polynomial hierarchy. Motivated by practical applications on large datasets we identify a natural tractable fragment of the problem. Finally, we also generalise all of our results to $k$-guarded clauses for constant $k$.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth
Authors:
Georg Gottlob,
Matthias Lanzinger,
Cem Okulmus,
Reinhard Pichler
Abstract:
Modern trends in data collection are bringing current mainstream techniques for database query processing to their limits. Consequently, various novel approaches for efficient query processing are being actively studied. One such approach is based on hypertree decompositions (HDs), which have been shown to carry great potential to process complex queries more efficiently and with stronger theoreti…
▽ More
Modern trends in data collection are bringing current mainstream techniques for database query processing to their limits. Consequently, various novel approaches for efficient query processing are being actively studied. One such approach is based on hypertree decompositions (HDs), which have been shown to carry great potential to process complex queries more efficiently and with stronger theoretical guarantees. However, using HDs for query execution relies on the difficult task of computing decompositions of the query structure, which guides the efficient execution of the query. From theoretical results we know that the performance of purely sequential methods is inherently limited, yet the problem is susceptible to parallelisation.
In this paper we propose the first algorithm for computing hypertree decompositions that is well-suited for parallelisation. The proposed algorithm log-k-decomp requires only a logarithmic number of recursion levels and additionally allows for highly parallelised pruning of the search space by restriction to balanced separators. We provide detailed experimental evaluation over the HyperBench benchmark and demonstrate that our approach is highly effective especially for complex queries.
△ Less
Submitted 12 April, 2022; v1 submitted 28 April, 2021;
originally announced April 2021.
-
The HyperTrac Project: Recent Progress and Future Research Directions on Hypergraph Decompositions
Authors:
Georg Gottlob,
Matthias Lanzinger,
Davide Mario Longo,
Cem Okulmus,
Reinhard Pichler
Abstract:
Constraint Satisfaction Problems (CSPs) play a central role in many applications in Artificial Intelligence and Operations Research. In general, solving CSPs is NP-complete. The structure of CSPs is best described by hypergraphs. Therefore, various forms of hypergraph decompositions have been proposed in the literature to identify tractable fragments of CSPs. However, also the computation of a con…
▽ More
Constraint Satisfaction Problems (CSPs) play a central role in many applications in Artificial Intelligence and Operations Research. In general, solving CSPs is NP-complete. The structure of CSPs is best described by hypergraphs. Therefore, various forms of hypergraph decompositions have been proposed in the literature to identify tractable fragments of CSPs. However, also the computation of a concrete hypergraph decomposition is a challenging task in itself. In this paper, we report on recent progress in the study of hypergraph decompositions and we outline several directions for future research.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings
Authors:
Wolfgang Fischl,
Georg Gottlob,
Davide Mario Longo,
Reinhard Pichler
Abstract:
To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a pu…
▽ More
To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analyzing, and retrieving hypergraphs are called for.
We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Semantic Width and the Fixed-Parameter Tractability of Constraint Satisfaction Problems
Authors:
Hubie Chen,
Georg Gottlob,
Matthias Lanzinger,
Reinhard Pichler
Abstract:
Constraint satisfaction problems (CSPs) are an important formal framework for the uniform treatment of various prominent AI tasks, e.g., coloring or scheduling problems. Solving CSPs is, in general, known to be NP-complete and fixed-parameter intractable when parameterized by their constraint scopes. We give a characterization of those classes of CSPs for which the problem becomes fixed-parameter…
▽ More
Constraint satisfaction problems (CSPs) are an important formal framework for the uniform treatment of various prominent AI tasks, e.g., coloring or scheduling problems. Solving CSPs is, in general, known to be NP-complete and fixed-parameter intractable when parameterized by their constraint scopes. We give a characterization of those classes of CSPs for which the problem becomes fixed-parameter tractable.
Our characterization significantly increases the utility of the CSP framework by making it possible to decide the fixed-parameter tractability of problems via their CSP formulations.
We further extend our characterization to the evaluation of unions of conjunctive queries, a fundamental problem in databases. Furthermore, we provide some new insight on the frontier of PTIME solvability of CSPs.
In particular, we observe that bounded fractional hypertree width is more general than bounded hypertree width only for classes that exhibit a certain type of exponential growth.
The presented work resolves a long-standing open problem and yields powerful new tools for complexity research in AI and database theory.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Fractional Covers of Hypergraphs with Bounded Multi-Intersection
Authors:
Georg Gottlob,
Matthias Lanzinger,
Reinhard Pichler,
Igor Razgon
Abstract:
Fractional (hyper-)graph theory is concerned with the specific problems that arise when fractional analogues of otherwise integer-valued (hyper-)graph invariants are considered. The focus of this paper is on fractional edge covers of hypergraphs. Our main technical result generalizes and unifies previous conditions under which the size of the support of fractional edge covers is bounded independen…
▽ More
Fractional (hyper-)graph theory is concerned with the specific problems that arise when fractional analogues of otherwise integer-valued (hyper-)graph invariants are considered. The focus of this paper is on fractional edge covers of hypergraphs. Our main technical result generalizes and unifies previous conditions under which the size of the support of fractional edge covers is bounded independently of the size of the hypergraph itself. This allows us to extend previous tractability results for checking if the fractional hypertree width of a given hypergraph is $\leq k$ for some constant $k$. We also show how our results translate to fractional vertex covers.
△ Less
Submitted 22 September, 2023; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Complexity Analysis of Generalized and Fractional Hypertree Decompositions
Authors:
Georg Gottlob,
Matthias Lanzinger,
Reinhard Pichler,
Igor Razgon
Abstract:
Hypertree decompositions (HDs), as well as the more powerful generalized hypertree decompositions (GHDs), and the yet more general fractional hypertree decompositions (FHDs) are hypergraph decomposition methods successfully used for answering conjunctive queries and for solving constraint satisfaction problems. Every hypergraph $H$ has a width relative to each of these methods: its hypertree width…
▽ More
Hypertree decompositions (HDs), as well as the more powerful generalized hypertree decompositions (GHDs), and the yet more general fractional hypertree decompositions (FHDs) are hypergraph decomposition methods successfully used for answering conjunctive queries and for solving constraint satisfaction problems. Every hypergraph $H$ has a width relative to each of these methods: its hypertree width $hw(H)$, its generalized hypertree width $ghw(H)$, and its fractional hypertree width $fhw(H)$, respectively. It is known that $hw(H)\leq k$ can be checked in polynomial time for fixed $k$, while checking $ghw(H)\leq k$ is NP-complete for $k \geq 3$. The complexity of checking $fhw(H)\leq k$ for a fixed $k$ has been open for over a decade.
We settle this open problem by showing that checking $fhw(H)\leq k$ is NP-complete, even for $k=2$. The same construction allows us to prove also the NP-completeness of checking $ghw(H)\leq k$ for $k=2$. After that, we identify meaningful restrictions which make checking for bounded $ghw$ or $fhw$ tractable or allow for an efficient approximation of the $fhw$.
△ Less
Submitted 5 March, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Semantic Width of Conjunctive Queries and Constraint Satisfaction Problems
Authors:
Georg Gottlob,
Matthias Lanzinger,
Reinhard Pichler
Abstract:
Answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs) are arguably among the most fundamental tasks in Computer Science. They are classical NP-complete problems. Consequently, the search for tractable fragments of these problems has received a lot of research interest over the decades. This research has traditionally progressed along three orthogonal threads. a) R…
▽ More
Answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs) are arguably among the most fundamental tasks in Computer Science. They are classical NP-complete problems. Consequently, the search for tractable fragments of these problems has received a lot of research interest over the decades. This research has traditionally progressed along three orthogonal threads. a) Reformulating queries into simpler, equivalent, queries (semantic optimization) b) Bounding answer sizes based on structural properties of the query c) Decomposing the query in such a way that global consistency follows from local consistency. Much progress has been made by various works that connect two of these threads. Bounded answer sizes and decompositions have been shown to be tightly connected through the important notions of fractional hypertree width and, more recently, submodular width. recent papers by Barceló et al. study decompositions up to generalized hypertree width under semantic optimization. In this work, we connect all three of these threads by introducing a general notion of semantic width and investigating semantic versions of fractional hypertree width, adaptive width, submodular width and the fractional cover number.
△ Less
Submitted 13 December, 2018; v1 submitted 11 December, 2018;
originally announced December 2018.
-
HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings
Authors:
Wolfgang Fischl,
Georg Gottlob,
Davide M. Longo,
Reinhard Pichler
Abstract:
To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a pu…
▽ More
To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analysing, and retrieving hypergraphs are called for.
We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
The Space-Efficient Core of Vadalog
Authors:
Gerald Berger,
Georg Gottlob,
Andreas Pieris,
Emanuel Sallinger
Abstract:
Vadalog is a system for performing complex reasoning tasks such as those required in advanced knowledge graphs. The logical core of the underlying Vadalog language is the warded fragment of tuple-generating dependencies (TGDs). This formalism ensures tractable reasoning in data complexity, while a recent analysis focusing on a practical implementation led to the reasoning algorithm around which th…
▽ More
Vadalog is a system for performing complex reasoning tasks such as those required in advanced knowledge graphs. The logical core of the underlying Vadalog language is the warded fragment of tuple-generating dependencies (TGDs). This formalism ensures tractable reasoning in data complexity, while a recent analysis focusing on a practical implementation led to the reasoning algorithm around which the Vadalog system is built. A fundamental question that has emerged in the context of Vadalog is the following: can we limit the recursion allowed by wardedness in order to obtain a formalism that provides a convenient syntax for expressing useful recursive statements, and at the same time achieves space-efficiency? After analyzing several real-life examples of warded sets of TGDs provided by our industrial partners, as well as recent benchmarks, we observed that recursion is often used in a restricted way: the body of a TGD contains at most one atom whose predicate is mutually recursive with a predicate in the head. We show that this type of recursion, known as piece-wise linear in the Datalog literature, is the answer to our main question. We further show that piece-wise linear recursion alone, without the wardedness condition, is not enough as it leads to the undecidability of reasoning. We finally study the relative expressiveness of the query languages based on (piece-wise linear) warded sets of TGDs.
△ Less
Submitted 16 September, 2018;
originally announced September 2018.
-
Data Science with Vadalog: Bridging Machine Learning and Reasoning
Authors:
Luigi Bellomarini,
Ruslan R. Fayzrakhmanov,
Georg Gottlob,
Andrey Kravchenko,
Eleonora Laurenza,
Yavor Nenov,
Stephane Reissfelder,
Emanuel Sallinger,
Evgeny Sherkhonov,
Lianlong Wu
Abstract:
Following the recent successful examples of large technology companies, many modern enterprises seek to build knowledge graphs to provide a unified view of corporate knowledge and to draw deep insights using machine learning and logical reasoning. There is currently a perceived disconnect between the traditional approaches for data science, typically based on machine learning and statistical model…
▽ More
Following the recent successful examples of large technology companies, many modern enterprises seek to build knowledge graphs to provide a unified view of corporate knowledge and to draw deep insights using machine learning and logical reasoning. There is currently a perceived disconnect between the traditional approaches for data science, typically based on machine learning and statistical modelling, and systems for reasoning with domain knowledge. In this paper we present a state-of-the-art Knowledge Graph Management System, Vadalog, which delivers highly expressive and efficient logical reasoning and provides seamless integration with modern data science toolkits, such as the Jupyter platform. We demonstrate how to use Vadalog to perform traditional data wrangling tasks, as well as complex logical and probabilistic reasoning. We argue that this is a significant step forward towards combining machine learning and reasoning in data science.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Authors:
Luigi Bellomarini,
Georg Gottlob,
Emanuel Sallinger
Abstract:
Over the past years, there has been a resurgence of Datalog-based systems in the database community as well as in industry. In this context, it has been recognized that to handle the complex knowl\-edge-based scenarios encountered today, such as reasoning over large knowledge graphs, Datalog has to be extended with features such as existential quantification. Yet, Datalog-based reasoning in the pr…
▽ More
Over the past years, there has been a resurgence of Datalog-based systems in the database community as well as in industry. In this context, it has been recognized that to handle the complex knowl\-edge-based scenarios encountered today, such as reasoning over large knowledge graphs, Datalog has to be extended with features such as existential quantification. Yet, Datalog-based reasoning in the presence of existential quantification is in general undecidable. Many efforts have been made to define decidable fragments. Warded Datalog+/- is a very promising one, as it captures PTIME complexity while allowing ontological reasoning. Yet so far, no implementation of Warded Datalog+/- was available. In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. The Vadalog system is Oxford's contribution to the VADA research programme, a joint effort of the universities of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the main contribution of this paper, we illustrate the first implementation of Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive termination control strategy. We also provide a comprehensive experimental evaluation.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Datalog: Bag Semantics via Set Semantics
Authors:
Leopoldo Bertossi,
Georg Gottlob,
Reinhard Pichler
Abstract:
Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called {\em warded Datalog}$^\pm$, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a prac…
▽ More
Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called {\em warded Datalog}$^\pm$, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. This use of Datalog$^\pm$ is extended to give a set semantics to duplicates in Datalog$^\pm$ itself. We investigate the properties of the resulting Datalog$^\pm$ programs, the problem of deciding multiplicities, and expressibility of some bag operations. Moreover, the proposed translation has the potential for interesting applications such as to Multiset Relational Algebra and the semantic web query language SPARQL with bag semantics.
△ Less
Submitted 12 February, 2019; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Tree Projections and Constraint Optimization Problems: Fixed-Parameter Tractability and Parallel Algorithms
Authors:
Georg Gottlob,
Gianlugi Greco,
Francesco Scarcello
Abstract:
Tree projections provide a unifying framework to deal with most structural decomposition methods of constraint satisfaction problems (CSPs). Within this framework, a CSP instance is decomposed into a number of sub-problems, called views, whose solutions are either already available or can be computed efficiently. The goal is to arrange portions of these views in a tree-like structure, called tree…
▽ More
Tree projections provide a unifying framework to deal with most structural decomposition methods of constraint satisfaction problems (CSPs). Within this framework, a CSP instance is decomposed into a number of sub-problems, called views, whose solutions are either already available or can be computed efficiently. The goal is to arrange portions of these views in a tree-like structure, called tree projection, which determines an efficiently solvable CSP instance equivalent to the original one. Deciding whether a tree projection exists is NP-hard. Solution methods have therefore been proposed in the literature that do not require a tree projection to be given, and that either correctly decide whether the given CSP instance is satisfiable, or return that a tree projection actually does not exist. These approaches had not been generalized so far on CSP extensions for optimization problems, where the goal is to compute a solution of maximum value/minimum cost. The paper fills the gap, by exhibiting a fixed-parameter polynomial-time algorithm that either disproves the existence of tree projections or computes an optimal solution, with the parameter being the size of the expression of the objective function to be optimized over all possible solutions (and not the size of the whole constraint formula, used in related works). Tractability results are also established for the problem of returning the best K solutions. Finally, parallel algorithms for such optimization problems are proposed and analyzed. Given that the classes of acyclic hypergraphs, hypergraphs of bounded treewidth, and hypergraphs of bounded generalized hypertree width are all covered as special cases of the tree projection framework, the results in this paper directly apply to these classes. These classes are extensively considered in the CSP setting, as well as in conjunctive database query evaluation and optimization.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
General and Fractional Hypertree Decompositions: Hard and Easy Cases
Authors:
Wolfgang Fischl,
Georg Gottlob,
Reinhard Pichler
Abstract:
Hypertree decompositions, as well as the more powerful generalized hypertree decompositions (GHDs), and the yet more general fractional hypertree decompositions (FHD) are hypergraph decomposition methods successfully used for answering conjunctive queries and for the solution of constraint satisfaction problems. Every hypergraph H has a width relative to each of these decomposition methods: its hy…
▽ More
Hypertree decompositions, as well as the more powerful generalized hypertree decompositions (GHDs), and the yet more general fractional hypertree decompositions (FHD) are hypergraph decomposition methods successfully used for answering conjunctive queries and for the solution of constraint satisfaction problems. Every hypergraph H has a width relative to each of these decomposition methods: its hypertree width hw(H), its generalized hypertree width ghw(H), and its fractional hypertree width fhw(H), respectively.
It is known that hw(H) <= k can be checked in polynomial time for fixed k, while checking ghw(H) <= k is NP-complete for any k greater than or equal to 3. The complexity of checking fhw(H) <= k for a fixed k has been open for more than a decade.
We settle this open problem by showing that checking fhw(H) <= k is NP-complete, even for k=2. The same construction allows us to prove also the NP-completeness of checking ghw(H) <= k for k=2. After proving these hardness results, we identify meaningful restrictions, for which checking for bounded ghw or fhw becomes tractable.
△ Less
Submitted 18 July, 2019; v1 submitted 3 November, 2016;
originally announced November 2016.
-
Semantic Acyclicity Under Constraints
Authors:
Pablo Barcelo,
Georg Gottlob,
Andreas Pieris
Abstract:
A conjunctive query (CQ) is semantically acyclic if it is equivalent to an acyclic one. Semantic acyclicity has been studied in the constraint-free case, and deciding whether a query enjoys this property is NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (tgds) that can express, e.g., inclusion dependencies, or equality-generating dependen…
▽ More
A conjunctive query (CQ) is semantically acyclic if it is equivalent to an acyclic one. Semantic acyclicity has been studied in the constraint-free case, and deciding whether a query enjoys this property is NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (tgds) that can express, e.g., inclusion dependencies, or equality-generating dependencies (egds) that capture, e.g., functional dependencies, a CQ may turn out to be semantically acyclic under the constraints while not semantically acyclic in general. This opens avenues to new query optimization techniques. In this paper we initiate and develop the theory of semantic acyclicity under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically acyclic under the constraints, or, in other words, is the query equivalent to an acyclic one over all those databases that satisfy the set of constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not sufficient condition for the decidability of semantic acyclicity. In particular, we show that semantic acyclicity is undecidable in the presence of full tgds (i.e., Datalog rules). In view of this fact, we focus on the main classes of tgds for which CQ containment is decidable, and do not capture the class of full tgds, namely guarded, non-recursive and sticky tgds. For these classes we show that semantic acyclicity is decidable, and its complexity coincides with the complexity of CQ containment. In the case of egds, we show that for keys over unary and binary predicates semantic acyclicity is decidable (NP-complete). We finally consider the problem of evaluating a semantically acyclic query over a database that satisfies a set of constraints; for guarded tgds and functional dependencies this problem is tractable.
△ Less
Submitted 3 June, 2016; v1 submitted 3 February, 2016;
originally announced February 2016.
-
Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic
Authors:
Georg Gottlob,
Enrico Malizia
Abstract:
The hypergraph duality problem DUAL is defined as follows: given two simple hypergraphs $\mathcal{G}$ and $\mathcal{H}$, decide whether $\mathcal{H}$ consists precisely of all minimal transversals of $\mathcal{G}$ (in which case we say that $\mathcal{G}$ is the dual of $\mathcal{H}$). This problem is equivalent to deciding whether two given non-redundant monotone DNFs are dual. It is known that no…
▽ More
The hypergraph duality problem DUAL is defined as follows: given two simple hypergraphs $\mathcal{G}$ and $\mathcal{H}$, decide whether $\mathcal{H}$ consists precisely of all minimal transversals of $\mathcal{G}$ (in which case we say that $\mathcal{G}$ is the dual of $\mathcal{H}$). This problem is equivalent to deciding whether two given non-redundant monotone DNFs are dual. It is known that non-DUAL, the complementary problem to DUAL, is in $\mathrm{GC}(\log^2 n,\mathrm{PTIME})$, where $\mathrm{GC}(f(n),\mathcal{C})$ denotes the complexity class of all problems that after a nondeterministic guess of $O(f(n))$ bits can be decided (checked) within complexity class $\mathcal{C}$. It was conjectured that non-DUAL is in $\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})$. In this paper we prove this conjecture and actually place the non-DUAL problem into the complexity class $\mathrm{GC}(\log^2 n,\mathrm{TC}^0)$ which is a subclass of $\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})$. We here refer to the logtime-uniform version of $\mathrm{TC}^0$, which corresponds to $\mathrm{FO(COUNT)}$, i.e., first order logic augmented by counting quantifiers. We achieve the latter bound in two steps. First, based on existing problem decomposition methods, we develop a new nondeterministic algorithm for non-DUAL that requires to guess $O(\log^2 n)$ bits. We then proceed by a logical analysis of this algorithm, allowing us to formulate its deterministic part in $\mathrm{FO(COUNT)}$. From this result, by the well known inclusion $\mathrm{TC}^0\subseteq\mathrm{LOGSPACE}$, it follows that DUAL belongs also to $\mathrm{DSPACE}[\log^2 n]$. Finally, by exploiting the principles on which the proposed nondeterministic algorithm is based, we devise a deterministic algorithm that, given two hypergraphs $\mathcal{G}$ and $\mathcal{H}$, computes in quadratic logspace a transversal of $\mathcal{G}$ missing in $\mathcal{H}$.
△ Less
Submitted 20 November, 2017; v1 submitted 10 July, 2014;
originally announced July 2014.
-
Query Rewriting and Optimization for Ontological Databases
Authors:
Georg Gottlob,
Giorgio Orsi,
Andreas Pieris
Abstract:
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper,…
▽ More
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints which is well-suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process so to produce possibly small and cost-effective UCQ rewritings for an input query.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
Querying the Guarded Fragment
Authors:
Vince Bárány,
Georg Gottlob,
Martin Otto
Abstract:
Evaluating a Boolean conjunctive query Q against a guarded first-order theory F is equivalent to checking whether "F and not Q" is unsatisfiable. This problem is relevant to the areas of database theory and description logic. Since Q may not be guarded, well known results about the decidability, complexity, and finite-model property of the guarded fragment do not obviously carry over to conjunctiv…
▽ More
Evaluating a Boolean conjunctive query Q against a guarded first-order theory F is equivalent to checking whether "F and not Q" is unsatisfiable. This problem is relevant to the areas of database theory and description logic. Since Q may not be guarded, well known results about the decidability, complexity, and finite-model property of the guarded fragment do not obviously carry over to conjunctive query answering over guarded theories, and had been left open in general. By investigating finite guarded bisimilar covers of hypergraphs and relational structures, and by substantially generalising Rosati's finite chase, we prove for guarded theories F and (unions of) conjunctive queries Q that (i) Q is true in each model of F iff Q is true in each finite model of F and (ii) determining whether F implies Q is 2EXPTIME-complete. We further show the following results: (iii) the existence of polynomial-size conformal covers of arbitrary hypergraphs; (iv) a new proof of the finite model property of the clique-guarded fragment; (v) the small model property of the guarded fragment with optimal bounds; (vi) a polynomial-time solution to the canonisation problem modulo guarded bisimulation, which yields (vii) a capturing result for guarded bisimulation invariant PTIME.
△ Less
Submitted 20 May, 2014; v1 submitted 23 September, 2013;
originally announced September 2013.
-
Taming the Infinite Chase: Query Answering under Expressive Integrity Constraints
Authors:
Andrea Cali,
Georg Gottlob,
Michael Kifer
Abstract:
The chase algorithm is a fundamental tool for query evaluation and query containment under constraints, where the constraints are (sub-classes of) tuple-generating dependencies (TGDs) and equality generating depencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates, with some notable exceptions. In this paper we take a general approach,…
▽ More
The chase algorithm is a fundamental tool for query evaluation and query containment under constraints, where the constraints are (sub-classes of) tuple-generating dependencies (TGDs) and equality generating depencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates, with some notable exceptions. In this paper we take a general approach, and we propose large classes of TGDs under which the chase does not always terminate. Our languages, in particular, are inspired by guarded logic: we show that by enforcing syntactic properties on the form of the TGDs, we are able to ensure decidability of the problem of answering conjunctive queries despite the non-terminating chase. We provide tight complexity bounds for the problem of conjunctive query evaluation for several classes of TGDs. We then introduce EGDs, and provide a condition under which EGDs do not interact with TGDs, and therefore do not take part in query answering. We show applications of our classes of constraints to the problem of answering conjunctive queries under F-Logic Lite, a recently introduced ontology language, and under prominent tractable Description Logics languages. All the results in this paper immediately extend to the problem of conjunctive query containment.
△ Less
Submitted 17 November, 2013; v1 submitted 13 December, 2012;
originally announced December 2012.
-
Deciding Monotone Duality and Identifying Frequent Itemsets in Quadratic Logspace
Authors:
Georg Gottlob
Abstract:
The monotone duality problem is defined as follows: Given two monotone formulas f and g in iredundant DNF, decide whether f and g are dual. This problem is the same as duality testing for hypergraphs, that is, checking whether a hypergraph H consists of precisely all minimal transversals of a simple hypergraph G. By exploiting a recent problem-decomposition method by Boros and Makino (ICALP 2009),…
▽ More
The monotone duality problem is defined as follows: Given two monotone formulas f and g in iredundant DNF, decide whether f and g are dual. This problem is the same as duality testing for hypergraphs, that is, checking whether a hypergraph H consists of precisely all minimal transversals of a simple hypergraph G. By exploiting a recent problem-decomposition method by Boros and Makino (ICALP 2009), we show that duality testing for hypergraphs, and thus for monotone DNFs, is feasible in DSPACE[log^2 n], i.e., in quadratic logspace. As the monotone duality problem is equivalent to a number of problems in the areas of databases, data mining, and knowledge discovery, the results presented here yield new complexity results for those problems, too. For example, it follows from our results that whenever for a Boolean-valued relation (whose attributes represent items), a number of maximal frequent itemsets and a number of minimal infrequent itemsets are known, then it can be decided in quadratic logspace whether there exist additional frequent or infrequent itemsets.
△ Less
Submitted 22 August, 2013; v1 submitted 9 December, 2012;
originally announced December 2012.
-
AMBER: Automatic Supervision for Multi-Attribute Extraction
Authors:
Tim Furche,
Georg Gottlob,
Giovanni Grasso,
Giorgio Orsi,
Christian Schallhart,
Cheng Wang
Abstract:
The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on the page without supervision. What the former lack in automation, the latter lack in accuracy. Thus accurate, automatic multi-attribute object extraction has rema…
▽ More
The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on the page without supervision. What the former lack in automation, the latter lack in accuracy. Thus accurate, automatic multi-attribute object extraction has remained an open challenge.
AMBER overcomes both limitations through mutual supervision between the repeated structure and automatically produced annotations. Previous approaches based on automatic annotations have suffered from low quality due to the inherent noise in the annotations and have attempted to compensate by exploring multiple candidate wrappers. In contrast, AMBER compensates for this noise by integrating repeated structure analysis with annotation-based induction: The repeated structure limits the search space for wrapper induction, and conversely, annotations allow the repeated structure analysis to distinguish noise from relevant data. Both, low recall and low precision in the annotations are mitigated to achieve almost human quality (more than 98 percent) multi-attribute object extraction.
To achieve this accuracy, AMBER needs to be trained once for an entire domain. AMBER bootstraps its training from a small, possibly noisy set of attribute instances and a few unannotated sites of the domain.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
The Ontological Key: Automatically Understanding and Integrating Forms to Access the Deep Web
Authors:
Tim Furche,
Georg Gottlob,
Giovanni Grasso,
Xiaonan Guo,
Giorgio Orsi,
Christian Schallhart
Abstract:
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form…
▽ More
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form understanding exists, let alone one that produces rich models for semantic services or integration with linked open data.
In this paper, we present OPAL, the first comprehensive approach to form understanding and integration. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines features from the text, structure, and visual rendering of a web page. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches for form labeling by a significant margin. For form interpretation, OPAL uses a schema (or ontology) of forms in a given domain. Thanks to this domain schema, it is able to produce nearly perfect (more than 97 percent accuracy in the evaluation domains) form interpretations. Yet, the effort to produce a domain schema is very low, as we provide a Datalog-based template language that eases the specification of such schemata and a methodology for deriving a domain schema largely automatically from an existing domain ontology. We demonstrate the value of the form interpretations in OPAL through a light-weight form integration system that successfully translates and distributes master queries to hundreds of forms with no error, yet is implemented with only a handful translation rules.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
Tractable Optimization Problems through Hypergraph-Based Structural Restrictions
Authors:
Georg Gottlob,
Gianluigi Greco,
Francesco Scarcello
Abstract:
Several variants of the Constraint Satisfaction Problem have been proposed and investigated in the literature for modelling those scenarios where solutions are associated with some given costs. Within these frameworks computing an optimal solution is an NP-hard problem in general; yet, when restricted over classes of instances whose constraint interactions can be modelled via (nearly-)acyclic grap…
▽ More
Several variants of the Constraint Satisfaction Problem have been proposed and investigated in the literature for modelling those scenarios where solutions are associated with some given costs. Within these frameworks computing an optimal solution is an NP-hard problem in general; yet, when restricted over classes of instances whose constraint interactions can be modelled via (nearly-)acyclic graphs, this problem is known to be solvable in polynomial time. In this paper, larger classes of tractable instances are singled out, by discussing solution approaches based on exploiting hypergraph acyclicity and, more generally, structural decomposition methods, such as (hyper)tree decompositions.
△ Less
Submitted 15 September, 2012;
originally announced September 2012.
-
Ontological Queries: Rewriting and Optimization (Extended Version)
Authors:
Georg Gottlob,
Giorgio Orsi,
Andreas Pieris
Abstract:
Ontological queries are evaluated against an ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research.
In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the…
▽ More
Ontological queries are evaluated against an ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research.
In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints.
In particular, we show how a conjunctive query against an ontology can be compiled into a union of conjunctive queries against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective UCQ rewritings for an input query. We review existing optimization methods, and propose an effective new method that works for linear Datalog+/-, a class of Datalog-based rules that encompasses well-known description logics of the DL-Lite family.
△ Less
Submitted 1 December, 2011;
originally announced December 2011.
-
Pure Nash Equilibria: Hard and Easy Games
Authors:
G. Gottlob,
G. Greco,
F. Scarcello
Abstract:
We investigate complexity issues related to pure Nash equilibria of strategic games. We show that, even in very restrictive settings, determining whether a game has a pure Nash Equilibrium is NP-hard, while deciding whether a game has a strong Nash equilibrium is SigmaP2-complete. We then study practically relevant restrictions that lower the complexity. In particular, we are interested in quanti…
▽ More
We investigate complexity issues related to pure Nash equilibria of strategic games. We show that, even in very restrictive settings, determining whether a game has a pure Nash Equilibrium is NP-hard, while deciding whether a game has a strong Nash equilibrium is SigmaP2-complete. We then study practically relevant restrictions that lower the complexity. In particular, we are interested in quantitative and qualitative restrictions of the way each players payoff depends on moves of other players. We say that a game has small neighborhood if the utility function for each player depends only on (the actions of) a logarithmically small number of other players. The dependency structure of a game G can be expressed by a graph DG(G) or by a hypergraph H(G). By relating Nash equilibrium problems to constraint satisfaction problems (CSPs), we show that if G has small neighborhood and if H(G) has bounded hypertree width (or if DG(G) has bounded treewidth), then finding pure Nash and Pareto equilibria is feasible in polynomial time. If the game is graphical, then these problems are LOGCFL-complete and thus in the class NC2 of highly parallelizable problems.
△ Less
Submitted 9 September, 2011;
originally announced September 2011.
-
Introducing LoCo, a Logic for Configuration Problems
Authors:
Markus Aschinger,
Conrad Drescher,
Georg Gottlob
Abstract:
In this paper we present the core of LoCo, a logic-based high-level representation language for expressing configuration problems. LoCo shall allow to model these problems in an intuitive and declarative way, the dynamic aspects of configuration notwithstanding. Our logic enforces that configurations contain only finitely many components and reasoning can be reduced to the task of model constructi…
▽ More
In this paper we present the core of LoCo, a logic-based high-level representation language for expressing configuration problems. LoCo shall allow to model these problems in an intuitive and declarative way, the dynamic aspects of configuration notwithstanding. Our logic enforces that configurations contain only finitely many components and reasoning can be reduced to the task of model construction.
△ Less
Submitted 1 September, 2011;
originally announced September 2011.
-
Rewriting Ontological Queries into Small Nonrecursive Datalog Programs
Authors:
Georg Gottlob,
Thomas Schwentick
Abstract:
We consider the setting of ontological database access, where an Abox is given in form of a relational database D and where a Boolean conjunctive query q has to be evaluated against D modulo a Tbox T formulated in DL-Lite or Linear Datalog+/-. It is well-known that (T,q) can be rewritten into an equivalent nonrecursive Datalog program P that can be directly evaluated over D. However, for Linear Da…
▽ More
We consider the setting of ontological database access, where an Abox is given in form of a relational database D and where a Boolean conjunctive query q has to be evaluated against D modulo a Tbox T formulated in DL-Lite or Linear Datalog+/-. It is well-known that (T,q) can be rewritten into an equivalent nonrecursive Datalog program P that can be directly evaluated over D. However, for Linear Datalog? or for DL-Lite versions that allow for role inclusion, the rewriting methods described so far result in a nonrecursive Datalog program P of size exponential in the joint size of T and q. This gives rise to the interesting question of whether such a rewriting necessarily needs to be of exponential size. In this paper we show that it is actually possible to translate (T,q) into a polynomially sized equivalent nonrecursive Datalog program P.
△ Less
Submitted 23 July, 2011; v1 submitted 19 June, 2011;
originally announced June 2011.
-
Determining Relevance of Accesses at Runtime (Extended Version)
Authors:
Michael Benedikt,
Georg Gottlob,
Pierre Senellart
Abstract:
Consider the situation where a query is to be answered using Web sources that restrict the accesses that can be made on backend relational data by requiring some attributes to be given as input of the service. The accesses provide lookups on the collection of attributes values that match the binding. They can differ in whether or not they require arguments to be generated from prior accesses. Prio…
▽ More
Consider the situation where a query is to be answered using Web sources that restrict the accesses that can be made on backend relational data by requiring some attributes to be given as input of the service. The accesses provide lookups on the collection of attributes values that match the binding. They can differ in whether or not they require arguments to be generated from prior accesses. Prior work has focused on the question of whether a query can be answered using a set of data sources, and in developing static access plans (e.g., Datalog programs) that implement query answering. We are interested in dynamic aspects of the query answering problem: given partial information about the data, which accesses could provide relevant data for answering a given query? We consider immediate and long-term notions of "relevant accesses", and ascertain the complexity of query relevance, for both conjunctive queries and arbitrary positive queries. In the process, we relate dynamic relevance of an access to query containment under access limitations and characterize the complexity of this problem; we produce several complexity results about containment that are of interest by themselves.
△ Less
Submitted 30 May, 2011; v1 submitted 4 April, 2011;
originally announced April 2011.
-
On Minimal Constraint Networks
Authors:
Georg Gottlob
Abstract:
In a minimal binary constraint network, every tuple of a constraint relation can be extended to a solution. The tractability or intractability of computing a solution to such a minimal network was a long standing open question. Dechter conjectured this computation problem to be NP-hard. We prove this conjecture. We also prove a conjecture by Dechter and Pearl stating that for k\geq2 it is NP-hard…
▽ More
In a minimal binary constraint network, every tuple of a constraint relation can be extended to a solution. The tractability or intractability of computing a solution to such a minimal network was a long standing open question. Dechter conjectured this computation problem to be NP-hard. We prove this conjecture. We also prove a conjecture by Dechter and Pearl stating that for k\geq2 it is NP-hard to decide whether a single constraint can be decomposed into an equivalent k-ary constraint network. We show that this holds even in case of bi-valued constraints where k\geq3, which proves another conjecture of Dechter and Pearl. Finally, we establish the tractability frontier for this problem with respect to the domain cardinality and the parameter k.
△ Less
Submitted 25 July, 2012; v1 submitted 8 March, 2011;
originally announced March 2011.
-
Distributed XML Design
Authors:
S. Abiteboul,
G. Gottlob,
M. Manna
Abstract:
A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given…
▽ More
A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given a type (a schema document that may vary from a DTD to a tree automaton) for the distributed document, "propagating" locally this type into a collection of types, that we call typing, while preserving desirable properties. We also consider the bottom-up design which consists in, given a type for each external resource, exhibiting a global type that is enforced by the local types, again with natural desirable properties. In the article, we lay out the fundamentals of a theory of distributed XML design, analyze problems concerning typing issues in this setting, and study their complexity.
△ Less
Submitted 13 December, 2010;
originally announced December 2010.
-
Monadic Datalog over Finite Structures with Bounded Treewidth
Authors:
Georg Gottlob,
Reinhard Pichler,
Fang Wei
Abstract:
Bounded treewidth and Monadic Second Order (MSO) logic have proved to be key concepts in establishing fixed-parameter tractability results. Indeed, by Courcelle's Theorem we know: Any property of finite structures, which is expressible by an MSO sentence, can be decided in linear time (data complexity) if the structures have bounded treewidth.
In principle, Courcelle's Theorem can be applied d…
▽ More
Bounded treewidth and Monadic Second Order (MSO) logic have proved to be key concepts in establishing fixed-parameter tractability results. Indeed, by Courcelle's Theorem we know: Any property of finite structures, which is expressible by an MSO sentence, can be decided in linear time (data complexity) if the structures have bounded treewidth.
In principle, Courcelle's Theorem can be applied directly to construct concrete algorithms by transforming the MSO evaluation problem into a tree language recognition problem. The latter can then be solved via a finite tree automaton (FTA). However, this approach has turned out to be problematical, since even relatively simple MSO formulae may lead to a ``state explosion'' of the FTA.
In this work we propose monadic datalog (i.e., datalog where all intentional predicate symbols are unary) as an alternative method to tackle this class of fixed-parameter tractable problems. We show that if some property of finite structures is expressible in MSO then this property can also be expressed by means of a monadic datalog program over the structure plus the tree decomposition.
Moreover, we show that the resulting fragment of datalog can be evaluated in linear time (both w.r.t. the program size and w.r.t. the data size). This new approach is put to work by devising new algorithms for the 3-Colorability problem of graphs and for the PRIMALITY problem of relational schemas (i.e., testing if some attribute in a relational schema is part of a key). We also report on experimental results with a prototype implementation.
△ Less
Submitted 18 September, 2008;
originally announced September 2008.
-
A Backtracking-Based Algorithm for Computing Hypertree-Decompositions
Authors:
Georg Gottlob,
Marko Samer
Abstract:
Hypertree decompositions of hypergraphs are a generalization of tree decompositions of graphs. The corresponding hypertree-width is a measure for the cyclicity and therefore tractability of the encoded computation problem. Many NP-hard decision and computation problems are known to be tractable on instances whose structure corresponds to hypergraphs of bounded hypertree-width. Intuitively, the s…
▽ More
Hypertree decompositions of hypergraphs are a generalization of tree decompositions of graphs. The corresponding hypertree-width is a measure for the cyclicity and therefore tractability of the encoded computation problem. Many NP-hard decision and computation problems are known to be tractable on instances whose structure corresponds to hypergraphs of bounded hypertree-width. Intuitively, the smaller the hypertree-width, the faster the computation problem can be solved. In this paper, we present the new backtracking-based algorithm det-k-decomp for computing hypertree decompositions of small width. Our benchmark evaluations have shown that det-k-decomp significantly outperforms opt-k-decomp, the only exact hypertree decomposition algorithm so far. Even compared to the best heuristic algorithm, we obtained competitive results as long as the hypergraphs are not too large.
△ Less
Submitted 13 January, 2007;
originally announced January 2007.
-
Conjunctive Queries over Trees
Authors:
Georg Gottlob,
Christoph Koch,
Klaus U. Schulz
Abstract:
We study the complexity and expressive power of conjunctive queries over unranked labeled trees represented using a variety of structure relations such as ``child'', ``descendant'', and ``following'' as well as unary relations for node labels. We establish a framework for characterizing structures representing trees for which conjunctive queries can be evaluated efficiently. Then we completely c…
▽ More
We study the complexity and expressive power of conjunctive queries over unranked labeled trees represented using a variety of structure relations such as ``child'', ``descendant'', and ``following'' as well as unary relations for node labels. We establish a framework for characterizing structures representing trees for which conjunctive queries can be evaluated efficiently. Then we completely chart the tractability frontier of the problem and establish a dichotomy theorem for our axis relations, i.e., we find all subset-maximal sets of axes for which query evaluation is in polynomial time and show that for all other cases, query evaluation is NP-complete. All polynomial-time results are obtained immediately using the proof techniques from our framework. Finally, we study the expressiveness of conjunctive queries over trees and show that for each conjunctive query, there is an equivalent acyclic positive query (i.e., a set of acyclic conjunctive queries), but that in general this query is not of polynomial size.
△ Less
Submitted 2 February, 2006;
originally announced February 2006.
-
A Formal Comparison of Visual Web Wrapper Generators
Authors:
Georg Gottlob,
Christoph Koch
Abstract:
We study the core fragment of the Elog wrapping language used in the Lixto system (a visual wrapper generator) and formally compare Elog to other wrapping languages proposed in the literature.
We study the core fragment of the Elog wrapping language used in the Lixto system (a visual wrapper generator) and formally compare Elog to other wrapping languages proposed in the literature.
△ Less
Submitted 7 October, 2003;
originally announced October 2003.
-
Monadic Datalog and the Expressive Power of Languages for Web Information Extraction
Authors:
Georg Gottlob,
Christoph Koch
Abstract:
Research on information extraction from Web pages (wrapping) has seen much activity recently (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog over trees as a wrapping language. We show that this simple language is equ…
▽ More
Research on information extraction from Web pages (wrapping) has seen much activity recently (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog over trees as a wrapping language. We show that this simple language is equivalent to monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and propose MSO as a yardstick for evaluating and comparing wrappers. Along the way, several other results on the complexity of query evaluation and query containment for monadic datalog over trees are established, and a simple normal form for this language is presented. Using the above results, we subsequently study the kernel fragment Elog$^-$ of the Elog wrapping language used in the Lixto system (a visual wrapper generator). Curiously, Elog$^-$ exactly captures MSO, yet is easier to use. Indeed, programs in this language can be entirely visually specified.
△ Less
Submitted 7 October, 2003; v1 submitted 15 November, 2002;
originally announced November 2002.
-
The DLV System for Knowledge Representation and Reasoning
Authors:
Nicola Leone,
Gerald Pfeifer,
Wolfgang Faber,
Thomas Eiter,
Georg Gottlob,
Simona Perri,
Francesco Scarcello
Abstract:
This paper presents the DLV system, which is widely considered the state-of-the-art implementation of disjunctive logic programming, and addresses several aspects. As for problem solving, we provide a formal definition of its kernel language, function-free disjunctive logic programs (also known as disjunctive datalog), extended by weak constraints, which are a powerful tool to express optimizati…
▽ More
This paper presents the DLV system, which is widely considered the state-of-the-art implementation of disjunctive logic programming, and addresses several aspects. As for problem solving, we provide a formal definition of its kernel language, function-free disjunctive logic programs (also known as disjunctive datalog), extended by weak constraints, which are a powerful tool to express optimization problems. We then illustrate the usage of DLV as a tool for knowledge representation and reasoning, describing a new declarative programming methodology which allows one to encode complex problems (up to $Δ^P_3$-complete problems) in a declarative fashion. On the foundational side, we provide a detailed analysis of the computational complexity of the language of DLV, and by deriving new complexity results we chart a complete picture of the complexity of this language and important fragments thereof.
Furthermore, we illustrate the general architecture of the DLV system which has been influenced by these results. As for applications, we overview application front-ends which have been developed on top of DLV to solve specific knowledge representation tasks, and we briefly describe the main international projects investigating the potential of the system for industrial exploitation. Finally, we report about thorough experimentation and benchmarking, which has been carried out to assess the efficiency of the system. The experimental results confirm the solidity of DLV and highlight its potential for emerging application areas like knowledge management and information integration.
△ Less
Submitted 10 September, 2003; v1 submitted 4 November, 2002;
originally announced November 2002.
-
Complexity of Nested Circumscription and Nested Abnormality Theories
Authors:
Marco Cadoli,
Thomas Eiter,
Georg Gottlob
Abstract:
The need for a circumscriptive formalism that allows for simple yet elegant modular problem representation has led Lifschitz (AIJ, 1995) to introduce nested abnormality theories (NATs) as a tool for modular knowledge representation, tailored for applying circumscription to minimize exceptional circumstances. Abstracting from this particular objective, we propose L_{CIRC}, which is an extension o…
▽ More
The need for a circumscriptive formalism that allows for simple yet elegant modular problem representation has led Lifschitz (AIJ, 1995) to introduce nested abnormality theories (NATs) as a tool for modular knowledge representation, tailored for applying circumscription to minimize exceptional circumstances. Abstracting from this particular objective, we propose L_{CIRC}, which is an extension of generic propositional circumscription by allowing propositional combinations and nesting of circumscriptive theories. As shown, NATs are naturally embedded into this language, and are in fact of equal expressive capability. We then analyze the complexity of L_{CIRC} and NATs, and in particular the effect of nesting. The latter is found to be a source of complexity, which climbs the Polynomial Hierarchy as the nesting depth increases and reaches PSPACE-completeness in the general case. We also identify meaningful syntactic fragments of NATs which have lower complexity. In particular, we show that the generalization of Horn circumscription in the NAT framework remains CONP-complete, and that Horn NATs without fixed letters can be efficiently transformed into an equivalent Horn CNF, which implies polynomial solvability of principal reasoning tasks. Finally, we also study extensions of NATs and briefly address the complexity in the first-order case. Our results give insight into the ``cost'' of using L_{CIRC} (resp. NATs) as a host language for expressing other formalisms such as action theories, narratives, or spatial theories.
△ Less
Submitted 20 July, 2002;
originally announced July 2002.
-
New Results on Monotone Dualization and Generating Hypergraph Transversals
Authors:
Thomas Eiter,
Georg Gottlob,
Kazuhisa Makino
Abstract:
We consider the problem of dualizing a monotone CNF (equivalently, computing all minimal transversals of a hypergraph), whose associated decision problem is a prominent open problem in NP-completeness. We present a number of new polynomial time resp. output-polynomial time results for significant cases, which largely advance the tractability frontier and improve on previous results. Furthermore,…
▽ More
We consider the problem of dualizing a monotone CNF (equivalently, computing all minimal transversals of a hypergraph), whose associated decision problem is a prominent open problem in NP-completeness. We present a number of new polynomial time resp. output-polynomial time results for significant cases, which largely advance the tractability frontier and improve on previous results. Furthermore, we show that duality of two monotone CNFs can be disproved with limited nondeterminism. More precisely, this is feasible in polynomial time with O(chi(n) * log n) suitably guessed bits, where chi(n) is given by χ(n)^chi(n) = n; note that chi(n) = o(log n). This result sheds new light on the complexity of this important problem.
△ Less
Submitted 26 April, 2002; v1 submitted 4 April, 2002;
originally announced April 2002.
-
Hypertree Decompositions and Tractable Queries
Authors:
G. Gottlob,
N. Leone,
F. Scarcello
Abstract:
Several important decision problems on conjunctive queries (CQs) are NP-complete in general but become tractable, and actually highly parallelizable, if restricted to acyclic or nearly acyclic queries. Examples are the evaluation of Boolean CQs and query containment. These problems were shown tractable for conjunctive queries of bounded treewidth and of bounded degree of cyclicity. The so far mo…
▽ More
Several important decision problems on conjunctive queries (CQs) are NP-complete in general but become tractable, and actually highly parallelizable, if restricted to acyclic or nearly acyclic queries. Examples are the evaluation of Boolean CQs and query containment. These problems were shown tractable for conjunctive queries of bounded treewidth and of bounded degree of cyclicity. The so far most general concept of nearly acyclic queries was the notion of queries of bounded query-width introduced by Chekuri and Rajaraman (1997). While CQs of bounded query width are tractable, it remained unclear whether such queries are efficiently recognizable. Chekuri and Rajaraman stated as an open problem whether for each constant k it can be determined in polynomial time if a query has query width less than or equal to k. We give a negative answer by proving this problem NP-complete (specifically, for k=4). In order to circumvent this difficulty, we introduce the new concept of hypertree decomposition of a query and the corresponding notion of hypertree width. We prove: (a) for each k, the class of queries with query width bounded by k is properly contained in the class of queries whose hypertree width is bounded by k; (b) unlike query width, constant hypertree-width is efficiently recognizable; (c) Boolean queries of constant hypertree width can be efficiently evaluated.
△ Less
Submitted 28 December, 1998;
originally announced December 1998.