-
Keyword Search in the Deep Web
Authors:
Andrea Calì,
Davide Martinenghi,
Riccardo Torlone
Abstract:
The Deep Web is constituted by data that are accessible through Web pages, but not readily indexable by search engines as they are returned in dynamic pages. In this paper we propose a conceptual framework for answering keyword queries on Deep Web sources represented as relational tables with so-called access limitations. We formalize the notion of optimal answer, characterize queries for which an…
▽ More
The Deep Web is constituted by data that are accessible through Web pages, but not readily indexable by search engines as they are returned in dynamic pages. In this paper we propose a conceptual framework for answering keyword queries on Deep Web sources represented as relational tables with so-called access limitations. We formalize the notion of optimal answer, characterize queries for which an answer can be found, and present a method for query processing based on the construction of a query plan that minimizes the accesses to the data sources.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
On complexity of restricted fragments of Decision DNNF
Authors:
Andrea Calí,
Igor Razgon
Abstract:
Decision DNNF (a.k.a. $\wedge_d$-FBDD) is an important special case of Decomposable Negation Normal Form (DNNF). Decision DNNF admits FPT sized representation of CNFs of bounded \emph{primal} treewidth. However, the complexity of representation for CNFs of bounded \emph{incidence} treewidth is wide open.
In the main part of this paper we carry out an in-depth study of the $\wedge_d$-OBDD model.…
▽ More
Decision DNNF (a.k.a. $\wedge_d$-FBDD) is an important special case of Decomposable Negation Normal Form (DNNF). Decision DNNF admits FPT sized representation of CNFs of bounded \emph{primal} treewidth. However, the complexity of representation for CNFs of bounded \emph{incidence} treewidth is wide open.
In the main part of this paper we carry out an in-depth study of the $\wedge_d$-OBDD model. We formulate a generic methodology for proving lower bounds for the model. Using this methodology, we reestablish the XP lower bound provided in [arxiv:1708.07767]. We also provide exponential separations between FBDD and $\wedge_d$-OBDD and between $\wedge_d$-OBDD and an ordinary OBDD. The last separation is somewhat surprising since $\wedge_d$-FBDD can be quasipolynomially simulated by FBDD.
In the remaining part of the paper, we introduce a relaxed version of Structured Decision DNNF that we name Structured $\wedge_d$-FBDD. We demonstrate that this model is quite powerful for CNFs of bounded incidence treewidth: it has an FPT representation for CNFs that can be turned into ones of bounded primal treewidth by removal of a constant number of clauses (while for both $\wedge_d$-OBDD and Structured Decision DNNF an XP lower bound is triggered by just two long clauses).
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Regular resolution for CNFs with almost bounded one-sided treewidth
Authors:
Andrea Cali,
Igor Razgon
Abstract:
We introduce a one-sided incidence tree decomposition of a CNF $\varphi$. This is a tree decomposition of the incidence graph of $\varphi$ where the underlying tree is rooted and the set of bags containing each clause induces a directed path in the tree. The one-sided treewidth is the smallest width of a one-sided incidence tree decomposition.
We consider a class of unsatisfiable CNF $\varphi$ t…
▽ More
We introduce a one-sided incidence tree decomposition of a CNF $\varphi$. This is a tree decomposition of the incidence graph of $\varphi$ where the underlying tree is rooted and the set of bags containing each clause induces a directed path in the tree. The one-sided treewidth is the smallest width of a one-sided incidence tree decomposition.
We consider a class of unsatisfiable CNF $\varphi$ that can be turned into one of one sided treewidth at most $k$ by removal of at most $p$ clauses. We show that the size of regular resolution for this class of CNFs is FPT parameterized by $k$ and $p$. The results contributes to understanding the complexity of resolution for CNFs of bounded incidence treewidth, an open problem well known in the areas of proof complexity and knowledge compilation. In particular, the result significantly generalizes all the restricted classes of CNFs of bounded incidence treewidth that are known to admit an FPT sized resolution.
The proof includes an auxiliary result and several new notions that may be of an independent interest.
△ Less
Submitted 31 August, 2022; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Non-FPT lower bounds for structural restrictions of decision DNNF
Authors:
Andrea Calì,
Florent Capelli,
Igor Razgon
Abstract:
We give a non-FPT lower bound on the size of structured decision DNNF and OBDD with decomposable AND-nodes representing CNF-formulas of bounded incidence treewidth. Both models are known to be of FPT size for CNFs of bounded primal treewidth. To the best of our knowledge this is the first parameterized separation of primal treewidth and incidence treewidth for knowledge compilation models.
We give a non-FPT lower bound on the size of structured decision DNNF and OBDD with decomposable AND-nodes representing CNF-formulas of bounded incidence treewidth. Both models are known to be of FPT size for CNFs of bounded primal treewidth. To the best of our knowledge this is the first parameterized separation of primal treewidth and incidence treewidth for knowledge compilation models.
△ Less
Submitted 25 August, 2017;
originally announced August 2017.
-
A Hybrid Approach to Query Answering under Expressive Datalog+/-
Authors:
Mostafa Milani,
Andrea Cali,
Leopoldo Bertossi
Abstract:
Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-…
▽ More
Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-acyclic TGDs. So far, only non-deterministic algorithms were known for answering queries on WS Datalog+/- programs. We present novel deterministic query answering algorithms under WS Datalog+/-. In particular, we propose: (1) a bottom-up grounding algorithm based on a query-driven chase, and (2) a hybrid approach based on transforming a WS program into a so-called sticky one, for which query rewriting techniques are known. We discuss how our algorithms can be optimized and effectively applied for query answering in real-world scenarios.
△ Less
Submitted 25 July, 2016; v1 submitted 22 April, 2016;
originally announced April 2016.
-
Deep Separability of Ontological Constraints
Authors:
Andrea Calì,
Marco Console,
Riccardo Frosini
Abstract:
When data schemata are enriched with expressive constraints that aim at representing the domain of interest, in order to answer queries one needs to consider the logical theory consisting of both the data and the constraints. Query answering in such a context is called ontological query answering. Commonly adopted database constraints in this field are tuple-generating dependencies (TGDs) and equa…
▽ More
When data schemata are enriched with expressive constraints that aim at representing the domain of interest, in order to answer queries one needs to consider the logical theory consisting of both the data and the constraints. Query answering in such a context is called ontological query answering. Commonly adopted database constraints in this field are tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). It is well known that their interaction leads to intractability or undecidability of query answering even in the case of simple subclasses. Several conditions have been found to guarantee separability, that is lack of interaction, between TGDs and EGDs. Separability makes EGDs (mostly) irrelevant for query answering and therefore often guarantees tractability, as long as the theory is satisfiable. In this paper we review the two notions of separability found in the literature, as well as several syntactic conditions that are sufficient to prove them. We then shed light on the issue of satisfiability checking, showing that under a sufficient condition called deep separability it can be done by considering the TGDs only.
We show that, fortunately, in the case of TGDs and EGDs, separability implies deep separability. This result generalizes several analogous ones, proved ad hoc for particular classes of constraints. Applications include the class of sticky TGDs and EGDs, for which we provide a syntactic separability condition which extends the analogous one for linear TGDs; preliminary experiments show the feasibility of query answering in this case.
△ Less
Submitted 31 December, 2013; v1 submitted 20 December, 2013;
originally announced December 2013.
-
Containment of Schema Mappings for Data Exchange (Preliminary Report)
Authors:
Andrea Calì,
Riccardo Torlone
Abstract:
In data exchange, data are materialised from a source schema to a target schema, according to suitable source-to-target constraints. Constraints are also expressed on the target schema to represent the domain of interest. A schema mapping is the union of the source-to-target and of the target constraints.
In this paper, we address the problem of containment of schema mappings for data exchange,…
▽ More
In data exchange, data are materialised from a source schema to a target schema, according to suitable source-to-target constraints. Constraints are also expressed on the target schema to represent the domain of interest. A schema mapping is the union of the source-to-target and of the target constraints.
In this paper, we address the problem of containment of schema mappings for data exchange, which has been recently proposed in this framework as a step towards the optimization of data exchange settings. We refer to a natural notion of containment that relies on the behaviour of schema mappings with respect to conjunctive query answering, in the presence of so-called LAV TGDs as target constraints. Our contribution is a practical technique for testing the containment based on the existence of a homomorphism between special "dummy" instances, which can be easily built from schema mappings.
We argue that containment of schema mappings is decidable for most practical cases, and we set the basis for further investigations in the topic. This paper extends our preliminary results.
△ Less
Submitted 31 December, 2013; v1 submitted 20 December, 2013;
originally announced December 2013.
-
Taming the Infinite Chase: Query Answering under Expressive Integrity Constraints
Authors:
Andrea Cali,
Georg Gottlob,
Michael Kifer
Abstract:
The chase algorithm is a fundamental tool for query evaluation and query containment under constraints, where the constraints are (sub-classes of) tuple-generating dependencies (TGDs) and equality generating depencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates, with some notable exceptions. In this paper we take a general approach,…
▽ More
The chase algorithm is a fundamental tool for query evaluation and query containment under constraints, where the constraints are (sub-classes of) tuple-generating dependencies (TGDs) and equality generating depencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates, with some notable exceptions. In this paper we take a general approach, and we propose large classes of TGDs under which the chase does not always terminate. Our languages, in particular, are inspired by guarded logic: we show that by enforcing syntactic properties on the form of the TGDs, we are able to ensure decidability of the problem of answering conjunctive queries despite the non-terminating chase. We provide tight complexity bounds for the problem of conjunctive query evaluation for several classes of TGDs. We then introduce EGDs, and provide a condition under which EGDs do not interact with TGDs, and therefore do not take part in query answering. We show applications of our classes of constraints to the problem of answering conjunctive queries under F-Logic Lite, a recently introduced ontology language, and under prominent tractable Description Logics languages. All the results in this paper immediately extend to the problem of conjunctive query containment.
△ Less
Submitted 17 November, 2013; v1 submitted 13 December, 2012;
originally announced December 2012.
-
Querying Incomplete Data over Extended ER Schemata
Authors:
Andrea Cali,
Davide Martinenghi
Abstract:
Since Chen's Entity-Relationship (ER) model, conceptual modeling has been playing a fundamental role in relational data design. In this paper we consider an extended ER (EER) model enriched with cardinality constraints, disjointness assertions, and is-a relations among both entities and relationships. In this setting, we consider the case of incomplete data, which is likely to occur, for instanc…
▽ More
Since Chen's Entity-Relationship (ER) model, conceptual modeling has been playing a fundamental role in relational data design. In this paper we consider an extended ER (EER) model enriched with cardinality constraints, disjointness assertions, and is-a relations among both entities and relationships. In this setting, we consider the case of incomplete data, which is likely to occur, for instance, when data from different sources are integrated. In such a context, we address the problem of providing correct answers to conjunctive queries by reasoning on the schema. Based on previous results about decidability of the problem, we provide a query answering algorithm that performs rewriting of the initial query into a recursive Datalog query encoding the information about the schema. We finally show extensions to more general settings. This paper will appear in the special issue of Theory and Practice of Logic Programming (TPLP) titled Logic Programming in Databases: From Datalog to Semantic-Web Rules.
△ Less
Submitted 13 April, 2010; v1 submitted 16 March, 2010;
originally announced March 2010.