Querying Incomplete Numerical Data: Between Certain and Possible Answers
Authors:
Marco Console,
Leonid Libkin,
Liat Peterfreund
Abstract:
Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers. In the presence of numerical a…
▽ More
Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers. In the presence of numerical attributes and aggregates, however, these answers are often meaningless, returning either too little or too much. Our goal is to define a principled framework for databases with numerical nulls and answering queries with arithmetic and aggregations over them.
Towards this goal, we assume that missing values in numerical attributes are given by probability distributions associated with marked nulls. This yields a model of probabilistic bag databases in which tuples are not necessarily independent, since nulls can repeat. We provide a general compositional framework for query answering, and then concentrate on queries that resemble standard SQL with arithmetic and aggregation. We show that these queries are measurable, and that their outputs have a finite representation. Moreover, since the classical forms of answers provide little information in the numerical setting, we look at the probability that numerical values in output tuples belong to specific intervals. Even though their exact computation is intractable, we show efficient approximation algorithms to compute such probabilities.
△ Less
Submitted 1 November, 2022; v1 submitted 27 October, 2022;
originally announced October 2022.
Deep Separability of Ontological Constraints
Authors:
Andrea Calì,
Marco Console,
Riccardo Frosini
Abstract:
When data schemata are enriched with expressive constraints that aim at representing the domain of interest, in order to answer queries one needs to consider the logical theory consisting of both the data and the constraints. Query answering in such a context is called ontological query answering. Commonly adopted database constraints in this field are tuple-generating dependencies (TGDs) and equa…
▽ More
When data schemata are enriched with expressive constraints that aim at representing the domain of interest, in order to answer queries one needs to consider the logical theory consisting of both the data and the constraints. Query answering in such a context is called ontological query answering. Commonly adopted database constraints in this field are tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). It is well known that their interaction leads to intractability or undecidability of query answering even in the case of simple subclasses. Several conditions have been found to guarantee separability, that is lack of interaction, between TGDs and EGDs. Separability makes EGDs (mostly) irrelevant for query answering and therefore often guarantees tractability, as long as the theory is satisfiable. In this paper we review the two notions of separability found in the literature, as well as several syntactic conditions that are sufficient to prove them. We then shed light on the issue of satisfiability checking, showing that under a sufficient condition called deep separability it can be done by considering the TGDs only.
We show that, fortunately, in the case of TGDs and EGDs, separability implies deep separability. This result generalizes several analogous ones, proved ad hoc for particular classes of constraints. Applications include the class of sticky TGDs and EGDs, for which we provide a syntactic separability condition which extends the analogous one for linear TGDs; preliminary experiments show the feasibility of query answering in this case.
△ Less
Submitted 31 December, 2013; v1 submitted 20 December, 2013;
originally announced December 2013.