-
Attribution Score Alignment in Explainable Data Management
Abstract: Different attribution-scores have been proposed to quantify the relevance of database tuples for a query answer from a database. Among them, we find Causal Responsibility, the Shapley Value, the Banzhaf Power-Index, and the Causal Effect. They have been analyzed in isolation, mainly in terms of computational properties. In this work, we start an investigation into the alignment of these scores on… ▽ More
Submitted 24 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.
Comments: Relevant references added in this version
-
arXiv:2502.02495 [pdf, ps, other]
The Causal-Effect Score in Data Management
Abstract: The Causal Effect (CE) is a numerical measure of causal influence of variables on observed results. Despite being widely used in many areas, only preliminary attempts have been made to use CE as an attribution score in data management, to measure the causal strength of tuples for query answering in databases. In this work, we introduce, generalize and investigate the so-called Causal-Effect Score… ▽ More
Submitted 27 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.
Comments: To appear in Proceedings of the 4th Conference on Causal Learning and Reasoning, 2025. This is the camera-ready version, and included a couple of new references
-
The Distributional Uncertainty of the SHAP score in Explainable Machine Learning
Abstract: Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is g… ▽ More
Submitted 13 August, 2024; v1 submitted 23 January, 2024; originally announced January 2024.
Comments: In ECAI 2024 proceedings
MSC Class: 68T37; 68T27
-
The Shapley Value in Database Management
Abstract: Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention… ▽ More
Submitted 11 January, 2024; originally announced January 2024.
Comments: 12 pages, including references. This is the authors version of the corresponding SIGMOD Record article
Journal ref: SIGMOD Rec. 52(2): 6-17 (2023)
-
Attribution-Scores in Data Management and Explainable Machine Learning
Abstract: We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database.… ▽ More
Submitted 31 July, 2023; originally announced August 2023.
Comments: Paper associated to ADBIS23 tutorial. To appear. arXiv admin note: substantial text overlap with arXiv:2303.02829, arXiv:2106.10562
-
From Database Repairs to Causality in Databases and Beyond
Abstract: We describe some recent approaches to score-based explanations for query answers in databases. The focus is on work done by the author and collaborators. Special emphasis is placed on the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.
Submitted 15 June, 2023; originally announced June 2023.
Comments: Contributed paper associated to keynote presentation at BDA 2022. To appear in special issue of Springer TLDKS. arXiv admin note: substantial text overlap with arXiv:2106.10562
-
Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation
Abstract: The use of Shap scores has become widespread in Explainable AI. However, their computation is in general intractable, in particular when done with a black-box classifier, such as neural network. Recent research has unveiled classes of open-box Boolean Circuit classifiers for which Shap can be computed efficiently. We show how to transform binary neural networks into those circuits for efficient Sh… ▽ More
Submitted 22 July, 2023; v1 submitted 11 March, 2023; originally announced March 2023.
Comments: Substantial revision of previous version with the same title. To appear in conference proceedings. It replaces the previously uploaded paper "Opening Up the Neural Network Classifier for Shap Score Computation", by the same authors
-
Attribution-Scores and Causal Counterfactuals as Explanations in Artificial Intelligence
Abstract: In this expository article we highlight the relevance of explanations for artificial intelligence, in general, and for the newer developments in {\em explainable AI}, referring to origins and connections of and among different approaches. We describe in simple terms, explanations in data management and machine learning that are based on attribution-scores, and counterfactuals as found in the area… ▽ More
Submitted 22 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.
Comments: Submitted as chapter contribution. In this version some additional comments were added, and some wrong equation references corrected
-
arXiv:2209.12110 [pdf, ps, other]
Answer-Set Programs for Repair Updates and Counterfactual Interventions
Abstract: We briefly describe -- mainly through very simple examples -- different kinds of answer-set programs with annotations that have been proposed for specifying: database repairs and consistent query answering; secrecy view and query evaluation with them; counterfactual interventions for causality in databases; and counterfactual-based explanations in machine learning.
Submitted 24 September, 2022; originally announced September 2022.
Comments: Submitted to Festschrift volume
-
arXiv:2108.11004 [pdf, ps, other]
Reasoning about Counterfactuals and Explanations: Problems, Results and Directions
Abstract: There are some recent approaches and results about the use of answer-set programming for specifying counterfactual interventions on entities under classification, and reasoning about them. These approaches are flexible and modular in that they allow the seamless addition of domain knowledge. Reasoning is enabled by query answering from the answer-set program. The programs can be used to specify an… ▽ More
Submitted 24 August, 2021; originally announced August 2021.
Comments: To appear in informal proceedings of 2nd Workshop on Explainable Logic-Based Knowledge Representation (XLoKR 2021), co-located with KR 2021. arXiv admin note: substantial text overlap with arXiv:2107.10159
-
arXiv:2108.08423 [pdf, ps, other]
Second-Order Specifications and Quantifier Elimination for Consistent Query Answering in Databases
Abstract: Consistent answers to a query from a possibly inconsistent database are answers that are simultaneously retrieved from every possible repair of the database. Repairs are consistent instances that minimally differ from the original inconsistent instance. It has been shown before that database repairs can be specified as the stable models of a disjunctive logic program. In this paper we show how to… ▽ More
Submitted 18 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.
Comments: A couple of minor mistakes corrected, and some explanations added
-
Extending Sticky-Datalog+/- via Finite-Position Selection Functions: Tractability, Algorithms, and Optimization
Abstract: Weakly-Sticky(WS) Datalog+/- is an expressive member of the family of Datalog+/- program classes that is defined on the basis of the conditions of stickiness and weak-acyclicity. Conjunctive query answering (QA) over the WS programs has been investigated, and its tractability in data complexity has been established. However, the design and implementation of practical QA algorithms and their optimi… ▽ More
Submitted 2 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.
Comments: Journal submission
-
Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification
Abstract: We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A… ▽ More
Submitted 1 September, 2021; v1 submitted 21 July, 2021; originally announced July 2021.
Comments: Revised for camera ready. Extended version with appendices of paper to appear in IJCLR'21. arXiv admin note: text overlap with arXiv:2106.10562
-
Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis
Abstract: We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning. The focus is on work done by the author and collaborators. Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation. Several examples that… ▽ More
Submitted 19 September, 2021; v1 submitted 19 June, 2021; originally announced June 2021.
Comments: Revised version for camera ready. Typos corrected, new references, and a new section with background material added. Paper associated to forthcoming short course at Fall School. arXiv admin note: text overlap with arXiv:2007.12799
-
On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results
Abstract: In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decom… ▽ More
Submitted 30 March, 2023; v1 submitted 16 April, 2021; originally announced April 2021.
Comments: Up to the formatting, this is the exact content of the paper in Journal of Machine Learning Research (JMLR)
-
Declarative Approaches to Counterfactual Explanations for Classification
Abstract: We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely "responsibility scores". T… ▽ More
Submitted 7 December, 2021; v1 submitted 14 November, 2020; originally announced November 2020.
Comments: Camera-ready of journal version, with some final additions and revisions. Revised and considerably extended version of a RuleML-RR'20 paper [arXiv:2004.13237]. Submitted by invitation
-
arXiv:2007.14045 [pdf, ps, other]
The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits
Abstract: Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractab… ▽ More
Submitted 3 April, 2021; v1 submitted 28 July, 2020; originally announced July 2020.
Comments: 17 pages, including 8 pages of main text. arXiv version of the AAAI'21 conference paper. Except from the addition of the technical appendix, the content is the same as the AAAI one
-
Score-Based Explanations in Data Management and Machine Learning
Abstract: We describe some approaches to explanations for observed outcomes in data management and machine learning. They are based on the assignment of numerical scores to predefined and potentially relevant inputs. More specifically, we consider explanations for query answers in databases, and for results from classification models. The described approaches are mostly of a causal and counterfactual nature… ▽ More
Submitted 18 August, 2020; v1 submitted 24 July, 2020; originally announced July 2020.
Comments: Companion paper for a tutorial at the Scalable Uncertainty Management Conference (SUM'20). To appear in Proc. SUM'20. Minor fixes made
-
arXiv:2004.13237 [pdf, ps, other]
An ASP-Based Approach to Counterfactual Explanations for Classification
Abstract: We propose answer-set programs that specify and compute counterfactual interventions as a basis for causality-based explanations to decisions produced by classification models. They can be applied with black-box models and models that can be specified as logic programs, such as rule-based classifiers. The main focus in on the specification and computation of maximum responsibility causal explanati… ▽ More
Submitted 15 June, 2020; v1 submitted 27 April, 2020; originally announced April 2020.
Comments: Revised and extended version. To appear in Proc. RuleML+RR, 2020
-
Causality-based Explanation of Classification Outcomes
Abstract: We propose a simple definition of an explanation for the outcome of a classifier based on concepts from causality. We compare it with previously proposed notions of explanation, and study their complexity. We conduct an experimental evaluation with two real datasets from the financial domain.
Submitted 25 May, 2020; v1 submitted 15 March, 2020; originally announced March 2020.
Comments: 16 pages, 6 figures, 1 table
-
The Shapley Value of Tuples in Query Answering
Abstract: We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single… ▽ More
Submitted 1 September, 2021; v1 submitted 18 April, 2019; originally announced April 2019.
Journal ref: Logical Methods in Computer Science, Volume 17, Issue 3 (September 2, 2021) lmcs:6942
-
Repair-Based Degrees of Database Inconsistency: Computation and Complexity
Abstract: We propose a generic numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. In particular, an inconsistency measure associated to cardinality-repairs is investigated in detail. More specifically, it is shown that it can be computed via answer-set programs, but sometimes its computation can be intractable in… ▽ More
Submitted 22 January, 2019; v1 submitted 26 September, 2018; originally announced September 2018.
Comments: Some editing made and some new paragraphs added
-
arXiv:1804.08834 [pdf, ps, other]
Measuring and Computing Database Inconsistency via Repairs
Abstract: We propose a generic numerical measure of inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. A particular inconsistency measure associated to cardinality-repairs is investigated; and we show that it can be computed via answer-set programs. Keywords: Integrity constraints in databases, inconsistent databases, database repairs,… ▽ More
Submitted 12 July, 2018; v1 submitted 24 April, 2018; originally announced April 2018.
Comments: Submission as short paper; to appear in Proc. Scalable Uncertainty Management, SUM 2018. Abstract and keywords added
-
Datalog: Bag Semantics via Set Semantics
Abstract: Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called {\em warded Datalog}$^\pm$, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a prac… ▽ More
Submitted 12 February, 2019; v1 submitted 16 March, 2018; originally announced March 2018.
Comments: Extended version of paper appearing in Proc. ICDT 2019
-
Specifying and Computing Causes for Query Answers in Databases via Database Repairs and Repair Programs
Abstract: A correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints has already been established. In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes. Here, causes are also introduced at the attribute le… ▽ More
Submitted 28 September, 2020; v1 submitted 4 December, 2017; originally announced December 2017.
Comments: To appear in "Knowledge and Information Systems" journal. This is the final version, and a much revised, corrected and extended version of: Bertossi, L. "Characterizing and Computing Causes for Query Answers in Databases from Database Repairs and Repair Programs". Proc. FoIKs, 2018, Springer LNCS 10833, pp. 55-76
-
arXiv:1704.05136 [pdf, ps, other]
The Causality/Repair Connection in Databases: Causality-Programs
Abstract: In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes for query answers from databases.
Submitted 26 June, 2017; v1 submitted 17 April, 2017; originally announced April 2017.
Comments: To appear in Proc. SUM'17 as short paper, 7-pages
-
Ontological Multidimensional Data Models and Contextual Data Qality
Abstract: Data quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. The data under assessment is mapped into the context, for additional analysis, processing, and quality data extraction. The resulting contexts allow… ▽ More
Submitted 13 August, 2017; v1 submitted 31 March, 2017; originally announced April 2017.
Comments: Journal submission (revised version addressing reviewers' observations) Extended version of RuleML'15 paper
-
The Ontological Multidimensional Data Model
Abstract: In this extended abstract we describe, mainly by examples, the main elements of the Ontological Multidimensional Data Model, which considerably extends a relational reconstruction of the multidimensional data model proposed by Hurtado and Mendelzon by means of tuple-generating dependencies, equality-generating dependencies, and negative constraints as found in Datalog+-. We briefly mention some go… ▽ More
Submitted 3 May, 2017; v1 submitted 9 March, 2017; originally announced March 2017.
Comments: Extended abstract. This version with minor revisions and slightly extended. To appear in Proc. AMW'17
-
arXiv:1611.06951 [pdf, ps, other]
Enforcing Relational Matching Dependencies with Datalog for Entity Resolution
Abstract: Entity resolution (ER) is about identifying and merging records in a database that represent the same real-world entity. Matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER policies. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. General "answer sets programs" have been proposed to specify the MD-… ▽ More
Submitted 25 February, 2017; v1 submitted 21 November, 2016; originally announced November 2016.
Comments: New revisions applied. To appear in Proc. FLAIRS'17
-
Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints
Abstract: Causality has been recently introduced in databases, to model, characterize, and possibly compute causes for query answers. Connections between QA-causality and consistency-based diagnosis and database repairs (wrt. integrity constraint violations) have already been established. In this work we establish precise connections between QA-causality and both abductive diagnosis and the view-update prob… ▽ More
Submitted 31 July, 2017; v1 submitted 5 November, 2016; originally announced November 2016.
Comments: To appear in International Journal of Approximate Reasoning. Extended version of "Flairs'16" and "UAI'15 WS on Causality" papers
-
arXiv:1608.04142 [pdf, ps, other]
Contexts and Data Quality Assessment
Abstract: The quality of data is context dependent. Starting from this intuition and experience, we propose and develop a conceptual framework that captures in formal terms the notion of "context-dependent data quality". We start by proposing a generic and abstract notion of context, and also of its uses, in general and in data management in particular. On this basis, we investigate "data quality assessment… ▽ More
Submitted 14 August, 2016; originally announced August 2016.
-
arXiv:1607.02682 [pdf, ps, other]
Extending Weakly-Sticky Datalog+/-: Query-Answering Tractability and Optimizations
Abstract: Weakly-sticky (WS) Datalog+/- is an expressive member of the family of Datalog+/- programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs… ▽ More
Submitted 9 July, 2016; originally announced July 2016.
Comments: Extended version of RR'16 paper
-
Consistency and Trust in Peer Data Exchange Systems
Abstract: We propose and investigate a semantics for "peer data exchange systems" where different peers are related by data exchange constraints and trust relationships. These two elements plus the data at the peers' sites and their local integrity constraints are made compatible via a semantics that characterizes sets of "solution instances" for the peers. They are the intended -possibly virtual- instances… ▽ More
Submitted 6 June, 2016; originally announced June 2016.
Comments: To appear in Theory and Practice of Logic Programming (TPLP). It includes appendix that will be published only in electronic format
-
Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics (extended version)
Abstract: A database D may be inconsistent wrt a given set IC of integrity constraints. Consistent Query Answering (CQA) is the problem of computing from D the answers to a query that are consistent wrt IC . Consistent answers are invariant under all the repairs of D, i.e. the consistent instances that minimally depart from D. Three classes of repair have been considered in the literature: those that minimi… ▽ More
Submitted 23 May, 2016; originally announced May 2016.
Comments: This paper, without the proofs provided here, arXiv:cs/0604002, appeared in the Proc. of ICDT 2007. This version contains all the proofs in correlation with the results reported in the ICDT paper (as opposed to a previous Arkiv Corr posting related to the same paper). One proof was corrected, and a corollary was added
-
arXiv:1604.06770 [pdf, ps, other]
A Hybrid Approach to Query Answering under Expressive Datalog+/-
Abstract: Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-… ▽ More
Submitted 25 July, 2016; v1 submitted 22 April, 2016; originally announced April 2016.
Comments: Extended version of RR'16 paper, to appear
-
arXiv:1603.02705 [pdf, ps, other]
Quantifying Causal Effects on Query Answering in Databases
Abstract: The notion of actual causation, as formalized by Halpern and Pearl, has been recently applied to relational databases, to characterize and compute actual causes for possibly unexpected answers to monotone queries. Causes take the form of database tuples, and can be ranked according to their causal responsibility, a numerical measure of their relevance as a cause to the query answer. In this work w… ▽ More
Submitted 24 April, 2016; v1 submitted 8 March, 2016; originally announced March 2016.
Comments: To appear in Proc. TAPP'16
ACM Class: H.2; I.2
-
Causes for Query Answers from Databases, Datalog Abduction and View-Updates: The Presence of Integrity Constraints
Abstract: Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between queryanswer causality, consistency-based diagnosis, database repairs (wrt. integrity constraint violations), abductive diagnosis and the view-update problem have been established. In this work we further investigate connections between query-answe… ▽ More
Submitted 20 February, 2016; originally announced February 2016.
Comments: To appear in Proceedings Flairs, 2016
-
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Abstract: Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this… ▽ More
Submitted 18 January, 2017; v1 submitted 6 February, 2016; originally announced February 2016.
Comments: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.06013
-
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Abstract: Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this wo… ▽ More
Submitted 24 August, 2015; originally announced August 2015.
Comments: To appear in Proc. SUM, 2015
Journal ref: Proc. SUM'15, 2015, Springer LNAI 9310, pp. 399-414
-
arXiv:1507.00257 [pdf, ps, other]
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
Abstract: In this work we establish and investigate connections between causes for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality p… ▽ More
Submitted 23 October, 2016; v1 submitted 1 July, 2015; originally announced July 2015.
Comments: To appear in Theory of Computing Systems. By invitation to special issue with extended papers from ICDT 2015 (paper arXiv:1412.4311)
-
arXiv:1506.04299 [pdf, ps, other]
Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates
Abstract: Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-… ▽ More
Submitted 19 September, 2015; v1 submitted 13 June, 2015; originally announced June 2015.
Comments: To appear in Proc. UAI Causal Inference Workshop, 2015. One example was fixed
-
arXiv:1504.03386 [pdf, ps, other]
Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+-
Abstract: We consider a semantic class, weakly-chase-sticky (WChS), and a syntactic subclass, jointly-weakly-sticky (JWS), of Datalog+- programs. Both extend that of weakly-sticky (WS) programs, which appear in our applications to data quality. For WChS programs we propose a practical, polynomial-time query answering algorithm (QAA). We establish that the two classes are closed under magic-sets rewritings.… ▽ More
Submitted 13 April, 2015; originally announced April 2015.
Comments: To appear in Proc. Alberto Mendelzon WS on Foundations of Data Management (AMW15)
-
From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back
Abstract: In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality probl… ▽ More
Submitted 13 December, 2014; originally announced December 2014.
Comments: Extended version of paper to appear in Proceedings of ICDT 2015
-
arXiv:1405.4228 [pdf, ps, other]
Unifying Causality, Diagnosis, Repairs and View-Updates in Databases
Abstract: In this work we establish and point out connections between the notion of query-answer causality in databases and database repairs, model-based diagnosis in its consistency-based and abductive versions, and database updates through views. The mutual relationships among these areas of data management and knowledge representation shed light on each of them and help to share notions and results they… ▽ More
Submitted 28 June, 2014; v1 submitted 16 May, 2014; originally announced May 2014.
Comments: On-line Proc. First International Workshop on Big Uncertain Data (BUDA 2014). Co-located with ACM PODS 2014. arXiv admin note: text overlap with arXiv:1404.6857
-
arXiv:1404.6857 [pdf, ps, other]
Causality in Databases: The Diagnosis and Repair Connections
Abstract: In this work we establish and investigate the connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. The vast bo… ▽ More
Submitted 28 June, 2014; v1 submitted 27 April, 2014; originally announced April 2014.
Comments: Proc. 15th International Workshop on Non-Monotonic Reasoning (NMR 2014)
-
arXiv:1312.7373 [pdf, ps, other]
Extending Contexts with Ontologies for Multidimensional Data Quality Assessment
Abstract: Data quality and data cleaning are context dependent activities. Starting from this observation, in previous work a context model for the assessment of the quality of a database instance was proposed. In that framework, the context takes the form of a possibly virtual database or data integration system into which a database instance under quality assessment is mapped, for additional analysis and… ▽ More
Submitted 20 January, 2014; v1 submitted 27 December, 2013; originally announced December 2013.
Comments: To appear in Proc. 5th International Workshop on Data Engineering meets the Semantic Web (DESWeb). In conjunction with ICDE 2014
-
arXiv:1309.1884 [pdf, ps, other]
Tractable vs. Intractable Cases of Matching Dependencies for Query Answering under Entity Resolution
Abstract: Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, on the basis of similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them.… ▽ More
Submitted 6 April, 2014; v1 submitted 7 September, 2013; originally announced September 2013.
-
arXiv:1304.7854 [pdf, ps, other]
On the Complexity of Query Answering under Matching Dependencies for Entity Resolution
Abstract: Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The re… ▽ More
Submitted 26 May, 2013; v1 submitted 30 April, 2013; originally announced April 2013.
Comments: To appear in Proc. of the Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2013)
-
arXiv:1112.5908 [pdf, ps, other]
Query Answering under Matching Dependencies for Data Cleaning: Complexity and Algorithms
Abstract: Matching dependencies (MDs) have been recently introduced as declarative rules for entity resolution (ER), i.e. for identifying and resolving duplicates in relational instance $D$. A set of MDs can be used as the basis for a possibly non-deterministic mechanism that computes a duplicate-free instance from $D$. The possible results of this process are the clean, "minimally resolved instances" (MRIs… ▽ More
Submitted 26 December, 2011; originally announced December 2011.
Comments: Conference submission, 2011
-
arXiv:1106.1478 [pdf, ps, other]
Consistent Query Answering under Spatial Semantic Constraints
Abstract: Consistent query answering is an inconsistency tolerant approach to obtaining semantically correct answers from a database that may be inconsistent with respect to its integrity constraints. In this work we formalize the notion of consistent query answer for spatial databases and spatial semantic integrity constraints. In order to do this, we first characterize conflicting spatial data, and next,… ▽ More
Submitted 7 June, 2011; originally announced June 2011.
Comments: Journal submission, 2010