-
Static Analysis of Graph Database Transformations
Authors:
Iovka Boneva,
Benoit Groz,
Jan Hidders,
Filip Murlak,
Slawomir Staworko
Abstract:
We investigate graph transformations, defined using Datalog-like rules based on acyclic conjunctive two-way regular path queries (acyclic C2RPQs), and we study two fundamental static analysis problems: type checking and equivalence of transformations in the presence of graph schemas. Additionally, we investigate the problem of target schema elicitation, which aims to construct a schema that closel…
▽ More
We investigate graph transformations, defined using Datalog-like rules based on acyclic conjunctive two-way regular path queries (acyclic C2RPQs), and we study two fundamental static analysis problems: type checking and equivalence of transformations in the presence of graph schemas. Additionally, we investigate the problem of target schema elicitation, which aims to construct a schema that closely captures all outputs of a transformation over graphs conforming to the input schema. We show all these problems are in EXPTIME by reducing them to C2RPQ containment modulo schema; we also provide matching lower bounds. We use cycle reversing to reduce query containment to the problem of unrestricted (finite or infinite) satisfiability of C2RPQs modulo a theory expressed in a description logic.
△ Less
Submitted 20 April, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
PG-Schema: Schemas for Property Graphs
Authors:
Renzo Angles,
Angela Bonifati,
Stefania Dumbrava,
George Fletcher,
Alastair Green,
Jan Hidders,
Bei Li,
Leonid Libkin,
Victor Marsault,
Wim Martens,
Filip Murlak,
Stefan Plantikow,
Ognjen Savković,
Michael Schmidt,
Juan Sequeda,
Sławek Staworko,
Dominik Tomaszuk,
Hannes Voigt,
Domagoj Vrgoč,
Mingxi Wu,
Dušan Živković
Abstract:
Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL…
▽ More
Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Types with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.
△ Less
Submitted 8 July, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Inference of Shape Expression Schemas Typed RDF Graphs
Authors:
Benoît Groz,
Aurélien Lemay,
Sławek Staworko,
Piotr Wieczorek
Abstract:
We consider the problem of constructing a Shape Expression Schema (ShEx) that describes the structure of a given input RDF graph. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound i.e., always producing a schema that validates the input RDF graph, and complete i.e., able to produce any schema, within a given class of schemas,…
▽ More
We consider the problem of constructing a Shape Expression Schema (ShEx) that describes the structure of a given input RDF graph. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound i.e., always producing a schema that validates the input RDF graph, and complete i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We study the case where the input graph is typed i.e., every node is given with its types. We limit our attention to a practical fragment ShEx0 of Shape Expressions Schemas that has an equivalent graphical representation in the form of shape graphs. We investigate the problem of constructing a canonical representative of a given shape graph. Finally, we present a sound and complete algorithm for shape graphs thus showing that ShEx0 is learnable from typed graphs.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
Threshold Queries in Theory and in the Wild
Authors:
Angela Bonifati,
Stefania Dumbrava,
George Fletcher,
Jan Hidders,
Matthias Hofer,
Wim Martens,
Filip Murlak,
Joshua Shinavier,
Sławek Staworko,
Dominik Tomaszuk
Abstract:
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. In this paper, we present a deep theoretical analysis of threshold query evaluation and show t…
▽ More
Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. In this paper, we present a deep theoretical analysis of threshold query evaluation and show that thresholds can be used to significantly improve the asymptotic bounds of state-of-the-art query evaluation algorithms. We also empirically show that threshold queries are significant in practice. In surprising contrast to conventional wisdom, we found important scenarios in real-world data sets in which users are interested in computing the results of queries up to a certain threshold, independent of a ranking function that orders the query results by importance.
△ Less
Submitted 17 November, 2021; v1 submitted 29 June, 2021;
originally announced June 2021.
-
A note on the class of languages generated by F-systems over regular languages
Authors:
Jorge C. Lucero,
Sławek Staworko
Abstract:
An F-system is a computational model that performs a folding operation on words of a given language, following directions coded on words of another given language. This paper considers the case in which both given languages are regular, and it shows that the class of languages generated by such F-systems is a proper subset of the class of linear context-free languages.
An F-system is a computational model that performs a folding operation on words of a given language, following directions coded on words of another given language. This paper considers the case in which both given languages are regular, and it shows that the class of languages generated by such F-systems is a proper subset of the class of linear context-free languages.
△ Less
Submitted 11 May, 2022; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Consistency and Certain Answers in Relational to RDF Data Exchange with Shape Constraints
Authors:
Iovka Boneva,
Jose Lozano,
Sławek Staworko
Abstract:
We investigate the data exchange from relational databases to RDF graphs inspired by R2RML with the addition of target shape schemas. We study the problems of consistency i.e., checking that every source instance admits a solution, and certain query answering i.e., finding answers present in every solution. We identify the class of constructive relational to RDF data exchange that uses IRI constru…
▽ More
We investigate the data exchange from relational databases to RDF graphs inspired by R2RML with the addition of target shape schemas. We study the problems of consistency i.e., checking that every source instance admits a solution, and certain query answering i.e., finding answers present in every solution. We identify the class of constructive relational to RDF data exchange that uses IRI constructors and full tgds (with no existential variables) in its source to target dependencies. We show that the consistency problem is coNP-complete. We introduce the notion of universal simulation solution that allows to compute certain query answers to any class of queries that is robust under simulation. One such class are nested regular expressions (NREs) that are forward i.e., do not use the inverse operation. Using universal simulation solution renders tractable the computation of certain answers to forward NREs (data-complexity). Finally, we present a number of results that show that relaxing the restrictions of the proposed framework leads to an increase in complexity.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Relational to RDF Data Exchange in Presence of a Shape Expression Schema
Authors:
Iovka Boneva,
Jose Lozano,
Sławek Staworko
Abstract:
We study the relational to RDF data exchange problem, where the tar- get constraints are specified using Shape Expression schema (ShEx). We investi- gate two fundamental problems: 1) consistency which is checking for a given data exchange setting whether there always exists a solution for any source instance, and 2) constructing a universal solution which is a solution that represents the space of…
▽ More
We study the relational to RDF data exchange problem, where the tar- get constraints are specified using Shape Expression schema (ShEx). We investi- gate two fundamental problems: 1) consistency which is checking for a given data exchange setting whether there always exists a solution for any source instance, and 2) constructing a universal solution which is a solution that represents the space of all solutions. We propose to use typed IRI constructors in source-to- target tuple generating dependencies to create the IRIs of the RDF graph from the values in the relational instance, and we translate ShEx into a set of target dependencies. We also identify data exchange settings that are key covered, a property that is decidable and guarantees consistency. Furthermore, we show that this property is a sufficient and necessary condition for the existence of universal solutions for a practical subclass of weakly-recursive ShEx.
△ Less
Submitted 30 April, 2018;
originally announced April 2018.
-
Containment of Shape Expression Schemas for RDF
Authors:
Slawek Staworko,
Piotr Wieczorek
Abstract:
We study the problem of containment for shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and their semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a p…
▽ More
We study the problem of containment for shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and their semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a practical subclass of deterministic shape graphs, it is also a necessary one, thus yielding a subclass with tractable containment. While for general shape graphs a minimal counter-example i.e., an instance proving non-containment, might be of exponential size, we show that containment is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx is coNEXP-hard and in coTwoNEXP^NP.
△ Less
Submitted 26 March, 2018; v1 submitted 20 March, 2018;
originally announced March 2018.
-
RDF Graph Alignment with Bisimulation
Authors:
Peter Buneman,
Sławek Staworko
Abstract:
We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF data…
▽ More
We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF database. We propose approaches inspired by the classical notion of graph bisimulation and extend them to capture the natural metrics of edit distance on the data values and the graph structure. We evaluate our methods on three evolving curated data sets. Overall, our results show that the proposed methods perform well and are scalable.
△ Less
Submitted 28 June, 2016;
originally announced June 2016.
-
Shape Expressions Schemas
Authors:
Iovka Boneva,
Jose E. Labra Gayo,
Eric G. Prud'hommeaux,
Sławek Staworko
Abstract:
We present Shape Expressions (ShEx), an expressive schema language for RDF designed to provide a high-level, user friendly syntax with intuitive semantics. ShEx allows to describe the vocabulary and the structure of an RDF graph, and to constrain the allowed values for the properties of a node. It includes an algebraic grouping operator, a choice operator, cardinalitiy constraints for the number o…
▽ More
We present Shape Expressions (ShEx), an expressive schema language for RDF designed to provide a high-level, user friendly syntax with intuitive semantics. ShEx allows to describe the vocabulary and the structure of an RDF graph, and to constrain the allowed values for the properties of a node. It includes an algebraic grouping operator, a choice operator, cardinalitiy constraints for the number of allowed occurrences of a property, and negation. We define the semantics of the language and illustrate it with examples. We then present a validation algorithm that, given a node in an RDF graph and a constraint defined by the ShEx schema, allows to check whether the node satisfies that constraint. The algorithm outputs a proof that contains trivially verifiable associations of nodes and the constraints that they satisfy. The structure can be used for complex post-processing tasks, such as transforming the RDF graph to other graph or tree structures, verifying more complex constraints, or debugging (w.r.t. the schema). We also show the inherent difficulty of error identification of ShEx.
△ Less
Submitted 16 November, 2015; v1 submitted 19 October, 2015;
originally announced October 2015.
-
Schemas for Unordered XML on a DIME
Authors:
Iovka Boneva,
Radu Ciucanu,
Sławek Staworko
Abstract:
We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are…
▽ More
We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs).
Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.
△ Less
Submitted 28 October, 2014; v1 submitted 28 November, 2013;
originally announced November 2013.
-
Learning Schemas for Unordered XML
Authors:
Radu Ciucanu,
Slawek Staworko
Abstract:
We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the…
▽ More
We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples.
△ Less
Submitted 25 July, 2013; v1 submitted 24 July, 2013;
originally announced July 2013.
-
Simple Schemas for Unordered XML
Authors:
Iovka Boneva,
Radu Ciucanu,
Slawek Staworko
Abstract:
We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a…
▽ More
We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, twig query satisfiability, implication, and containment in the presence of schema. Our research indicates that the proposed formalisms retain much of the expressiveness of DTDs without an increase in computational complexity.
△ Less
Submitted 20 June, 2013; v1 submitted 18 March, 2013;
originally announced March 2013.
-
On Injective Embeddings of Tree Patterns
Authors:
Jakub Michaliszyn,
Anca Muscholl,
Sławek Staworko,
Piotr Wieczorek,
Zhilin Wu
Abstract:
We study three different kinds of embeddings of tree patterns: weakly-injective, ancestor-preserving, and lca-preserving. While each of them is often referred to as injective embedding, they form a proper hierarchy and their computational properties vary (from P to NP-complete). We present a thorough study of the complexity of the model checking problem i.e., is there an embedding of a given tree…
▽ More
We study three different kinds of embeddings of tree patterns: weakly-injective, ancestor-preserving, and lca-preserving. While each of them is often referred to as injective embedding, they form a proper hierarchy and their computational properties vary (from P to NP-complete). We present a thorough study of the complexity of the model checking problem i.e., is there an embedding of a given tree pattern in a given tree, and we investigate the impact of various restrictions imposed on the tree pattern: bound on the degree of a node, bound on the height, and type of allowed labels and edges.
△ Less
Submitted 28 April, 2012; v1 submitted 22 April, 2012;
originally announced April 2012.
-
Learning XML Twig Queries
Authors:
Sławomir Staworko,
Piotr Wieczorek
Abstract:
We investigate the problem of learning XML queries, path queries and tree pattern queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. We study two learning settings that differ with the types of annotations. In the first settin…
▽ More
We investigate the problem of learning XML queries, path queries and tree pattern queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. We study two learning settings that differ with the types of annotations. In the first setting the user may only indicate required nodes that the query must return. In the second, more general, setting, the user may also indicate forbidden nodes that the query must not return. The query may or may not return any node with no annotation. We formalize what it means for a class of queries to be \emph{learnable}. One requirement is the existence of a learning algorithm that is sound i.e., always returns a query consistent with the examples given by the user. Furthermore, the learning algorithm should be complete i.e., able to produce every query with a sufficiently rich example. Other requirements involve tractability of learning and its robustness to nonessential examples. We show that the classes of simple path queries and path-subsumption-free tree queries are learnable from positive examples. The learnability of the full class of tree pattern queries (and the full class of path queries) remains an open question. We show also that adding negative examples to the picture renders the learning unfeasible.
Published in ICDT 2012, Berlin.
△ Less
Submitted 20 April, 2012; v1 submitted 19 June, 2011;
originally announced June 2011.
-
Prioritized Repairing and Consistent Query Answering in Relational Databases
Authors:
Slawomir Staworko,
Jan Chomicki,
Jerzy Marcinkowski
Abstract:
A consistent query answer in an inconsistent database is an answer obtained in every (minimal) repair. The repairs are obtained by resolving all conflicts in all possible ways. Often, however, the user is able to provide a preference on how conflicts should be resolved. We investigate here the framework of preferred consistent query answers, in which user preferences are used to narrow down the se…
▽ More
A consistent query answer in an inconsistent database is an answer obtained in every (minimal) repair. The repairs are obtained by resolving all conflicts in all possible ways. Often, however, the user is able to provide a preference on how conflicts should be resolved. We investigate here the framework of preferred consistent query answers, in which user preferences are used to narrow down the set of repairs to a set of preferred repairs. We axiomatize desirable properties of preferred repairs. We present three different families of preferred repairs and study their mutual relationships. Finally, we investigate the complexity of preferred repairing and computing preferred consistent query answers.
△ Less
Submitted 7 October, 2011; v1 submitted 4 August, 2009;
originally announced August 2009.
-
Consistent Query Answers in the Presence of Universal Constraints
Authors:
Slawomir Staworko,
Jan Chomicki
Abstract:
The framework of consistent query answers and repairs has been introduced to alleviate the impact of inconsistent data on the answers to a query. A repair is a minimally different consistent instance and an answer is consistent if it is present in every repair. In this article we study the complexity of consistent query answers and repair checking in the presence of universal constraints.
We p…
▽ More
The framework of consistent query answers and repairs has been introduced to alleviate the impact of inconsistent data on the answers to a query. A repair is a minimally different consistent instance and an answer is consistent if it is present in every repair. In this article we study the complexity of consistent query answers and repair checking in the presence of universal constraints.
We propose an extended version of the conflict hypergraph which allows to capture all repairs w.r.t. a set of universal constraints. We show that repair checking is in PTIME for the class of full tuple-generating dependencies and denial constraints, and we present a polynomial repair algorithm. This algorithm is sound, i.e. always produces a repair, but also complete, i.e. every repair can be constructed. Next, we present a polynomial-time algorithm computing consistent answers to ground quantifier-free queries in the presence of denial constraints, join dependencies, and acyclic full-tuple generating dependencies. Finally, we show that extending the class of constraints leads to intractability. For arbitrary full tuple-generating dependencies consistent query answering becomes coNP-complete. For arbitrary universal constraints consistent query answering is Π_2^p-complete and repair checking coNP-complete.
△ Less
Submitted 19 February, 2009; v1 submitted 9 September, 2008;
originally announced September 2008.
-
Priority-Based Conflict Resolution in Inconsistent Relational Databases
Authors:
Slawomir Staworko,
Jan Chomicki
Abstract:
We study here the impact of priorities on conflict resolution in inconsistent relational databases. We extend the framework of repairs and consistent query answers. We propose a set of postulates that an extended framework should satisfy and consider two instantiations of the framework: (locally preferred) l-repairs and (globally preferred) g-repairs. We study the relationships between them and…
▽ More
We study here the impact of priorities on conflict resolution in inconsistent relational databases. We extend the framework of repairs and consistent query answers. We propose a set of postulates that an extended framework should satisfy and consider two instantiations of the framework: (locally preferred) l-repairs and (globally preferred) g-repairs. We study the relationships between them and the impact each notion of repair has on the computational complexity of repair checking and consistent query answers.
△ Less
Submitted 14 June, 2005;
originally announced June 2005.