-
Jumping Evaluation of Nested Regular Path Queries
Authors:
Joachim Niehren,
Sylvain Salvati,
Rustam Azimov
Abstract:
Nested regular path queries are used for querying graph databases and RDF triple stores. We propose a new algorithm for evaluating nested regular path queries on a graph from a set of start nodes in combined linear time. We show that this complexity upper bound can be reduced by making it dependent on the size of the query's top-down needed subgraph, a notion that we introduce. For many queries in…
▽ More
Nested regular path queries are used for querying graph databases and RDF triple stores. We propose a new algorithm for evaluating nested regular path queries on a graph from a set of start nodes in combined linear time. We show that this complexity upper bound can be reduced by making it dependent on the size of the query's top-down needed subgraph, a notion that we introduce. For many queries in practice, the top-down needed subgraph is way smaller than the whole graph. Our algorithm is based on a novel compilation schema from nested regular path queries to monadic datalog queries. Its complexity upper bound follows from known properties of top-down datalog evaluation. As an application, we show that our algorithm permits to reformulate in simple terms a variant of a very efficient automata-based algorithm proposed by Maneth and Nguyen that evaluates navigational path queries in datatrees based on indexes and jumping. Moreover, it overcomes some limitations of Maneth and Nguyen's: it is not bound to trees and applies to graphs; it is not limited to forward navigational XPath but can treat any nested regular path query and it can be implemented efficiently without any dedicated techniques, by using any efficient datalog evaluator such as LogicBlox.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
One Algorithm to Evaluate Them All: Unified Linear Algebra Based Approach to Evaluate Both Regular and Context-Free Path Queries
Authors:
Ekaterina Shemetova,
Rustam Azimov,
Egor Orachev,
Ilya Epelbaum,
Semyon Grigorev
Abstract:
The Kronecker product-based algorithm for context-free path querying (CFPQ) was proposed by Orachev et al. (2020). We reduce this algorithm to operations over Boolean matrices and extend it with the mechanism to extract all paths of interest. We also prove $O(n^3/\log{n})$ time complexity of the proposed algorithm, where n is a number of vertices of the input graph. Thus, we provide the alternativ…
▽ More
The Kronecker product-based algorithm for context-free path querying (CFPQ) was proposed by Orachev et al. (2020). We reduce this algorithm to operations over Boolean matrices and extend it with the mechanism to extract all paths of interest. We also prove $O(n^3/\log{n})$ time complexity of the proposed algorithm, where n is a number of vertices of the input graph. Thus, we provide the alternative way to construct a slightly subcubic algorithm for CFPQ which is based on linear algebra and incremental transitive closure (a classic graph-theoretic problem), as opposed to the algorithm with the same complexity proposed by Chaudhuri (2008). Our evaluation shows that our algorithm is a good candidate to be the universal algorithm for both regular and context-free path querying.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Context-Free Path Querying by Matrix Multiplication
Authors:
Rustam Azimov,
Semyon Grigorev
Abstract:
Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by speci…
▽ More
Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by specifying paths using formal grammars over the alphabet of edge labels. An answer to a context-free path query in this approach is usually a set of triples (A, m, n) such that there is a path from the node m to the node n, whose labeling is derived from a non-terminal A of the given context-free grammar. This type of queries is evaluated using the relational query semantics. Another example of path query semantics is the single-path query semantics which requires presenting a single path from the node m to the node n, whose labeling is derived from a non-terminal A for all triples (A, m, n) evaluated using the relational query semantics. There is a number of algorithms for query evaluation which use these semantics but all of them perform poorly on large graphs. One of the most common technique for efficient big data processing is the use of a graphics processing unit (GPU) to perform computations, but these algorithms do not allow to use this technique efficiently. In this paper, we show how the context-free path query evaluation using these query semantics can be reduced to the calculation of the matrix transitive closure. Also, we propose an algorithm for context-free path query evaluation which uses relational query semantics and is based on matrix operations that make it possible to speed up computations by using a GPU.
△ Less
Submitted 19 December, 2017; v1 submitted 4 July, 2017;
originally announced July 2017.