-
General-Purpose Join Algorithms for Listing Triangles in Large Graphs
Authors:
Daniel Zinn
Abstract:
We investigate applying general-purpose join algorithms to the triangle listing problem in an out-of-core context. In particular, we focus on Leapfrog Triejoin (LFTJ) by Veldhuizen 2014, a recently proposed, worst-case optimal algorithm. We present "boxing": a novel, yet conceptually simple, approach for feeding input data to LFTJ. Our extensive analysis shows that this approach is I/O efficient,…
▽ More
We investigate applying general-purpose join algorithms to the triangle listing problem in an out-of-core context. In particular, we focus on Leapfrog Triejoin (LFTJ) by Veldhuizen 2014, a recently proposed, worst-case optimal algorithm. We present "boxing": a novel, yet conceptually simple, approach for feeding input data to LFTJ. Our extensive analysis shows that this approach is I/O efficient, being worst-case optimal (in a certain sense). Furthermore, if input data is only a constant factor larger than the available memory, then a boxed LFTJ essentially maintains the CPU data-complexity of the vanilla LFTJ. Next, focusing on LFTJ applied to the triangle query, we show that for many graphs boxed LFTJ matches the I/O complexity of the recently by Hu, Tao and Yufei proposed specialized algorithm MGT for listing tiangles in an out-of-core setting. We also strengthen the analysis of LFTJ's computational complexity for the triangle query by considering families of input graphs that are characterized not only by the number of edges but also by a measure of their density. E.g., we show that LFTJ achieves a CPU complexity of O(|E|log|E|) for planar graphs, while on general graphs, no algorithm can be faster than O(|E|^{1.5}). Finally, we perform an experimental evaluation for the triangle listing problem confirming our theoretical results and showing the overall effectiveness of our approach. On all our real-world and synthetic data sets (some of which containing more than 1.2 billion edges) LFTJ in single-threaded mode is within a factor of 3 of the specialized MGT; a penalty that---as we demonstrate---can be alleviated by parallelization.
△ Less
Submitted 27 January, 2015;
originally announced January 2015.
-
Win-Move is Coordination-Free (Sometimes)
Authors:
Daniel Zinn,
Todd J Green,
Bertram Ludäscher
Abstract:
In a recent paper by Hellerstein [15], a tight relationship was conjectured between the number of strata of a Datalog${}^\neg$ program and the number of "coordination stages" required for its distributed computation. Indeed, Ameloot et al. [9] showed that a query can be computed by a coordination-free relational transducer network iff it is monotone, thus answering in the affirmative a variant of…
▽ More
In a recent paper by Hellerstein [15], a tight relationship was conjectured between the number of strata of a Datalog${}^\neg$ program and the number of "coordination stages" required for its distributed computation. Indeed, Ameloot et al. [9] showed that a query can be computed by a coordination-free relational transducer network iff it is monotone, thus answering in the affirmative a variant of Hellerstein's CALM conjecture, based on a particular definition of coordination-free computation. In this paper, we present three additional models for declarative networking. In these variants, relational transducers have limited access to the way data is distributed. This variation allows transducer networks to compute more queries in a coordination-free manner: e.g., a transducer can check whether a ground atom $A$ over the input schema is in the "scope" of the local node, and then send either $A$ or $\neg A$ to other nodes.
We show the surprising result that the query given by the well-founded semantics of the unstratifiable win-move program is coordination-free in some of the models we consider. We also show that the original transducer network model [9] and our variants form a strict hierarchy of classes of coordination-free queries. Finally, we identify different syntactic fragments of Datalog${}^{\neg\neg}_{\forall}$, called semi-monotone programs, which can be used as declarative network programming languages, whose distributed computation is guaranteed to be eventually consistent and coordination-free.
△ Less
Submitted 10 December, 2013;
originally announced December 2013.
-
First-Order Provenance Games
Authors:
Sven Köhler,
Bertram Ludäscher,
Daniel Zinn
Abstract:
We propose a new model of provenance, based on a game-theoretic approach to query evaluation. First, we study games G in their own right, and ask how to explain that a position x in G is won, lost, or drawn. The resulting notion of game provenance is closely related to winning strategies, and excludes from provenance all "bad moves", i.e., those which unnecessarily allow the opponent to improve th…
▽ More
We propose a new model of provenance, based on a game-theoretic approach to query evaluation. First, we study games G in their own right, and ask how to explain that a position x in G is won, lost, or drawn. The resulting notion of game provenance is closely related to winning strategies, and excludes from provenance all "bad moves", i.e., those which unnecessarily allow the opponent to improve the outcome of a play. In this way, the value of a position is determined by its game provenance. We then define provenance games by viewing the evaluation of a first-order query as a game between two players who argue whether a tuple is in the query answer. For RA+ queries, we show that game provenance is equivalent to the most general semiring of provenance polynomials N[X]. Variants of our game yield other known semirings. However, unlike semiring provenance, game provenance also provides a "built-in" way to handle negation and thus to answer why-not questions: In (provenance) games, the reason why x is not won, is the same as why x is lost or drawn (the latter is possible for games with draws). Since first-order provenance games are draw-free, they yield a new provenance model that combines how- and why-not provenance.
△ Less
Submitted 10 September, 2013;
originally announced September 2013.
-
Weak Forms of Monotonicity and Coordination-Freeness
Authors:
Daniel Zinn
Abstract:
Our earlier work titled: "Win-move is Coordination-Free (Sometimes)" has shown that the classes of queries that can be distributedly computed in a coordination-free manner form a strict hierarchy depending on the assumptions of the model for distributed computations. In this paper, we further characterize these classes by revealing a tight relationship between them and novel weakened forms of mono…
▽ More
Our earlier work titled: "Win-move is Coordination-Free (Sometimes)" has shown that the classes of queries that can be distributedly computed in a coordination-free manner form a strict hierarchy depending on the assumptions of the model for distributed computations. In this paper, we further characterize these classes by revealing a tight relationship between them and novel weakened forms of monotonicity.
△ Less
Submitted 1 February, 2012;
originally announced February 2012.