Comparing MapReduce and Pipeline Implementations for Counting Triangles

Pasarella, Edelmira; Vidal, Maria-Esther; Zoltan, Cristina

doi:10.4204/EPTCS.237.2

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1701.03318 (cs)

[Submitted on 12 Jan 2017]

Title:Comparing MapReduce and Pipeline Implementations for Counting Triangles

Authors:Edelmira Pasarella (Universitat Politecnica de Catalunya), Maria-Esther Vidal (Fraunhofer IAIS), Cristina Zoltan (Universitat Politecnica de Catalunya)

View PDF

Abstract:A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide and Conquer paradigm, named dynamic pipeline. The main features of dynamic pipelines are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different topologies, sizes, and densities. Observed results suggest that dynamic pipelines allows for an efficient implementation of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.

Comments:	In Proceedings PROLE 2016, arXiv:1701.03069
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
ACM classes:	D.1.3; F.1.2
Cite as:	arXiv:1701.03318 [cs.DC]
	(or arXiv:1701.03318v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1701.03318
Journal reference:	EPTCS 237, 2017, pp. 20-33
Related DOI:	https://doi.org/10.4204/EPTCS.237.2

Submission history

From: EPTCS [view email] [via EPTCS proxy]
[v1] Thu, 12 Jan 2017 12:04:15 UTC (461 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Comparing MapReduce and Pipeline Implementations for Counting Triangles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Comparing MapReduce and Pipeline Implementations for Counting Triangles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators