-
Distributed Community Detection with the WCC Metric
Authors:
Matthew Saltz,
Arnau Prat-Pèrez,
David Dominguez-Sal
Abstract:
Community detection has become an extremely active area of research in recent years, with researchers proposing various new metrics and algorithms to address the problem. Recently, the Weighted Community Clustering (WCC) metric was proposed as a novel way to judge the quality of a community partitioning based on the distribution of triangles in the graph, and was demonstrated to yield superior res…
▽ More
Community detection has become an extremely active area of research in recent years, with researchers proposing various new metrics and algorithms to address the problem. Recently, the Weighted Community Clustering (WCC) metric was proposed as a novel way to judge the quality of a community partitioning based on the distribution of triangles in the graph, and was demonstrated to yield superior results over other commonly used metrics like modularity. The same authors later presented a parallel algorithm for optimizing WCC on large graphs. In this paper, we propose a new distributed, vertex-centric algorithm for community detection using the WCC metric. Results are presented that demonstrate the algorithm's performance and scalability on up to 32 worker machines and real graphs of up to 1.8 billion vertices. The algorithm scales best with the largest graphs, and to our knowledge, it is the first distributed algorithm for optimizing the WCC metric.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
Massive Query Expansion by Exploiting Graph Knowledge Bases
Authors:
Joan Guisado-Gámez,
David Dominguez-Sal,
Josep-LLuis Larriba-Pey
Abstract:
Keyword based search engines have problems with term ambiguity and vocabulary mismatch. In this paper, we propose a query expansion technique that enriches queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a knowledge base by identifying the query concepts, and adding relevant synonyms and semantical…
▽ More
Keyword based search engines have problems with term ambiguity and vocabulary mismatch. In this paper, we propose a query expansion technique that enriches queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a knowledge base by identifying the query concepts, and adding relevant synonyms and semantically related terms. We propose two approaches: (i) lexical expansion that locates the relevant concepts in the knowledge base; and, (ii) topological expansion that analyzes the network of relations among the concepts, and suggests semantically related terms by path and community analysis of the knowledge graph. We perform our expansions by using two versions of the Wikipedia as knowledge base, concluding that the combination of both lexical and topological expansion provides improvements of the system's precision up to more than 27%.
△ Less
Submitted 21 October, 2013;
originally announced October 2013.
-
On Demand Memory Specialization for Distributed Graph Databases
Authors:
Xavier Martinez-Palau,
David Dominguez-Sal,
Reza Akbarinia,
Patrick Valduriez,
Josep Lluís Larriba-Pey
Abstract:
In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communication of the data structure in distributed systems. We exploit this data structure with a new \textit{Dynamic Data Partitioning} strategy (DYDAP) that as…
▽ More
In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communication of the data structure in distributed systems. We exploit this data structure with a new \textit{Dynamic Data Partitioning} strategy (DYDAP) that assigns the portions of the graph according to historical data access patterns, and guarantees a small network communication and a computational load balance in distributed graph queries. This method is able to adapt dynamically to new workloads and evolve when the query distribution changes. Our experiments show that DYDAP yields a throughput up to an order of magnitude higher than previous methods based on cache specialization, in a variety of scenarios, and the average response time of the system is divided by two.
△ Less
Submitted 16 October, 2013;
originally announced October 2013.
-
Shaping Communities out of Triangles
Authors:
Arnau Prat-Pérez,
David Dominguez-Sal,
Josep M. Brunat,
Josep-Lluis Larriba-Pey
Abstract:
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its importance in many fields such as biology, social networks or network traffic analysis. The metrics proposed to shape communities are generic and follow two approaches: maximizing the internal density of such communities or reducing the connectivity of the internal vertices with those out…
▽ More
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its importance in many fields such as biology, social networks or network traffic analysis. The metrics proposed to shape communities are generic and follow two approaches: maximizing the internal density of such communities or reducing the connectivity of the internal vertices with those outside the community. However, these metrics take the edges as a set and do not consider the internal layout of the edges in the community. We define a set of properties oriented to social networks that ensure that communities are cohesive, structured and well defined. Then, we propose the Weighted Community Clustering (WCC), which is a community metric based on triangles. We proof that analyzing communities by triangles gives communities that fulfill the listed set of properties, in contrast to previous metrics. Finally, we experimentally show that WCC correctly captures the concept of community in social networks using real and syntethic datasets, and compare statistically some of the most relevant community detection algorithms in the state of the art.
△ Less
Submitted 26 July, 2012;
originally announced July 2012.