-
Accelerating Local Search for the Maximum Independent Set Problem
Authors:
Jakob Dahlum,
Sebastian Lamm,
Peter Sanders,
Christian Schulz,
Darren Strash,
Renato F. Werneck
Abstract:
Computing high-quality independent sets quickly is an important problem in combinatorial optimization. Several recent algorithms have shown that kernelization techniques can be used to find exact maximum independent sets in medium-sized sparse graphs, as well as high-quality independent sets in huge sparse graphs that are intractable for exact (exponential-time) algorithms. However, a major drawba…
▽ More
Computing high-quality independent sets quickly is an important problem in combinatorial optimization. Several recent algorithms have shown that kernelization techniques can be used to find exact maximum independent sets in medium-sized sparse graphs, as well as high-quality independent sets in huge sparse graphs that are intractable for exact (exponential-time) algorithms. However, a major drawback of these algorithms is that they require significant preprocessing overhead, and therefore cannot be used to find a high-quality independent set quickly.
In this paper, we show that performing simple kernelization techniques in an online fashion significantly boosts the performance of local search, and is much faster than pre-computing a kernel using advanced techniques. In addition, we show that cutting high-degree vertices can boost local search performance even further, especially on huge (sparse) complex networks. Our experiments show that we can drastically speed up the computation of large independent sets compared to other state-of-the-art algorithms, while also producing results that are very close to the best known solutions.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
Finding Near-Optimal Independent Sets at Scale
Authors:
Sebastian Lamm,
Peter Sanders,
Christian Schulz,
Darren Strash,
Renato F. Werneck
Abstract:
The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively…
▽ More
The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space---which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.
△ Less
Submitted 2 September, 2015;
originally announced September 2015.
-
Public Transit Labeling
Authors:
Daniel Delling,
Julian Dibbelt,
Thomas Pajor,
Renato F. Werneck
Abstract:
We study the journey planning problem in public transit networks. Developing efficient preprocessing-based speedup techniques for this problem has been challenging: current approaches either require massive preprocessing effort or provide limited speedups. Leveraging recent advances in Hub Labeling, the fastest algorithm for road networks, we revisit the well-known time-expanded model for public t…
▽ More
We study the journey planning problem in public transit networks. Developing efficient preprocessing-based speedup techniques for this problem has been challenging: current approaches either require massive preprocessing effort or provide limited speedups. Leveraging recent advances in Hub Labeling, the fastest algorithm for road networks, we revisit the well-known time-expanded model for public transit. Exploiting domain-specific properties, we provide simple and efficient algorithms for the earliest arrival, profile, and multicriteria problems, with queries that are orders of magnitude faster than the state of the art.
△ Less
Submitted 6 May, 2015;
originally announced May 2015.
-
Route Planning in Transportation Networks
Authors:
Hannah Bast,
Daniel Delling,
Andrew Goldberg,
Matthias Müller-Hannemann,
Thomas Pajor,
Peter Sanders,
Dorothea Wagner,
Renato F. Werneck
Abstract:
We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while oth…
▽ More
We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while others can deal efficiently with real-time traffic. Journey planning on public transportation systems, although conceptually similar, is a significantly harder problem due to its inherent time-dependent and multicriteria nature. Although exact algorithms are fast enough for interactive queries on metropolitan transit systems, dealing with continent-sized instances requires simplifications or heavy preprocessing. The multimodal route planning problem, which seeks journeys combining schedule-based transportation (buses, trains) with unrestricted modes (walking, driving), is even harder, relying on approximate solutions even for metropolitan inputs.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
A Robust and Scalable Algorithm for the Steiner Problem in Graphs
Authors:
Thomas Pajor,
Eduardo Uchoa,
Renato F. Werneck
Abstract:
We present an effective heuristic for the Steiner Problem in Graphs. Its main elements are a multistart algorithm coupled with aggressive combination of elite solutions, both leveraging recently-proposed fast local searches. We also propose a fast implementation of a well-known dual ascent algorithm that not only makes our heuristics more robust (by quickly dealing with easier cases), but can also…
▽ More
We present an effective heuristic for the Steiner Problem in Graphs. Its main elements are a multistart algorithm coupled with aggressive combination of elite solutions, both leveraging recently-proposed fast local searches. We also propose a fast implementation of a well-known dual ascent algorithm that not only makes our heuristics more robust (by quickly dealing with easier cases), but can also be used as a building block of an exact (branch-and-bound) algorithm that is quite effective for some inputs. On all graph classes we consider, our heuristic is competitive with (and sometimes more effective than) any previous approach with similar running times. It is also scalable: with long runs, we could improve or match the best published results for most open instances in the literature.
△ Less
Submitted 10 December, 2014; v1 submitted 8 December, 2014;
originally announced December 2014.
-
Distance-Based Influence in Networks: Computation and Maximization
Authors:
Edith Cohen,
Daniel Delling,
Thomas Pajor,
Renato F. Werneck
Abstract:
A premise at a heart of network analysis is that entities in a network derive utilities from their connections. The {\em influence} of a seed set $S$ of nodes is defined as the sum over nodes $u$ of the {\em utility} of $S$ to $u$. {\em Distance-based} utility, which is a decreasing function of the distance from $S$ to $u$, was explored in several successful research threads from social network an…
▽ More
A premise at a heart of network analysis is that entities in a network derive utilities from their connections. The {\em influence} of a seed set $S$ of nodes is defined as the sum over nodes $u$ of the {\em utility} of $S$ to $u$. {\em Distance-based} utility, which is a decreasing function of the distance from $S$ to $u$, was explored in several successful research threads from social network analysis and economics: Network formation games [Bloch andJackson 2007], Reachability-based influence [Richardson and Domingos 2002, Kempe et al. 2003], "threshold" influence [Gomez-Rodriguez et al. 2011], and {\em closeness centrality} [Bavelas 1948].
We formulate a model that unifies and extends this previous work and address the two fundamental computational problems in this domain: {\em Influence oracles} and {\em influence maximization} (IM). An oracle performs some preprocessing, after which influence queries for arbitrary seed sets can be efficiently computed. With IM, we seek a set of nodes of a given size with maximum influence. Since the IM problem is computationally hard, we instead seek a {\em greedy sequence} of nodes, with each prefix having influence that is at least $1-1/e$ of that of the optimal seed set of the same size. We present the first highly scalable algorithms for both problems, providing statistical guarantees on approximation quality and near-linear worst-case bounds on the computation. We perform an experimental evaluation which demonstrates the effectiveness of our designs on networks with hundreds of millions of edges.
△ Less
Submitted 31 January, 2016; v1 submitted 25 October, 2014;
originally announced October 2014.
-
Computing Classic Closeness Centrality, at Scale
Authors:
Edith Cohen,
Daniel Delling,
Thomas Pajor,
Renato F. Werneck
Abstract:
Closeness centrality, first considered by Bavelas (1948), is an importance measure of a node in a network which is based on the distances from the node to all other nodes. The classic definition, proposed by Bavelas (1950), Beauchamp (1965), and Sabidussi (1966), is (the inverse of) the average distance to all other nodes.
We propose the first highly scalable (near linear-time processing and lin…
▽ More
Closeness centrality, first considered by Bavelas (1948), is an importance measure of a node in a network which is based on the distances from the node to all other nodes. The classic definition, proposed by Bavelas (1950), Beauchamp (1965), and Sabidussi (1966), is (the inverse of) the average distance to all other nodes.
We propose the first highly scalable (near linear-time processing and linear space overhead) algorithm for estimating, within a small relative error, the classic closeness centralities of all nodes in the graph. Our algorithm applies to undirected graphs, as well as for centrality computed with respect to round-trip distances in directed graphs.
For directed graphs, we also propose an efficient algorithm that approximates generalizations of classic closeness centrality to outbound and inbound centralities. Although it does not provide worst-case theoretical approximation guarantees, it is designed to perform well on real networks.
We perform extensive experiments on large networks, demonstrating high scalability and accuracy.
△ Less
Submitted 29 August, 2014;
originally announced September 2014.
-
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
Authors:
Edith Cohen,
Daniel Delling,
Thomas Pajor,
Renato F. Werneck
Abstract:
Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces.
Basic computational problems in the study of diffusion are influence queries (determining the potency of a…
▽ More
Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces.
Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified seed set of nodes) and Influence Maximization (identifying the most influential seed set of a given size). Answering each influence query involves many edge traversals, and does not scale when there are many queries on very large graphs. The gold standard for Influence Maximization is the greedy algorithm, which iteratively adds to the seed set a node maximizing the marginal gain in influence. Greedy has a guaranteed approximation ratio of at least (1-1/e) and actually produces a sequence of nodes, with each prefix having approximation guarantee with respect to the same-size optimum. Since Greedy does not scale well beyond a few million edges, for larger inputs one must currently use either heuristics or alternative algorithms designed for a pre-specified small seed set size.
We develop a novel sketch-based design for influence computation. Our greedy Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with billions of edges, with one to two orders of magnitude speedup over the best greedy methods. It still has a guaranteed approximation ratio, and in practice its quality nearly matches that of exact greedy. We also present influence oracles, which use linear-time preprocessing to generate a small sketch for each node, allowing the influence of any seed set to be quickly answered from the sketches of its nodes.
△ Less
Submitted 26 August, 2014;
originally announced August 2014.
-
Data Structures for Mergeable Trees
Authors:
Loukas Georgiadis,
Haim Kaplan,
Nira Shafrir,
Robert E. Tarjan,
Renato F. Werneck
Abstract:
Motivated by an application in computational topology, we consider a novel variant of the problem of efficiently maintaining dynamic rooted trees. This variant requires merging two paths in a single operation. In contrast to the standard problem, in which only one tree arc changes at a time, a single merge operation can change many arcs. In spite of this, we develop a data structure that support…
▽ More
Motivated by an application in computational topology, we consider a novel variant of the problem of efficiently maintaining dynamic rooted trees. This variant requires merging two paths in a single operation. In contrast to the standard problem, in which only one tree arc changes at a time, a single merge operation can change many arcs. In spite of this, we develop a data structure that supports merges on an n-node forest in O(log^2 n) amortized time and all other standard tree operations in O(log n) time (amortized, worst-case, or randomized depending on the underlying data structure). For the special case that occurs in the motivating application, in which arbitrary arc deletions (cuts) are not allowed, we give a data structure with an O(log n) time bound per operation. This is asymptotically optimal under certain assumptions. For the even-more special case in which both cuts and parent queries are disallowed, we give an alternative O(log n)-time solution that uses standard dynamic trees as a black box. This solution also applies to the motivating application. Our methods use previous work on dynamic trees in various ways, but the analysis of each algorithm requires novel ideas. We also investigate lower bounds for the problem under various assumptions.
△ Less
Submitted 11 November, 2007;
originally announced November 2007.