-
Sliding Window Algorithms for k-Clustering Problems
Authors:
Michele Borassi,
Alessandro Epasto,
Silvio Lattanzi,
Sergei Vassilvitskii,
Morteza Zadimoghaddam
Abstract:
The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we provid…
▽ More
The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we provide simple and practical algorithms that offer stronger performance guarantees than previous results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.
△ Less
Submitted 23 October, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Computing top-k Closeness Centrality Faster in Unweighted Graphs
Authors:
Elisabetta Bergamini,
Michele Borassi,
Pierluigi Crescenzi,
Andrea Marino,
Henning Meyerhenke
Abstract:
Given a connected graph $G=(V,E)$, the closeness centrality of a vertex $v$ is defined as $\frac{n-1}{\sum_{w \in V} d(v,w)}$. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the $k$ most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the firs…
▽ More
Given a connected graph $G=(V,E)$, the closeness centrality of a vertex $v$ is defined as $\frac{n-1}{\sum_{w \in V} d(v,w)}$. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the $k$ most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the first part of the paper, we prove that it is not solvable in time $Ø(|E|^{2-ε})$ on directed graphs, for any constant $ε>0$, under reasonable complexity assumptions. Furthermore, we propose a new algorithm for selecting the $k$ most central nodes in a graph: we experimentally show that this algorithm improves significantly both the textbook algorithm, which is based on computing the distance between all pairs of vertices, and the state of the art. For example, we are able to compute the top $k$ nodes in few dozens of seconds in real-world networks with millions of nodes and edges. Finally, as a case study, we compute the $10$ most central actors in the IMDB collaboration network, where two actors are linked if they played together in a movie, and in the Wikipedia citation network, which contains a directed edge from a page $p$ to a page $q$ if $p$ contains a link to $q$.
△ Less
Submitted 27 April, 2017; v1 submitted 4 April, 2017;
originally announced April 2017.
-
KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation
Authors:
Michele Borassi,
Emanuele Natale
Abstract:
We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on real-world complex networks. The efficiency of the new algorithm relies on two new theoretical contributions, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms t…
▽ More
We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on real-world complex networks. The efficiency of the new algorithm relies on two new theoretical contributions, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time $|E|^{\frac{1}{2}+o(1)}$ with high probability, obtaining a significant speedup with respect to the $Θ(|E|)$ worst-case performance. We experimentally show that this new technique achieves similar speedups on real-world complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the $k$ most central nodes. Furthermore, our analysis is general, and it might be extended to other settings.
△ Less
Submitted 12 August, 2016; v1 submitted 28 April, 2016;
originally announced April 2016.
-
An Axiomatic and an Average-Case Analysis of Algorithms and Heuristics for Metric Properties of Graphs
Authors:
Michele Borassi,
Pierluigi Crescenzi,
Luca Trevisan
Abstract:
In recent years, researchers proposed several algorithms that compute metric quantities of real-world complex networks, and that are very efficient in practice, although there is no worst-case guarantee.
In this work, we propose an axiomatic framework to analyze the performances of these algorithms, by proving that they are efficient on the class of graphs satisfying certain axioms. Furthermore,…
▽ More
In recent years, researchers proposed several algorithms that compute metric quantities of real-world complex networks, and that are very efficient in practice, although there is no worst-case guarantee.
In this work, we propose an axiomatic framework to analyze the performances of these algorithms, by proving that they are efficient on the class of graphs satisfying certain axioms. Furthermore, we prove that the axioms are verified asymptotically almost surely by several probabilistic models that generate power law random graphs, such as the In recent years, researchers proposed several algorithms that compute metric quantities of real-world complex networks, and that are very efficient in practice, although there is no worst-case guarantee.
In this work, we propose an axiomatic framework to analyze the performances of these algorithms, by proving that they are efficient on the class of graphs satisfying certain properties. Furthermore, we prove that these properties are verified asymptotically almost surely by several probabilistic models that generate power law random graphs, such as the Configuration Model, the Chung-Lu model, and the Norros-Reittu model. Thus, our results imply average-case analyses in these models.
For example, in our framework, existing algorithms can compute the diameter and the radius of a graph in subquadratic time, and sometimes even in time $n^{1+o(1)}$. Moreover, in some regimes, it is possible to compute the $k$ most central vertices according to closeness centrality in subquadratic time, and to design a distance oracle with sublinear query time and subquadratic space occupancy.
In the worst case, it is impossible to obtain comparable results for any of these problems, unless widely-believed conjectures are false.
△ Less
Submitted 15 January, 2017; v1 submitted 5 April, 2016;
originally announced April 2016.
-
A Note on the Complexity of Computing the Number of Reachable Vertices in a Digraph
Authors:
Michele Borassi
Abstract:
In this work, we consider the following problem: given a digraph $G=(V,E)$, for each vertex $v$, we want to compute the number of vertices reachable from $v$. In other words, we want to compute the out-degree of each vertex in the transitive closure of $G$. We show that this problem is not solvable in time $\mathcal{O}\left(|E|^{2-ε}\right)$ for any $ε>0$, unless the Strong Exponential Time Hypoth…
▽ More
In this work, we consider the following problem: given a digraph $G=(V,E)$, for each vertex $v$, we want to compute the number of vertices reachable from $v$. In other words, we want to compute the out-degree of each vertex in the transitive closure of $G$. We show that this problem is not solvable in time $\mathcal{O}\left(|E|^{2-ε}\right)$ for any $ε>0$, unless the Strong Exponential Time Hypothesis is false. This result still holds if $G$ is assumed to be acyclic.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
Fast and Simple Computation of Top-k Closeness Centralities
Authors:
Michele Borassi,
Pierluigi Crescenzi,
Andrea Marino
Abstract:
Closeness is an important centrality measure widely used in the analysis of real-world complex networks. In particular, the problem of selecting the k most central nodes with respect to this measure has been deeply analyzed in the last decade. However, even for not very large networks, this problem is computationally intractable in practice: indeed, Abboud et al have recently shown that its comple…
▽ More
Closeness is an important centrality measure widely used in the analysis of real-world complex networks. In particular, the problem of selecting the k most central nodes with respect to this measure has been deeply analyzed in the last decade. However, even for not very large networks, this problem is computationally intractable in practice: indeed, Abboud et al have recently shown that its complexity is strictly related to the complexity of the All-Pairs Shortest Path (in short, APSP) problem, for which no subcubic "combinatorial" algorithm is known. In this paper, we propose a new algorithm for selecting the k most closeness central nodes in a graph. In practice, this algorithm significantly improves over the APSP approach, even though its worst-case time complexity is the same. For example, the algorithm is able to compute the top k nodes in few dozens of seconds even when applied to real-world networks with millions of nodes and edges. We will also experimentally prove that our algorithm drastically outperforms the most recently designed algorithm, proposed by Olsen et al. Finally, we apply the new algorithm to the computation of the most central actors in the IMDB collaboration network, where two actors are linked if they played together in a movie.
△ Less
Submitted 6 July, 2015;
originally announced July 2015.
-
Hyperbolicity Measures "Democracy" in Real-World Networks
Authors:
Michele Borassi,
Alessandro Chessa,
Guido Caldarelli
Abstract:
We analyze the hyperbolicity of real-world networks, a geometric quantity that measures if a space is negatively curved. In our interpretation, a network with small hyperbolicity is "aristocratic", because it contains a small set of vertices involved in many shortest paths, so that few elements "connect" the systems, while a network with large hyperbolicity has a more "democratic" structure with a…
▽ More
We analyze the hyperbolicity of real-world networks, a geometric quantity that measures if a space is negatively curved. In our interpretation, a network with small hyperbolicity is "aristocratic", because it contains a small set of vertices involved in many shortest paths, so that few elements "connect" the systems, while a network with large hyperbolicity has a more "democratic" structure with a larger number of crucial elements.
We prove mathematically the soundness of this interpretation, and we derive its consequences by analyzing a large dataset of real-world networks. We confirm and improve previous results on hyperbolicity, and we analyze them in the light of our interpretation.
Moreover, we study (for the first time in our knowledge) the hyperbolicity of the neighborhood of a given vertex. This allows to define an "influence area" for the vertices in the graph. We show that the influence area of the highest degree vertex is small in what we define "local" networks, like most social or peer-to-peer networks. On the other hand, if the network is built in order to reach a "global" goal, as in metabolic networks or autonomous system networks, the influence area is much larger, and it can contain up to half the vertices in the graph. In conclusion, our newly introduced approach allows to distinguish the topology and the structure of various complex networks.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.
-
Into the Square - On the Complexity of Quadratic-Time Solvable Problems
Authors:
Michele Borassi,
Pierluigi Crescenzi,
Michel Habib
Abstract:
This paper will analyze several quadratic-time solvable problems, and will classify them into two classes: problems that are solvable in truly subquadratic time (that is, in time $O(n^{2-ε})$ for some $ε>0$) and problems that are not, unless the well known Strong Exponential Time Hypothesis (SETH) is false. In particular, we will prove that some quadratic-time solvable problems are indeed easier t…
▽ More
This paper will analyze several quadratic-time solvable problems, and will classify them into two classes: problems that are solvable in truly subquadratic time (that is, in time $O(n^{2-ε})$ for some $ε>0$) and problems that are not, unless the well known Strong Exponential Time Hypothesis (SETH) is false. In particular, we will prove that some quadratic-time solvable problems are indeed easier than expected. We will provide an algorithm that computes the transitive closure of a directed graph in time $O(mn^{\frac{ω+1}{4}})$, where $m$ denotes the number of edges in the transitive closure and $ω$ is the exponent for matrix multiplication. As a side effect, we will prove that our algorithm runs in time $O(n^{\frac{5}{3}})$ if the transitive closure is sparse. The same time bounds hold if we want to check whether a graph is transitive, by replacing m with the number of edges in the graph itself. As far as we know, this is the fastest algorithm for sparse transitive digraph recognition. Finally, we will apply our algorithm to the comparability graph recognition problem (dating back to 1941), obtaining the first truly subquadratic algorithm. The second part of the paper deals with hardness results. Starting from an artificial quadratic-time solvable variation of the k-SAT problem, we will construct a graph of Karp reductions, proving that a truly subquadratic-time algorithm for any of the problems in the graph falsifies SETH. The analyzed problems are the following: computing the subset graph, finding dominating sets, computing the betweenness centrality of a vertex, computing the minimum closeness centrality, and computing the hyperbolicity of a pair of vertices. We will also be able to include in our framework three proofs already appeared in the literature, concerning the graph diameter computation, local alignment of strings and orthogonality of vectors.
△ Less
Submitted 18 July, 2014;
originally announced July 2014.