Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
Authors:
Steinar Laenen,
Bogdan-Adrian Manghiuc,
He Sun
Abstract:
This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous stat…
▽ More
This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous state-of-the-art on synthetic and real-world datasets and show that our designed algorithm produces comparable or better HC trees with much lower running time.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs
Authors:
Bogdan-Adrian Manghiuc,
He Sun
Abstract:
Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our s…
▽ More
Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
Augmenting the Algebraic Connectivity of Graphs
Authors:
Bogdan-Adrian Manghiuc,
Pan Peng,
He Sun
Abstract:
For any undirected graph $G=(V,E)$ and a set $E_W$ of candidate edges with $E\cap E_W=\emptyset$, the $(k,γ)$-spectral augmentability problem is to find a set $F$ of $k$ edges from $E_W$ with appropriate weighting, such that the algebraic connectivity of the resulting graph $H=(V,E\cup F)$ is least $γ$. Because of a tight connection between the algebraic connectivity and many other graph parameter…
▽ More
For any undirected graph $G=(V,E)$ and a set $E_W$ of candidate edges with $E\cap E_W=\emptyset$, the $(k,γ)$-spectral augmentability problem is to find a set $F$ of $k$ edges from $E_W$ with appropriate weighting, such that the algebraic connectivity of the resulting graph $H=(V,E\cup F)$ is least $γ$. Because of a tight connection between the algebraic connectivity and many other graph parameters, including the graph's conductance and the mixing time of random walks in a graph, maximising the resulting graph's algebraic connectivity by adding a small number of edges has been studied over the past 15 years.
In this work we present an approximate and efficient algorithm for the $(k,γ)$-spectral augmentability problem, and our algorithm runs in almost-linear time under a wide regime of parameters. Our main algorithm is based on the following two novel techniques developed in the paper, which might have applications beyond the $(k,γ)$-spectral augmentability problem.
(1) We present a fast algorithm for solving a feasibility version of an SDP for the algebraic connectivity maximisation problem from [GB06]. Our algorithm is based on the classic primal-dual framework for solving SDP, which in turn uses the multiplicative weight update algorithm. We present a novel approach of unifying SDP constraints of different matrix and vector variables and give a good separation oracle accordingly.
(2) We present an efficient algorithm for the subgraph sparsification problem, and for a wide range of parameters our algorithm runs in almost-linear time, in contrast to the previously best known algorithm running in at least $Ω(n^2mk)$ time [KMST10]. Our analysis shows how the randomised BSS framework can be generalised in the setting of subgraph sparsification, and how the potential functions can be applied to approximately keep track of different subspaces.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.