-
Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers
Authors:
Yijia Zhou,
Kyle A. Gallivan,
Adrian Barbu
Abstract:
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. It provides theoretical guarantees t…
▽ More
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. It provides theoretical guarantees that the algorithm obtains high accuracy with high probability under certain assumptions. Moreover, it can also be used as an initialization strategy for $k$-means clustering. Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm when clustering a large number of clusters, and a $k$-means algorithm initialized by the algorithm outperforms many of the classic clustering methods in both speed and accuracy, while scaling well to large datasets such as ImageNet.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Analysis of the Neighborhood Pattern Similarity Measure for the Role Extraction Problem
Authors:
Melissa Marchand,
Kyle A. Gallivan,
Wen Huang,
Paul Van Dooren
Abstract:
In this paper we analyze an indirect approach, called the Neighborhood Pattern Similarity approach, to solve the so-called role extraction problem of a large-scale graph. The method is based on the preliminary construction of a node similarity matrix which allows in a second stage to group together, with an appropriate clustering technique, the nodes that are assigned to have the same role. The an…
▽ More
In this paper we analyze an indirect approach, called the Neighborhood Pattern Similarity approach, to solve the so-called role extraction problem of a large-scale graph. The method is based on the preliminary construction of a node similarity matrix which allows in a second stage to group together, with an appropriate clustering technique, the nodes that are assigned to have the same role. The analysis builds on the notion of ideal graphs where all nodes with the same role, are also structurally equivalent.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Community Detection by a Riemannian Projected Proximal Gradient Method
Authors:
Meng Wei,
Wen Huang,
Kyle A. Gallivan,
Paul Van Dooren
Abstract:
Community detection plays an important role in understanding and exploiting the structure of complex systems. Many algorithms have been developed for community detection using modularity maximization or other techniques. In this paper, we formulate the community detection problem as a constrained nonsmooth optimization problem on the compact Stiefel manifold. A Riemannian projected proximal gradie…
▽ More
Community detection plays an important role in understanding and exploiting the structure of complex systems. Many algorithms have been developed for community detection using modularity maximization or other techniques. In this paper, we formulate the community detection problem as a constrained nonsmooth optimization problem on the compact Stiefel manifold. A Riemannian projected proximal gradient method is proposed and used to solve the problem. To the best of our knowledge, this is the first attempt to use Riemannian optimization for community detection problem. Numerical experimental results on synthetic benchmarks and real-world networks show that our algorithm is effective and outperforms several state-of-art algorithms.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.