Search | arXiv e-print repository

Spectral Clustering on Large Datasets: When Does it Work? Theory from Continuous Clustering and Density Cheeger-Buser

Authors: Timothy Chu, Gary Miller, Noel Walkington

Abstract: Spectral clustering is one of the most popular clustering algorithms that has stood the test of time. It is simple to describe, can be implemented using standard linear algebra, and often finds better clusters than traditional clustering algorithms like $k$-means and $k$-centers. The foundational algorithm for two-way spectral clustering, by Shi and Malik, creates a geometric graph from data and f… ▽ More Spectral clustering is one of the most popular clustering algorithms that has stood the test of time. It is simple to describe, can be implemented using standard linear algebra, and often finds better clusters than traditional clustering algorithms like $k$-means and $k$-centers. The foundational algorithm for two-way spectral clustering, by Shi and Malik, creates a geometric graph from data and finds a spectral cut of the graph. In modern machine learning, many data sets are modeled as a large number of points drawn from a probability density function. Little is known about when spectral clustering works in this setting -- and when it doesn't. Past researchers justified spectral clustering by appealing to the graph Cheeger inequality (which states that the spectral cut of a graph approximates the ``Normalized Cut''), but this justification is known to break down on large data sets. We provide theoretically-informed intuition about spectral clustering on large data sets drawn from probability densities, by proving when a continuous form of spectral clustering considered by past researchers (the unweighted spectral cut of a probability density) finds good clusters of the underlying density itself. Our work suggests that Shi-Malik spectral clustering works well on data drawn from mixtures of Laplace distributions, and works poorly on data drawn from certain other densities, such as a density we call the `square-root trough'. Our core theorem proves that weighted spectral cuts have low weighted isoperimetry for all probability densities. Our key tool is a new Cheeger-Buser inequality for all probability densities, including discontinuous ones. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2004.09589 [pdf, other]

Weighted Cheeger and Buser Inequalities, with Applications to Clustering and Cutting Probability Densities

Authors: Timothy Chu, Gary L. Miller, Noel J. Walkington, Alex L. Wang

Abstract: In this paper, we show how sparse or isoperimetric cuts of a probability density function relate to Cheeger cuts of its principal eigenfunction, for appropriate definitions of `sparse cut' and `principal eigenfunction'. We construct these appropriate definitions of sparse cut and principal eigenfunction in the probability density setting. Then, we prove Cheeger and Buser type inequalities simila… ▽ More In this paper, we show how sparse or isoperimetric cuts of a probability density function relate to Cheeger cuts of its principal eigenfunction, for appropriate definitions of `sparse cut' and `principal eigenfunction'. We construct these appropriate definitions of sparse cut and principal eigenfunction in the probability density setting. Then, we prove Cheeger and Buser type inequalities similar to those for the normalized graph Laplacian of Alon-Milman. We demonstrate that no such inequalities hold for most prior definitions of sparse cut and principal eigenfunction. We apply this result to generate novel algorithms for cutting probability densities and clustering data, including a principled variant of spectral clustering. △ Less

Submitted 6 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:1812.02841 [pdf, other]

Hardy-Muckenhoupt Bounds for Laplacian Eigenvalues

Authors: Gary L. Miller, Noel J. Walkington, Alex L. Wang

Abstract: We present two graph quantities Psi(G,S) and Psi_2(G) which give constant factor estimates to the Dirichlet and Neumann eigenvalues, lambda(G,S) and lambda_2(G), respectively. Our techniques make use of a discrete Hardy-type inequality. We present two graph quantities Psi(G,S) and Psi_2(G) which give constant factor estimates to the Dirichlet and Neumann eigenvalues, lambda(G,S) and lambda_2(G), respectively. Our techniques make use of a discrete Hardy-type inequality. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Showing 1–3 of 3 results for author: Walkington, N