Search | arXiv e-print repository

doi 10.1109/JSTSP.2018.2837638

Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

Authors: Kwangjun Ahn, Kangwook Lee, Changho Suh

Abstract: Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only \emph{multi}-way similarity measures are available. This motivates us to explore the multi-way measurement setting. In… ▽ More Spectral clustering is a celebrated algorithm that partitions objects based on pairwise similarity information. While this approach has been successfully applied to a variety of domains, it comes with limitations. The reason is that there are many other applications in which only \emph{multi}-way similarity measures are available. This motivates us to explore the multi-way measurement setting. In this work, we develop two algorithms intended for such setting: Hypergraph Spectral Clustering (HSC) and Hypergraph Spectral Clustering with Local Refinement (HSCLR). Our main contribution lies in performance analysis of the poly-time algorithms under a random hypergraph model, which we name the weighted stochastic block model, in which objects and multi-way measures are modeled as nodes and weights of hyperedges, respectively. Denoting by $n$ the number of nodes, our analysis reveals the following: (1) HSC outputs a partition which is better than a random guess if the sum of edge weights (to be explained later) is $Ω(n)$; (2) HSC outputs a partition which coincides with the hidden partition except for a vanishing fraction of nodes if the sum of edge weights is $ω(n)$; and (3) HSCLR exactly recovers the hidden partition if the sum of edge weights is on the order of $n \log n$. Our results improve upon the state of the arts recently established under the model and they firstly settle the order-wise optimal results for the binary edge weight case. Moreover, we show that our results lead to efficient sketching algorithms for subspace clustering, a computer vision application. Lastly, we show that HSCLR achieves the information-theoretic limits for a special yet practically relevant model, thereby showing no computational barrier for the case. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: 16 pages; 3 figures

Journal ref: October 2018 special issue on "Information-Theoretic Methods in Data Acquisition, Analysis, and Processing" of the IEEE Journal of Selected Topics in Signal Processing

arXiv:1602.03828 [pdf, other]

Community Recovery in Graphs with Locality

Authors: Yuxin Chen, Govinda Kamath, Changho Suh, David Tse

Abstract: Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all nodes pairs, as in most existing models.… ▽ More Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all nodes pairs, as in most existing models. We present an algorithm that runs nearly linearly in the number of measurements and which achieves the information theoretic limit for exact recovery. △ Less

Submitted 1 June, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

Comments: accepted in part to International Conference on Machine Learning (ICML), 2016

arXiv:1504.07218 [pdf, ps, other]

Spectral MLE: Top-$K$ Rank Aggregation from Pairwise Comparisons

Authors: Yuxin Chen, Changho Suh

Abstract: This paper explores the preference-based top-$K$ rank aggregation problem. Suppose that a collection of items is repeatedly compared in pairs, and one wishes to recover a consistent ordering that emphasizes the top-$K$ ranked items, based on partially revealed preferences. We focus on the Bradley-Terry-Luce (BTL) model that postulates a set of latent preference scores underlying all items, where t… ▽ More This paper explores the preference-based top-$K$ rank aggregation problem. Suppose that a collection of items is repeatedly compared in pairs, and one wishes to recover a consistent ordering that emphasizes the top-$K$ ranked items, based on partially revealed preferences. We focus on the Bradley-Terry-Luce (BTL) model that postulates a set of latent preference scores underlying all items, where the odds of paired comparisons depend only on the relative scores of the items involved. We characterize the minimax limits on identifiability of top-$K$ ranked items, in the presence of random and non-adaptive sampling. Our results highlight a separation measure that quantifies the gap of preference scores between the $K^{\text{th}}$ and $(K+1)^{\text{th}}$ ranked items. The minimum sample complexity required for reliable top-$K$ ranking scales inversely with the separation measure irrespective of other preference distribution metrics. To approach this minimax limit, we propose a nearly linear-time ranking scheme, called \emph{Spectral MLE}, that returns the indices of the top-$K$ items in accordance to a careful score estimate. In a nutshell, Spectral MLE starts with an initial score estimate with minimal squared loss (obtained via a spectral method), and then successively refines each component with the assistance of coordinate-wise MLEs. Encouragingly, Spectral MLE allows perfect top-$K$ item identification under minimal sample complexity. The practical applicability of Spectral MLE is further corroborated by numerical experiments. △ Less

Submitted 28 May, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

Comments: accepted to International Conference on Machine Learning (ICML), 2015

arXiv:1504.01369 [pdf, other]

Information Recovery from Pairwise Measurements

Authors: Yuxin Chen, Changho Suh, Andrea J. Goldsmith

Abstract: This paper is concerned with jointly recovering $n$ node-variables $\left\{ x_{i}\right\}_{1\leq i\leq n}$ from a collection of pairwise difference measurements. Imagine we acquire a few observations taking the form of $x_{i}-x_{j}$; the observation pattern is represented by a measurement graph $\mathcal{G}$ with an edge set $\mathcal{E}$ such that $x_{i}-x_{j}$ is observed if and only if… ▽ More This paper is concerned with jointly recovering $n$ node-variables $\left\{ x_{i}\right\}_{1\leq i\leq n}$ from a collection of pairwise difference measurements. Imagine we acquire a few observations taking the form of $x_{i}-x_{j}$; the observation pattern is represented by a measurement graph $\mathcal{G}$ with an edge set $\mathcal{E}$ such that $x_{i}-x_{j}$ is observed if and only if $(i,j)\in\mathcal{E}$. To account for noisy measurements in a general manner, we model the data acquisition process by a set of channels with given input/output transition measures. Employing information-theoretic tools applied to channel decoding problems, we develop a \emph{unified} framework to characterize the fundamental recovery criterion, which accommodates general graph structures, alphabet sizes, and channel transition measures. In particular, our results isolate a family of \emph{minimum} \emph{channel divergence measures} to characterize the degree of measurement corruption, which together with the size of the minimum cut of $\mathcal{G}$ dictates the feasibility of exact information recovery. For various homogeneous graphs, the recovery condition depends almost only on the edge sparsity of the measurement graph irrespective of other graphical metrics; alternatively, the minimum sample complexity required for these graphs scales like \[ \text{minimum sample complexity }\asymp\frac{n\log n}{\mathsf{Hel}_{1/2}^{\min}} \] for certain information metric $\mathsf{Hel}_{1/2}^{\min}$ defined in the main text, as long as the alphabet size is not super-polynomial in $n$. We apply our general theory to three concrete applications, including the stochastic block model, the outlier model, and the haplotype assembly problem. Our theory leads to order-wise tight recovery conditions for all these scenarios. △ Less

Submitted 5 May, 2016; v1 submitted 6 April, 2015; originally announced April 2015.

Comments: This work has been presented in part in ISIT 2014 (https://arxiv.boxedpaper.com/abs/1404.7105) and ISIT 2015

arXiv:1010.4101 [pdf, other]

Boundary-twisted normal form and the number of elementary moves to unknot

Authors: Chan-Ho Suh

Abstract: Suppose $K$ is an unknot lying in the 1-skeleton of a triangulated 3-manifold with $t$ tetrahedra. Hass and Lagarias showed there is an upper bound, depending only on $t$, for the minimal number of elementary moves to untangle $K$. We give a simpler proof, utilizing a normal form for surfaces whose boundary is contained in the 1-skeleton of a triangulated 3-manifold. We also obtain a significantly… ▽ More Suppose $K$ is an unknot lying in the 1-skeleton of a triangulated 3-manifold with $t$ tetrahedra. Hass and Lagarias showed there is an upper bound, depending only on $t$, for the minimal number of elementary moves to untangle $K$. We give a simpler proof, utilizing a normal form for surfaces whose boundary is contained in the 1-skeleton of a triangulated 3-manifold. We also obtain a significantly better upper bound of $2^{120t+14}$ and improve the Hass--Lagarias upper bound on the number of Reidemeister moves needed to unknot to $2^{10^5 n}$, where $n$ is the crossing number. △ Less

Submitted 20 October, 2010; originally announced October 2010.

Comments: 17 pages, many figures

MSC Class: 57M; 57N10

arXiv:1009.1500 [pdf, other]

The Unknotting Problem and Normal Surface Q-Theory

Authors: Chan-Ho Suh

Abstract: Tollefson described a variant of normal surface theory for 3-manifolds, called Q-theory, where only the quadrilateral coordinates are used. Suppose $M$ is a triangulated, compact, irreducible, boundary-irreducible 3-manifold. In Q-theory, if $M$ contains an essential surface, then the projective solution space has an essential surface at a vertex. One interesting situation not covered by this theo… ▽ More Tollefson described a variant of normal surface theory for 3-manifolds, called Q-theory, where only the quadrilateral coordinates are used. Suppose $M$ is a triangulated, compact, irreducible, boundary-irreducible 3-manifold. In Q-theory, if $M$ contains an essential surface, then the projective solution space has an essential surface at a vertex. One interesting situation not covered by this theorem is when $M$ is boundary reducible, e.g. $M$ is an unknot complement. We prove that in this case $M$ has an essential disc at a vertex of the Q-projective solution space. △ Less

Submitted 8 September, 2010; originally announced September 2010.

Comments: 13 pages, 4 figures

MSC Class: 57M

arXiv:0708.2162

Normal Surface Theory in Link Diagrams

Authors: Chan-Ho Suh

Abstract: This paper has been withdrawn by the author, due to a significant error in section 4.3.1. This paper has been withdrawn by the author, due to a significant error in section 4.3.1. △ Less

Submitted 26 January, 2009; v1 submitted 16 August, 2007; originally announced August 2007.

Comments: This paper is withdrawn

MSC Class: 57M25; 68Q25

Showing 1–7 of 7 results for author: Suh, C