Spectral Clustering with Likelihood Refinement is Optimal for Latent Class Recovery

Lyu, Zhongyuan; Gu, Yuqi

Abstract:Latent class models are widely used for identifying unobserved subgroups from multivariate categorical data in social sciences, with binary data as a particularly popular example. However, accurately recovering individual latent class memberships and determining the number of classes remains challenging, especially when handling large-scale datasets with many items. This paper proposes a novel two-stage algorithm for latent class models with high-dimensional binary responses. Our method first initializes latent class assignments by an easy-to-implement spectral clustering algorithm, and then refines these assignments with a one-step likelihood-based update. This approach combines the computational efficiency of spectral clustering with the improved statistical accuracy of likelihood-based estimation. We establish theoretical guarantees showing that this method leads to optimal latent class recovery and exact clustering of subjects under mild conditions. Additionally, we propose a simple consistent estimator for the number of latent classes. Extensive experiments on both simulated data and real data validate our theoretical results and demonstrate our method's superior performance over alternative methods.

Subjects:	Methodology (stat.ME)
MSC classes:	62H30, 91C20
ACM classes:	G.3
Cite as:	arXiv:2506.07167 [stat.ME]
	(or arXiv:2506.07167v1 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2506.07167

Statistics > Methodology

Title:Spectral Clustering with Likelihood Refinement is Optimal for Latent Class Recovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators