Handling missing data in model-based clustering

Serafini, Alessio; Murphy, Thomas Brendan; Scrucca, Luca

Statistics > Machine Learning

arXiv:2006.02954 (stat)

[Submitted on 4 Jun 2020]

Title:Handling missing data in model-based clustering

Authors:Alessio Serafini, Thomas Brendan Murphy, Luca Scrucca

View PDF

Abstract:Gaussian Mixture models (GMMs) are a powerful tool for clustering, classification and density estimation when clustering structures are embedded in the data. The presence of missing values can largely impact the GMMs estimation process, thus handling missing data turns out to be a crucial point in clustering, classification and density estimation. Several techniques have been developed to impute the missing values before model estimation. Among these, multiple imputation is a simple and useful general approach to handle missing data. In this paper we propose two different methods to fit Gaussian mixtures in the presence of missing data. Both methods use a variant of the Monte Carlo Expectation-Maximisation (MCEM) algorithm for data augmentation. Thus, multiple imputations are performed during the E-step, followed by the standard M-step for a given eigen-decomposed component-covariance matrix. We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2006.02954 [stat.ML]
	(or arXiv:2006.02954v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2006.02954

Submission history

From: Alessio Serafini [view email]
[v1] Thu, 4 Jun 2020 15:36:31 UTC (691 KB)

Statistics > Machine Learning

Title:Handling missing data in model-based clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Handling missing data in model-based clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators