K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

Brunet-Saumard, Camille; Genetay, Edouard; Saumard, Adrien

Abstract:We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. Guidelines are provided for tuning the hyper-parameters K-bMOM in practice. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm. Finally, considering a simplified and theoretical version of our estimator, we prove its robustness to adversarial contamination by deriving robust rates of convergence for the K-means distorsion. To our knowledge, it is the first result of this kind for the K-means distorsion.

Subjects:	Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:2002.03899 [stat.ME]
	(or arXiv:2002.03899v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2002.03899

Statistics > Methodology

Title:K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators