Clustering performance analysis using new correlation based cluster validity indices

Wiroonsri, Nathakhun

Statistics > Machine Learning

arXiv:2109.11172v1 (stat)

[Submitted on 23 Sep 2021 (this version), latest version 25 Jul 2022 (v2)]

Title:Clustering performance analysis using new correlation based cluster validity indices

Authors:Nathakhun Wiroonsri

View PDF

Abstract:There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.

Comments:	20 pages
Subjects:	Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
MSC classes:	62H30, 68T10
Cite as:	arXiv:2109.11172 [stat.ML]
	(or arXiv:2109.11172v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2109.11172

Submission history

From: Nathakhun Wiroonsri [view email]
[v1] Thu, 23 Sep 2021 06:59:41 UTC (2,176 KB)
[v2] Mon, 25 Jul 2022 06:41:09 UTC (3,343 KB)

Statistics > Machine Learning

Title:Clustering performance analysis using new correlation based cluster validity indices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Clustering performance analysis using new correlation based cluster validity indices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators