A Revenue Function for Comparison-Based Hierarchical Clustering

Mandal, Aishik; Perrot, Michaël; Ghoshdastidar, Debarghya

Computer Science > Machine Learning

arXiv:2211.16459 (cs)

[Submitted on 29 Nov 2022 (v1), last revised 2 Apr 2023 (this version, v2)]

Title:A Revenue Function for Comparison-Based Hierarchical Clustering

Authors:Aishik Mandal, Michaël Perrot, Debarghya Ghoshdastidar

View PDF

Abstract:Comparison-based learning addresses the problem of learning when, instead of explicit features or pairwise similarities, one only has access to comparisons of the form: \emph{Object $A$ is more similar to $B$ than to $C$.} Recently, it has been shown that, in Hierarchical Clustering, single and complete linkage can be directly implemented using only such comparisons while several algorithms have been proposed to emulate the behaviour of average linkage. Hence, finding hierarchies (or dendrograms) using only comparisons is a well understood problem. However, evaluating their meaningfulness when no ground-truth nor explicit similarities are available remains an open question.
In this paper, we bridge this gap by proposing a new revenue function that allows one to measure the goodness of dendrograms using only comparisons. We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities. On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons. On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods.

Comments:	26 pages, 6 figures, 5 tables. Transactions on Machine Learning Research (2023)
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2211.16459 [cs.LG]
	(or arXiv:2211.16459v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.16459

Submission history

From: Aishik Mandal [view email]
[v1] Tue, 29 Nov 2022 18:40:02 UTC (1,467 KB)
[v2] Sun, 2 Apr 2023 13:09:28 UTC (5,754 KB)

Computer Science > Machine Learning

Title:A Revenue Function for Comparison-Based Hierarchical Clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Revenue Function for Comparison-Based Hierarchical Clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators