Fair-Capacitated Clustering

Quy, Tai Le; Roy, Arjun; Friege, Gunnar; Ntoutsi, Eirini

Computer Science > Machine Learning

arXiv:2104.12116 (cs)

[Submitted on 25 Apr 2021 (v1), last revised 28 Apr 2021 (this version, v2)]

Title:Fair-Capacitated Clustering

Authors:Tai Le Quy, Arjun Roy, Gunnar Friege, Eirini Ntoutsi

View PDF

Abstract:Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

Comments:	10 pages, 5 figures, 14th International Conference on Educational Data Mining - EDM 2021 (short paper)
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2104.12116 [cs.LG]
	(or arXiv:2104.12116v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2104.12116
Journal reference:	14th International Conference on Educational Data Mining - EDM 2021

Submission history

From: Tai Le Quy [view email]
[v1] Sun, 25 Apr 2021 09:39:39 UTC (178 KB)
[v2] Wed, 28 Apr 2021 07:24:28 UTC (178 KB)

Computer Science > Machine Learning

Title:Fair-Capacitated Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fair-Capacitated Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators