A Taxonomy of Challenges to Curating Fair Datasets

Zhao, Dora; Scheuerman, Morgan Klaus; Chitre, Pooja; Andrews, Jerone T. A.; Panagiotidou, Georgia; Walker, Shawn; Pine, Kathleen H.; Xiang, Alice

Computer Science > Machine Learning

arXiv:2406.06407 (cs)

[Submitted on 10 Jun 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:A Taxonomy of Challenges to Curating Fair Datasets

Authors:Dora Zhao, Morgan Klaus Scheuerman, Pooja Chitre, Jerone T.A. Andrews, Georgia Panagiotidou, Shawn Walker, Kathleen H. Pine, Alice Xiang

View PDF HTML (experimental)

Abstract:Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

Comments:	NeurIPS Datasets & Benchmarks 2024 (Oral)
Subjects:	Machine Learning (cs.LG); Computers and Society (cs.CY)
Cite as:	arXiv:2406.06407 [cs.LG]
	(or arXiv:2406.06407v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.06407

Submission history

From: Dora Zhao [view email]
[v1] Mon, 10 Jun 2024 15:59:08 UTC (893 KB)
[v2] Thu, 31 Oct 2024 18:47:52 UTC (1,510 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-06

Change to browse by:

cs
cs.CY

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:A Taxonomy of Challenges to Curating Fair Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Taxonomy of Challenges to Curating Fair Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators