A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Shah, Nihar B.; Balakrishnan, Sivaraman; Wainwright, Martin J.

Computer Science > Machine Learning

arXiv:1606.09632v2 (cs)

[Submitted on 30 Jun 2016 (v1), revised 7 Nov 2019 (this version, v2), latest version 10 Jan 2021 (v3)]

Title:A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Authors:Nihar B. Shah, Sivaraman Balakrishnan, Martin J. Wainwright

View PDF

Abstract:The aggregation and denoising of crowd-labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive minimax rates of convergence for the permutation-based model that are optimal (up to logarithmic factors). We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. We then design a computationally-efficient method, called the OBI-WAN estimator, that is optimal over a class intermediate between the permutation-based and the Dawid-Skene models (up to logarithmic factors), and also simultaneously achieves non-trivial guarantees over the entire permutation-based model class. Finally, we conduct synthetic simulations and experiments on real-world crowdsourcing data, and these corroborate our theoretical findings.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
Cite as:	arXiv:1606.09632 [cs.LG]
	(or arXiv:1606.09632v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1606.09632

Submission history

From: Nihar Shah [view email]
[v1] Thu, 30 Jun 2016 19:40:56 UTC (110 KB)
[v2] Thu, 7 Nov 2019 04:58:30 UTC (150 KB)
[v3] Sun, 10 Jan 2021 18:18:41 UTC (126 KB)

Computer Science > Machine Learning

Title:A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators