Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation

Beijbom, Oscar

Computer Science > Computers and Society

arXiv:1410.7074 (cs)

[Submitted on 26 Oct 2014 (v1), last revised 21 Aug 2015 (this version, v4)]

Title:Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation

Authors:Oscar Beijbom

View PDF

Abstract:Methods for automated collection and annotation are changing the cost-structures of sampling surveys for a wide range of applications. Digital samples in the form of images or audio recordings can be collected rapidly, and annotated by computer programs or crowd workers. We consider the problem of estimating a population mean under these new cost-structures, and propose a Hybrid-Offset sampling design. This design utilizes two annotators: a primary, which is accurate but costly (e.g. a human expert) and an auxiliary which is noisy but cheap (e.g. a computer program), in order to minimize total sampling expenditures. Our analysis gives necessary conditions for the Hybrid-Offset design and specifies optimal sample sizes for both annotators. Simulations on data from a coral reef survey program indicate that the Hybrid-Offset design outperforms several alternative sampling designs. In particular, sampling expenditures are reduced 50% compared to the Conventional design currently deployed by the coral ecologists.

Comments:	PDF contains 9 pages of manuscript followed by 7 pages of Supplementary Information
Subjects:	Computers and Society (cs.CY); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:1410.7074 [cs.CY]
	(or arXiv:1410.7074v4 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.1410.7074

Submission history

From: Oscar Beijbom Mr [view email]
[v1] Sun, 26 Oct 2014 20:12:32 UTC (670 KB)
[v2] Wed, 29 Oct 2014 15:59:30 UTC (670 KB)
[v3] Wed, 19 Nov 2014 01:19:14 UTC (670 KB)
[v4] Fri, 21 Aug 2015 16:18:15 UTC (86 KB)

Computer Science > Computers and Society

Title:Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators