-
Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace
Authors:
Ayush Jain,
Akash Das Sarma,
Aditya Parameswaran,
Jennifer Widom
Abstract:
We conduct an experimental analysis of a dataset comprising over 27 million microtasks performed by over 70,000 workers issued to a large crowdsourcing marketplace between 2012-2016. Using this data---never before analyzed in an academic context---we shed light on three crucial aspects of crowdsourcing: (1) Task design --- helping requesters understand what constitutes an effective task, and how t…
▽ More
We conduct an experimental analysis of a dataset comprising over 27 million microtasks performed by over 70,000 workers issued to a large crowdsourcing marketplace between 2012-2016. Using this data---never before analyzed in an academic context---we shed light on three crucial aspects of crowdsourcing: (1) Task design --- helping requesters understand what constitutes an effective task, and how to go about designing one; (2) Marketplace dynamics --- helping marketplace administrators and designers understand the interaction between tasks and workers, and the corresponding marketplace load; and (3) Worker behavior --- understanding worker attention spans, lifetimes, and general behavior, for the improvement of the crowdsourcing ecosystem as a whole.
△ Less
Submitted 22 January, 2017;
originally announced January 2017.
-
Globally Optimal Crowdsourcing Quality Management
Authors:
Akash Das Sarma,
Aditya Parameswaran,
Jennifer Widom
Abstract:
We study crowdsourcing quality management, that is, given worker responses to a set of tasks, our goal is to jointly estimate the true answers for the tasks, as well as the quality of the workers. Prior work on this problem relies primarily on applying Expectation-Maximization (EM) on the underlying maximum likelihood problem to estimate true answers as well as worker quality. Unfortunately, EM on…
▽ More
We study crowdsourcing quality management, that is, given worker responses to a set of tasks, our goal is to jointly estimate the true answers for the tasks, as well as the quality of the workers. Prior work on this problem relies primarily on applying Expectation-Maximization (EM) on the underlying maximum likelihood problem to estimate true answers as well as worker quality. Unfortunately, EM only provides a locally optimal solution rather than a globally optimal one. Other solutions to the problem (that do not leverage EM) fail to provide global optimality guarantees as well. In this paper, we focus on filtering, where tasks require the evaluation of a yes/no predicate, and rating, where tasks elicit integer scores from a finite domain. We design algorithms for finding the global optimal estimates of correct task answers and worker quality for the underlying maximum likelihood problem, and characterize the complexity of these algorithms. Our algorithms conceptually consider all mappings from tasks to true answers (typically a very large number), leveraging two key ideas to reduce, by several orders of magnitude, the number of mappings under consideration, while preserving optimality. We also demonstrate that these algorithms often find more accurate estimates than EM-based algorithms. This paper makes an important contribution towards understanding the inherent complexity of globally optimal crowdsourcing quality management.
△ Less
Submitted 1 March, 2015; v1 submitted 26 February, 2015;
originally announced February 2015.
-
Human-Assisted Graph Search: It's Okay to Ask Questions
Authors:
Aditya Parameswaran,
Anish Das Sarma,
Hector Garcia-Molina,
Neoklis Polyzotis,
Jennifer Widom
Abstract:
We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form "Is there a target node that is reachable from the current node?". This general problem has applications in many domains that can utilize human intelligence, including cur…
▽ More
We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form "Is there a target node that is reachable from the current node?". This general problem has applications in many domains that can utilize human intelligence, including curation of hierarchies, debugging workflows, image segmentation and categorization, interactive search and filter synthesis. To our knowledge, this work provides the first formal algorithmic study of the optimization of human computation for this problem. We study various dimensions of the problem space, providing algorithms and complexity results. Our framework and algorithms can be used in the design of an optimizer for crowd-sourcing platforms such as Mechanical Turk.
△ Less
Submitted 16 March, 2011;
originally announced March 2011.
-
The Lowell Database Research Self Assessment
Authors:
Serge Abiteboul,
Rakesh Agrawal,
Phil Bernstein,
Mike Carey,
Stefano Ceri,
Bruce Croft,
David DeWitt,
Mike Franklin,
Hector Garcia Molina,
Dieter Gawlick,
Jim Gray,
Laura Haas,
Alon Halevy,
Joe Hellerstein,
Yannis Ioannidis,
Martin Kersten,
Michael Pazzani,
Mike Lesk,
David Maier,
Jeff Naughton,
Hans Schek,
Timos Sellis,
Avi Silberschatz,
Mike Stonebraker,
Rick Snodgrass
, et al. (4 additional authors not shown)
Abstract:
A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of most complex software systems. It…
▽ More
A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of most complex software systems. It recommends that database researchers increase focus on: integration of text, data, code, and streams; fusion of information from heterogeneous data sources; reasoning about uncertain data; unsupervised data mining for interesting correlations; information privacy; and self-adaptation and repair.
△ Less
Submitted 6 October, 2003;
originally announced October 2003.