-
New Survey Questions and Estimators for Network Clustering with Respondent-Driven Sampling Data
Authors:
Ashton M. Verdery,
Jacob C. Fisher,
Nalyn Siripong,
Kahina Abdesselam,
Shawn Bauldry
Abstract:
Respondent-driven sampling (RDS) is a popular method for sampling hard-to-survey populations that leverages social network connections through peer recruitment. While RDS is most frequently applied to estimate the prevalence of infections and risk behaviors of interest to public health, like HIV/AIDS or condom use, it is rarely used to draw inferences about the structural properties of social netw…
▽ More
Respondent-driven sampling (RDS) is a popular method for sampling hard-to-survey populations that leverages social network connections through peer recruitment. While RDS is most frequently applied to estimate the prevalence of infections and risk behaviors of interest to public health, like HIV/AIDS or condom use, it is rarely used to draw inferences about the structural properties of social networks among such populations because it does not typically collect the necessary data. Drawing on recent advances in computer science, we introduce a set of data collection instruments and RDS estimators for network clustering, an important topological property that has been linked to a network's potential for diffusion of information, disease, and health behaviors. We use simulations to explore how these estimators, originally developed for random walk samples of computer networks, perform when applied to RDS samples with characteristics encountered in realistic field settings that depart from random walks. In particular, we explore the effects of multiple seeds, without vs. with replacement, branching chains, imperfect response rates, preferential recruitment, and misreporting of ties. We find that clustering coefficient estimators retain desirable properties in RDS samples. This paper takes an important step towards calculating network characteristics using non-traditional sampling methods, and it expands RDS's potential to tell researchers more about hidden populations and the social factors driving disease prevalence.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities
Authors:
Nathaniel D. Porter,
Ashton M. Verdery,
S. Michael Gaddis
Abstract:
The importance of big data is a contested topic among social scientists. Proponents claim it will fuel a research revolution, but skeptics challenge it as unreliably measured and decontextualized, with limited utility for accurately answering social science research questions. We argue that social scientists need effective tools to quantify big data's measurement error and expand the contextual in…
▽ More
The importance of big data is a contested topic among social scientists. Proponents claim it will fuel a research revolution, but skeptics challenge it as unreliably measured and decontextualized, with limited utility for accurately answering social science research questions. We argue that social scientists need effective tools to quantify big data's measurement error and expand the contextual information associated with it. Standard research efforts in many fields already pursue these goals through data augmentation, the systematic assessment of measurement against known quantities and expansion of extant data by adding new information. Traditionally, these tasks are accomplished using trained research assistants or specialized algorithms. However, such approaches may not be scalable to big data or appease its skeptics. We consider a third alternative that may increase the validity and value of big data: data augmentation with online crowdsourcing. We present three empirical cases to illustrate the strengths and limits of crowdsourcing for academic research, with a particular eye to how they can be applied to data augmentation tasks that will accelerate acceptance of big data among social scientists. The cases use Amazon Mechanical Turk to 1. verify automated coding of the academic discipline of dissertation committee members, 2. link online product pages to a book database, and 3. gather data on mental health resources at colleges. In light of these cases, we consider the costs and benefits of augmenting big data with crowdsourcing marketplaces and provide guidelines on best practices. We also offer a standardized reporting template that will enhance reproducibility.
△ Less
Submitted 29 May, 2017; v1 submitted 27 September, 2016;
originally announced September 2016.
-
Network Structure and Biased Variance Estimation in Respondent Driven Sampling
Authors:
Ashton M. Verdery,
Ted Mouw,
Shawn Bauldry,
Peter J. Mucha
Abstract:
This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance e…
▽ More
This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
△ Less
Submitted 4 December, 2015; v1 submitted 19 September, 2013;
originally announced September 2013.