-
From "I have nothing to hide" to "It looks like stalking": Measuring Americans' Level of Comfort with Individual Mobility Features Extracted from Location Data
Authors:
Naman Awasthi,
Saad Mohammad Abrar,
Daniel Smolyak,
Vanessa Frias-Martinez
Abstract:
Location data collection has become widespread with smart phones becoming ubiquitous. Smart phone apps often collect precise location data from users by offering \textit{free} services and then monetize it for advertising and marketing purposes. While major tech companies only sell aggregate behaviors for marketing purposes; data aggregators and data brokers offer access to individual location dat…
▽ More
Location data collection has become widespread with smart phones becoming ubiquitous. Smart phone apps often collect precise location data from users by offering \textit{free} services and then monetize it for advertising and marketing purposes. While major tech companies only sell aggregate behaviors for marketing purposes; data aggregators and data brokers offer access to individual location data. Some data brokers and aggregators have certain rules in place to preserve privacy; and the FTC has also started to vigorously regulate consumer privacy for location data. In this paper, we present an in-depth exploration of U.S. privacy perceptions with respect to specific location features derivable from data made available by location data brokers and aggregators. These results can provide policy implications that could assist organizations like the FTC in defining clear access rules. Using a factorial vignette survey, we collected responses from 1,405 participants to evaluate their level of comfort with sharing different types of location features, including individual trajectory data and visits to points of interest, available for purchase from data brokers worldwide. Our results show that trajectory-related features are associated with higher privacy concerns, that some data broker based obfuscation practices increase levels of comfort, and that race, ethnicity and education have an effect on data sharing privacy perceptions. We also model the privacy perceptions of people as a predictive task with F1 score \textbf{0.6}.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Systematic analysis of the effectiveness of adding human mobility data to covid-19 case prediction linear models
Authors:
Saad Mohammad Abrar,
Naman Awasthi,
Daniel Smolyak,
Vanessa Frias-Martinez
Abstract:
Human mobility data has been extensively used in covid-19 case prediction models. Nevertheless, related work has questioned whether mobility data really helps that much. We present a systematic analysis across mobility datasets and prediction lookaheads and reveal that adding mobility data to predictive models improves model performance only for about two months at the onset of the testing period,…
▽ More
Human mobility data has been extensively used in covid-19 case prediction models. Nevertheless, related work has questioned whether mobility data really helps that much. We present a systematic analysis across mobility datasets and prediction lookaheads and reveal that adding mobility data to predictive models improves model performance only for about two months at the onset of the testing period, and that performance improvements -- measured as predicted vs. actual correlation improvement over non-mobility baselines -- are at most 0.3.
△ Less
Submitted 4 May, 2024;
originally announced July 2024.
-
Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models
Authors:
Saad Mohammad Abrar,
Naman Awasthi,
Daniel Smolyak,
Vanessa Frias-Martinez
Abstract:
The US COVID-19 Forecast Hub, a repository of COVID-19 forecasts from over 50 independent research groups, is used by the Centers for Disease Control and Prevention (CDC) for their official COVID-19 communications. As such, the Forecast Hub is a critical centralized resource to promote transparent decision making. While the Forecast Hub has provided valuable predictions focused on accuracy, there…
▽ More
The US COVID-19 Forecast Hub, a repository of COVID-19 forecasts from over 50 independent research groups, is used by the Centers for Disease Control and Prevention (CDC) for their official COVID-19 communications. As such, the Forecast Hub is a critical centralized resource to promote transparent decision making. While the Forecast Hub has provided valuable predictions focused on accuracy, there is an opportunity to evaluate model performance across social determinants such as race and urbanization level that have been known to play a role in the COVID-19 pandemic. In this paper, we carry out a comprehensive fairness analysis of the Forecast Hub model predictions and we show statistically significant diverse predictive performance across social determinants, with minority racial and ethnic groups as well as less urbanized areas often associated with higher prediction errors. We hope this work will encourage COVID-19 modelers and the CDC to report fairness metrics together with accuracy, and to reflect on the potential harms of the models on specific social groups and contexts.
△ Less
Submitted 18 February, 2025; v1 submitted 17 May, 2024;
originally announced May 2024.
-
COVID-19's Unequal Toll: An assessment of small business impact disparities with respect to ethnorace in metropolitan areas in the US using mobility data
Authors:
Saad Mohammad Abrar,
Kazi Tasnim Zinat,
Naman Awasthi,
Vanessa Frias-Martinez
Abstract:
Early in the pandemic, counties and states implemented a variety of non-pharmacological interventions (NPIs) focused on mobility, such as national lockdowns or work-from-home strategies, as it became clear that restricting movement was essential to containing the epidemic. Due to these restrictions, businesses were severely affected and in particular, small, urban restaurant businesses. In additio…
▽ More
Early in the pandemic, counties and states implemented a variety of non-pharmacological interventions (NPIs) focused on mobility, such as national lockdowns or work-from-home strategies, as it became clear that restricting movement was essential to containing the epidemic. Due to these restrictions, businesses were severely affected and in particular, small, urban restaurant businesses. In addition to that, COVID-19 has also amplified many of the socioeconomic disparities and systemic racial inequities that exist in our society. The overarching objective of this study was to examine the changes in small urban restaurant visitation patterns following the COVID-19 pandemic and associated mobility restrictions, as well as to uncover potential disparities across different racial/ethnic groups in order to understand inequities in the impact and recovery. Specifically, the two key objectives were: 1) to analyze the overall changes in restaurant visitation patterns in US metropolitan areas during the pandemic compared to a pre-pandemic baseline, and 2) to investigate differences in visitation pattern changes across Census Block Groups with majority Asian, Black, Hispanic, White, and American Indian populations, identifying any disproportionate effects. Using aggregated geolocated cell phone data from SafeGraph, we document the overall changes in small urban restaurant businesses' visitation patterns with respect to racial composition at a granularity of Census Block Groups. Our results show clear indications of reduced visitation patterns after the pandemic, with slow recoveries. Via visualizations and statistical analyses, we show that reductions in visitation patterns were the highest for small urban restaurant businesses in majority Asian neighborhoods.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets
Authors:
Rahul Yedida,
Saad Mohammad Abrar,
Cleber Melo-Filho,
Eugene Muratov,
Rada Chirkova,
Alexander Tropsha
Abstract:
Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text.
Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the…
▽ More
Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text.
Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways, as a ground truth source for checking the existence of the proposed pairs. For the proposed pairs not found in ROBOKOP, we provide further verification using Chemotext.
Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For example, our model successfully identified that Omeprazole can help treat heartburn.We discuss the significance of this result, showing some examples of the proposed pairs.
Discussion and Conclusion: The agreement of our results with the existing knowledge source indicates a step in the right direction. Given the plug-and-play nature of our framework, it is easy to add, remove, or modify parts to improve the model as necessary. We discuss the results showing some examples, and note that this is a potentially new line of research that has further scope to be explored. Although our approach was originally oriented on radio podcast transcripts, it is input-agnostic and could be applied to any source of textual data and to any problem of interest.
△ Less
Submitted 22 October, 2020;
originally announced November 2020.