Diversity of symptom phenotypes in SARS-CoV-2 community infections observed in multiple large datasets
Authors:
Martyn Fyles,
Karina-Doris Vihta,
Carole H Sudre,
Harry Long,
Rajenki Das,
Caroline Jay,
Tom Wingfield,
Fergus Cumming,
William Green,
Pantelis Hadjipantelis,
Joni Kirk,
Claire J Steves,
Sebastien Ourselin,
Graham F Medley,
Elizabeth Fearon,
Thomas House
Abstract:
Through the use of cutting-edge unsupervised classification techniques from statistics and machine learning, we characterise symptom phenotypes among symptomatic SARS-CoV-2 PCR-positive community cases. We first analyse each dataset in isolation and across age bands, before using methods that allow us to compare multiple datasets. While we observe separation due to the total number of symptoms exp…
▽ More
Through the use of cutting-edge unsupervised classification techniques from statistics and machine learning, we characterise symptom phenotypes among symptomatic SARS-CoV-2 PCR-positive community cases. We first analyse each dataset in isolation and across age bands, before using methods that allow us to compare multiple datasets. While we observe separation due to the total number of symptoms experienced by cases, we also see a separation of symptoms into gastrointestinal, respiratory and other types, and different symptom co-occurrence patterns at the extremes of age. In this way, we are able to demonstrate the deep structure of symptoms of COVID-19 without usual biases due to study design. This is expected to have implications for the identification and management of community SARS-CoV-2 cases and could be further applied to symptom-based management of other diseases and syndromes.
△ Less
Submitted 20 November, 2023; v1 submitted 10 November, 2021;
originally announced November 2021.
Accessible Data Curation and Analytics for International-Scale Citizen Science Datasets
Authors:
Benjamin Murray,
Eric Kerfoot,
Mark S. Graham,
Carole H. Sudre,
Erika Molteni,
Liane S. Canas,
Michela Antonelli,
Kerstin Klaser,
Alessia Visconti,
Andrew T. Chan,
Paul W. Franks,
Richard Davies,
Jonathan Wolf,
Tim Spector,
Claire J. Steves,
Marc Modat,
Sebastien Ourselin
Abstract:
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scal…
▽ More
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scale of the dataset means that it can no longer be easily processed using standard software on commodity hardware. Secondly, the size of the research group means that replicability and consistency of key analytics used across multiple publications becomes an issue. We present ExeTera, an open source data curation software designed to address scalability challenges and to enable reproducible research across an international research group for datasets such as the Covid Symptom Study dataset.
△ Less
Submitted 17 February, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.