Stable predictions for health related anticausal prediction tasks affected by selection biases: the need to deconfound the test set features
Authors:
Elias Chaibub Neto,
Phil Snyder,
Solveig K Sieberts,
Larsson Omberg
Abstract:
In health related machine learning applications, the training data often corresponds to a non-representative sample from the target populations where the learners will be deployed. In anticausal prediction tasks, selection biases often make the associations between confounders and the outcome variable unstable across different target environments. As a consequence, the predictions from confounded…
▽ More
In health related machine learning applications, the training data often corresponds to a non-representative sample from the target populations where the learners will be deployed. In anticausal prediction tasks, selection biases often make the associations between confounders and the outcome variable unstable across different target environments. As a consequence, the predictions from confounded learners are often unstable, and might fail to generalize in shifted test environments. Stable prediction approaches aim to solve this problem by producing predictions that are stable across unknown test environments. These approaches, however, are sometimes applied to the training data alone with the hope that training an unconfounded model will be enough to generate stable predictions in shifted test sets. Here, we show that this is insufficient, and that improved stability can be achieved by deconfounding the test set features as well. We illustrate these observations using both synthetic data and real world data from a mobile health study.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
Indicators of retention in remote digital health studies: A cross-study evaluation of 100,000 participants
Authors:
Abhishek Pratap,
Elias Chaibub Neto,
Phil Snyder,
Carl Stepnowsky,
NoƩmie Elhadad,
Daniel Grant,
Matthew H. Mohebbi,
Sean Mooney,
Christine Suver,
John Wilbanks,
Lara Mangravite,
Patrick Heagerty,
Pat Arean,
Larsson Omberg
Abstract:
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representativeness of the collected data including generaliza…
▽ More
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representativeness of the collected data including generalizability of findings from these studies. We report the challenges in retention and recruitment in eight remote digital health studies comprising over 100,000 participants who participated for more than 850,000 days, completing close to 3.5 million remote health evaluations. Survival modeling surfaced several factors significantly associated(P < 1e-16) with increase in median retention time i) Clinician referral(increase of 40 days), ii) Effect of compensation (22 days), iii) Clinical conditions of interest to the study (7 days) and iv) Older adults(4 days). Additionally, four distinct patterns of daily app usage behavior that were also associated(P < 1e-10) with participant demographics were identified. Most studies were not able to recruit a representative sample, either demographically or regionally. Combined together these findings can help inform recruitment and retention strategies to enable equitable participation of populations in future digital health research.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.