Optimal Cut-Point Estimation for Functional Digital Biomarkers: Application to Diabetes Risk Stratification via Continuous Glucose Monitoring
Authors:
Oscar Lado-Baleato,
Carla Díaz-Louza,
Francisco Gude,
Marcos Matabuena
Abstract:
Establishing optimal cut-offs for clinical biomarkers is a fundamental statistical problem in epidemiology, clinical trials, and drug discovery. While there is extensive literature regarding the definition of optimal cut-offs for scalar biomarkers, methodologies for analyzing random statistical objects in the more complex spaces associated with random functions and graphs - something increasingly…
▽ More
Establishing optimal cut-offs for clinical biomarkers is a fundamental statistical problem in epidemiology, clinical trials, and drug discovery. While there is extensive literature regarding the definition of optimal cut-offs for scalar biomarkers, methodologies for analyzing random statistical objects in the more complex spaces associated with random functions and graphs - something increasingly required in the field of modern digital health applications - are lacking. This paper proposes a new, general, simple methodology for defining optimal cut-offs for random objects residing in separable Hilbert spaces. Its underlying motivation is the need to create new, digital health rules for the detection of diabetes mellitus, and thus better exploit the continuous high-dimensional functional information provided by continuous glucose monitors (CGM). A functional cut-off for identifying diabetes is offered, based on glucose distributional representations from CGM time series. This work may be a valuable resource for researchers interested in defining and validating new digital biomarkers for biosensor time series
△ Less
Submitted 9 March, 2025; v1 submitted 15 April, 2024;
originally announced April 2024.
Personalized Imputation in metric spaces via conformal prediction: Applications in Predicting Diabetes Development with Continuous Glucose Monitoring Information
Authors:
Marcos Matabuena,
Carla Díaz-Louzao,
Rahul Ghosal,
Francisco Gude
Abstract:
The challenge of handling missing data is widespread in modern data analysis, particularly during the preprocessing phase and in various inferential modeling tasks. Although numerous algorithms exist for imputing missing data, the assessment of imputation quality at the patient level often lacks personalized statistical approaches. Moreover, there is a scarcity of imputation methods for metric spa…
▽ More
The challenge of handling missing data is widespread in modern data analysis, particularly during the preprocessing phase and in various inferential modeling tasks. Although numerous algorithms exist for imputing missing data, the assessment of imputation quality at the patient level often lacks personalized statistical approaches. Moreover, there is a scarcity of imputation methods for metric space based statistical objects. The aim of this paper is to introduce a novel two-step framework that comprises: (i) a imputation methods for statistical objects taking values in metrics spaces, and (ii) a criterion for personalizing imputation using conformal inference techniques. This work is motivated by the need to impute distributional functional representations of continuous glucose monitoring (CGM) data within the context of a longitudinal study on diabetes, where a significant fraction of patients do not have available CGM profiles. The importance of these methods is illustrated by evaluating the effectiveness of CGM data as new digital biomarkers to predict the time to diabetes onset in healthy populations. To address these scientific challenges, we propose: (i) a new regression algorithm for missing responses; (ii) novel conformal prediction algorithms tailored for metric spaces with a focus on density responses within the 2-Wasserstein geometry; (iii) a broadly applicable personalized imputation method criterion, designed to enhance both of the aforementioned strategies, yet valid across any statistical model and data structure. Our findings reveal that incorporating CGM data into diabetes time-to-event analysis, augmented with a novel personalization phase of imputation, significantly enhances predictive accuracy by over ten percent compared to traditional predictive models for time to diabetes.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.