Search | arXiv e-print repository

Longitudinal Missing Data Imputation for Predicting Disability Stage of Patients with Multiple Sclerosis

Authors: Mahin Vazifehdan, Pietro Bosoni, Daniele Pala, Eleonora Tavazzi, Roberto Bergamaschi, Riccardo Bellazzi, Arianna Dagliati

Abstract: Multiple Sclerosis (MS) is a chronic disease characterized by progressive or alternate impairment of neurological functions (motor, sensory, visual, and cognitive). Predicting disease progression with a probabilistic and time-dependent approach might help in suggesting interventions that can delay the progression of the disease. However, extracting informative knowledge from irregularly collected… ▽ More Multiple Sclerosis (MS) is a chronic disease characterized by progressive or alternate impairment of neurological functions (motor, sensory, visual, and cognitive). Predicting disease progression with a probabilistic and time-dependent approach might help in suggesting interventions that can delay the progression of the disease. However, extracting informative knowledge from irregularly collected longitudinal data is difficult, and missing data pose significant challenges. MS progression is measured through the Expanded Disability Status Scale (EDSS), which quantifies and monitors disability in MS over time. EDSS assesses impairment in eight functional systems (FS). Frequently, only the EDSS score assigned by clinicians is reported, while FS sub-scores are missing. Imputing these scores might be useful, especially to stratify patients according to their phenotype assessed over the disease progression. This study aimed at i) exploring different methodologies for imputing missing FS sub-scores, and ii) predicting the EDSS score using complete clinical data. Results show that Exponential Weighted Moving Average achieved the lowest error rate in the missing data imputation task; furthermore, the combination of Classification and Regression Trees for the imputation and SVM for the prediction task obtained the best accuracy. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Comments: 6 pages, 3 tables

arXiv:2408.17385 [pdf, other]

Comparing Propensity Score-Based Methods in Estimating the Treatment Effects: A Simulation Study

Authors: Sara Poletto, Enrico Longato, Erica Tavazzi, Martina Vettoretti

Abstract: In observational studies, the recorded treatment assignment is not purely random, but it is influenced by external factors such as patient characteristics, reimbursement policies, and existing guidelines. Therefore, the treatment effect can be estimated only after accounting for confounding factors. Propensity score (PS) methods are a family of methods that is widely used for this purpose. Althoug… ▽ More In observational studies, the recorded treatment assignment is not purely random, but it is influenced by external factors such as patient characteristics, reimbursement policies, and existing guidelines. Therefore, the treatment effect can be estimated only after accounting for confounding factors. Propensity score (PS) methods are a family of methods that is widely used for this purpose. Although they are all based on the estimation of the a posteriori probability of treatment assignment given patient covariates, they estimate the treatment effect from different statistical points of view and are, thus, relatively hard to compare. In this work, we propose a simulation experiment in which a hypothetical cohort of subjects is simulated in seven scenarios of increasing complexity of the associations between covariates and treatment, but where the two main definitions of treatment effect (average treatment effect, ATE, and average effect of the treatment on the treated, ATT) coincide. Our purpose is to compare the performance of a wide array of PS-based methods (matching, stratification, and inverse probability weighting) in estimating the treatment effect and their robustness in different scenarios. We find that inverse probability weighting provides estimates of the treatment effect that are closer to the expected value by weighting all subjects of the starting population. Conversely, matching and stratification ensure that the subpopulation that generated the final estimate is made up of real instances drawn from the starting population, and, thus, provide a higher degree of control on the validity domain of the estimates. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 19th conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB), September 4-6th 2024, Benevento (Italy)

arXiv:2408.17376 [pdf, other]

Exploring the Impact of Environmental Pollutants on Multiple Sclerosis Progression

Authors: Elena Marinello, Erica Tavazzi, Enrico Longato, Pietro Bosoni, Arianna Dagliati, Mahin Vazifehdan, Riccardo Bellazzi, Isotta Trescato, Alessandro Guazzo, Martina Vettoretti, Eleonora Tavazzi, Lara Ahmad, Roberto Bergamaschi, Paola Cavalla, Umberto Manera, Adriano Chio, Barbara Di Camillo

Abstract: Multiple Sclerosis (MS) is a chronic autoimmune and inflammatory neurological disorder characterised by episodes of symptom exacerbation, known as relapses. In this study, we investigate the role of environmental factors in relapse occurrence among MS patients, using data from the H2020 BRAINTEASER project. We employed predictive models, including Random Forest (RF) and Logistic Regression (LR), w… ▽ More Multiple Sclerosis (MS) is a chronic autoimmune and inflammatory neurological disorder characterised by episodes of symptom exacerbation, known as relapses. In this study, we investigate the role of environmental factors in relapse occurrence among MS patients, using data from the H2020 BRAINTEASER project. We employed predictive models, including Random Forest (RF) and Logistic Regression (LR), with varying sets of input features to predict the occurrence of relapses based on clinical and pollutant data collected over a week. The RF yielded the best result, with an AUC-ROC score of 0.713. Environmental variables, such as precipitation, NO2, PM2.5, humidity, and temperature, were found to be relevant to the prediction. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2402.17554 [pdf]

Evaluation of Predictive Reliability to Foster Trust in Artificial Intelligence. A case study in Multiple Sclerosis

Authors: Lorenzo Peracchio, Giovanna Nicora, Enea Parimbelli, Tommaso Mario Buonocore, Roberto Bergamaschi, Eleonora Tavazzi, Arianna Dagliati, Riccardo Bellazzi

Abstract: Applying Artificial Intelligence (AI) and Machine Learning (ML) in critical contexts, such as medicine, requires the implementation of safety measures to reduce risks of harm in case of prediction errors. Spotting ML failures is of paramount importance when ML predictions are used to drive clinical decisions. ML predictive reliability measures the degree of trust of a ML prediction on a new instan… ▽ More Applying Artificial Intelligence (AI) and Machine Learning (ML) in critical contexts, such as medicine, requires the implementation of safety measures to reduce risks of harm in case of prediction errors. Spotting ML failures is of paramount importance when ML predictions are used to drive clinical decisions. ML predictive reliability measures the degree of trust of a ML prediction on a new instance, thus allowing decision-makers to accept or reject it based on its reliability. To assess reliability, we propose a method that implements two principles. First, our approach evaluates whether an instance to be classified is coming from the same distribution of the training set. To do this, we leverage Autoencoders (AEs) ability to reconstruct the training set with low error. An instance is considered Out-of-Distribution (OOD) if the AE reconstructs it with a high error. Second, it is evaluated whether the ML classifier has good performances on samples similar to the newly classified instance by using a proxy model. We show that this approach is able to assess reliability both in a simulated scenario and on a model trained to predict disease progression of Multiple Sclerosis patients. We also developed a Python package, named relAI, to embed reliability measures into ML pipelines. We propose a simple approach that can be used in the deployment phase of any ML model to suggest whether to trust predictions or not. Our method holds the promise to provide effective support to clinicians by spotting potential ML failures during deployment. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 20 pages, 7 figures

Showing 1–4 of 4 results for author: Tavazzi, E