-
MRI-based and metabolomics-based age scores act synergetically for mortality prediction shown by multi-cohort federated learning
Authors:
Pedro Mateus,
Swier Garst,
Jing Yu,
Davy Cats,
Alexander G. J. Harms,
Mahlet Birhanu,
Marian Beekman,
P. Eline Slagboom,
Marcel Reinders,
Jeroen van der Grond,
Andre Dekker,
Jacobus F. A. Jansen,
Magdalena Beran,
Miranda T. Schram,
Pieter Jelle Visser,
Justine Moonen,
Mohsen Ghanbari,
Gennady Roshchupkin,
Dina Vojinovic,
Inigo Bermejo,
Hailiang Mei,
Esther E. Bron
Abstract:
Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to e…
▽ More
Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to estimate BrainAge in three cohorts. The federated BrainAge model yielded significantly lower error for age prediction across the cohorts than locally trained models. Harmonizing the age interval between cohorts further improved BrainAge accuracy. Subsequently, we compared BrainAge with MetaboAge using federated association and survival analyses. The results showed a small association between BrainAge and MetaboAge as well as a higher predictive value for the time to mortality of both scores combined than for the individual scores. Hence, our study suggests that both aging scores capture different aspects of the aging process.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
PATE: Proximity-Aware Time series anomaly Evaluation
Authors:
Ramin Ghorbani,
Marcel J. T. Reinders,
David M. J. Tax
Abstract:
Evaluating anomaly detection algorithms in time series data is critical as inaccuracies can lead to flawed decision-making in various domains where real-time analytics and data-driven strategies are essential. Traditional performance metrics assume iid data and fail to capture the complex temporal dynamics and specific characteristics of time series anomalies, such as early and delayed detections.…
▽ More
Evaluating anomaly detection algorithms in time series data is critical as inaccuracies can lead to flawed decision-making in various domains where real-time analytics and data-driven strategies are essential. Traditional performance metrics assume iid data and fail to capture the complex temporal dynamics and specific characteristics of time series anomalies, such as early and delayed detections. We introduce Proximity-Aware Time series anomaly Evaluation (PATE), a novel evaluation metric that incorporates the temporal relationship between prediction and anomaly intervals. PATE uses proximity-based weighting considering buffer zones around anomaly intervals, enabling a more detailed and informed assessment of a detection. Using these weights, PATE computes a weighted version of the area under the Precision and Recall curve. Our experiments with synthetic and real-world datasets show the superiority of PATE in providing more sensible and accurate evaluations than other evaluation metrics. We also tested several state-of-the-art anomaly detectors across various benchmark datasets using the PATE evaluation scheme. The results show that a common metric like Point-Adjusted F1 Score fails to characterize the detection performances well, and that PATE is able to provide a more fair model comparison. By introducing PATE, we redefine the understanding of model efficacy that steers future studies toward developing more effective and accurate detection models.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
RESTAD: REconstruction and Similarity based Transformer for time series Anomaly Detection
Authors:
Ramin Ghorbani,
Marcel J. T. Reinders,
David M. J. Tax
Abstract:
Anomaly detection in time series data is crucial across various domains. The scarcity of labeled data for such tasks has increased the attention towards unsupervised learning methods. These approaches, often relying solely on reconstruction error, typically fail to detect subtle anomalies in complex datasets. To address this, we introduce RESTAD, an adaptation of the Transformer model by incorpora…
▽ More
Anomaly detection in time series data is crucial across various domains. The scarcity of labeled data for such tasks has increased the attention towards unsupervised learning methods. These approaches, often relying solely on reconstruction error, typically fail to detect subtle anomalies in complex datasets. To address this, we introduce RESTAD, an adaptation of the Transformer model by incorporating a layer of Radial Basis Function (RBF) neurons within its architecture. This layer fits a non-parametric density in the latent representation, such that a high RBF output indicates similarity with predominantly normal training data. RESTAD integrates the RBF similarity scores with the reconstruction errors to increase sensitivity to anomalies. Our empirical evaluations demonstrate that RESTAD outperforms various established baselines across multiple benchmark datasets.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Learning solutions of parametric Navier-Stokes with physics-informed neural networks
Authors:
M. Naderibeni,
M. J. T. Reinders,
L. Wu,
D. M. J. Tax
Abstract:
We leverage Physics-Informed Neural Networks (PINNs) to learn solution functions of parametric Navier-Stokes Equations (NSE). Our proposed approach results in a feasible optimization problem setup that bypasses PINNs' limitations in converging to solutions of highly nonlinear parametric-PDEs like NSE. We consider the parameter(s) of interest as inputs of PINNs along with spatio-temporal coordinate…
▽ More
We leverage Physics-Informed Neural Networks (PINNs) to learn solution functions of parametric Navier-Stokes Equations (NSE). Our proposed approach results in a feasible optimization problem setup that bypasses PINNs' limitations in converging to solutions of highly nonlinear parametric-PDEs like NSE. We consider the parameter(s) of interest as inputs of PINNs along with spatio-temporal coordinates, and train PINNs on generated numerical solutions of parametric-PDES for instances of the parameters. We perform experiments on the classical 2D flow past cylinder problem aiming to learn velocities and pressure functions over a range of Reynolds numbers as parameter of interest. Provision of training data from generated numerical simulations allows for interpolation of the solution functions for a range of parameters. Therefore, we compare PINNs with unconstrained conventional Neural Networks (NN) on this problem setup to investigate the effectiveness of considering the PDEs regularization in the loss function. We show that our proposed approach results in optimizing PINN models that learn the solution functions while making sure that flow predictions are in line with conservational laws of mass and momentum. Our results show that PINN results in accurate prediction of gradients compared to NN model, this is clearly visible in predicted vorticity fields given that none of these models were trained on vorticity labels.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Improving performance of heart rate time series classification by grouping subjects
Authors:
Michael Beekhuizen,
Arman Naseri,
David Tax,
Ivo van der Bilt,
Marcel Reinders
Abstract:
Unlike the more commonly analyzed ECG or PPG data for activity classification, heart rate time series data is less detailed, often noisier and can contain missing data points. Using the BigIdeasLab_STEP dataset, which includes heart rate time series annotated with specific tasks performed by individuals, we sought to determine if general classification was achievable. Our analyses showed that the…
▽ More
Unlike the more commonly analyzed ECG or PPG data for activity classification, heart rate time series data is less detailed, often noisier and can contain missing data points. Using the BigIdeasLab_STEP dataset, which includes heart rate time series annotated with specific tasks performed by individuals, we sought to determine if general classification was achievable. Our analyses showed that the accuracy is sensitive to the choice of window/stride size. Moreover, we found variable classification performances between subjects due to differences in the physical structure of their hearts. Various techniques were used to minimize this variability. First of all, normalization proved to be a crucial step and significantly improved the performance. Secondly, grouping subjects and performing classification inside a group helped to improve performance and decrease inter-subject variability. Finally, we show that including handcrafted features as input to a deep learning (DL) network improves the classification performance further. Together, these findings indicate that heart rate time series can be utilized for classification tasks like predicting activity. However, normalization or grouping techniques need to be chosen carefully to minimize the issue of subject variability.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Federated K-means Clustering
Authors:
Swier Garst,
Marcel Reinders
Abstract:
Federated learning is a technique that enables the use of distributed datasets for machine learning purposes without requiring data to be pooled, thereby better preserving privacy and ownership of the data. While supervised FL research has grown substantially over the last years, unsupervised FL methods remain scarce. This work introduces an algorithm which implements K-means clustering in a feder…
▽ More
Federated learning is a technique that enables the use of distributed datasets for machine learning purposes without requiring data to be pooled, thereby better preserving privacy and ownership of the data. While supervised FL research has grown substantially over the last years, unsupervised FL methods remain scarce. This work introduces an algorithm which implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.
△ Less
Submitted 16 February, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification
Authors:
Ramin Ghorbani,
Marcel J. T. Reinders,
David M. J. Tax
Abstract:
Photoplethysmography (PPG) signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. In particular, heart conditions that manifest in rare and subtle deviating heart patterns may be interesting. However, robust and reliable anomaly detection within these data remains a challenge due to the scarcity of labeled data and high inter-subject…
▽ More
Photoplethysmography (PPG) signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. In particular, heart conditions that manifest in rare and subtle deviating heart patterns may be interesting. However, robust and reliable anomaly detection within these data remains a challenge due to the scarcity of labeled data and high inter-subject variability. This paper introduces a two-stage framework leveraging representation learning and personalization to improve anomaly detection performance in PPG data. The proposed framework first employs representation learning to transform the original PPG signals into a more discriminative and compact representation. We then apply three different unsupervised anomaly detection methods for movement detection and biometric identification. We validate our approach using two different datasets in both generalized and personalized scenarios. The results show that representation learning significantly improves anomaly detection performance while reducing the high inter-subject variability. Personalized models further enhance anomaly detection performance, underscoring the role of personalization in PPG-based fitness-health monitoring systems. The results from biometric identification show that it's easier to distinguish a new user from one intended authorized user than from a group of users. Overall, this study provides evidence of the effectiveness of representation learning and personalization for anomaly detection in PPG data.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Self-Supervised PPG Representation Learning Shows High Inter-Subject Variability
Authors:
Ramin Ghorbani,
Marcel J. T. Reinders,
David M. J. Tax
Abstract:
With the progress of sensor technology in wearables, the collection and analysis of PPG signals are gaining more interest. Using Machine Learning, the cardiac rhythm corresponding to PPG signals can be used to predict different tasks such as activity recognition, sleep stage detection, or more general health status. However, supervised learning is often limited by the amount of available labeled d…
▽ More
With the progress of sensor technology in wearables, the collection and analysis of PPG signals are gaining more interest. Using Machine Learning, the cardiac rhythm corresponding to PPG signals can be used to predict different tasks such as activity recognition, sleep stage detection, or more general health status. However, supervised learning is often limited by the amount of available labeled data, which is typically expensive to obtain. To address this problem, we propose a Self-Supervised Learning (SSL) method with a pretext task of signal reconstruction to learn an informative generalized PPG representation. The performance of the proposed SSL framework is compared with two fully supervised baselines. The results show that in a very limited label data setting (10 samples per class or less), using SSL is beneficial, and a simple classifier trained on SSL-learned representations outperforms fully supervised deep neural networks. However, the results reveal that the SSL-learned representations are too focused on encoding the subjects. Unfortunately, there is high inter-subject variability in the SSL-learned representations, which makes working with this data more challenging when labeled data is scarce. The high inter-subject variability suggests that there is still room for improvements in learning representations. In general, the results suggest that SSL may pave the way for the broader use of machine learning models on PPG data in label-scarce regimes.
△ Less
Submitted 19 December, 2022; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Prior Biological Knowledge And Epigenetic Information Enhances Prediction Accuracy Of Bayesian Wnt Pathway
Authors:
Shriprakash Sinha,
Marcel J. T. Reinders,
Wim Verhaegh
Abstract:
Computational modeling of Wnt signaling pathway has gained prominence for its use as computer aided diagnostic tool to develop therapeutic cancer target drugs and predict of test samples as cancerous and non cancerous. This manuscript focuses on development of simple static bayesian network models of varying complexity that encompasses prior partially available biological knowledge about intra and…
▽ More
Computational modeling of Wnt signaling pathway has gained prominence for its use as computer aided diagnostic tool to develop therapeutic cancer target drugs and predict of test samples as cancerous and non cancerous. This manuscript focuses on development of simple static bayesian network models of varying complexity that encompasses prior partially available biological knowledge about intra and extra cellular factors affecting the Wnt pathway and incorporates epigenetic information like methylation and histone modification of a few genes known to have inhibitory affect on Wnt pathway. It might be expected that such models not only increase cancer prediction accuracies and also form basis for understanding Wnt signaling activity in different states of tumorigenesis. Initial results in human colorectal cancer cases indicate that incorporation of epigenetic information increases prediction accuracy of test samples as being tumorous or normal. Receiver Operator Curves (ROC) and their respective area under the curve (AUC) measurements, obtained from predictions of state of test sample and corresponding predictions of the state of activation of transcription complex of the Wnt pathway for the test sample, indicate that there is significant difference between the Wnt pathway being on (off) and its association with the sample being tumorous (normal). Two sample Kolmogorov-Smirnov test confirm the statistical deviation between the distributions of these predictions. At a preliminary stage, use of these models may help in understanding the yet unknown effect of certain factors like DKK2, DKK3-1 and SFRP-2/3/5 on β-catenin transcription complex.
△ Less
Submitted 25 November, 2024; v1 submitted 15 July, 2013;
originally announced July 2013.
-
Personalised Travel Recommendation based on Location Co-occurrence
Authors:
Maarten Clements,
Pavel Serdyukov,
Arjen P. de Vries,
Marcel J. T. Reinders
Abstract:
We propose a new task of recommending touristic locations based on a user's visiting history in a geographically remote region. This can be used to plan a touristic visit to a new city or country, or by travel agencies to provide personalised travel deals.
A set of geotags is used to compute a location similarity model between two different regions. The similarity between two landmarks is derive…
▽ More
We propose a new task of recommending touristic locations based on a user's visiting history in a geographically remote region. This can be used to plan a touristic visit to a new city or country, or by travel agencies to provide personalised travel deals.
A set of geotags is used to compute a location similarity model between two different regions. The similarity between two landmarks is derived from the number of users that have visited both places, using a Gaussian density estimation of the co-occurrence space of location visits to cluster related geotags. The standard deviation of the kernel can be used as a scale parameter that determines the size of the recommended landmarks.
A personalised recommendation based on the location similarity model is evaluated on city and country scale and is able to outperform a location ranking based on popularity. Especially when a tourist filter based on visit duration is enforced, the prediction can be accurately adapted to the preference of the user. An extensive evaluation based on manual annotations shows that more strict ranking methods like cosine similarity and a proposed RankDiff algorithm provide more serendipitous recommendations and are able to link similar locations on opposite sides of the world.
△ Less
Submitted 26 June, 2011;
originally announced June 2011.