Search | arXiv e-print repository

uPVC-Net: A Universal Premature Ventricular Contraction Detection Deep Learning Algorithm

Authors: Hagai Hamami, Yosef Solewicz, Daniel Zur, Yonatan Kleerekoper, Joachim A. Behar

Abstract: Introduction: Premature Ventricular Contractions (PVCs) are common cardiac arrhythmias originating from the ventricles. Accurate detection remains challenging due to variability in electrocardiogram (ECG) waveforms caused by differences in lead placement, recording conditions, and population demographics. Methods: We developed uPVC-Net, a universal deep learning model to detect PVCs from any singl… ▽ More Introduction: Premature Ventricular Contractions (PVCs) are common cardiac arrhythmias originating from the ventricles. Accurate detection remains challenging due to variability in electrocardiogram (ECG) waveforms caused by differences in lead placement, recording conditions, and population demographics. Methods: We developed uPVC-Net, a universal deep learning model to detect PVCs from any single-lead ECG recordings. The model is developed on four independent ECG datasets comprising a total of 8.3 million beats collected from Holter monitors and a modern wearable ECG patch. uPVC-Net employs a custom architecture and a multi-source, multi-lead training strategy. For each experiment, one dataset is held out to evaluate out-of-distribution (OOD) generalization. Results: uPVC-Net achieved an AUC between 97.8% and 99.1% on the held-out datasets. Notably, performance on wearable single-lead ECG data reached an AUC of 99.1%. Conclusion: uPVC-Net exhibits strong generalization across diverse lead configurations and populations, highlighting its potential for robust, real-world clinical deployment. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 8 pages

MSC Class: 92C55; 92B20 ACM Class: I.2.6; J.3

arXiv:2505.05291 [pdf, other]

Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

Authors: Benjamin A. Cohen, Jonathan Fhima, Meishar Meisel, Baskin Meital, Luis Filipe Nakayama, Eran Berkowitz, Joachim A. Behar

Abstract: Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pr… ▽ More Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets totaling 70,000 expert-annotated images for the task of moderate-to-late age-related macular degeneration (AMD) identification. Our results show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models, which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining, which achieved AUROCs of 0.68-0.91. These findings highlight the value of foundation models in improving AMD identification and challenge the assumption that in-domain pretraining is necessary. Furthermore, we release BRAMD, an open-access dataset (n=587) of DFIs with AMD labels from Brazil. △ Less

Submitted 22 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

Comments: 10 pages, 3 figures

arXiv:2505.04230 [pdf]

HYAMD High-Resolution Fundus Image Dataset for age related macular degeneration (AMD) Diagnosis

Authors: Meishar Meisel, Benjamin A. Cohen, Meital Baskin, Beatrice Tiosano, Joachim A. Behar, Eran Berkowitz

Abstract: The Hillel Yaffe Age Related Macular Degeneration (HYAMD) dataset is a longitudinal collection of 1,560 Digital Fundus Images (DFIs) from 325 patients examined at the Hillel Yaffe Medical Center (Hadera, Israel) between 2021 and 2024. The dataset includes an AMD cohort of 147 patients (aged 54-94) with varying stages of AMD and a control group of 190 diabetic retinopathy (DR) patients (aged 24-92)… ▽ More The Hillel Yaffe Age Related Macular Degeneration (HYAMD) dataset is a longitudinal collection of 1,560 Digital Fundus Images (DFIs) from 325 patients examined at the Hillel Yaffe Medical Center (Hadera, Israel) between 2021 and 2024. The dataset includes an AMD cohort of 147 patients (aged 54-94) with varying stages of AMD and a control group of 190 diabetic retinopathy (DR) patients (aged 24-92). AMD diagnoses were based on comprehensive clinical ophthalmic evaluations, supported by Optical Coherence Tomography (OCT) and OCT angiography. Non-AMD DFIs were sourced from DR patients without concurrent AMD, diagnosed using macular OCT, fluorescein angiography, and widefield imaging. HYAMD provides gold-standard annotations, ensuring AMD labels were assigned following a full clinical assessment. Images were captured with a DRI OCT Triton (Topcon) camera, offering a 45 deg field of view and 1960 x 1934 pixel resolution. To the best of our knowledge, HYAMD is the first open-access retinal dataset from an Israeli sample, designed to support AMD identification using machine learning models. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2503.13486 [pdf, other]

Machine learning for triage of strokes with large vessel occlusion using photoplethysmography biomarkers

Authors: Márton Á. Goda, Helen Badge, Jasmeen Khan, Yosef Solewicz, Moran Davoodi, Rumbidzai Teramayi, Dennis Cordato, Longting Lin, Lauren Christie, Christopher Blair, Gagan Sharma, Mark Parsons, Joachim A. Behar

Abstract: Objective. Large vessel occlusion (LVO) stroke presents a major challenge in clinical practice due to the potential for poor outcomes with delayed treatment. Treatment for LVO involves highly specialized care, in particular endovascular thrombectomy, and is available only at certain hospitals. Therefore, prehospital identification of LVO by emergency ambulance services, can be critical for triagin… ▽ More Objective. Large vessel occlusion (LVO) stroke presents a major challenge in clinical practice due to the potential for poor outcomes with delayed treatment. Treatment for LVO involves highly specialized care, in particular endovascular thrombectomy, and is available only at certain hospitals. Therefore, prehospital identification of LVO by emergency ambulance services, can be critical for triaging LVO stroke patients directly to a hospital with access to endovascular therapy. Clinical scores exist to help distinguish LVO from less severe strokes, but they are based on a series of examinations that can take minutes and may be impractical for patients with dementia or those who cannot follow commands due to their stroke. There is a need for a fast and reliable method to aid in the early identification of LVO. In this study, our objective was to assess the feasibility of using 30-second photoplethysmography (PPG) recording to assist in recognizing LVO stroke. Method. A total of 88 patients, including 25 with LVO, 27 with stroke mimic (SM), and 36 non-LVO stroke patients (NL), were recorded at the Liverpool Hospital emergency department in Sydney, Australia. Demographics (age, sex), as well as morphological features and beating rate variability measures, were extracted from the PPG. A binary classification approach was employed to differentiate between LVO stroke and NL+SM (NL.SM). A 2:1 train-test split was stratified and repeated randomly across 100 iterations. Results. The best model achieved a median test set area under the receiver operating characteristic curve (AUROC) of 0.77 (0.71--0.82). \textit{Conclusion.} Our study demonstrates the potential of utilizing a 30-second PPG recording for identifying LVO stroke. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2502.19514 [pdf, other]

GONet: A Generalizable Deep Learning Model for Glaucoma Detection

Authors: Or Abramovich, Hadas Pizem, Jonathan Fhima, Eran Berkowitz, Ben Gofrit, Meishar Meisel, Meital Baskin, Jan Van Eijgen, Ingeborg Stalmans, Eytan Z. Blumenthal, Joachim A. Behar

Abstract: Glaucomatous optic neuropathy (GON) is a prevalent ocular disease that can lead to irreversible vision loss if not detected early and treated. The traditional diagnostic approach for GON involves a set of ophthalmic examinations, which are time-consuming and require a visit to an ophthalmologist. Recent deep learning models for automating GON detection from digital fundus images (DFI) have shown p… ▽ More Glaucomatous optic neuropathy (GON) is a prevalent ocular disease that can lead to irreversible vision loss if not detected early and treated. The traditional diagnostic approach for GON involves a set of ophthalmic examinations, which are time-consuming and require a visit to an ophthalmologist. Recent deep learning models for automating GON detection from digital fundus images (DFI) have shown promise but often suffer from limited generalizability across different ethnicities, disease groups and examination settings. To address these limitations, we introduce GONet, a robust deep learning model developed using seven independent datasets, including over 119,000 DFIs with gold-standard annotations and from patients of diverse geographic backgrounds. GONet consists of a DINOv2 pre-trained self-supervised vision transformers fine-tuned using a multisource domain strategy. GONet demonstrated high out-of-distribution generalizability, with an AUC of 0.85-0.99 in target domains. GONet performance was similar or superior to state-of-the-art works and was significantly superior to the cup-to-disc ratio, by up to 21.6%. GONet is available at [URL provided on publication]. We also contribute a new dataset consisting of 768 DFI with GON labels as open access. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: 9 pages, 4 figures, submitted to IEEE Transactions on Biomedical Engineering

ACM Class: I.2.10

arXiv:2404.06869 [pdf, other]

SleepPPG-Net2: Deep learning generalization for sleep staging from photoplethysmography

Authors: Shirel Attia, Revital Shani Hershkovich, Alissa Tabakhov, Angeleene Ang, Sharon Haimov, Riva Tauman, Joachim A. Behar

Abstract: Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performan… ▽ More Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2401.05411 [pdf, other]

RawECGNet: Deep Learning Generalization for Atrial Fibrillation Detection from the Raw ECG

Authors: Noam Ben-Moshe, Kenta Tsutsui, Shany Biton, Leif Sörnmo, Joachim A. Behar

Abstract: Introduction: Deep learning models for detecting episodes of atrial fibrillation (AF) using rhythm information in long-term, ambulatory ECG recordings have shown high performance. However, the rhythm-based approach does not take advantage of the morphological information conveyed by the different ECG waveforms, particularly the f-waves. As a result, the performance of such models may be inherently… ▽ More Introduction: Deep learning models for detecting episodes of atrial fibrillation (AF) using rhythm information in long-term, ambulatory ECG recordings have shown high performance. However, the rhythm-based approach does not take advantage of the morphological information conveyed by the different ECG waveforms, particularly the f-waves. As a result, the performance of such models may be inherently limited. Methods: To address this limitation, we have developed a deep learning model, named RawECGNet, to detect episodes of AF and atrial flutter (AFl) using the raw, single-lead ECG. We compare the generalization performance of RawECGNet on two external data sets that account for distribution shifts in geography, ethnicity, and lead position. RawECGNet is further benchmarked against a state-of-the-art deep learning model, named ArNet2, which utilizes rhythm information as input. Results: Using RawECGNet, the results for the different leads in the external test sets in terms of the F1 score were 0.91--0.94 in RBDB and 0.93 in SHDB, compared to 0.89--0.91 in RBDB and 0.91 in SHDB for ArNet2. The results highlight RawECGNet as a high-performance, generalizable algorithm for detection of AF and AFl episodes, exploiting information on both rhythm and morphology. △ Less

Submitted 26 December, 2023; originally announced January 2024.

arXiv:2312.14891 [pdf, other]

DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images

Authors: Yevgeniy Men, Jonathan Fhima, Leo Anthony Celi, Lucas Zago Ribeiro, Luis Filipe Nakayama, Joachim A. Behar

Abstract: Diabetic retinopathy (DR) is a prevalent complication of diabetes associated with a significant risk of vision loss. Timely identification is critical to curb vision impairment. Algorithms for DR staging from digital fundus images (DFIs) have been recently proposed. However, models often fail to generalize due to distribution shifts between the source domain on which the model was trained and the… ▽ More Diabetic retinopathy (DR) is a prevalent complication of diabetes associated with a significant risk of vision loss. Timely identification is critical to curb vision impairment. Algorithms for DR staging from digital fundus images (DFIs) have been recently proposed. However, models often fail to generalize due to distribution shifts between the source domain on which the model was trained and the target domain where it is deployed. A common and particularly challenging shift is often encountered when the source- and target-domain supports do not fully overlap. In this research, we introduce DRStageNet, a deep learning model designed to mitigate this challenge. We used seven publicly available datasets, comprising a total of 93,534 DFIs that cover a variety of patient demographics, ethnicities, geographic origins and comorbidities. We fine-tune DINOv2, a pretrained model of self-supervised vision transformer, and implement a multi-source domain fine-tuning strategy to enhance generalization performance. We benchmark and demonstrate the superiority of our method to two state-of-the-art benchmarks, including a recently published foundation model. We adapted the grad-rollout method to our regression task in order to provide high-resolution explainability heatmaps. The error analysis showed that 59\% of the main errors had incorrect reference labels. DRStageNet is accessible at URL [upon acceptance of the manuscript]. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2309.05780 [pdf, other]

LUNet: Deep Learning for the Segmentation of Arterioles and Venules in High Resolution Fundus Images

Authors: Jonathan Fhima, Jan Van Eijgen, Hana Kulenovic, Valérie Debeuf, Marie Vangilbergen, Marie-Isaline Billen, Heloïse Brackenier, Moti Freiman, Ingeborg Stalmans, Joachim A. Behar

Abstract: The retina is the only part of the human body in which blood vessels can be accessed non-invasively using imaging techniques such as digital fundus images (DFI). The spatial distribution of the retinal microvasculature may change with cardiovascular diseases and thus the eyes may be regarded as a window to our hearts. Computerized segmentation of the retinal arterioles and venules (A/V) is essenti… ▽ More The retina is the only part of the human body in which blood vessels can be accessed non-invasively using imaging techniques such as digital fundus images (DFI). The spatial distribution of the retinal microvasculature may change with cardiovascular diseases and thus the eyes may be regarded as a window to our hearts. Computerized segmentation of the retinal arterioles and venules (A/V) is essential for automated microvasculature analysis. Using active learning, we created a new DFI dataset containing 240 crowd-sourced manual A/V segmentations performed by fifteen medical students and reviewed by an ophthalmologist, and developed LUNet, a novel deep learning architecture for high resolution A/V segmentation. LUNet architecture includes a double dilated convolutional block that aims to enhance the receptive field of the model and reduce its parameter count. Furthermore, LUNet has a long tail that operates at high resolution to refine the segmentation. The custom loss function emphasizes the continuity of the blood vessels. LUNet is shown to significantly outperform two state-of-the-art segmentation algorithms on the local test set as well as on four external test sets simulating distribution shifts across ethnicity, comorbidities, and annotators. We make the newly created dataset open access (upon publication). △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2307.08331 [pdf, other]

Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs

Authors: Noam Ben-Moshe, Shany Biton, Kenta Tsutsui, Mahmoud Suleiman, Leif Sörnmo, Joachim A. Behar

Abstract: Introduction: The presence of fibrillatory waves (f-waves) is important in the diagnosis of atrial fibrillation (AF), which has motivated the development of methods for f-wave extraction. We propose a novel approach to benchmarking methods designed for single-lead ECG analysis, building on the hypothesis that better-performing AF classification using features computed from the extracted f-waves im… ▽ More Introduction: The presence of fibrillatory waves (f-waves) is important in the diagnosis of atrial fibrillation (AF), which has motivated the development of methods for f-wave extraction. We propose a novel approach to benchmarking methods designed for single-lead ECG analysis, building on the hypothesis that better-performing AF classification using features computed from the extracted f-waves implies better-performing extraction. The approach is well-suited for processing large Holter data sets annotated with respect to the presence of AF. Methods: Three data sets with a total of 300 two- or three-lead Holter recordings, performed in the USA, Israel and Japan, were used as well as a simulated single-lead data set. Four existing extraction methods based on either average beat subtraction or principal component analysis (PCA) were evaluated. A random forest classifier was used for window-based AF classification. Performance was measured by the area under the receiver operating characteristic (AUROC). Results: The best performance was found for PCA-based extraction, resulting in AUROCs in the ranges 0.77--0.83, 0.62--0.78, and 0.87--0.89 for the data sets from USA, Israel, and Japan, respectively, when analyzed across leads; the AUROC of the simulated single-lead, noisy data set was 0.98. Conclusions: This study provides a novel approach to evaluating the performance of f-wave extraction methods, offering the advantage of not using ground truth f-waves for evaluation, thus being able to leverage real data sets for evaluation. The code is open source (following publication). △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2209.03762 [pdf, other]

Estimation of f-wave Dominant Frequency Using a Voting Scheme

Authors: Shany Biton, Mahmoud Suleiman, Noam Ben Moshe, Leif Sörnmo, Joachim A. Behar

Abstract: Introduction: Atrial fibrillation (AF) is the most common heart arrhythmia, characterized by the presence of fibrillatory waves (f-waves) in the ECG. We introduce a voting scheme to estimate the dominant atrial frequency (DAF) of f-waves. Methods: We analysed a subset of Holter recordings obtained from the University of Virginia AF Database. 100 Holter recordings with manually annotated AF events,… ▽ More Introduction: Atrial fibrillation (AF) is the most common heart arrhythmia, characterized by the presence of fibrillatory waves (f-waves) in the ECG. We introduce a voting scheme to estimate the dominant atrial frequency (DAF) of f-waves. Methods: We analysed a subset of Holter recordings obtained from the University of Virginia AF Database. 100 Holter recordings with manually annotated AF events, resulting in a total 363 AF events lasting more than 1 min. The f-waves were extracted using four different template subtraction (TS) algorithms and the DAF was estimated from the first 1-min window of each AF event. A random forest classifier was used. We hypothesized that better extraction of the f-wave meant better AF/non-AF classification using the DAF as the single input feature of the RF model. Results: Performance on the test set, expressed in terms of AF/non-AF classification, was the best when the DAF was computed computed the three best-performing extraction methods. Using these three algorithms in a voting scheme, the classifier obtained AUC=0.60 and the DAFs were mostly spread around 6 Hz, 5.66 (4.83-7.47). Conclusions: This study has two novel contributions: (1) a method for assessing the performance of f-wave extraction algorithms, and (2) a voting scheme for improved DAF estimation. △ Less

Submitted 23 August, 2022; originally announced September 2022.

Comments: 4 pages, 3 figures

arXiv:2208.10550 [pdf, other]

Atrial Fibrillation Recurrence Risk Prediction from 12-lead ECG Recorded Pre- and Post-Ablation Procedure

Authors: Eran Zvuloni, Sheina Gendelman, Sanghamitra Mohanty, Jason Lewen, Andrea Natale, Joachim A. Behar

Abstract: Introduction: 12-lead electrocardiogram (ECG) is recorded during atrial fibrillation (AF) catheter ablation procedure (CAP). It is not easy to determine if CAP was successful without a long follow-up assessing for AF recurrence (AFR). Therefore, an AFR risk prediction algorithm could enable a better management of CAP patients. In this research, we extracted features from 12-lead ECG recorded befor… ▽ More Introduction: 12-lead electrocardiogram (ECG) is recorded during atrial fibrillation (AF) catheter ablation procedure (CAP). It is not easy to determine if CAP was successful without a long follow-up assessing for AF recurrence (AFR). Therefore, an AFR risk prediction algorithm could enable a better management of CAP patients. In this research, we extracted features from 12-lead ECG recorded before and after CAP and train an AFR risk prediction machine learning model. Methods: Pre- and post-CAP segments were extracted from 112 patients. The analysis included a signal quality criterion, heart rate variability and morphological biomarkers engineered from the 12-lead ECG (804 features overall). 43 out of the 112 patients (n) had AFR clinical endpoint available. These were utilized to assess the feasibility of AFR risk prediction, using either pre or post CAP features. A random forest classifier was trained within a nested cross validation framework. Results: 36 features were found statistically significant for distinguishing between the pre and post surgery states (n=112). For the classification, an area under the receiver operating characteristic (AUROC) curve was reported with AUROC_pre=0.64 and AUROC_post=0.74 (n=43). Discussion and conclusions: This preliminary analysis showed the feasibility of AFR risk prediction. Such a model could be used to improve CAP management. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2208.10153 [pdf, other]

ArNet-ECG: Deep Learning for the Detection of Atrial Fibrillation from the Raw Electrocardiogram

Authors: Noam Ben-Moshe, Shany Biton, Joachim A. Behar

Abstract: Atrial fibrillation (AF) is the most prevalent heart arrhythmia. AF manifests on the electrocardiogram (ECG) though irregular beat-to-beat time interval variation, the absence of P-wave and the presence of fibrillatory waves (f-wave). We hypothesize that a deep learning (DL) approach trained on the raw ECG will enable robust detection of AF events and the estimation of the AF burden (AFB). We furt… ▽ More Atrial fibrillation (AF) is the most prevalent heart arrhythmia. AF manifests on the electrocardiogram (ECG) though irregular beat-to-beat time interval variation, the absence of P-wave and the presence of fibrillatory waves (f-wave). We hypothesize that a deep learning (DL) approach trained on the raw ECG will enable robust detection of AF events and the estimation of the AF burden (AFB). We further hypothesize that the performance reached leveraging the raw ECG will be superior to previously developed methods using the beat-to-beat interval variation time series. Consequently, we develop a new DL algorithm, denoted ArNet-ECG, to robustly detect AF events and estimate the AFB from the raw ECG and benchmark this algorithms against previous work. Methods: A dataset including 2,247 adult patients and totaling over 53,753 hours of continuous ECG from the University of Virginia (UVAF) was used. Results: ArNet-ECG obtained an F1 of 0.96 and ArNet2 obtained an F1 0.94. Discussion and conclusion: ArNet-ECG outperformed ArNet2 thus demonstrating that using the raw ECG provides added performance over the beat-to-beat interval time series. The main reason found for explaining the higher performance of ArNet-ECG was its high performance on atrial flutter examples versus poor performance on these recordings for ArNet2. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2207.09667 [pdf]

Generalizable and Robust Deep Learning Algorithm for Atrial Fibrillation Diagnosis Across Ethnicities, Ages and Sexes

Authors: Shany Biton, Mohsin Aldhafeeri, Erez Marcusohn, Kenta Tsutsui, Tom Szwagier, Adi Elias, Julien Oster, Jean Marc Sellal, Mahmoud Suleiman, Joachim A. Behar

Abstract: To drive health innovation that meets the needs of all and democratize healthcare, there is a need to assess the generalization performance of deep learning (DL) algorithms across various distribution shifts to ensure that these algorithms are robust. This retrospective study is, to the best of our knowledge, the first to develop and assess the generalization performance of a deep learning (DL) mo… ▽ More To drive health innovation that meets the needs of all and democratize healthcare, there is a need to assess the generalization performance of deep learning (DL) algorithms across various distribution shifts to ensure that these algorithms are robust. This retrospective study is, to the best of our knowledge, the first to develop and assess the generalization performance of a deep learning (DL) model for AF events detection from long term beat-to-beat intervals across ethnicities, ages and sexes. The new recurrent DL model, denoted ArNet2, was developed on a large retrospective dataset of 2,147 patients totaling 51,386 hours of continuous electrocardiogram (ECG). The models generalization was evaluated on manually annotated test sets from four centers (USA, Israel, Japan and China) totaling 402 patients. The model was further validated on a retrospective dataset of 1,730 consecutives Holter recordings from the Rambam Hospital Holter clinic, Haifa, Israel. The model outperformed benchmark state-of-the-art models and generalized well across ethnicities, ages and sexes. Performance was higher for female than male and young adults (less than 60 years old) and showed some differences across ethnicities. The main finding explaining these variations was an impairment in performance in groups with a higher prevalence of atrial flutter (AFL). Our findings on the relative performance of ArNet2 across groups may have clinical implications on the choice of the preferred AF examination method to use relative to the group of interest. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.06096 [pdf, other]

On Merging Feature Engineering and Deep Learning for Diagnosis, Risk-Prediction and Age Estimation Based on the 12-Lead ECG

Authors: Eran Zvuloni, Jesse Read, Antônio H. Ribeiro, Antonio Luiz P. Ribeiro, Joachim A. Behar

Abstract: Objective: Machine learning techniques have been used extensively for 12-lead electrocardiogram (ECG) analysis. For physiological time series, deep learning (DL) superiority to feature engineering (FE) approaches based on domain knowledge is still an open question. Moreover, it remains unclear whether combining DL with FE may improve performance. Methods: We considered three tasks intending to add… ▽ More Objective: Machine learning techniques have been used extensively for 12-lead electrocardiogram (ECG) analysis. For physiological time series, deep learning (DL) superiority to feature engineering (FE) approaches based on domain knowledge is still an open question. Moreover, it remains unclear whether combining DL with FE may improve performance. Methods: We considered three tasks intending to address these research gaps: cardiac arrhythmia diagnosis (multiclass-multilabel classification), atrial fibrillation risk prediction (binary classification), and age estimation (regression). We used an overall dataset of 2.3M 12-lead ECG recordings to train the following models for each task: i) a random forest taking the FE as input was trained as a classical machine learning approach; ii) an end-to-end DL model; and iii) a merged model of FE+DL. Results: FE yielded comparable results to DL while necessitating significantly less data for the two classification tasks and it was outperformed by DL for the regression task. For all tasks, merging FE with DL did not improve performance over DL alone. Conclusion: We found that for traditional 12-lead ECG based diagnosis tasks DL did not yield a meaningful improvement over FE, while it improved significantly the nontraditional regression task. We also found that combining FE with DL did not improve over DL alone which suggests that the FE were redundant with the features learned by DL. Significance: Our findings provides important recommendations on what machine learning strategy and data regime to chose with respect to the task at hand for the development of new machine learning models based on the 12-lead ECG. △ Less

Submitted 16 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

ACM Class: I.2.0; J.3

arXiv:2205.05389 [pdf, other]

doi 10.1088/1361-6579/ac8ccd

Machine Learning to Support Triage of Children at Risk for Epileptic Seizures in the Pediatric Intensive Care Unit

Authors: Raphael Azriel, Cecil D. Hahn, Thomas De Cooman, Sabine Van Huffel, Eric T. Payne, Kristin L. McBain, Danny Eytan, Joachim A. Behar

Abstract: Objective: Epileptic seizures are relatively common in critically-ill children admitted to the pediatric intensive care unit (PICU) and thus serve as an important target for identification and treatment. Most of these seizures have no discernible clinical manifestation but still have a significant impact on morbidity and mortality. Children that are deemed at risk for seizures within the PICU are… ▽ More Objective: Epileptic seizures are relatively common in critically-ill children admitted to the pediatric intensive care unit (PICU) and thus serve as an important target for identification and treatment. Most of these seizures have no discernible clinical manifestation but still have a significant impact on morbidity and mortality. Children that are deemed at risk for seizures within the PICU are monitored using continuous-electroencephalogram (cEEG). cEEG monitoring cost is considerable and as the number of available machines is always limited, clinicians need to resort to triaging patients according to perceived risk in order to allocate resources. This research aims to develop a computer aided tool to improve seizures risk assessment in critically-ill children, using an ubiquitously recorded signal in the PICU, namely the electrocardiogram (ECG). Approach: A novel data-driven model was developed at a patient-level approach, based on features extracted from the first hour of ECG recording and the clinical data of the patient. Main results: The most predictive features were the age of the patient, the brain injury as coma etiology and the QRS area. For patients without any prior clinical data, using one hour of ECG recording, the classification performance of the random forest classifier reached an area under the receiver operating characteristic curve (AUROC) score of 0.84. When combining ECG features with the patients clinical history, the AUROC reached 0.87. Significance: Taking a real clinical scenario, we estimated that our clinical decision support triage tool can improve the positive predictive value by more than 59% over the clinical standard. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: 8 pages, 9 figures, submitted to Physiological Measurement

arXiv:2205.01676 [pdf, other]

doi 10.1016/j.cmpb.2023.107522

FundusQ-Net: a Regression Quality Assessment Deep Learning Algorithm for Fundus Images Quality Grading

Authors: Or Abramovich, Hadas Pizem, Jan Van Eijgen, Ilan Oren, Joshua Melamed, Ingeborg Stalmans, Eytan Z. Blumenthal, Joachim A. Behar

Abstract: Objective: Ophthalmological pathologies such as glaucoma, diabetic retinopathy and age-related macular degeneration are major causes of blindness and vision impairment. There is a need for novel decision support tools that can simplify and speed up the diagnosis of these pathologies. A key step in this process is to automatically estimate the quality of the fundus images to make sure these are int… ▽ More Objective: Ophthalmological pathologies such as glaucoma, diabetic retinopathy and age-related macular degeneration are major causes of blindness and vision impairment. There is a need for novel decision support tools that can simplify and speed up the diagnosis of these pathologies. A key step in this process is to automatically estimate the quality of the fundus images to make sure these are interpretable by a human operator or a machine learning model. We present a novel fundus image quality scale and deep learning (DL) model that can estimate fundus image quality relative to this new scale. Methods: A total of 1,245 images were graded for quality by two ophthalmologists within the range 1-10, with a resolution of 0.5. A DL regression model was trained for fundus image quality assessment. The architecture used was Inception-V3. The model was developed using a total of 89,947 images from 6 databases, of which 1,245 were labeled by the specialists and the remaining 88,702 images were used for pre-training and semi-supervised learning. The final DL model was evaluated on an internal test set (n=209) as well as an external test set (n=194). Results: The final DL model, denoted FundusQ-Net, achieved a mean absolute error of 0.61 (0.54-0.68) on the internal test set. When evaluated as a binary classification model on the public DRIMDB database as an external test set the model obtained an accuracy of 99%. Significance: the proposed algorithm provides a new robust tool for automated quality grading of fundus images. △ Less

Submitted 6 June, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: 12 pages, 9 figures, published in Computer Methods and Programs in Biomedicine

ACM Class: I.2.10

Journal ref: Computer Methods and Programs in Biomedicine, Volume 239, September 2023, 107522, ISSN 0169-2607

arXiv:2202.05735 [pdf, other]

SleepPPG-Net: a deep learning algorithm for robust sleep staging from continuous photoplethysmography

Authors: Kevin Kotzen, Peter H. Charlton, Sharon Salabi, Lea Amar, Amir Landesberg, Joachim A. Behar

Abstract: Introduction: Sleep staging is an essential component in the diagnosis of sleep disorders and management of sleep health. It is traditionally measured in a clinical setting and requires a labor-intensive labeling process. We hypothesize that it is possible to perform robust 4-class sleep staging using the raw photoplethysmography (PPG) time series and modern advances in deep learning (DL). Methods… ▽ More Introduction: Sleep staging is an essential component in the diagnosis of sleep disorders and management of sleep health. It is traditionally measured in a clinical setting and requires a labor-intensive labeling process. We hypothesize that it is possible to perform robust 4-class sleep staging using the raw photoplethysmography (PPG) time series and modern advances in deep learning (DL). Methods: We used two publicly available sleep databases that included raw PPG recordings, totalling 2,374 patients and 23,055 hours. We developed SleepPPG-Net, a DL model for 4-class sleep staging from the raw PPG time series. SleepPPG-Net was trained end-to-end and consists of a residual convolutional network for automatic feature extraction and a temporal convolutional network to capture long-range contextual information. We benchmarked the performance of SleepPPG-Net against models based on the best-reported state-of-the-art (SOTA) algorithms. Results: When benchmarked on a held-out test set, SleepPPG-Net obtained a median Cohen's Kappa ($κ$) score of 0.75 against 0.69 for the best SOTA approach. SleepPPG-Net showed good generalization performance to an external database, obtaining a $κ$ score of 0.74 after transfer learning. Perspective: Overall, SleepPPG-Net provides new SOTA performance. In addition, performance is high enough to open the path to the development of wearables that meet the requirements for usage in clinical applications such as the diagnosis and monitoring of obstructive sleep apnea. △ Less

Submitted 29 April, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: 11 pages, 10 figures

arXiv:2012.05492 [pdf]

Machine learning for nocturnal diagnosis of chronic obstructive pulmonary disease using digital oximetry biomarkers

Authors: Jeremy Levy, Daniel Alvarez, Felix del Campo, Joachim A. Behar

Abstract: Objective: Chronic obstructive pulmonary disease (COPD) is a highly prevalent chronic condition. COPD is a major source of morbidity, mortality and healthcare costs. Spirometry is the gold standard test for a definitive diagnosis and severity grading of COPD. However, a large proportion of individuals with COPD are undiagnosed and untreated. Given the high prevalence of COPD and its clinical impor… ▽ More Objective: Chronic obstructive pulmonary disease (COPD) is a highly prevalent chronic condition. COPD is a major source of morbidity, mortality and healthcare costs. Spirometry is the gold standard test for a definitive diagnosis and severity grading of COPD. However, a large proportion of individuals with COPD are undiagnosed and untreated. Given the high prevalence of COPD and its clinical importance, it is critical to develop new algorithms to identify undiagnosed COPD, especially in specific groups at risk, such as those with sleep disorder breathing. To our knowledge, no research has looked at the feasibility of COPD diagnosis from the nocturnal oximetry time series. Approach: We hypothesize that patients with COPD will exert certain patterns and/or dynamics of their overnight oximetry time series that are unique to this condition. We introduce a novel approach to nocturnal COPD diagnosis using 44 oximetry digital biomarkers and 5 demographic features and assess its performance in a population sample at risk of sleep-disordered breathing. A total of n=350 unique patients polysomnography (PSG) recordings. A random forest (RF) classifier is trained using these features and evaluated using the nested cross-validation procedure. Significance: Our research makes a number of novel scientific contributions. First, we demonstrated for the first time, the feasibility of COPD diagnosis from nocturnal oximetry time series in a population sample at risk of sleep disordered breathing. We highlighted what digital oximetry biomarkers best reflect how COPD manifests overnight. The results motivate that overnight single channel oximetry is a valuable pathway for COPD diagnosis. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 34 pages, 9 figures

arXiv:2008.02228 [pdf]

Remote atrial fibrillation burden estimation using deep recurrent neural network

Authors: Armand Chocron, Julien Oster, Shany Biton, Mandel Franck, Meyer Elbaz, Yehoshua Y. Zeevi, Joachim Behar

Abstract: The atrial fibrillation burden (AFB) is defined as the percentage of time spend in atrial fibrillation (AF) over a long enough monitoring period. Recent research has demonstrated the added prognosis value that becomes available by using the AFB as compared with the binary diagnosis. We evaluate, for the first time, the ability to estimate the AFB over long-term continuous recordings, using a deep… ▽ More The atrial fibrillation burden (AFB) is defined as the percentage of time spend in atrial fibrillation (AF) over a long enough monitoring period. Recent research has demonstrated the added prognosis value that becomes available by using the AFB as compared with the binary diagnosis. We evaluate, for the first time, the ability to estimate the AFB over long-term continuous recordings, using a deep recurrent neutral network (DRNN) approach. Methods: The models were developed and evaluated on a large database of p=2,891 patients, totaling t=68,800 hours of continuous electrocardiography (ECG) recordings acquired at the University of Virginia heart station. Specifically, 24h beat-to-beat time series were obtained from a single portable ECG channel. The network, denoted ArNet, was benchmarked against a gradient boosting (XGB) model, trained on 21 features including the coefficient of sample entropy (CosEn) and AFEvidence. Data were divided into training and test sets, while patients were stratified by the presence and severity of AF. The generalizations of ArNet and XGB were also evaluated on the independent test PhysioNet LTAF database. Results: the absolute AF burden estimation error |E_AF|, median and interquartile, on the test set, was 1.2 (0.1-6.7) for ArNet and 3.1 (0.0-11.7) for XGB for AF individuals. Generalization results on LTAF were consistent with E_AF of 2.6 (1.1-14.7) for ArNet and 3.6 (1.0-16.7) for XGB. Conclusion: This research demonstrates the feasibility of AFB estimation from 24h beat-to-beat interval time series utilizing recent advances in DRNN. Significance: The novel data-driven approach enables robust remote diagnosis and phenotyping of AF. △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:2007.14686 [pdf]

doi 10.1088/1361-6579/abb8bf

Digital biomarkers and artificial intelligence for mass diagnosis of atrial fibrillation in a population sample at risk of sleep disordered breathing

Authors: Armand Chocron, Roi Efraim, Franck Mandel, Michael Rueschman, Niclas Palmius, Thomas Penzel, Meyer Elbaz, Joachim A. Behar

Abstract: Atrial fibrillation (AF) is the most prevalent arrhythmia and is associated with a five-fold increase in stroke risk. Many individuals with AF go undetected. These individuals are often asymptomatic. There are ongoing debates on whether mass screening for AF is to be recommended. However, there is incentive in performing screening for specific at risk groups such as individuals suspected of sleep-… ▽ More Atrial fibrillation (AF) is the most prevalent arrhythmia and is associated with a five-fold increase in stroke risk. Many individuals with AF go undetected. These individuals are often asymptomatic. There are ongoing debates on whether mass screening for AF is to be recommended. However, there is incentive in performing screening for specific at risk groups such as individuals suspected of sleep-disordered breathing where an important association between AF and obstructive sleep apnea (OSA) has been demonstrated. We introduce a new methodology leveraging digital biomarkers and recent advances in artificial intelligence (AI) for the purpose of mass AF diagnosis. We demonstrate the value of such methodology in a large population sample at risk of sleep disordered breathing. Four databases, totaling n=3,088 patients and p=26,913 hours of ECG raw data were used. Three of the databases (n=125, p=2,513) were used for training a machine learning model in recognizing AF events from beat-to-beat interval time series. The visit 1 of the sleep heart health study database (SHHS1, n=2,963, p=24,400) consists of overnight polysomnographic (PSG) recordings, and was considered as the test set. In SHHS1, expert inspection identified a total of 70 patients with a prominent AF rhythm. Model prediction on the SHHS1 showed an overall Se=0.97,Sp=0.99,NPV=0.99,PPV=0.67 in classifying individuals with or without prominent AF. PPV was non-inferior (p=0.03) for individuals with an apnea-hypopnea index (AHI) > 15 versus AHI < 15. Over 22% of correctly identified prominent AF rhythm cases were not documented as AF in the SHHS1. Individuals with prominent AF can be automatically diagnosed from an overnight single channel ECG recording, with an accuracy unaffected by the presence of OSA. AF detection from overnight ECG recording revealed a large proportion of undiagnosed AF and may enhance the phenotyping of OSA. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Showing 1–21 of 21 results for author: Behar, J