-
FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization
Authors:
Abdelrhman Gaber,
Hassan Abd-Eltawab,
John Elgallab,
Youssif Abuzied,
Dineo Mpanya,
Turgay Celik,
Swarun Kumar,
Tamer ElBatt
Abstract:
Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To…
▽ More
Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance.
Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy.
Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.
△ Less
Submitted 30 July, 2025;
originally announced July 2025.
-
Discovering the Unequal Importance of Coded Bits in the Decoding of Polar Codes
Authors:
Hossam Hassan,
Ali Gaber,
Mohammed Karmoose,
Noha Korany
Abstract:
Polar codes are widely used in modern communication systems due to their capacity-achieving properties. This paper investigates the importance of coded bits in the decoding process of polar codes and aims to determine which bits contribute most to successful decoding. We investigate the problem via a brute-force search approach and surrogate optimization techniques to identify the most critical co…
▽ More
Polar codes are widely used in modern communication systems due to their capacity-achieving properties. This paper investigates the importance of coded bits in the decoding process of polar codes and aims to determine which bits contribute most to successful decoding. We investigate the problem via a brute-force search approach and surrogate optimization techniques to identify the most critical coded bits. We also demonstrate how mapping these important bits to the most reliable channels improves system performance with minimal additional cost. We show the performance of our proposed bit mapping in OFDM based systems, and demonstrate up to x7 gain in BER performance.
△ Less
Submitted 11 July, 2025;
originally announced July 2025.
-
Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios
Authors:
Karim Gamal,
Ahmed Gaber,
Hossam Amer
Abstract:
Federated learning is a growing field in the machine learning community due to its decentralized and private design. Model training in federated learning is distributed over multiple clients giving access to lots of client data while maintaining privacy. Then, a server aggregates the training done on these multiple clients without access to their data, which could be emojis widely used in any soci…
▽ More
Federated learning is a growing field in the machine learning community due to its decentralized and private design. Model training in federated learning is distributed over multiple clients giving access to lots of client data while maintaining privacy. Then, a server aggregates the training done on these multiple clients without access to their data, which could be emojis widely used in any social media service and instant messaging platforms to express users' sentiments. This paper proposes federated learning-based multilingual emoji prediction in both clean and attack scenarios. Emoji prediction data have been crawled from both Twitter and SemEval emoji datasets. This data is used to train and evaluate different transformer model sizes including a sparsely activated transformer with either the assumption of clean data in all clients or poisoned data via label flipping attack in some clients. Experimental results on these models show that federated learning in either clean or attacked scenarios performs similarly to centralized training in multilingual emoji prediction on seen and unseen languages under different data sources and distributions. Our trained transformers perform better than other techniques on the SemEval emoji dataset in addition to the privacy as well as distributed benefits of federated learning.
△ Less
Submitted 6 July, 2023; v1 submitted 30 March, 2023;
originally announced April 2023.
-
WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using Transformers
Authors:
Ahmed Sultan,
Mahmoud Salim,
Amina Gaber,
Islam El Hosary
Abstract:
In this paper, we describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text alongside other experiments. Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa", a transformer-based multilingual masked language model, on monolingual English and Spanish data and Spanish-English code-mixed data. Our system outperfor…
▽ More
In this paper, we describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text alongside other experiments. Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa", a transformer-based multilingual masked language model, on monolingual English and Spanish data and Spanish-English code-mixed data. Our system outperforms the official task baseline by achieving a 70.1% average F1-Score on the official leaderboard using the test set. For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan".
△ Less
Submitted 21 September, 2020;
originally announced September 2020.