-
Large-Scale Processing and Validation of Grid Data for Assessing the Fair Spatial Distribution of PV Hosting Capacity
Authors:
Ali Mohamed Ali,
Yaser Raeisi,
Plouton Grammatikos,
Davide Pavanello,
Pierre Roduit,
Fabrizio Sossan
Abstract:
The integration of PV systems and increased electrification levels present significant challenges to the traditional design and operation of distribution grids. This paper presents a methodology for extracting, validating, and adapting grid data from a distribution system operator's (DSO) database to facilitate large-scale grid studies, including load flow and optimal power flow analyses. The vali…
▽ More
The integration of PV systems and increased electrification levels present significant challenges to the traditional design and operation of distribution grids. This paper presents a methodology for extracting, validating, and adapting grid data from a distribution system operator's (DSO) database to facilitate large-scale grid studies, including load flow and optimal power flow analyses. The validation process combines rule-based sanity checks and offline automated power flow analyses to ensure data consistency and detect potential errors in the grid database, allowing for their correction. As a practical application, the paper proposes a method to assess the PV hosting capacity of distribution grids, with a focus on ensuring fairness in their spatial distribution. By incorporating fairness criteria into the analyses, we quantify the costs (in terms of missed revenues from selling PV generation) associated with spatial fairness.
△ Less
Submitted 11 July, 2025;
originally announced July 2025.
-
A Deep Convolutional Neural Network-Based Novel Class Balancing for Imbalance Data Segmentation
Authors:
Atifa Kalsoom,
M. A. Iftikhar,
Amjad Ali,
Zubair Shah,
Shidin Balakrishnan,
Hazrat Ali
Abstract:
Retinal fundus images provide valuable insights into the human eye's interior structure and crucial features, such as blood vessels, optic disk, macula, and fovea. However, accurate segmentation of retinal blood vessels can be challenging due to imbalanced data distribution and varying vessel thickness. In this paper, we propose BLCB-CNN, a novel pipeline based on deep learning and bi-level class…
▽ More
Retinal fundus images provide valuable insights into the human eye's interior structure and crucial features, such as blood vessels, optic disk, macula, and fovea. However, accurate segmentation of retinal blood vessels can be challenging due to imbalanced data distribution and varying vessel thickness. In this paper, we propose BLCB-CNN, a novel pipeline based on deep learning and bi-level class balancing scheme to achieve vessel segmentation in retinal fundus images. The BLCB-CNN scheme uses a Convolutional Neural Network (CNN) architecture and an empirical approach to balance the distribution of pixels across vessel and non-vessel classes and within thin and thick vessels. Level-I is used for vessel/non-vessel balancing and Level-II is used for thick/thin vessel balancing. Additionally, pre-processing of the input retinal fundus image is performed by Global Contrast Normalization (GCN), Contrast Limited Adaptive Histogram Equalization (CLAHE), and gamma corrections to increase intensity uniformity as well as to enhance the contrast between vessels and background pixels. The resulting balanced dataset is used for classification-based segmentation of the retinal vascular tree. We evaluate the proposed scheme on standard retinal fundus images and achieve superior performance measures, including an area under the ROC curve of 98.23%, Accuracy of 96.22%, Sensitivity of 81.57%, and Specificity of 97.65%. We also demonstrate the method's efficacy through external cross-validation on STARE images, confirming its generalization ability.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study
Authors:
Yassine El Kheir,
Omnia Ibrahim,
Amit Meghanani,
Nada Almarwani,
Hawau Olamide Toyin,
Sadeen Alharbi,
Modar Alfadly,
Lamya Alkanhal,
Ibrahim Selim,
Shehab Elbatal,
Salima Mdhaffar,
Thomas Hain,
Yasser Hifny,
Mostafa Shahin,
Ahmed Ali
Abstract:
We present a unified benchmark for mispronunciation detection in Modern Standard Arabic (MSA) using Qur'anic recitation as a case study. Our approach lays the groundwork for advancing Arabic pronunciation assessment by providing a comprehensive pipeline that spans data processing, the development of a specialized phoneme set tailored to the nuances of MSA pronunciation, and the creation of the fir…
▽ More
We present a unified benchmark for mispronunciation detection in Modern Standard Arabic (MSA) using Qur'anic recitation as a case study. Our approach lays the groundwork for advancing Arabic pronunciation assessment by providing a comprehensive pipeline that spans data processing, the development of a specialized phoneme set tailored to the nuances of MSA pronunciation, and the creation of the first publicly available test set for this task, which we term as the Qur'anic Mispronunciation Benchmark (QuranMB.v1). Furthermore, we evaluate several baseline models to provide initial performance insights, thereby highlighting both the promise and the challenges inherent in assessing MSA pronunciation. By establishing this standardized framework, we aim to foster further research and development in pronunciation assessment in Arabic language technology and related applications.
△ Less
Submitted 12 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
TinyML-Based Adaptive Pulse Shaping for Edge Intelligence in IoT/IIoT
Authors:
Afan Ali
Abstract:
Edge intelligence in IoT and IIoT demands lightweight algorithms for data processing on resource-constrained devices. This paper introduces a novel adaptive pulse shape filter based on TinyML for PAPR and SER optimization on edge devices used in uplink IoT communication. Implemented on IoT nodes such as sensors, our pruned neural network provides up to 2 dB PAPR saving over root-raised-cosine (RRC…
▽ More
Edge intelligence in IoT and IIoT demands lightweight algorithms for data processing on resource-constrained devices. This paper introduces a novel adaptive pulse shape filter based on TinyML for PAPR and SER optimization on edge devices used in uplink IoT communication. Implemented on IoT nodes such as sensors, our pruned neural network provides up to 2 dB PAPR saving over root-raised-cosine (RRC) filters. Mass simulations validate its efficacy in DFT-s-OFDM systems and offer an energy-efficient and scalable solution for IoT/IIoT use cases such as smart factories and rural connectivity.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
AI-Driven Digital Twins: Optimizing 5G/6G Network Slicing with NTNs
Authors:
Afan Ali,
Huseyin Arslan
Abstract:
Network slicing in 5G/6G Non-Terrestrial Network (NTN) is confronted with mobility and traffic variability. An artificial intelligence (AI)-based digital twin (DT) architecture with deep reinforcement learning (DRL) using Deep deterministic policy gradient (DDPG) is proposed for dynamic optimization of resource allocation. DT virtualizes network states to enable predictive analysis, while DRL chan…
▽ More
Network slicing in 5G/6G Non-Terrestrial Network (NTN) is confronted with mobility and traffic variability. An artificial intelligence (AI)-based digital twin (DT) architecture with deep reinforcement learning (DRL) using Deep deterministic policy gradient (DDPG) is proposed for dynamic optimization of resource allocation. DT virtualizes network states to enable predictive analysis, while DRL changes bandwidth for eMBB slice. Simulations show a 25\% latency reduction compared to static methods, with enhanced resource utilization. This scalable solution supports 5G/6G NTN applications like disaster recovery and urban blockage.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Spreading the Wave: Low-Complexity PAPR Reduction for AFDM and OCDM in 6G Networks
Authors:
Afan Ali,
Abdelali Arous,
Huseyin Arslan
Abstract:
High Peak-to-Average Power Ratio (PAPR) is still a common issue in multicarrier signal modulation systems such as Orthogonal Chirp Division Multiplexing (OCDM) and Affine Frequency Division Multiplexing (AFDM), which are envisioned to play a central role in 6G networks. To this end, this paper aims to investigate a novel and low-complexity solution towards minimizing the PAPR with the aid of a uni…
▽ More
High Peak-to-Average Power Ratio (PAPR) is still a common issue in multicarrier signal modulation systems such as Orthogonal Chirp Division Multiplexing (OCDM) and Affine Frequency Division Multiplexing (AFDM), which are envisioned to play a central role in 6G networks. To this end, this paper aims to investigate a novel and low-complexity solution towards minimizing the PAPR with the aid of a unified premodulation data spreading paradigm. It analyze four spreading techniques namely, Walsh-Hadamard transform (WHT), Discrete Cosine transform (DCT), Zadoff-Chu transform (ZC), and Interleaved Discrete Fourier transform (IDFT), which assist in preallocating energy prior to OCDM and AFDM modulation. The proposed method takes advantage of the inherent characteristics of chirp-based modulation to achieve a notable reduction in PAPR at minimal computational load and no side information as compared to past solutions, such as Partial Transmit Sequence (PTS) or Selected Mapping (SLM), which suffers with a high computational complexity. The proposed method has an additional benefit of achieving an improvement in phase selectivity by increasing chirp parameters of AFDM and quadratic phase of OCDM, which amplifies the robustness in doubly dispersive channels. It further reduces interference by smoothing the output spread signal. The analytical and simulation results demonstrate an improvement in the overall energy efficiency and scalability of large ioT sensor networks.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens
Authors:
Afan Ali,
Md. Jalil Piran,
Huseyin Arslan
Abstract:
Covert wireless communications are critical for concealing the existence of any transmission from adversarial wardens, particularly in complex environments with multiple heterogeneous detectors. This paper proposes a novel adversarial AI framework leveraging a multi-discriminator Generative Adversarial Network (GAN) to design signals that evade detection by diverse wardens, while ensuring reliable…
▽ More
Covert wireless communications are critical for concealing the existence of any transmission from adversarial wardens, particularly in complex environments with multiple heterogeneous detectors. This paper proposes a novel adversarial AI framework leveraging a multi-discriminator Generative Adversarial Network (GAN) to design signals that evade detection by diverse wardens, while ensuring reliable decoding by the intended receiver. The transmitter is modeled as a generator that produces noise-like signals, while every warden is modeled as an individual discriminator, suggesting varied channel conditions and detection techniques. Unlike traditional methods like spread spectrum or single-discriminator GANs, our approach addresses multi-warden scenarios with moving receiver and wardens, which enhances robustness in urban surveillance, military operations, and 6G networks. Performance evaluation shows encouraging results with improved detection probabilities and bit error rates (BERs), in up to five warden cases, compared to noise injection and single-discriminator baselines. The scalability and flexibility of the system make it a potential candidate for future wireless secure systems, and potential future directions include real-time optimization and synergy with 6G technologies such as intelligent reflecting surfaces.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Measurement-Based Line-Impedance Estimation in the Absence of Phasor Measurement Units
Authors:
Plouton Grammatikos,
Ali Mohamed Ali,
Fabrizio Sossan
Abstract:
This paper proposes and compares experimentally several methods to estimate the series resistance and reactance (i.e., the transversal components of the $π$-model of a line) of low-voltage lines in distribution grids. It first shows that if phasor measurements are available and the grid nodal voltages and power injections are known, the problem can be formulated and solved as a conventional load f…
▽ More
This paper proposes and compares experimentally several methods to estimate the series resistance and reactance (i.e., the transversal components of the $π$-model of a line) of low-voltage lines in distribution grids. It first shows that if phasor measurements are available and the grid nodal voltages and power injections are known, the problem can be formulated and solved as a conventional load flow with properly adjusted unknowns. To solve this problem, we propose an analytical derivation of the Jacobian matrix. If only RMS values are available, such as from smart meters, integrating information from multiple intervals becomes necessary, ultimately opening to least-squares estimations, widely adopted in the literature. In this context, applying the proposed Jacobian contributes to accelerating the problem resolution of existing algorithms. The methods are compared in terms of estimation performance and convergence by using measurements from an experimental distribution grid interfacing real-world components and with realistic size implemented at the Gridlab at HES-SO Valais.
△ Less
Submitted 7 May, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment
Authors:
Abdelrahaman A. Hassan,
Abdelrahman A. Ali,
Aya E. Fouda,
Radwa J. Hanafy,
Mohammed E. Fouda
Abstract:
The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools. Traditional clinical assessments often face limitations in accessibility, objectivity, and consistency. This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio…
▽ More
The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools. Traditional clinical assessments often face limitations in accessibility, objectivity, and consistency. This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio, and video data. Our approach involves a comprehensive analysis of various data preprocessing techniques, including novel chunking and utterance-based formatting strategies. We systematically evaluate a range of state-of-the-art embedding models for each modality and employ Convolutional Neural Networks (CNNs) and Bidirectional LSTM Networks (BiLSTMs) for feature extraction. We explore data-level, feature-level, and decision-level fusion techniques, including a novel integration of Large Language Model (LLM) predictions. We also investigate the impact of replacing Multilayer Perceptron classifiers with Support Vector Machines. We extend our analysis to severity prediction using PHQ-8 and PCL-C scores and multi-class classification (considering co-occurring conditions). Our results demonstrate that utterance-based chunking significantly improves performance, particularly for text and audio modalities. Decision-level fusion, incorporating LLM predictions, achieves the highest accuracy, with a balanced accuracy of 94.8% for depression and 96.2% for PTSD detection. The combination of CNN-BiLSTM architectures with utterance-level chunking, coupled with the integration of external LLM, provides a powerful and nuanced approach to the detection and assessment of mental health conditions. Our findings highlight the potential of MMML for developing more accurate, accessible, and personalized mental healthcare tools.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
RCC-PFL: Robust Client Clustering under Noisy Labels in Personalized Federated Learning
Authors:
Abdulmoneam Ali,
Ahmed Arafa
Abstract:
We address the problem of cluster identity estimation in a personalized federated learning (PFL) setting in which users aim to learn different personal models. The backbone of effective learning in such a setting is to cluster users into groups whose objectives are similar. A typical approach in the literature is to achieve this by training users' data on different proposed personal models and ass…
▽ More
We address the problem of cluster identity estimation in a personalized federated learning (PFL) setting in which users aim to learn different personal models. The backbone of effective learning in such a setting is to cluster users into groups whose objectives are similar. A typical approach in the literature is to achieve this by training users' data on different proposed personal models and assign them to groups based on which model achieves the lowest value of the users' loss functions. This process is to be done iteratively until group identities converge. A key challenge in such a setting arises when users have noisy labeled data, which may produce misleading values of their loss functions, and hence lead to ineffective clustering. To overcome this challenge, we propose a label-agnostic data similarity-based clustering algorithm, coined RCC-PFL, with three main advantages: the cluster identity estimation procedure is independent from the training labels; it is a one-shot clustering algorithm performed prior to the training; and it requires fewer communication rounds and less computation compared to iterative-based clustering methods. We validate our proposed algorithm using various models and datasets and show that it outperforms multiple baselines in terms of average accuracy and variance reduction.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Automating Hot-Rolling: Designing an Integrated Mechatronics System for Enhanced Efficiency in Sheet Metal Production
Authors:
Mostafa A. Mostafa,
Mohamed Khaled,
Abdelrahman Ali,
Amr Mostafa,
Mariam Mohamed,
Omar Ahmed,
Osama Khalil
Abstract:
The hot-rolling process is a critical stage in sheet metal production within the heavy steel industry. Traditionally, parameter adjustments such as sheet metal velocity and roll gap are performed manually, leading to inefficiencies and limited precision. This project introduces an integrated mechatronics system designed to automate the control of rolling speed and sheet metal thickness, enhancing…
▽ More
The hot-rolling process is a critical stage in sheet metal production within the heavy steel industry. Traditionally, parameter adjustments such as sheet metal velocity and roll gap are performed manually, leading to inefficiencies and limited precision. This project introduces an integrated mechatronics system designed to automate the control of rolling speed and sheet metal thickness, enhancing efficiency, consistency, and quality. The proposed system consists of a pair of rolls applying compression loads, with a mechanism for gap control, suitable motors and sensors, and dynamic modeling to optimize performance. Through simulation and practical implementation strategies, we demonstrate the feasibility of automating the hot-rolling process. By integrating mechatronics, this solution aims to modernize sheet metal production, improve productivity, and enhance product quality in the steel industry.
△ Less
Submitted 16 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
AI-Driven Diabetic Retinopathy Screening: Multicentric Validation of AIDRSS in India
Authors:
Amit Kr Dey,
Pradeep Walia,
Girish Somvanshi,
Abrar Ali,
Sagarnil Das,
Pallabi Paul,
Minakhi Ghosh
Abstract:
Purpose: Diabetic retinopathy (DR) is a major cause of vision loss, particularly in India, where access to retina specialists is limited in rural areas. This study aims to evaluate the Artificial Intelligence-based Diabetic Retinopathy Screening System (AIDRSS) for DR detection and prevalence assessment, addressing the growing need for scalable, automated screening solutions in resource-limited se…
▽ More
Purpose: Diabetic retinopathy (DR) is a major cause of vision loss, particularly in India, where access to retina specialists is limited in rural areas. This study aims to evaluate the Artificial Intelligence-based Diabetic Retinopathy Screening System (AIDRSS) for DR detection and prevalence assessment, addressing the growing need for scalable, automated screening solutions in resource-limited settings.
Approach: A multicentric, cross-sectional study was conducted in Kolkata, India, involving 5,029 participants and 10,058 macula-centric retinal fundus images. The AIDRSS employed a deep learning algorithm with 50 million trainable parameters, integrated with Contrast Limited Adaptive Histogram Equalization (CLAHE) preprocessing for enhanced image quality. DR was graded using the International Clinical Diabetic Retinopathy (ICDR) Scale, categorizing disease into five stages (DR0 to DR4). Statistical metrics including sensitivity, specificity, and prevalence rates were evaluated against expert retina specialist assessments.
Results: The prevalence of DR in the general population was 13.7%, rising to 38.2% among individuals with elevated random blood glucose levels. The AIDRSS achieved an overall sensitivity of 92%, specificity of 88%, and 100% sensitivity for detecting referable DR (DR3 and DR4). These results demonstrate the system's robust performance in accurately identifying and grading DR in a diverse population.
Conclusions: AIDRSS provides a reliable, scalable solution for early DR detection in resource-constrained environments. Its integration of advanced AI techniques ensures high diagnostic accuracy, with potential to significantly reduce the burden of diabetes-related vision loss in underserved regions.
△ Less
Submitted 13 January, 2025; v1 submitted 10 January, 2025;
originally announced January 2025.
-
Improving Numerical Stability and Accuracy in Partitioned Methods with Algebraic Prediction
Authors:
Ahmad Ali,
Haya Monawwar,
Hantao Cui
Abstract:
The partitioned approach for the numerical integration of power system differential algebraic equations faces inherent numerical stability challenges due to delays between the computation of state and algebraic variables. Such delays can compromise solution accuracy and computational efficiency, particularly in large-scale system simulations. We present an $O(h^2)$-accurate prediction scheme for a…
▽ More
The partitioned approach for the numerical integration of power system differential algebraic equations faces inherent numerical stability challenges due to delays between the computation of state and algebraic variables. Such delays can compromise solution accuracy and computational efficiency, particularly in large-scale system simulations. We present an $O(h^2)$-accurate prediction scheme for algebraic variables based on forward and backward difference formulas, applied before the correction step of numerical integration. The scheme improves the numerical stability of the partitioned approach while maintaining computational efficiency. Through numerical simulations on a lightly damped single machine infinite bus system and a large-scale 140-bus network, we demonstrate that the proposed method, when combined with variable time-stepping, significantly enhances the numerical stability, solution accuracy, and computational performance of the simulation. Results show reduced step rejections, fewer nonlinear solver iterations, and improved accuracy compared to conventional approaches, making the method particularly valuable for large-scale power system dynamic simulations.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
Leveraging Audio and Text Modalities in Mental Health: A Study of LLMs Performance
Authors:
Abdelrahman A. Ali,
Aya E. Fouda,
Radwa J. Hanafy,
Mohammed E. Fouda
Abstract:
Mental health disorders are increasingly prevalent worldwide, creating an urgent need for innovative tools to support early diagnosis and intervention. This study explores the potential of Large Language Models (LLMs) in multimodal mental health diagnostics, specifically for detecting depression and Post Traumatic Stress Disorder through text and audio modalities. Using the E-DAIC dataset, we comp…
▽ More
Mental health disorders are increasingly prevalent worldwide, creating an urgent need for innovative tools to support early diagnosis and intervention. This study explores the potential of Large Language Models (LLMs) in multimodal mental health diagnostics, specifically for detecting depression and Post Traumatic Stress Disorder through text and audio modalities. Using the E-DAIC dataset, we compare text and audio modalities to investigate whether LLMs can perform equally well or better with audio inputs. We further examine the integration of both modalities to determine if this can enhance diagnostic accuracy, which generally results in improved performance metrics. Our analysis specifically utilizes custom-formulated metrics; Modal Superiority Score and Disagreement Resolvement Score to evaluate how combined modalities influence model performance. The Gemini 1.5 Pro model achieves the highest scores in binary depression classification when using the combined modality, with an F1 score of 0.67 and a Balanced Accuracy (BA) of 77.4%, assessed across the full dataset. These results represent an increase of 3.1% over its performance with the text modality and 2.7% over the audio modality, highlighting the effectiveness of integrating modalities to enhance diagnostic accuracy. Notably, all results are obtained in zero-shot inferring, highlighting the robustness of the models without requiring task-specific fine-tuning. To explore the impact of different configurations on model performance, we conduct binary, severity, and multiclass tasks using both zero-shot and few-shot prompts, examining the effects of prompt variations on performance. The results reveal that models such as Gemini 1.5 Pro in text and audio modalities, and GPT-4o mini in the text modality, often surpass other models in balanced accuracy and F1 scores across multiple tasks.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Robust multi-coil MRI reconstruction via self-supervised denoising
Authors:
Asad Aali,
Marius Arvinte,
Sidharth Kumar,
Yamin I. Arefeen,
Jonathan I. Tamir
Abstract:
We study the effect of incorporating self-supervised denoising as a pre-processing step for training deep learning (DL) based reconstruction methods on data corrupted by Gaussian noise. K-space data employed for training are typically multi-coil and inherently noisy. Although DL-based reconstruction methods trained on fully sampled data can enable high reconstruction quality, obtaining large, nois…
▽ More
We study the effect of incorporating self-supervised denoising as a pre-processing step for training deep learning (DL) based reconstruction methods on data corrupted by Gaussian noise. K-space data employed for training are typically multi-coil and inherently noisy. Although DL-based reconstruction methods trained on fully sampled data can enable high reconstruction quality, obtaining large, noise-free datasets is impractical. We leverage Generalized Stein's Unbiased Risk Estimate (GSURE) for denoising. We evaluate two DL-based reconstruction methods: Diffusion Probabilistic Models (DPMs) and Model-Based Deep Learning (MoDL). We evaluate the impact of denoising on the performance of these DL-based methods in solving accelerated multi-coil magnetic resonance imaging (MRI) reconstruction. The experiments were carried out on T2-weighted brain and fat-suppressed proton-density knee scans. We observed that self-supervised denoising enhances the quality and efficiency of MRI reconstructions across various scenarios. Specifically, employing denoised images rather than noisy counterparts when training DL networks results in lower normalized root mean squared error (NRMSE), higher structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) across different SNR levels, including 32dB, 22dB, and 12dB for T2-weighted brain data, and 24dB, 14dB, and 4dB for fat-suppressed knee data. Overall, we showed that denoising is an essential pre-processing technique capable of improving the efficacy of DL-based MRI reconstruction methods under diverse conditions. By refining the quality of input data, denoising enables training more effective DL networks, potentially bypassing the need for noise-free reference MRI scans.
△ Less
Submitted 24 May, 2025; v1 submitted 19 November, 2024;
originally announced November 2024.
-
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
Authors:
Md Mubtasim Ahasan,
Md Fahim,
Tasnim Mohiuddin,
A K M Mahbubur Rahman,
Aman Chadha,
Tariq Iqbal,
M Ashraful Amin,
Md Mofijul Islam,
Amin Ahsan Ali
Abstract:
Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains challenging. This process demands acoustic, semantic, and contextual information for precise speech representations. Existing speech representations generally fall into…
▽ More
Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains challenging. This process demands acoustic, semantic, and contextual information for precise speech representations. Existing speech representations generally fall into two categories: acoustic tokens from audio codecs and semantic tokens from speech self-supervised learning models. Although recent efforts have unified acoustic and semantic tokens for improved performance, they overlook the crucial role of contextual representation in comprehensive speech modeling. Our empirical investigations reveal that the absence of contextual representations results in elevated Word Error Rate (WER) and Word Information Lost (WIL) scores in speech transcriptions. To address these limitations, we propose two novel distillation approaches: (1) a language model (LM)-guided distillation method that incorporates contextual information, and (2) a combined LM and self-supervised speech model (SM)-guided distillation technique that effectively distills multimodal representations (acoustic, semantic, and contextual) into a comprehensive speech tokenizer, termed DM-Codec. The DM-Codec architecture adopts a streamlined encoder-decoder framework with a Residual Vector Quantizer (RVQ) and incorporates the LM and SM during the training process. Experiments show DM-Codec significantly outperforms state-of-the-art speech tokenization models, reducing WER by up to 13.46%, WIL by 9.82%, and improving speech quality by 5.84% and intelligibility by 1.85% on the LibriSpeech benchmark dataset. The code, samples, and model checkpoints are available at https://github.com/mubtasimahasan/DM-Codec.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Deep Learning Applications in Medical Image Analysis: Advancements, Challenges, and Future Directions
Authors:
Aimina Ali Eli,
Abida Ali
Abstract:
Medical image analysis has emerged as an essential element of contemporary healthcare, facilitating physicians in achieving expedited and precise diagnosis. Recent breakthroughs in deep learning, a subset of artificial intelligence, have markedly revolutionized the analysis of medical pictures, improving the accuracy and efficiency of clinical procedures. Deep learning algorithms, especially convo…
▽ More
Medical image analysis has emerged as an essential element of contemporary healthcare, facilitating physicians in achieving expedited and precise diagnosis. Recent breakthroughs in deep learning, a subset of artificial intelligence, have markedly revolutionized the analysis of medical pictures, improving the accuracy and efficiency of clinical procedures. Deep learning algorithms, especially convolutional neural networks (CNNs), have demonstrated remarkable proficiency in autonomously learning features from multidimensional medical pictures, including MRI, CT, and X-ray scans, without the necessity for manual feature extraction. These models have been utilized across multiple medical disciplines, including pathology, radiology, ophthalmology, and cardiology, where they aid in illness detection, classification, and segmentation tasks......
△ Less
Submitted 4 November, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
A Control Theoretic Study on Omnidirectional MAVs with Minimum Number of Actuators and No Internal Forces at Any Orientation
Authors:
Ahmed Ali,
Chiara Gabellieri,
Antonio Franchi
Abstract:
We propose a new multirotor aerial vehicle class of designs composed of a multi-body structure in which a main body is connected by passive joints to links equipped with propellers. We have investigated some instances of such class, some of which are shown to achieve omnidirectionality while having a minimum number of inputs equal to the main body Degrees of Freedom DoF's, only uni-directional pos…
▽ More
We propose a new multirotor aerial vehicle class of designs composed of a multi-body structure in which a main body is connected by passive joints to links equipped with propellers. We have investigated some instances of such class, some of which are shown to achieve omnidirectionality while having a minimum number of inputs equal to the main body Degrees of Freedom DoF's, only uni-directional positive thrust propellers, and no internal forces generated at steady state. After dynamics are derived following the Euler-Lagrange approach, an I/O dynamic feedback linearization strategy is then used to show the controllability of any desired pose with stable zero dynamics. We finally verify the developed controller with closed-loop simulations.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Data Similarity-Based One-Shot Clustering for Multi-Task Hierarchical Federated Learning
Authors:
Abdulmoneam Ali,
Ahmed Arafa
Abstract:
We address the problem of cluster identity estimation in a hierarchical federated learning setting in which users work toward learning different tasks. To overcome the challenge of task heterogeneity, users need to be grouped in a way such that users with the same task are in the same group, conducting training together, while sharing the weights of feature extraction layers with the other groups.…
▽ More
We address the problem of cluster identity estimation in a hierarchical federated learning setting in which users work toward learning different tasks. To overcome the challenge of task heterogeneity, users need to be grouped in a way such that users with the same task are in the same group, conducting training together, while sharing the weights of feature extraction layers with the other groups. Toward that end, we propose a one-shot clustering algorithm that can effectively identify and group users based on their data similarity. This enables more efficient collaboration and sharing of a common layer representation within the federated learning system. Our proposed algorithm not only enhances the clustering process, but also overcomes challenges related to privacy concerns, communication overhead, and the need for prior knowledge about learning models or loss function behaviors. We validate our proposed algorithm using various datasets such as CIFAR-10 and Fashion MNIST, and show that it outperforms the baseline in terms of accuracy and variance reduction.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
Authors:
Abdulhady Abas Abdullah,
Aram Mahmood Ahmed,
Tarik Rashid,
Hadi Veisi,
Yassin Hussein Rassul,
Bryar Hassan,
Polla Fattah,
Sabat Abdulhameed Ali,
Ahmed S. Shamsaldin
Abstract:
Speech signal processing is a cornerstone of modern communication technologies, tasked with improving the clarity and comprehensibility of audio data in noisy environments. The primary challenge in this field is the effective separation and recognition of speech from background noise, crucial for applications ranging from voice-activated assistants to automated transcription services. The quality…
▽ More
Speech signal processing is a cornerstone of modern communication technologies, tasked with improving the clarity and comprehensibility of audio data in noisy environments. The primary challenge in this field is the effective separation and recognition of speech from background noise, crucial for applications ranging from voice-activated assistants to automated transcription services. The quality of speech recognition directly impacts user experience and accessibility in technology-driven communication. This review paper explores advanced clustering techniques, particularly focusing on the Kernel Fuzzy C-Means (KFCM) method, to address these challenges. Our findings indicate that KFCM, compared to traditional methods like K-Means (KM) and Fuzzy C-Means (FCM), provides superior performance in handling non-linear and non-stationary noise conditions in speech signals. The most notable outcome of this review is the adaptability of KFCM to various noisy environments, making it a robust choice for speech enhancement applications. Additionally, the paper identifies gaps in current methodologies, such as the need for more dynamic clustering algorithms that can adapt in real time to changing noise conditions without compromising speech recognition quality. Key contributions include a detailed comparative analysis of current clustering algorithms and suggestions for further integrating hybrid models that combine KFCM with neural networks to enhance speech recognition accuracy. Through this review, we advocate for a shift towards more sophisticated, adaptive clustering techniques that can significantly improve speech enhancement and pave the way for more resilient speech processing systems.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
BeyondCT: A deep learning model for predicting pulmonary function from chest CT scans
Authors:
Kaiwen Geng,
Zhiyi Shi,
Xiaoyan Zhao,
Alaa Ali,
Jing Wang,
Joseph Leader,
Jiantao Pu
Abstract:
Abstract
Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans.
Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans.
M…
▽ More
Abstract
Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans.
Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans.
Methods: The development cohort came from the Pittsburgh Lung Screening Study (PLuSS) (n=3619). The validation cohort came from the Specialized Centers of Clinically Oriented Research (SCCOR) in COPD (n=662). A deep learning model called BeyondCT, combining a three-dimensional (3D) convolutional neural network (CNN) and Vision Transformer (ViT) architecture, was used to predict forced vital capacity (FVC) and forced expiratory volume in one second (FEV1) from non-contrasted inspiratory chest CT scans. A 3D CNN model without ViT was used for comparison. Subject demographics (age, gender, smoking status) were also incorporated into the model. Performance was compared to actual PFTs using mean absolute error (MAE, L), percentage error, and R square.
Results: The 3D-CNN model achieved MAEs of 0.395 L and 0.383 L, percentage errors of 13.84% and 18.85%, and R square of 0.665 and 0.679 for FVC and FEV1, respectively. The BeyondCT model without demographics had MAEs of 0.362 L and 0.371 L, percentage errors of 10.89% and 14.96%, and R square of 0.719 and 0.727, respectively. Including demographics improved performance (p<0.05), with MAEs of 0.356 L and 0.353 L, percentage errors of 10.79% and 14.82%, and R square of 0.77 and 0.739 for FVC and FEV1 in the test set.
Conclusion: The BeyondCT model showed robust performance in predicting lung function from non-contrast inspiratory chest CT scans.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
Authors:
Yassine El Kheir,
Hamdy Mubarak,
Ahmed Ali,
Shammur Absar Chowdhury
Abstract:
This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic sound sets. The proposed framework utilized a quantized sequence of input with(out) continuous pretrained self-supervised representation. We show the efficacy of t…
▽ More
This paper presents a novel Dialectal Sound and Vowelization Recovery framework, designed to recognize borrowed and dialectal sounds within phonologically diverse and dialect-rich languages, that extends beyond its standard orthographic sound sets. The proposed framework utilized a quantized sequence of input with(out) continuous pretrained self-supervised representation. We show the efficacy of the pipeline using limited data for Arabic, a dialect-rich language containing more than 22 major dialects. Phonetically correct transcribed speech resources for dialectal Arabic are scarce. Therefore, we introduce ArabVoice15, a first-of-its-kind, curated test set featuring 5 hours of dialectal speech across 15 Arab countries, with phonetically accurate transcriptions, including borrowed and dialect-specific sounds. We described in detail the annotation guideline along with the analysis of the dialectal confusion pairs. Our extensive evaluation includes both subjective -- human perception tests and objective measures. Our empirical results, reported with three test sets, show that with only one and half hours of training data, our model improve character error rate by ~ 7\% in ArabVoice15 compared to the baseline.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Two-Phase Segmentation Approach for Accurate Left Ventricle Segmentation in Cardiac MRI using Machine Learning
Authors:
Maria Tamoor,
Abbas Raza Ali,
Philemon Philip,
Ruqqayia Adil,
Rabia Shahid,
Asma Naseer
Abstract:
Accurate segmentation of the Left Ventricle (LV) holds substantial importance due to its implications in disease detection, regional analysis, and the development of complex models for cardiac surgical planning. CMR is a golden standard for diagnosis of serveral cardiac diseases. LV in CMR comprises of three distinct sections: Basal, Mid-Ventricle, and Apical. This research focuses on the precise…
▽ More
Accurate segmentation of the Left Ventricle (LV) holds substantial importance due to its implications in disease detection, regional analysis, and the development of complex models for cardiac surgical planning. CMR is a golden standard for diagnosis of serveral cardiac diseases. LV in CMR comprises of three distinct sections: Basal, Mid-Ventricle, and Apical. This research focuses on the precise segmentation of the LV from Cardiac MRI (CMR) scans, joining with the capabilities of Machine Learning (ML). The central challenge in this research revolves around the absence of a set of parameters applicable to all three types of LV slices. Parameters optimized for basal slices often fall short when applied to mid-ventricular and apical slices, and vice versa. To handle this issue, a new method is proposed to enhance LV segmentation. The proposed method involves using distinct sets of parameters for each type of slice, resulting in a two-phase segmentation approach. The initial phase categorizes images into three groups based on the type of LV slice, while the second phase aims to segment CMR images using parameters derived from the preceding phase. A publicly available dataset (Automated Cardiac Diagnosis Challenge (ACDC)) is used. 10-Fold Cross Validation is used and it achieved a mean score of 0.9228. Comprehensive testing indicates that the best parameter set for a particular type of slice does not perform adequately for the other slice types. All results show that the proposed approach fills a critical void in parameter standardization through a two-phase segmentation model for the LV, aiming to not only improve the accuracy of cardiac image analysis but also contribute advancements to the field of LV segmentation.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Novel Hybrid Integrated Pix2Pix and WGAN Model with Gradient Penalty for Binary Images Denoising
Authors:
Luca Tirel,
Ali Mohamed Ali,
Hashim A. Hashim
Abstract:
This paper introduces a novel approach to image denoising that leverages the advantages of Generative Adversarial Networks (GANs). Specifically, we propose a model that combines elements of the Pix2Pix model and the Wasserstein GAN (WGAN) with Gradient Penalty (WGAN-GP). This hybrid framework seeks to capitalize on the denoising capabilities of conditional GANs, as demonstrated in the Pix2Pix mode…
▽ More
This paper introduces a novel approach to image denoising that leverages the advantages of Generative Adversarial Networks (GANs). Specifically, we propose a model that combines elements of the Pix2Pix model and the Wasserstein GAN (WGAN) with Gradient Penalty (WGAN-GP). This hybrid framework seeks to capitalize on the denoising capabilities of conditional GANs, as demonstrated in the Pix2Pix model, while mitigating the need for an exhaustive search for optimal hyperparameters that could potentially ruin the stability of the learning process. In the proposed method, the GAN's generator is employed to produce denoised images, harnessing the power of a conditional GAN for noise reduction. Simultaneously, the implementation of the Lipschitz continuity constraint during updates, as featured in WGAN-GP, aids in reducing susceptibility to mode collapse. This innovative design allows the proposed model to benefit from the strong points of both Pix2Pix and WGAN-GP, generating superior denoising results while ensuring training stability. Drawing on previous work on image-to-image translation and GAN stabilization techniques, the proposed research highlights the potential of GANs as a general-purpose solution for denoising. The paper details the development and testing of this model, showcasing its effectiveness through numerical experiments. The dataset was created by adding synthetic noise to clean images. Numerical results based on real-world dataset validation underscore the efficacy of this approach in image-denoising tasks, exhibiting significant enhancements over traditional techniques. Notably, the proposed model demonstrates strong generalization capabilities, performing effectively even when trained with synthetic noise.
△ Less
Submitted 31 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Effects of Small-Scale User Mobility on Highly Directional XR Communications
Authors:
Asad Ali,
Olga Galinina,
Jiri Hosek,
Sergey Andreev
Abstract:
The development of next-generation communication systems promises to enable extended reality (XR) applications, such as XR gaming with ultra-realistic content and human-grade sensory feedback. These demanding applications impose stringent performance requirements on the underlying wireless communication infrastructure. To meet the expected Quality of Experience (QoE) for XR applications, high-capa…
▽ More
The development of next-generation communication systems promises to enable extended reality (XR) applications, such as XR gaming with ultra-realistic content and human-grade sensory feedback. These demanding applications impose stringent performance requirements on the underlying wireless communication infrastructure. To meet the expected Quality of Experience (QoE) for XR applications, high-capacity connections are necessary, which can be achieved by using millimeter-wave (mmWave) frequency bands and employing highly directional beams. However, these narrow beams are susceptible to even minor misalignments caused by small-scale user mobility, such as changes in the orientation of the XR head-mounted device (HMD) or minor shifts in user body position. This article explores the impact of small-scale user mobility on mmWave connectivity for XR and reviews approaches to resolve the challenges arising due to small-scale mobility. To deepen our understanding of small-scale mobility during XR usage, we prepared a dataset of user mobility during XR gaming. We use this dataset to study the effects of user mobility on highly directional communication, identifying specific aspects of user mobility that significantly affect the performance of narrow-beam wireless communication systems. Our results confirm the substantial influence of small-scale mobility on beam misalignment, highlighting the need for enhanced mechanisms to effectively manage the consequences of small-scale mobility.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Koopman-LQR Controller for Quadrotor UAVs from Data
Authors:
Zeyad M. Manaa,
Ayman M. Abdallah,
Mohammad A. Abido,
Syed S. Azhar Ali
Abstract:
Quadrotor systems are common and beneficial for many fields, but their intricate behavior often makes it challenging to design effective and optimal control strategies. Some traditional approaches to nonlinear control often rely on local linearizations or complex nonlinear models, which can be inaccurate or computationally expensive. We present a data-driven approach to identify the dynamics of a…
▽ More
Quadrotor systems are common and beneficial for many fields, but their intricate behavior often makes it challenging to design effective and optimal control strategies. Some traditional approaches to nonlinear control often rely on local linearizations or complex nonlinear models, which can be inaccurate or computationally expensive. We present a data-driven approach to identify the dynamics of a given quadrotor system using Koopman operator theory. Koopman theory offers a framework for representing nonlinear dynamics as linear operators acting on observable functions of the state space. This allows to approximate nonlinear systems with globally linear models in a higher dimensional space, which can be analyzed and controlled using standard linear optimal control techniques. We leverage the method of extended dynamic mode decomposition (EDMD) to identify Koopman operator from data with total least squares. We demonstrate that the identified model can be stabilized and controllable by designing a controller using linear quadratic regulator (LQR).
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Speech Representation Analysis based on Inter- and Intra-Model Similarities
Authors:
Yassine El Kheir,
Ahmed Ali,
Shammur Absar Chowdhury
Abstract:
Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation an…
▽ More
Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation and task-specific constraint. We examine different SSL models varying their training paradigm -- Contrastive (Wav2Vec2.0) and Predictive models (HuBERT); and model sizes (base and large). We explore these models on different levels of localization/distributivity of information including (i) individual neurons; (ii) layer representation; (iii) attention weights and (iv) compare the representations with their finetuned counterparts.Our results highlight that these models converge to similar representation subspaces but not to similar neuron-localized concepts\footnote{A concept represents a coherent fragment of knowledge, such as ``a class containing certain objects as elements, where the objects have certain properties. We made the code publicly available for facilitating further research, we publicly released our code.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Near or far: On determining the appropriate channel estimation strategy in cross-field communication
Authors:
Simon Tarboush,
Anum Ali,
Tareq Y. Al-Naffouri
Abstract:
The use of ultra-massive multiple-input multiple-output and high-frequency large bandwidth systems is likely in the next-generation wireless communication systems. In such systems, the user moves between near- and far-field regions, and consequently, the channel estimation will need to be carried out in the cross-field scenario. Channel estimation strategies have been proposed for both near- and f…
▽ More
The use of ultra-massive multiple-input multiple-output and high-frequency large bandwidth systems is likely in the next-generation wireless communication systems. In such systems, the user moves between near- and far-field regions, and consequently, the channel estimation will need to be carried out in the cross-field scenario. Channel estimation strategies have been proposed for both near- and far-fields, but in the cross-field problem, the first step is to determine whether the near- or far-field is applicable so that an appropriate channel estimation strategy can be employed. In this work, we propose using a hidden Markov model over an ensemble of region estimates to enhance the accuracy of selecting the actual region. The region indicators are calculated using the pair-wise power differences between received signals across the subarrays within an array-of-subarrays architecture. Numerical results show that the proposed method achieves a high success rate in determining the appropriate channel estimation strategy.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Concurrent Multiphysics and Multiscale Topology Optimization for Lightweight Laser-Driven Porous Actuator Systems
Authors:
Musaddiq Al Ali,
Masatoshi Shimoda
Abstract:
In this research, multi-physics topology optimization is employed to achieve the detailed design of a lightweight porous linear actuation mechanism that harnesses energy through laser activation. A multiscale topology optimization methodology is introduced for micro- and macroscale design, considering energy dissipation via heat convection and radiation. This investigation meticulously considers t…
▽ More
In this research, multi-physics topology optimization is employed to achieve the detailed design of a lightweight porous linear actuation mechanism that harnesses energy through laser activation. A multiscale topology optimization methodology is introduced for micro- and macroscale design, considering energy dissipation via heat convection and radiation. This investigation meticulously considers the impact of heat dissipation mechanisms, including thermal conduction, convection, and radiation. Through various numerical cases, we systematically explore the influence of micro-scale considerations on porous design and understand the effects on the topology optimization process by incorporating various microstructural systems. The results demonstrate that porous actuator designs exhibit superior performance compared to solid actuator designs. This study contributes to advancing the understanding of multiscale effects in topology optimization, paving the way for more efficient and lightweight designs in the field of laser-activated porous actuators.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
M2ANET: Mobile Malaria Attention Network for efficient classification of plasmodium parasites in blood cells
Authors:
Salam Ahmed Ali,
Peshraw Salam Abdulqadir,
Shan Ali Abdullah,
Haruna Yunusa
Abstract:
Malaria is a life-threatening infectious disease caused by Plasmodium parasites, which poses a significant public health challenge worldwide, particularly in tropical and subtropical regions. Timely and accurate detection of malaria parasites in blood cells is crucial for effective treatment and control of the disease. In recent years, deep learning techniques have demonstrated remarkable success…
▽ More
Malaria is a life-threatening infectious disease caused by Plasmodium parasites, which poses a significant public health challenge worldwide, particularly in tropical and subtropical regions. Timely and accurate detection of malaria parasites in blood cells is crucial for effective treatment and control of the disease. In recent years, deep learning techniques have demonstrated remarkable success in medical image analysis tasks, offering promising avenues for improving diagnostic accuracy, with limited studies on hybrid mobile models due to the complexity of combining two distinct models and the significant memory demand of self-attention mechanism especially for edge devices. In this study, we explore the potential of designing a hybrid mobile model for efficient classification of plasmodium parasites in blood cell images. Therefore, we present M2ANET (Mobile Malaria Attention Network). The model integrates MBConv3 (MobileNetV3 blocks) for efficient capturing of local feature extractions within blood cell images and a modified global-MHSA (multi-head self-attention) mechanism in the latter stages of the network for capturing global context. Through extensive experimentation on benchmark, we demonstrate that M2ANET outperforms some state-of-the-art lightweight and mobile networks in terms of both accuracy and efficiency. Moreover, we discuss the potential implications of M2ANET in advancing malaria diagnosis and treatment, highlighting its suitability for deployment in resource-constrained healthcare settings. The development of M2ANET represents a significant advancement in the pursuit of efficient and accurate malaria detection, with broader implications for medical image analysis and global healthcare initiatives.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Deep Representation Learning-Based Dynamic Trajectory Phenotyping for Acute Respiratory Failure in Medical Intensive Care Units
Authors:
Alan Wu,
Tilendra Choudhary,
Pulakesh Upadhyaya,
Ayman Ali,
Philip Yang,
Rishikesan Kamaleswaran
Abstract:
Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medica…
▽ More
Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medical intensive care units who required at least 24 hours of invasive mechanical ventilation at a quarternary care academic hospital in southeast USA for the years 2016-2021. A total of N=3349 patient encounters were included in this study. Clustering Representation Learning on Incomplete Time Series Data (CRLI) algorithm was applied to a parsimonious set of EMR variables in this data set. To validate the optimal number of clusters, the K-means algorithm was used in conjunction with dynamic time warping. Our model yielded four distinct patient phenotypes that were characterized as liver dysfunction/heterogeneous, hypercapnia, hypoxemia, and multiple organ dysfunction syndrome by a critical care expert. A Kaplan-Meier analysis to compare the 28-day mortality trends exhibited significant differences (p < 0.005) between the four phenotypes. The study demonstrates the utility of our deep representation learning-based approach in unraveling phenotypes that reflect the heterogeneity in sepsis-induced ARF in terms of different mortality outcomes and severity. These phenotypes might reveal important clinical insights into an effective prognosis and tailored treatment strategies.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
A Linear MPC with Control Barrier Functions for Differential Drive Robots
Authors:
Ali Mohamed Ali,
Chao Shen,
Hashim A. Hashim
Abstract:
The need for fully autonomous mobile robots has surged over the past decade, with the imperative of ensuring safe navigation in a dynamic setting emerging as a primary challenge impeding advancements in this domain. In this paper, a Safety Critical Model Predictive Control based on Dynamic Feedback Linearization tailored to the application of differential drive robots with two wheels is proposed t…
▽ More
The need for fully autonomous mobile robots has surged over the past decade, with the imperative of ensuring safe navigation in a dynamic setting emerging as a primary challenge impeding advancements in this domain. In this paper, a Safety Critical Model Predictive Control based on Dynamic Feedback Linearization tailored to the application of differential drive robots with two wheels is proposed to generate control signals that result in obstacle-free paths. A barrier function introduces a safety constraint to the optimization problem of the Model Predictive Control (MPC) to prevent collisions. Due to the intrinsic nonlinearities of the differential drive robots, computational complexity while implementing a Nonlinear Model Predictive Control (NMPC) arises. To facilitate the real-time implementation of the optimization problem and to accommodate the underactuated nature of the robot, a combination of Linear Model Predictive Control (LMPC) and Dynamic Feedback Linearization (DFL) is proposed. The MPC problem is formulated on a linear equivalent model of the differential drive robot rendered by the DFL controller. The analysis of the closed-loop stability and recursive feasibility of the proposed control design is discussed. Numerical experiments illustrate the robustness and effectiveness of the proposed control synthesis in avoiding obstacles with respect to the benchmark of using Euclidean distance constraints. Keywords: Model Predictive Control, MPC, Autonomous Ground Vehicles, Nonlinearity, Dynamic Feedback Linearization, Optimal Control, Differential Robots.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
MPC Based Linear Equivalence with Control Barrier Functions for VTOL-UAVs
Authors:
Ali Mohamed Ali,
Hashim A. Hashim,
Chao Shen
Abstract:
In this work, we propose a cascaded scheme of linear Model prediction Control (MPC) based on Control Barrier Functions (CBF) with Dynamic Feedback Linearization (DFL) for Vertical Take-off and Landing (VTOL) Unmanned Aerial Vehicles (UAVs). CBF is a tool that allows enforcement of forward invariance of a set using Lyapunov-like functions to ensure safety. The First control synthesis that employed…
▽ More
In this work, we propose a cascaded scheme of linear Model prediction Control (MPC) based on Control Barrier Functions (CBF) with Dynamic Feedback Linearization (DFL) for Vertical Take-off and Landing (VTOL) Unmanned Aerial Vehicles (UAVs). CBF is a tool that allows enforcement of forward invariance of a set using Lyapunov-like functions to ensure safety. The First control synthesis that employed CBF was based on Quadratic Program (QP) that modifies the existing controller to satisfy the safety requirements. However, the CBF-QP-based controllers leading to longer detours and undesirable transient performance. Recent contributions utilize the framework of MPC benefiting from the prediction capabilities and constraints imposed on the state and control inputs. Due to the intrinsic nonlinearities of the dynamics of robotics systems, all the existing MPC-CBF solutions rely on nonlinear MPC formulations or operate on less accurate linear models. In contrast, our novel solution unlocks the benefits of linear MPC-CBF while considering the full underactuated dynamics without any linear approximations. The cascaded scheme converts the problem of safe VTOL-UAV navigation to a Quadratic Constraint Quadratic Programming (QCQP) problem solved efficiently by off-the-shelf solvers. The closed-loop stability and recursive feasibility is proved along with numerical simulations showing the effective and robust solutions. Keywords: Unmanned Aerial Vehicles, Vertical Take-off and Landing, Model Predictive Control, MPC, Nonlinearity, Dynamic Feedback Linearization, Optimal Control.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
A Novel Approach to WaveNet Architecture for RF Signal Separation with Learnable Dilation and Data Augmentation
Authors:
Yu Tian,
Ahmed Alhammadi,
Abdullah Quran,
Abubakar Sani Ali
Abstract:
In this paper, we address the intricate issue of RF signal separation by presenting a novel adaptation of the WaveNet architecture that introduces learnable dilation parameters, significantly enhancing signal separation in dense RF spectrums. Our focused architectural refinements and innovative data augmentation strategies have markedly improved the model's ability to discern complex signal source…
▽ More
In this paper, we address the intricate issue of RF signal separation by presenting a novel adaptation of the WaveNet architecture that introduces learnable dilation parameters, significantly enhancing signal separation in dense RF spectrums. Our focused architectural refinements and innovative data augmentation strategies have markedly improved the model's ability to discern complex signal sources. This paper details our comprehensive methodology, including the refined model architecture, data preparation techniques, and the strategic training strategy that have been pivotal to our success. The efficacy of our approach is evidenced by the substantial improvements recorded: a 58.82\% increase in SINR at a BER of $10^{-3}$ for OFDM-QPSK with EMI Signal 1, surpassing traditional benchmarks. Notably, our model achieved first place in the challenge \cite{datadrivenrf2024}, demonstrating its superior performance and establishing a new standard for machine learning applications within the RF communications domain.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Fault Diagnosis on Induction Motor using Machine Learning and Signal Processing
Authors:
Muhammad Samiullah,
Hasan Ali,
Shehryar Zahoor,
Anas Ali
Abstract:
The detection and identification of induction motor faults using machine learning and signal processing is a valuable approach to avoiding plant disturbances and shutdowns in the context of Industry 4.0. In this work, we present a study on the detection and identification of induction motor faults using machine learning and signal processing with MATLAB Simulink. We developed a model of a three-ph…
▽ More
The detection and identification of induction motor faults using machine learning and signal processing is a valuable approach to avoiding plant disturbances and shutdowns in the context of Industry 4.0. In this work, we present a study on the detection and identification of induction motor faults using machine learning and signal processing with MATLAB Simulink. We developed a model of a three-phase induction motor in MATLAB Simulink to generate healthy and faulty motor data. The data collected included stator currents, rotor currents, input power, slip, rotor speed, and efficiency. We generated four faults in the induction motor: open circuit fault, short circuit fault, overload, and broken rotor bars. We collected a total of 150,000 data points with a 60-40% ratio of healthy to faulty motor data. We applied Fast Fourier Transform (FFT) to detect and identify healthy and unhealthy conditions and added a distinctive feature in our data. The generated dataset was trained different machine learning models. On comparing the accuracy of the models on the test set, we concluded that the Decision Tree algorithm performed the best with an accuracy of about 92%. Our study contributes to the literature by providing a valuable approach to fault detection and classification with machine learning models for industrial applications.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Rapid detection of rare events from in situ X-ray diffraction data using machine learning
Authors:
Weijian Zheng,
Jun-Sang Park,
Peter Kenesei,
Ahsan Ali,
Zhengchun Liu,
Ian T. Foster,
Nicholas Schwarz,
Rajkumar Kettimuthu,
Antonino Miceli,
Hemant Sharma
Abstract:
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs o…
▽ More
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs of traditional data acquisition and reduction approaches pose a barrier to quickly extracting actionable insights and improving the temporal resolution of these snapshots. Here we present a fully automated technique capable of rapidly detecting the onset of plasticity in high-energy X-ray microscopy data. Our technique is computationally faster by at least 50 times than the traditional approaches and works for data sets that are up to 9 times sparser than a full data set. This new technique leverages self-supervised image representation learning and clustering to transform massive data into compact, semantic-rich representations of visually salient characteristics (e.g., peak shapes). These characteristics can be a rapid indicator of anomalous events such as changes in diffraction peak shapes. We anticipate that this technique will provide just-in-time actionable information to drive smarter experiments that effectively deploy multi-modal X-ray diffraction methods that span many decades of length scales.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Leveraging machine learning to enhance climate models: a review
Authors:
Ahmed Elsayed,
Shrouk Wally,
Islam Alkabbany,
Asem Ali,
Aly Farag
Abstract:
Recent achievements in machine learning (Ml) have had a significant impact on various fields, including climate science. Climate modeling is very important and plays a crucial role in shaping the decisions of governments and individuals in mitigating the impact of climate change. Climate change poses a serious threat to humanity, however, current climate models are limited by computational costs,…
▽ More
Recent achievements in machine learning (Ml) have had a significant impact on various fields, including climate science. Climate modeling is very important and plays a crucial role in shaping the decisions of governments and individuals in mitigating the impact of climate change. Climate change poses a serious threat to humanity, however, current climate models are limited by computational costs, uncertainties, and biases, affecting their prediction accuracy. The vast amount of climate data generated by satellites, radars, and earth system models (ESMS) poses a significant challenge. ML techniques can be effectively employed to analyze this data and extract valuable insights that aid in our understanding of the earth climate. This review paper focuses on how ml has been utilized in the last 5 years to boost the current state-of-the-art climate models. We invite the ml community to join in the global effort to accurately model the earth climate by collaborating with other fields to leverage ml as a powerful tool in this endeavor.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Deep Learning Models for Classification of COVID-19 Cases by Medical Images
Authors:
Amir Ali
Abstract:
In recent times, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention, owing to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study harnesses the po…
▽ More
In recent times, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention, owing to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study harnesses the power of deep learning models for the precise classification of infected patients. Our research involves a comparative analysis of deep transfer learning-based classification models, including DenseNet201, GoogleNet, and AlexNet, against carefully chosen supervised learning models. Additionally, our work encompasses Covid-19 classification, which involves the identification and differentiation of medical images, such as X-rays and electrocardiograms, that exhibit telltale signs of Covid-19 infection. This comprehensive approach ensures that our models can handle a wide range of medical image types and effectively identify characteristic patterns indicative of Covid-19. By conducting meticulous research and employing advanced deep learning techniques, we have made significant strides in enhancing the accuracy and speed of Covid-19 diagnosis. Our results demonstrate the effectiveness of these models and their potential to make substantial contributions to the global effort to combat COVID-19.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Automatic Pronunciation Assessment -- A Review
Authors:
Yassine El Kheir,
Ahmed Ali,
Shammur Absar Chowdhury
Abstract:
Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challeng…
▽ More
Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Speech collage: code-switched audio generation by collaging monolingual corpora
Authors:
Amir Hussein,
Dorsa Zeinali,
Ondřej Klejch,
Matthew Wiesner,
Brian Yan,
Shammur Chowdhury,
Ahmed Ali,
Shinji Watanabe,
Sanjeev Khudanpur
Abstract:
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We…
▽ More
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero-shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model's code-switching inclination and reduces its monolingual bias.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Guided Frequency Loss for Image Restoration
Authors:
Bilel Benjdira,
Anas M. Ali,
Anis Koubaa
Abstract:
Image Restoration has seen remarkable progress in recent years. Many generative models have been adapted to tackle the known restoration cases of images. However, the interest in benefiting from the frequency domain is not well explored despite its major factor in these particular cases of image synthesis. In this study, we propose the Guided Frequency Loss (GFL), which helps the model to learn in…
▽ More
Image Restoration has seen remarkable progress in recent years. Many generative models have been adapted to tackle the known restoration cases of images. However, the interest in benefiting from the frequency domain is not well explored despite its major factor in these particular cases of image synthesis. In this study, we propose the Guided Frequency Loss (GFL), which helps the model to learn in a balanced way the image's frequency content alongside the spatial content. It aggregates three major components that work in parallel to enhance learning efficiency; a Charbonnier component, a Laplacian Pyramid component, and a Gradual Frequency component. We tested GFL on the Super Resolution and the Denoising tasks. We used three different datasets and three different architectures for each of them. We found that the GFL loss improved the PSNR metric in most implemented experiments. Also, it improved the training of the Super Resolution models in both SwinIR and SRGAN. In addition, the utility of the GFL loss increased better on constrained data due to the less stochasticity in the high frequencies' components among samples.
△ Less
Submitted 22 October, 2023; v1 submitted 27 September, 2023;
originally announced September 2023.
-
The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Authors:
Yassine El Kheir,
Shammur Absar Chowdhury,
Ahmed Ali
Abstract:
Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verba…
▽ More
Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verbal cues, alongside the conventional speech and phoneme representations. Additionally, we introduce ''Goodness of phonemic-duration'' metric to effectively model duration distribution within the framework. Our results validate the effectiveness of the proposed IntraVerbalPA framework and its individual components, yielding performance that either matches or outperforms existing research works.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
L1-aware Multilingual Mispronunciation Detection Framework
Authors:
Yassine El Kheir,
Shammur Absar Chowdhury,
Ahmed Ali
Abstract:
The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechani…
▽ More
The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechanism is deployed to align the input audio with the reference phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an auxiliary model, pretrained in a multi-task setup identifying L1 and L2 language, and are infused with the primary network. Finally, the L1-MultiMDD is then optimized for a unified multilingual phoneme recognition task using connectionist temporal classification (CTC) loss for the target languages: English, Arabic, and Mandarin. Our experiments demonstrate the effectiveness of the proposed L1-MultiMDD framework on both seen -- L2-ARTIC, LATIC, and AraVoiceL2v2; and unseen -- EpaDB and Speechocean762 datasets. The consistent gains in PER, and false rejection rate (FRR) across all target languages confirm our approach's robustness, efficacy, and generalizability.
△ Less
Submitted 21 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
AMDNet23: A combined deep Contour-based Convolutional Neural Network and Long Short Term Memory system to diagnose Age-related Macular Degeneration
Authors:
Md. Aiyub Ali,
Md. Shakhawat Hossain,
Md. Kawar Hossain,
Subhadra Soumi Sikder,
Sharun Akter Khushbu,
Mirajul Islam
Abstract:
In light of the expanding population, an automated framework of disease detection can assist doctors in the diagnosis of ocular diseases, yields accurate, stable, rapid outcomes, and improves the success rate of early detection. The work initially intended the enhancing the quality of fundus images by employing an adaptive contrast enhancement algorithm (CLAHE) and Gamma correction. In the preproc…
▽ More
In light of the expanding population, an automated framework of disease detection can assist doctors in the diagnosis of ocular diseases, yields accurate, stable, rapid outcomes, and improves the success rate of early detection. The work initially intended the enhancing the quality of fundus images by employing an adaptive contrast enhancement algorithm (CLAHE) and Gamma correction. In the preprocessing techniques, CLAHE elevates the local contrast of the fundus image and gamma correction increases the intensity of relevant features. This study operates on a AMDNet23 system of deep learning that combined the neural networks made up of convolutions (CNN) and short-term and long-term memory (LSTM) to automatically detect aged macular degeneration (AMD) disease from fundus ophthalmology. In this mechanism, CNN is utilized for extracting features and LSTM is utilized to detect the extracted features. The dataset of this research is collected from multiple sources and afterward applied quality assessment techniques, 2000 experimental fundus images encompass four distinct classes equitably. The proposed hybrid deep AMDNet23 model demonstrates to detection of AMD ocular disease and the experimental result achieved an accuracy 96.50%, specificity 99.32%, sensitivity 96.5%, and F1-score 96.49.0%. The system achieves state-of-the-art findings on fundus imagery datasets to diagnose AMD ocular disease and findings effectively potential of our method.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Evaluation of a Low-Cost Single-Lead ECG Module for Vascular Ageing Prediction and Studying Smoking-induced Changes in ECG
Authors:
S. Anas Ali,
M. Saqib Niaz,
Mubashir Rehman,
Ahsan Mehmood,
M. Mahboob Ur Rahman,
Kashif Riaz,
Qammer H. Abbasi
Abstract:
Vascular age is traditionally measured using invasive methods or through 12-lead electrocardiogram (ECG). This paper utilizes a low-cost single-lead (lead-I) ECG module to predict the vascular age of an apparently healthy young person. In addition, we also study the impact of smoking on ECG traces of the light-but-habitual smokers. We begin by collecting (lead-I) ECG data from 42 apparently health…
▽ More
Vascular age is traditionally measured using invasive methods or through 12-lead electrocardiogram (ECG). This paper utilizes a low-cost single-lead (lead-I) ECG module to predict the vascular age of an apparently healthy young person. In addition, we also study the impact of smoking on ECG traces of the light-but-habitual smokers. We begin by collecting (lead-I) ECG data from 42 apparently healthy subjects (smokers and non-smokers) aged 18 to 30 years, using our custom-built low-cost single-lead ECG module, and anthropometric data, e.g., body mass index, smoking status, blood pressure, etc. Under our proposed method, we first pre-process our dataset by denoising the ECG traces, followed by baseline drift removal, followed by z-score normalization. Next, we create another dataset by dividing the ECG traces into overlapping segments of five-second duration. We then feed both segmented and unsegmented datasets to a number of machine learning models, a 1D convolutional neural network, and ResNet18 model, for vascular ageing prediction. We also do transfer learning whereby we pre-train our models on a public PPG dataset, and later, fine-tune and evaluate them on our unsegmented ECG dataset. The random forest model outperforms all other models and previous works by achieving a mean squared error (MSE) of 0.07 and coefficient of determination R2 of 0.99, MSE of 3.56 and R2 of 0.26, MSE of 0.99 and R2 of 0.87, for segmented ECG dataset, for unsegmented ECG dataset, and for transfer learning scenario, respectively. Finally, we utilize the explainable AI framework to identify those ECG features that get affected due to smoking. This work is aligned with the sustainable development goals 3 and 10 of the United Nations which aim to provide low-cost but quality healthcare solutions to the unprivileged. This work also finds its applications in the broad domain of forensic science.
△ Less
Submitted 25 November, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
MyVoice: Arabic Speech Resource Collaboration Platform
Authors:
Yousseif Elshahawy,
Yassine El Kheir,
Shammur Absar Chowdhury,
Ahmed Ali
Abstract:
We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and…
▽ More
We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.
△ Less
Submitted 23 July, 2023;
originally announced August 2023.
-
Measuring Student Behavioral Engagement using Histogram of Actions
Authors:
Ahmed Abdelkawy,
Aly Farag,
Islam Alkabbany,
Asem Ali,
Chris Foreman,
Thomas Tretter,
Nicholas Hindy
Abstract:
In this paper, we propose a novel technique for measuring behavioral engagement through students' actions recognition. The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. To learn the dynamics of student upper body, a 3D-CNN model is used. T…
▽ More
In this paper, we propose a novel technique for measuring behavioral engagement through students' actions recognition. The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. To learn the dynamics of student upper body, a 3D-CNN model is used. The trained 3D-CNN model is used to recognize actions within every 2minute video segment then these actions are used to build a histogram of actions which encodes the student actions and their frequencies. This histogram is utilized as an input to SVM classifier to classify whether the student is engaged or disengaged. To evaluate the proposed framework, we build a dataset consisting of 1414 2-minute video segments annotated with 13 actions and 112 video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top 1 accuracy 83.63% and the proposed framework can capture the average engagement of the class.
△ Less
Submitted 15 May, 2025; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Defeating Proactive Jammers Using Deep Reinforcement Learning for Resource-Constrained IoT Networks
Authors:
Abubakar Sani Ali,
Shimaa Naser,
Sami Muhaidat
Abstract:
Traditional anti-jamming techniques like spread spectrum, adaptive power/rate control, and cognitive radio, have demonstrated effectiveness in mitigating jamming attacks. However, their robustness against the growing complexity of internet-of-thing (IoT) networks and diverse jamming attacks is still limited. To address these challenges, machine learning (ML)-based techniques have emerged as promis…
▽ More
Traditional anti-jamming techniques like spread spectrum, adaptive power/rate control, and cognitive radio, have demonstrated effectiveness in mitigating jamming attacks. However, their robustness against the growing complexity of internet-of-thing (IoT) networks and diverse jamming attacks is still limited. To address these challenges, machine learning (ML)-based techniques have emerged as promising solutions. By offering adaptive and intelligent anti-jamming capabilities, ML-based approaches can effectively adapt to dynamic attack scenarios and overcome the limitations of traditional methods. In this paper, we propose a deep reinforcement learning (DRL)-based approach that utilizes state input from realistic wireless network interface cards. We train five different variants of deep Q-network (DQN) agents to mitigate the effects of jamming with the aim of identifying the most sample-efficient, lightweight, robust, and least complex agent that is tailored for power-constrained devices. The simulation results demonstrate the effectiveness of the proposed DRL-based anti-jamming approach against proactive jammers, regardless of their jamming strategy which eliminates the need for a pattern recognition or jamming strategy detection step. Our findings present a promising solution for securing IoT networks against jamming attacks and highlights substantial opportunities for continued investigation and advancement within this field.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator
Authors:
Massa Baali,
Ahmed Ali
Abstract:
This paper presents FOOCTTS, an automatic pipeline for a football commentator that generates speech with background crowd noise. The application gets the text from the user, applies text pre-processing such as vowelization, followed by the commentator's speech synthesizer. Our pipeline included Arabic automatic speech recognition for data labeling, CTC segmentation, transcription vowelization to m…
▽ More
This paper presents FOOCTTS, an automatic pipeline for a football commentator that generates speech with background crowd noise. The application gets the text from the user, applies text pre-processing such as vowelization, followed by the commentator's speech synthesizer. Our pipeline included Arabic automatic speech recognition for data labeling, CTC segmentation, transcription vowelization to match speech, and fine-tuning the TTS. Our system is capable of generating speech with its acoustic environment within limited 15 minutes of football commentator recording. Our prototype is generalizable and can be easily applied to different domains and languages.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Multi-View Multi-Task Representation Learning for Mispronunciation Detection
Authors:
Yassine El Kheir,
Shammur Absar Chowdhury,
Ahmed Ali
Abstract:
The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phoneti…
▽ More
The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phonetic representation in a low-resource setting. Using the mono- and multilingual encoders, the model learn multiple views of the input, and capture the sound properties across diverse languages and accents. These encoded representations are further enriched by learning articulatory features in a multi-task setup. Our reported results using the L2-ARCTIC data outperformed the SOTA models, with a phoneme error rate reduction of 11.13% and 8.60% and absolute F1 score increase of 5.89%, and 2.49% compared to the single-view mono- and multilingual systems, with a limited L2 dataset.
△ Less
Submitted 7 August, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.