Search | arXiv e-print repository

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Authors: Tony Alex, Sara Ahmed, Armin Mustafa, Muhammad Awais, Philip JB Jackson

Abstract: Self-supervised pre-trained audio networks have seen widespread adoption in real-world systems, particularly in multi-modal large language models. These networks are often employed in a frozen state, under the assumption that the SSL pre-training has sufficiently equipped them to handle real-world audio. However, a critical question remains: how well do these models actually perform in real-world… ▽ More Self-supervised pre-trained audio networks have seen widespread adoption in real-world systems, particularly in multi-modal large language models. These networks are often employed in a frozen state, under the assumption that the SSL pre-training has sufficiently equipped them to handle real-world audio. However, a critical question remains: how well do these models actually perform in real-world conditions, where audio is typically polyphonic and complex, involving multiple overlapping sound sources? Current audio SSL methods are often benchmarked on datasets predominantly featuring monophonic audio, such as environmental sounds, and speech. As a result, the ability of SSL models to generalize to polyphonic audio, a common characteristic in natural scenarios, remains underexplored. This limitation raises concerns about the practical robustness of SSL models in more realistic audio settings. To address this gap, we introduce Self-Supervised Learning from Audio Mixtures (SSLAM), a novel direction in audio SSL research, designed to improve, designed to improve the model's ability to learn from polyphonic data while maintaining strong performance on monophonic data. We thoroughly evaluate SSLAM on standard audio SSL benchmark datasets which are predominantly monophonic and conduct a comprehensive comparative analysis against SOTA methods using a range of high-quality, publicly available polyphonic datasets. SSLAM not only improves model performance on polyphonic audio, but also maintains or exceeds performance on standard audio SSL benchmarks. Notably, it achieves up to a 3.9\% improvement on the AudioSet-2M (AS-2M), reaching a mean average precision (mAP) of 50.2. For polyphonic datasets, SSLAM sets new SOTA in both linear evaluation and fine-tuning regimes with performance improvements of up to 9.1\% (mAP). △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: Accepted at ICLR 2025. Code and pre-trained models are available at \url{https://github.com/ta012/SSLAM}

arXiv:2506.09549 [pdf, ps, other]

A Study on Speech Assessment with Visual Cues

Authors: Shafique Ahmed, Ryandhimas E. Zezario, Nasir Saleem, Amir Hussain, Hsin-Min Wang, Yu Tsao

Abstract: Non-intrusive assessment of speech quality and intelligibility is essential when clean reference signals are unavailable. In this work, we propose a multimodal framework that integrates audio features and visual cues to predict PESQ and STOI scores. It employs a dual-branch architecture, where spectral features are extracted using STFT, and visual embeddings are obtained via a visual encoder. Thes… ▽ More Non-intrusive assessment of speech quality and intelligibility is essential when clean reference signals are unavailable. In this work, we propose a multimodal framework that integrates audio features and visual cues to predict PESQ and STOI scores. It employs a dual-branch architecture, where spectral features are extracted using STFT, and visual embeddings are obtained via a visual encoder. These features are then fused and processed by a CNN-BLSTM with attention, followed by multi-task learning to simultaneously predict PESQ and STOI. Evaluations on the LRS3-TED dataset, augmented with noise from the DEMAND corpus, show that our model outperforms the audio-only baseline. Under seen noise conditions, it improves LCC by 9.61% (0.8397->0.9205) for PESQ and 11.47% (0.7403->0.8253) for STOI. These results highlight the effectiveness of incorporating visual cues in enhancing the accuracy of non-intrusive speech assessment. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: Accepted to Interspeech 2025

arXiv:2505.23503 [pdf, ps, other]

Can Large Language Models Challenge CNNs in Medical Image Analysis?

Authors: Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

Abstract: This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environment… ▽ More This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings. △ Less

Submitted 3 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

arXiv:2504.18939 [pdf, other]

Federated Learning-based Semantic Segmentation for Lane and Object Detection in Autonomous Driving

Authors: Gharbi Khamis Alshammari, Ahmad Abubakar, Nada M. O. Sid Ahmed, Naif Khalaf Alshammari

Abstract: Autonomous Vehicles (AVs) require precise lane and object detection to ensure safe navigation. However, centralized deep learning (DL) approaches for semantic segmentation raise privacy and scalability challenges, particularly when handling sensitive data. This research presents a new federated learning (FL) framework that integrates secure deep Convolutional Neural Networks (CNNs) and Differentia… ▽ More Autonomous Vehicles (AVs) require precise lane and object detection to ensure safe navigation. However, centralized deep learning (DL) approaches for semantic segmentation raise privacy and scalability challenges, particularly when handling sensitive data. This research presents a new federated learning (FL) framework that integrates secure deep Convolutional Neural Networks (CNNs) and Differential Privacy (DP) to address these issues. The core contribution of this work involves: (1) developing a new hybrid UNet-ResNet34 architecture for centralized semantic segmentation to achieve high accuracy and tackle privacy concerns due to centralized training, and (2) implementing the privacy-preserving FL model, distributed across AVs to enhance performance through secure CNNs and DP mechanisms. In the proposed FL framework, the methodology distinguishes itself from the existing approach through the following: (a) ensuring data decentralization through FL to uphold user privacy by eliminating the need for centralized data aggregation, (b) integrating DP mechanisms to secure sensitive model updates against potential adversarial inference attacks, and (c) evaluating the frameworks performance and generalizability using RGB and semantic segmentation datasets derived from the CARLA simulator. Experimental results show significant improvements in accuracy, from 81.5% to 88.7% for the RGB dataset and from 79.3% to 86.9% for the SEG dataset over 20 to 70 Communication Rounds (CRs). Global loss was reduced by over 60%, and minor accuracy trade-offs from DP were observed. This study contributes by offering a scalable, privacy-preserving FL framework tailored for AVs, optimizing communication efficiency while balancing performance and data security. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: This paper has been accepted for publication in Scientific Reports

Report number: ID ae432a6e-ca3b-4496-8ace-59ab5b0c278a

arXiv:2504.18582 [pdf]

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning

Authors: Abdulhady Abas Abdullah, Sarkhel H. Taher Karim, Sara Azad Ahmed, Kanar R. Tariq, Tarik A. Rashid

Abstract: Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish pose unique challenges due to limited annotated data, multiple dialects and frequent code-switching. In this study, we address these issues by training the Wav2… ▽ More Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish pose unique challenges due to limited annotated data, multiple dialects and frequent code-switching. In this study, we address these issues by training the Wav2Vec 2.0 self-supervised learning model on a dedicated Kurdish corpus. By leveraging transfer learning, we adapted multilingual representations learned from other languages to capture the phonetic and acoustic characteristics of Kurdish speech. Relative to a baseline method, our approach reduced the diarization error rate by seven point two percent and improved cluster purity by thirteen percent. These findings demonstrate that enhancements to existing models can significantly improve diarization performance for under-resourced languages. Our work has practical implications for developing transcription services for Kurdish-language media and for speaker segmentation in multilingual call centers, teleconferencing and video-conferencing systems. The results establish a foundation for building effective diarization systems in other understudied languages, contributing to greater equity in speech technology. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2503.16556 [pdf]

Reliable Radiologic Skeletal Muscle Area Assessment -- A Biomarker for Cancer Cachexia Diagnosis

Authors: Sabeen Ahmed, Nathan Parker, Margaret Park, Daniel Jeong, Lauren Peres, Evan W. Davis, Jennifer B. Permuth, Erin Siegel, Matthew B. Schabath, Yasin Yilmaz, Ghulam Rasool

Abstract: Cancer cachexia is a common metabolic disorder characterized by severe muscle atrophy which is associated with poor prognosis and quality of life. Monitoring skeletal muscle area (SMA) longitudinally through computed tomography (CT) scans, an imaging modality routinely acquired in cancer care, is an effective way to identify and track this condition. However, existing tools often lack full automat… ▽ More Cancer cachexia is a common metabolic disorder characterized by severe muscle atrophy which is associated with poor prognosis and quality of life. Monitoring skeletal muscle area (SMA) longitudinally through computed tomography (CT) scans, an imaging modality routinely acquired in cancer care, is an effective way to identify and track this condition. However, existing tools often lack full automation and exhibit inconsistent accuracy, limiting their potential for integration into clinical workflows. To address these challenges, we developed SMAART-AI (Skeletal Muscle Assessment-Automated and Reliable Tool-based on AI), an end-to-end automated pipeline powered by deep learning models (nnU-Net 2D) trained on mid-third lumbar level CT images with 5-fold cross-validation, ensuring generalizability and robustness. SMAART-AI incorporates an uncertainty-based mechanism to flag high-error SMA predictions for expert review, enhancing reliability. We combined the SMA, skeletal muscle index, BMI, and clinical data to train a multi-layer perceptron (MLP) model designed to predict cachexia at the time of cancer diagnosis. Tested on the gastroesophageal cancer dataset, SMAART-AI achieved a Dice score of 97.80% +/- 0.93%, with SMA estimated across all four datasets in this study at a median absolute error of 2.48% compared to manual annotations with SliceOmatic. Uncertainty metrics-variance, entropy, and coefficient of variation-strongly correlated with SMA prediction errors (0.83, 0.76, and 0.73 respectively). The MLP model predicts cachexia with 79% precision, providing clinicians with a reliable tool for early diagnosis and intervention. By combining automation, accuracy, and uncertainty awareness, SMAART-AI bridges the gap between research and clinical application, offering a transformative approach to managing cancer cachexia. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 47 pages, 19 figures, 9 Tables

arXiv:2503.06797 [pdf]

Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia

Authors: Sabeen Ahmed, Nathan Parker, Margaret Park, Evan W. Davis, Jennifer B. Permuth, Matthew B. Schabath, Yasin Yilmaz, Ghulam Rasool

Abstract: Cancer cachexia is a multifactorial syndrome characterized by progressive muscle wasting, metabolic dysfunction, and systemic inflammation, leading to reduced quality of life and increased mortality. Despite extensive research, no single definitive biomarker exists, as cachexia-related indicators such as serum biomarkers, skeletal muscle measurements, and metabolic abnormalities often overlap with… ▽ More Cancer cachexia is a multifactorial syndrome characterized by progressive muscle wasting, metabolic dysfunction, and systemic inflammation, leading to reduced quality of life and increased mortality. Despite extensive research, no single definitive biomarker exists, as cachexia-related indicators such as serum biomarkers, skeletal muscle measurements, and metabolic abnormalities often overlap with other conditions. Existing composite indices, including the Cancer Cachexia Index (CXI), Modified CXI (mCXI), and Cachexia Score (CASCO), integrate multiple biomarkers but lack standardized thresholds, limiting their clinical utility. This study proposes a multimodal AI-based biomarker for early cancer cachexia detection, leveraging open-source large language models (LLMs) and foundation models trained on medical data. The approach integrates heterogeneous patient data, including demographics, disease status, lab reports, radiological imaging (CT scans), and clinical notes, using a machine learning framework that can handle missing data. Unlike previous AI-based models trained on curated datasets, this method utilizes routinely collected clinical data, enhancing real-world applicability. Additionally, the model incorporates confidence estimation, allowing the identification of cases requiring expert review for precise clinical interpretation. Preliminary findings demonstrate that integrating multiple data modalities improves cachexia prediction accuracy at the time of cancer diagnosis. The AI-based biomarker dynamically adapts to patient-specific factors such as age, race, ethnicity, weight, cancer type, and stage, avoiding the limitations of fixed-threshold biomarkers. This multimodal AI biomarker provides a scalable and clinically viable solution for early cancer cachexia detection, facilitating personalized interventions and potentially improving treatment outcomes and patient survival. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: 17 pages, 6 figures, 3 Tables

arXiv:2502.19258 [pdf]

Deep learning and classical computer vision techniques in medical image analysis: Case studies on brain MRI tissue segmentation, lung CT COPD registration, and skin lesion classification

Authors: Anyimadu Daniel Tweneboah, Suleiman Taofik Ahmed, Hossain Mohammad Imran

Abstract: Medical imaging spans diverse tasks and modalities which play a pivotal role in disease diagnosis, treatment planning, and monitoring. This study presents a novel exploration, being the first to systematically evaluate segmentation, registration, and classification tasks across multiple imaging modalities. Integrating both classical and deep learning (DL) approaches in addressing brain MRI tissue… ▽ More Medical imaging spans diverse tasks and modalities which play a pivotal role in disease diagnosis, treatment planning, and monitoring. This study presents a novel exploration, being the first to systematically evaluate segmentation, registration, and classification tasks across multiple imaging modalities. Integrating both classical and deep learning (DL) approaches in addressing brain MRI tissue segmentation, lung CT image registration, and skin lesion classification from dermoscopic images, we demonstrate the complementary strengths of these methodologies in diverse applications. For brain tissue segmentation, 3D DL models outperformed 2D and patch-based models, specifically nnU-Net achieving Dice of 0.9397, with 3D U-Net models on ResNet34 backbone, offering competitive results with Dice 0.8946. Multi-Atlas methods provided robust alternatives for cases where DL methods are not feasible, achieving average Dice of 0.7267. In lung CT registration, classical Elastix-based methods outperformed DL models, achieving a minimum Target Registration Error (TRE) of 6.68 mm, highlighting the effectiveness of parameter tuning. HighResNet performed best among DL models with a TRE of 7.40 mm. For skin lesion classification, ensembles of DL models like InceptionResNetV2 and ResNet50 excelled, achieving up to 90.44%, and 93.62% accuracies for binary and multiclass classification respectively. Also, adopting One-vs-All method, DL attained accuracies of 94.64% (mel vs. others), 95.35% (bcc vs. others), and 96.93% (scc vs. others), while ML models specifically Multi-Layer Perceptron (MLP) on handcrafted features offered interpretable alternatives with 85.04% accuracy using SMOTE for class imbalance correction on the multi-class task and 83.27% on the binary-class task. Links to source code are available on request. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: 27 pages, 18 figures

arXiv:2502.10822 [pdf, other]

NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids

Authors: Shafique Ahmed, Ryandhimas E. Zezario, Hui-Guan Yuan, Amir Hussain, Hsin-Min Wang, Wei-Ho Chung, Yu Tsao

Abstract: The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral fea… ▽ More The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on the Cadenza Challenge MUSIC dataset. Notably, our data augmentation strategy maintains high performance on unseen datasets (e.g., VCTK, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC approach and a two-stage baseline on the VoiceBank+DEMAND dataset, achieving a 10% improvement in both HASPI (0.90) and HASQI (0.59) scores. These results highlight the potential of NeuroAMP and Denoising NeuroAMP to deliver notable improvements in personalized hearing aid amplification. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2502.10652 [pdf, other]

Deep Learning for Wound Tissue Segmentation: A Comprehensive Evaluation using A Novel Dataset

Authors: Muhammad Ashad Kabir, Nidita Roy, Md. Ekramul Hossain, Jill Featherston, Sayed Ahmed

Abstract: Deep learning (DL) techniques have emerged as promising solutions for medical wound tissue segmentation. However, a notable limitation in this field is the lack of publicly available labelled datasets and a standardised performance evaluation of state-of-the-art DL models on such datasets. This study addresses this gap by comprehensively evaluating various DL models for wound tissue segmentation u… ▽ More Deep learning (DL) techniques have emerged as promising solutions for medical wound tissue segmentation. However, a notable limitation in this field is the lack of publicly available labelled datasets and a standardised performance evaluation of state-of-the-art DL models on such datasets. This study addresses this gap by comprehensively evaluating various DL models for wound tissue segmentation using a novel dataset. We have curated a dataset comprising 147 wound images exhibiting six tissue types: slough, granulation, maceration, necrosis, bone, and tendon. The dataset was meticulously labelled for semantic segmentation employing supervised machine learning techniques. Three distinct labelling formats were developed -- full image, patch, and superpixel. Our investigation encompassed a wide array of DL segmentation and classification methodologies, ranging from conventional approaches like UNet, to generative adversarial networks such as cGAN, and modified techniques like FPN+VGG16. Also, we explored DL-based classification methods (e.g., ResNet50) and machine learning-based classification leveraging DL features (e.g., AlexNet+RF). In total, 82 wound tissue segmentation models were derived across the three labelling formats. Our analysis yielded several notable findings, including identifying optimal DL models for each labelling format based on weighted average Dice or F1 scores. Notably, FPN+VGG16 emerged as the top-performing DL model for wound tissue segmentation, achieving a dice score of 82.25%. This study provides a valuable benchmark for evaluating wound image segmentation and classification models, offering insights to inform future research and clinical practice in wound care. The labelled dataset created in this study is available at https://github.com/akabircs/WoundTissue. △ Less

Submitted 14 February, 2025; originally announced February 2025.

Comments: 35 pages

arXiv:2502.02669 [pdf, other]

Distributed Prescribed-Time Observer for Nonlinear Systems in Block-Triangular Form

Authors: Vincent de Heij, M. Umar B. Niazi, Karl H. Johansson, Saeed Ahmed

Abstract: This paper proposes a distributed prescribed-time observer for nonlinear systems representable in a block-triangular observable canonical form. Using a weighted average of neighbor estimates exchanged over a strongly connected digraph, each observer estimates the system state despite the limited observability of local sensor measurements. The proposed design guarantees that distributed state estim… ▽ More This paper proposes a distributed prescribed-time observer for nonlinear systems representable in a block-triangular observable canonical form. Using a weighted average of neighbor estimates exchanged over a strongly connected digraph, each observer estimates the system state despite the limited observability of local sensor measurements. The proposed design guarantees that distributed state estimation errors converge to zero at a user-specified convergence time, irrespective of observers' initial conditions. To achieve this prescribed-time convergence, distributed observers implement time-varying local output injection gains that monotonically increase and approach infinity at the prescribed time. The theoretical convergence is rigorously proven and validated through numerical simulations, where some implementation issues due to increasing gains have also been clarified. △ Less

Submitted 12 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

arXiv:2501.17597 [pdf, other]

Economic Nonlinear Model Predictive Control of Prosumer District Heating Networks: The Extended Version

Authors: Max Sibeijn, Saeed Ahmed, Mohammad Khosravi, Tamás Keviczky

Abstract: In this paper, we propose an economic nonlinear model predictive control (MPC) algorithm for district heating networks (DHNs). The proposed method features prosumers, multiple producers, and storage systems, which are essential components of 4th generation DHNs. These networks are characterized by their ability to optimize their operations, aiming to reduce supply temperatures, accommodate distrib… ▽ More In this paper, we propose an economic nonlinear model predictive control (MPC) algorithm for district heating networks (DHNs). The proposed method features prosumers, multiple producers, and storage systems, which are essential components of 4th generation DHNs. These networks are characterized by their ability to optimize their operations, aiming to reduce supply temperatures, accommodate distributed heat sources, and leverage the flexibility provided by thermal inertia and storage, all crucial for achieving a fossil-fuel-free energy supply. Developing a smart energy management system to accomplish these goals requires detailed models of highly complex nonlinear systems and computational algorithms able to handle large-scale optimization problems. To address this, we introduce a graph-based optimization-oriented model that efficiently integrates distributed producers, prosumers, storage buffers, and bidirectional pipe flows, such that it can be implemented in a real-time MPC setting. Furthermore, we conduct several numerical experiments to evaluate the performance of the proposed algorithms in closed-loop. Our findings demonstrate that the MPC methods achieved up to 9% cost improvement over traditional rule-based controllers while better maintaining system constraints. △ Less

Submitted 29 January, 2025; originally announced January 2025.

arXiv:2501.16485 [pdf, other]

Enhanced Position Estimation in Tactile Internet-Enabled Remote Robotic Surgery Using MOESP-Based Kalman Filter

Authors: Muhammad Hanif Lashari, Wafa Batayneh, Ashfaq Khokhar, Shakil Ahmed

Abstract: Accurately estimating the position of a patient's side robotic arm in real time during remote surgery is a significant challenge, especially within Tactile Internet (TI) environments. This paper presents a new and efficient method for position estimation using a Kalman Filter (KF) combined with the Multivariable Output-Error State Space (MOESP) method for system identification. Unlike traditional… ▽ More Accurately estimating the position of a patient's side robotic arm in real time during remote surgery is a significant challenge, especially within Tactile Internet (TI) environments. This paper presents a new and efficient method for position estimation using a Kalman Filter (KF) combined with the Multivariable Output-Error State Space (MOESP) method for system identification. Unlike traditional approaches that require prior knowledge of the system's dynamics, this study uses the JIGSAW dataset, a comprehensive collection of robotic surgical data, along with input from the Master Tool Manipulator (MTM) to derive the state-space model directly. The MOESP method allows accurate modeling of the Patient Side Manipulator (PSM) dynamics without prior system models, improving the KF's performance under simulated network conditions, including delays, jitter, and packet loss. These conditions mimic real-world challenges in Tactile Internet applications. The findings demonstrate the KF's improved resilience and accuracy in state estimation, achieving over 95 percent accuracy despite network-induced uncertainties. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: arXiv admin note: substantial text overlap with arXiv:2406.04503

arXiv:2501.14664

Predictive Position Estimation for Remote Surgery under Packet Loss Using the Informer Framework

Authors: Muhammad Hanif Lashari, Shakil Ahmed, Wafa Batayneh, Ashfaq Khokhar

Abstract: Accurate and real-time position estimation of the robotic arm on the patient's side is crucial for the success of remote robotic surgery in Tactile Internet environments. This paper proposes a predictive approach using the computationally efficient Transformer-based Informer model for position estimation, combined with a Four-State Hidden Markov Model (4-State HMM) to simulate realistic packet los… ▽ More Accurate and real-time position estimation of the robotic arm on the patient's side is crucial for the success of remote robotic surgery in Tactile Internet environments. This paper proposes a predictive approach using the computationally efficient Transformer-based Informer model for position estimation, combined with a Four-State Hidden Markov Model (4-State HMM) to simulate realistic packet loss scenarios. The method effectively addresses network-induced delays, jitter, and packet loss, ensuring reliable performance in remote robotic surgery. The study evaluates the Informer model on the JIGSAWS dataset, demonstrating its capability to handle sequential data challenges caused by network uncertainties. Key features, including ProbSparse attention and a generative-style decoder, enhance prediction accuracy, computational speed, and memory efficiency. Results indicate that the proposed method achieves over 90 percent accuracy across varying network conditions. Furthermore, the Informer framework outperforms traditional models such as TCN, RNN, and LSTM, highlighting its suitability for real-time remote surgery applications. △ Less

Submitted 15 May, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: The paper is being withdrawn due to a methodological issue identified during the review process. Specifically, further evaluation revealed inconsistencies in the packet loss modeling and prediction performance analysis. We plan to revise and correct these aspects before considering resubmission

arXiv:2501.13357 [pdf, other]

A light-weight model to generate NDWI from Sentinel-1

Authors: Saleh Sakib Ahmed, Saifur Rahman Jony, Md. Toufikuzzaman, Saifullah Sayed, Rashed Uz Zzaman, Sara Nowreen, M. Sohel Rahman

Abstract: The use of Sentinel-2 images to compute Normalized Difference Water Index (NDWI) has many applications, including water body area detection. However, cloud cover poses significant challenges in this regard, which hampers the effectiveness of Sentinel-2 images in this context. In this paper, we present a deep learning model that can generate NDWI given Sentinel-1 images, thereby overcoming this clo… ▽ More The use of Sentinel-2 images to compute Normalized Difference Water Index (NDWI) has many applications, including water body area detection. However, cloud cover poses significant challenges in this regard, which hampers the effectiveness of Sentinel-2 images in this context. In this paper, we present a deep learning model that can generate NDWI given Sentinel-1 images, thereby overcoming this cloud barrier. We show the effectiveness of our model, where it demonstrates a high accuracy of 0.9134 and an AUC of 0.8656 to predict the NDWI. Additionally, we observe promising results with an R2 score of 0.4984 (for regressing the NDWI values) and a Mean IoU of 0.4139 (for the underlying segmentation task). In conclusion, our model offers a first and robust solution for generating NDWI images directly from Sentinel-1 images and subsequent use for various applications even under challenging conditions such as cloud cover and nighttime. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2412.18248 [pdf]

Detection and Forecasting of Parkinson Disease Progression from Speech Signal Features Using MultiLayer Perceptron and LSTM

Authors: Majid Ali, Hina Shakir, Asia Samreen, Sohaib Ahmed

Abstract: Accurate diagnosis of Parkinson disease, especially in its early stages, can be a challenging task. The application of machine learning techniques helps improve the diagnostic accuracy of Parkinson disease detection but only few studies have presented work towards the prediction of disease progression. In this research work, Long Short Term Memory LSTM was trained using the diagnostic features on… ▽ More Accurate diagnosis of Parkinson disease, especially in its early stages, can be a challenging task. The application of machine learning techniques helps improve the diagnostic accuracy of Parkinson disease detection but only few studies have presented work towards the prediction of disease progression. In this research work, Long Short Term Memory LSTM was trained using the diagnostic features on Parkinson patients speech signals, to predict the disease progression while a Multilayer Perceptron MLP was trained on the same diagnostic features to detect the disease. Diagnostic features selected using two well-known feature selection methods named Relief-F and Sequential Forward Selection and applied on LSTM and MLP have shown to accurately predict the disease progression as stage 2 and 3 and its existence respectively. △ Less

Submitted 24 December, 2024; originally announced December 2024.

arXiv:2412.17035 [pdf, other]

Design of Frequency Index Modulated Waveforms for Integrated SAR and Communication on High-Altitude Platforms (HAPs)

Authors: Bang Huang, Sajid Ahmed, Mohamed-Slim Alouini

Abstract: This paper, addressing the integration requirements of radar imaging and communication for High-Altitude Platform Stations (HAPs) platforms, designs a waveform based on linear frequency modulated (LFM) frequency-hopping signals that combines synthetic aperture radar (SAR) and communication functionalities. Specifically, each pulse of an LFM signal is segmented into multiple parts, forming a sequen… ▽ More This paper, addressing the integration requirements of radar imaging and communication for High-Altitude Platform Stations (HAPs) platforms, designs a waveform based on linear frequency modulated (LFM) frequency-hopping signals that combines synthetic aperture radar (SAR) and communication functionalities. Specifically, each pulse of an LFM signal is segmented into multiple parts, forming a sequence of sub-pulses. Each sub-pulse can adopt a different carrier frequency, leading to frequency hops between sub-pulses. This design is termed frequency index modulation (FIM), enabling the embedding of communication information into different carrier frequencies for transmission. To further enhance the data transmission rate at the communication end, this paper incorporates quadrature amplitude modulation (QAM) into waveform design. %For the SAR portion, this approach reduces the ADC sampling requirements while maintaining range resolution. The paper derives the ambiguity function of the proposed waveform and analyzes its Doppler and range resolution, establishing upper and lower bounds for the range resolution. In processing SAR signals, the receiver first removes QAM symbols, and to address phase discontinuities between sub-pulses, a phase compensation algorithm is proposed to achieve coherent processing. For the communication receiver, the user first performs de-chirp processing and then demodulates QAM symbols and FIM index symbols using a two-step maximum likelihood (ML) algorithm. Numerical simulations further confirm the theoretical validity of the proposed approach. △ Less

Submitted 22 December, 2024; originally announced December 2024.

arXiv:2412.16176 [pdf, other]

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

Authors: Danush Venkateshperumal, Rahman Abdul Rafi, Shakil Ahmed, Ashfaq Khokhar

Abstract: Emergency communication systems face disruptions due to packet loss, bandwidth constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real-time service quality. Victims in distress often struggle to convey critical information due to panic, speech disorders, and background noise, further complicating dispatchers' ability to assess situations accurately. Staffing… ▽ More Emergency communication systems face disruptions due to packet loss, bandwidth constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real-time service quality. Victims in distress often struggle to convey critical information due to panic, speech disorders, and background noise, further complicating dispatchers' ability to assess situations accurately. Staffing shortages in emergency centers exacerbate delays in coordination and assistance. This paper proposes leveraging Large Language Models (LLMs) to address these challenges by reconstructing incomplete speech, filling contextual gaps, and prioritizing calls based on severity. The system integrates real-time transcription with Retrieval-Augmented Generation (RAG) to generate contextual responses, using Twilio and AssemblyAI APIs for seamless implementation. Evaluation shows high precision, favorable BLEU and ROUGE scores, and alignment with real-world needs, demonstrating the model's potential to optimize emergency response workflows and prioritize critical cases effectively. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: 15 pages,8 figures

arXiv:2412.02443 [pdf]

doi 10.1016/j.aej.2024.06.095

Multi-scale and Multi-path Cascaded Convolutional Network for Semantic Segmentation of Colorectal Polyps

Authors: Malik Abdul Manan, Feng Jinchao, Muhammad Yaqub, Shahzad Ahmed, Syed Muhammad Ali Imran, Imran Shabir Chuhan, Haroon Ahmed Khan

Abstract: Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the ab… ▽ More Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the absence of multi-level feature integration during the decoding stage by integrating multi-scale and multi-path cascaded convolutional techniques and enhances feature aggregation through dual attention modules, skip connections, and a feature enhancer. MMCC-Net achieves superior performance in identifying polyp areas at the pixel level. The Proposed MMCC-Net was tested across six public datasets and compared against eight SOTA models to demonstrate its efficiency in polyp segmentation. The MMCC-Net's performance shows Dice scores with confidence intervals ranging between (77.08, 77.56) and (94.19, 94.71) and Mean Intersection over Union (MIoU) scores with confidence intervals ranging from (72.20, 73.00) to (89.69, 90.53) on the six databases. These results highlight the model's potential as a powerful tool for accurate and efficient polyp segmentation, contributing to early detection and prevention strategies in colorectal cancer. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Journal ref: Alexandria Engineering Journal Volume 105, October 2024, Pages 341-359

arXiv:2412.00888 [pdf]

doi 10.1109/ICSIP61881.2024.10671533

DPE-Net: Dual-Parallel Encoder Based Network for Semantic Segmentation of Polyps

Authors: Malik Abdul Manan, Feng Jinchao, Shahzad Ahmed, Abdul Raheem

Abstract: In medical imaging, efficient segmentation of colon polyps plays a pivotal role in minimally invasive solutions for colorectal cancer. This study introduces a novel approach employing two parallel encoder branches within a network for polyp segmentation. One branch of the encoder incorporates the dual convolution blocks that have the capability to maintain feature information over increased depths… ▽ More In medical imaging, efficient segmentation of colon polyps plays a pivotal role in minimally invasive solutions for colorectal cancer. This study introduces a novel approach employing two parallel encoder branches within a network for polyp segmentation. One branch of the encoder incorporates the dual convolution blocks that have the capability to maintain feature information over increased depths, and the other block embraces the single convolution block with the addition of the previous layer's feature, offering diversity in feature extraction within the encoder, combining them before transpose layers with a depth-wise concatenation operation. Our model demonstrated superior performance, surpassing several established deep-learning architectures on the Kvasir and CVC-ClinicDB datasets, achieved a Dice score of 0.919, a mIoU of 0.866 for the Kvasir dataset, and a Dice score of 0.931 and a mIoU of 0.891 for the CVC-ClinicDB. The visual and quantitative results highlight the efficacy of our model, potentially setting a new model in medical image segmentation. △ Less

Submitted 3 December, 2024; v1 submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.06531 [pdf, other]

Decentralized Bus Voltage Restoration for DC Microgrids

Authors: Nabil Mohammed, Shehab Ahmed, Charalambos Konstantinou

Abstract: Regulating the voltage of the common DC bus, also referred to as the load bus, in DC microgrids is crucial for ensuring reliability and maintaining the nominal load voltage, which is essential for protecting sensitive loads from voltage variations. Stability and reliability are thereby enhanced, preventing malfunctions and extending the lifespan of sensitive loads (e.g., electronic devices). Volta… ▽ More Regulating the voltage of the common DC bus, also referred to as the load bus, in DC microgrids is crucial for ensuring reliability and maintaining the nominal load voltage, which is essential for protecting sensitive loads from voltage variations. Stability and reliability are thereby enhanced, preventing malfunctions and extending the lifespan of sensitive loads (e.g., electronic devices). Voltage drops are caused by resistances of feeders connecting converters to the common DC bus, resulting in a reduced DC bus voltage compared to the nominal/desired value. Existing techniques to restore this voltage in DC microgrids are mainly centralized and rely on secondary control layers. These layers sense the common DC bus voltage, compare it to the nominal value, and utilize a PI controller to send corrections via communication links to each converter. In this paper, a local and straightforward approach to restoring the bus voltage in DC microgrids is presented, ensuring regulation in a decentralized manner. Voltage drops across resistances of feeders are compensated by an additional control loop feedback within each converter, based on the converter output current and feeder resistance. The proposed approach is verified through simulation and hardware-in-the-loop results, eliminating the need for communication links and hence increasing reliability and reducing cybersecurity threats. △ Less

Submitted 10 November, 2024; originally announced November 2024.

Comments: 6 pages

arXiv:2410.22521 [pdf, other]

Lyapunov Characterization for ISS of Impulsive Switched Systems

Authors: Saeed Ahmed, Patrick Bachmann, Stephan Trenn

Abstract: In this study, we investigate the ISS of impulsive switched systems that have modes with both stable and unstable flows. We assume that the switching signal satisfies mode-dependent average dwell and leave time conditions. To establish ISS conditions, we propose two types of time-varying ISS-Lyapunov functions: one that is non-decreasing and another one that is decreasing. Our research proves that… ▽ More In this study, we investigate the ISS of impulsive switched systems that have modes with both stable and unstable flows. We assume that the switching signal satisfies mode-dependent average dwell and leave time conditions. To establish ISS conditions, we propose two types of time-varying ISS-Lyapunov functions: one that is non-decreasing and another one that is decreasing. Our research proves that the existence of either of these ISS-Lyapunov functions is a necessary and sufficient condition for ISS. We also present a technique for constructing a decreasing ISS-Lyapunov function from a non-decreasing one, which is useful for its own sake. Our findings also have added value to previous research that only studied sufficient conditions for ISS, as our results apply to a broader class of systems. This is because we impose less restrictive dwell and leave time constraints on the switching signal and our ISS-Lyapunov functions are time-varying with general nonlinear conditions imposed on them. Moreover, we provide a method to guarantee the ISS of a particular class of impulsive switched systems when the switching signal is unknown. △ Less

Submitted 29 October, 2024; originally announced October 2024.

arXiv:2410.21570 [pdf, other]

A novel switched systems approach to nonconvex optimisation

Authors: Joel Ferguson, Saeed Ahmed, Juan E. Machado, Michele Cucuzzella, Jacquelien M. A. Scherpen

Abstract: We develop a novel switching dynamics that converges to the Karush-Kuhn-Tucker (KKT) point of a nonlinear optimisation problem. This new approach is particularly notable for its lower dimensionality compared to conventional primal-dual dynamics, as it focuses exclusively on estimating the primal variable. Our method is successfully illustrated on general quadratic optimisation problems, the minimi… ▽ More We develop a novel switching dynamics that converges to the Karush-Kuhn-Tucker (KKT) point of a nonlinear optimisation problem. This new approach is particularly notable for its lower dimensionality compared to conventional primal-dual dynamics, as it focuses exclusively on estimating the primal variable. Our method is successfully illustrated on general quadratic optimisation problems, the minimisation of the classical Rosenbrock function, and a nonconvex optimisation problem stemming from the control of energy-efficient buildings. △ Less

Submitted 28 October, 2024; originally announced October 2024.

arXiv:2409.20219 [pdf, other]

Advanced Resilience Planning for Distribution Systems

Authors: Ahmad Bin Afzal, Nabil Mohammed, Shehab Ahmed, Charalambos Konstantinou

Abstract: Climate change has led to an increase in the frequency and severity of extreme weather events, posing significant challenges for power distribution systems. In response, this work presents a planning approach in order to enhance the resilience of distribution systems against climatic hazards. The framework systematically addresses uncertainties during extreme events, including weather variability… ▽ More Climate change has led to an increase in the frequency and severity of extreme weather events, posing significant challenges for power distribution systems. In response, this work presents a planning approach in order to enhance the resilience of distribution systems against climatic hazards. The framework systematically addresses uncertainties during extreme events, including weather variability and line damage. Key strategies include line hardening, backup diesel generators, and sectionalizers to strengthen resilience. We model spatio-temporal dynamics and costs through a hybrid model integrating stochastic processes with deterministic elements. A two-stage stochastic mixed-integer linear approach is developed to optimize resilience investments against load loss, generator operations, and repairs. Case studies on the IEEE 15-bus benchmark system and a realistic distribution grid model in Riyadh, Saudi Arabia demonstrate enhanced system robustness as well as cost efficiency of 10% and 15%, respectively. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Comments: CIRED Chicago Workshop 2024: Resilience of Electric Distribution Systems

arXiv:2409.07322 [pdf, other]

Three-Dimensional, Multimodal Synchrotron Data for Machine Learning Applications

Authors: Calum Green, Sharif Ahmed, Shashidhara Marathe, Liam Perera, Alberto Leonardi, Killian Gmyrek, Daniele Dini, James Le Houx

Abstract: Machine learning techniques are being increasingly applied in medical and physical sciences across a variety of imaging modalities; however, an important issue when developing these tools is the availability of good quality training data. Here we present a unique, multimodal synchrotron dataset of a bespoke zinc-doped Zeolite 13X sample that can be used to develop advanced deep learning and data f… ▽ More Machine learning techniques are being increasingly applied in medical and physical sciences across a variety of imaging modalities; however, an important issue when developing these tools is the availability of good quality training data. Here we present a unique, multimodal synchrotron dataset of a bespoke zinc-doped Zeolite 13X sample that can be used to develop advanced deep learning and data fusion pipelines. Multi-resolution micro X-ray computed tomography was performed on a zinc-doped Zeolite 13X fragment to characterise its pores and features, before spatially resolved X-ray diffraction computed tomography was carried out to characterise the homogeneous distribution of sodium and zinc phases. Zinc absorption was controlled to create a simple, spatially isolated, two-phase material. Both raw and processed data is available as a series of Zenodo entries. Altogether we present a spatially resolved, three-dimensional, multimodal, multi-resolution dataset that can be used for the development of machine learning techniques. Such techniques include development of super-resolution, multimodal data fusion, and 3D reconstruction algorithm development. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 9 pages, 4 figures. Image Processing and Artificial Intelligence Conference, 2024

arXiv:2409.02453 [pdf, other]

FrameCorr: Adaptive, Autoencoder-based Neural Compression for Video Reconstruction in Resource and Timing Constrained Network Settings

Authors: John Li, Shehab Sarar Ahmed, Deepak Nair

Abstract: Despite the growing adoption of video processing via Internet of Things (IoT) devices due to their cost-effectiveness, transmitting captured data to nearby servers poses challenges due to varying timing constraints and scarcity of network bandwidth. Existing video compression methods face difficulties in recovering compressed data when incomplete data is provided. Here, we introduce FrameCorr, a d… ▽ More Despite the growing adoption of video processing via Internet of Things (IoT) devices due to their cost-effectiveness, transmitting captured data to nearby servers poses challenges due to varying timing constraints and scarcity of network bandwidth. Existing video compression methods face difficulties in recovering compressed data when incomplete data is provided. Here, we introduce FrameCorr, a deep-learning based solution that utilizes previously received data to predict the missing segments of a frame, enabling the reconstruction of a frame from partially received data. △ Less

Submitted 10 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

arXiv:2407.16167 [pdf]

doi 10.1016/j.ifacol.2025.01.086

Consideration of Vehicle Characteristics on the Motion Planner Algorithm

Authors: Syed Adil Ahmed, Taehyun Shim

Abstract: Autonomous vehicle control is generally divided in two main areas; trajectory planning and tracking. Currently, the trajectory planning is mostly done by particle or kinematic model-based optimization controllers. The output of these planners, since they do not consider CG height and its effects, is not unique for different vehicle types, especially for high CG vehicles. As a result, the tracking… ▽ More Autonomous vehicle control is generally divided in two main areas; trajectory planning and tracking. Currently, the trajectory planning is mostly done by particle or kinematic model-based optimization controllers. The output of these planners, since they do not consider CG height and its effects, is not unique for different vehicle types, especially for high CG vehicles. As a result, the tracking controller may have to work hard to avoid vehicle handling and comfort constraints while trying to realize these sub-optimal trajectories. This paper tries to address this problem by considering a planner with simplified double track model with estimation of lateral and roll based load transfer using steady state equations and a simplified tire model to reduce solver workload. The developed planner is compared with the widely used particle and kinematic model planners in collision avoidance scenarios in both high and low acceleration conditions and with different vehicle heights. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: This paper has been accepted for conference proceedings in MECC 2024, Chicago under a Creative Commons License CC-BY-NC-ND

Journal ref: IFAC-PapersOnLine, Vol 58, Num 28, 2024, pgs 444-449

arXiv:2404.16397 [pdf, other]

Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology

Authors: Tiago Gonçalves, Dagoberto Pulido-Arias, Julian Willett, Katharina V. Hoebel, Mason Cleveland, Syed Rakin Ahmed, Elizabeth Gerstner, Jayashree Kalpathy-Cramer, Jaime S. Cardoso, Christopher P. Bridge, Albert E. Kim

Abstract: The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activi… ▽ More The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activity of ten biologically relevant pathways from the hematoxylin and eosin (H&E) slide of primary breast tumors. We employed different feature extraction approaches and state-of-the-art model architectures. Using binary classification, our models attained area under the receiver operating characteristic (AUROC) scores above 0.70 for nearly all gene expression pathways and on some cases, exceeded 0.80. Attention maps suggest that our trained models recognize biologically relevant spatial patterns of cell sub-populations from H&E. These efforts represent a first step towards developing computational H&E biomarkers that reflect facets of the TME and hold promise for augmenting precision oncology. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Paper accepted at the First Workshop on Imageomics (Imageomics-AAAI-24) - Discovering Biological Knowledge from Images using AI (https://sites.google.com/vt.edu/imageomics-aaai-24/home), held as part of the 38th Annual AAAI Conference on Artificial Intelligence (https://aaai.org/aaai-conference/)

MSC Class: 92C55 ACM Class: I.5.1; I.5.4; I.2.10; J.3

arXiv:2402.16757 [pdf, other]

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids

Authors: Jasper Kirton-Wingate, Shafique Ahmed, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Jen-Cheng Hou, Tassadaq Hussain, Yu Tsao, Amir Hussain

Abstract: Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-da… ▽ More Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio information to improve listening comfort, based upon the preferences of the user. The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. Additionally, to provide contextual information we predict the acoustic scene in which the user is situated. These tasks are achieved via a multi-task DL model, which surpasses the performance of inferring the acoustic scene or SNR separately, by jointly leveraging a shared encoded feature space. These environmental inferences are exploited in a preference elicitation framework, which linearly learns a set of predictive functions to determine the target SNR of an AV (Audio-Visual) SE system. By greatly reducing noise in challenging listening conditions, and by novelly scaling the output of the SE model, we are able to provide HA users with contextually individualised SE. Preliminary results suggest an improvement over the non-individualised baseline model in some participants. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: This has been submitted to the Trends in Hearing journal

arXiv:2401.00469 [pdf, other]

Exploring the Synergy: A Review of Dual-Functional Radar Communication Systems

Authors: Ali Hanif, Sajid Ahmed, Tareq Y. Al-Naffouri, Mohamed-Slim Alouin

Abstract: This review paper examines the concept and advancements in the evolving landscape of Dual-functional Radar Communication (DFRC) systems. Traditionally, radar and communication systems have functioned independently, but current research is actively investigating the integration of these functionalities into a unified platform. This paper discusses the motivations behind the development of DFRC syst… ▽ More This review paper examines the concept and advancements in the evolving landscape of Dual-functional Radar Communication (DFRC) systems. Traditionally, radar and communication systems have functioned independently, but current research is actively investigating the integration of these functionalities into a unified platform. This paper discusses the motivations behind the development of DFRC systems, the challenges involved, and the potential benefits they offer. A discussion on the performance bounds for DFRC systems is also presented. The paper encompasses a comprehensive analysis of various techniques, architectures, and technologies used in the design and optimization of DFRC systems, along with their performance and trade-offs. Additionally, we explore potential application scenarios for these joint communication and sensing systems, offering a comprehensive perspective on the multifaceted landscape of DFRC technology. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 17 pages, 7 figures

arXiv:2311.00154 [pdf, other]

Medi-CAT: Contrastive Adversarial Training for Medical Image Classification

Authors: Pervaiz Iqbal Khan, Andreas Dengel, Sheraz Ahmed

Abstract: There are not many large medical image datasets available. For these datasets, too small deep learning models can't learn useful features, so they don't work well due to underfitting, and too big models tend to overfit the limited data. As a result, there is a compromise between the two issues. This paper proposes a training strategy Medi-CAT to overcome the underfitting and overfitting phenomena… ▽ More There are not many large medical image datasets available. For these datasets, too small deep learning models can't learn useful features, so they don't work well due to underfitting, and too big models tend to overfit the limited data. As a result, there is a compromise between the two issues. This paper proposes a training strategy Medi-CAT to overcome the underfitting and overfitting phenomena in medical imaging datasets. Specifically, the proposed training methodology employs large pre-trained vision transformers to overcome underfitting and adversarial and contrastive learning techniques to prevent overfitting. The proposed method is trained and evaluated on four medical image classification datasets from the MedMNIST collection. Our experimental results indicate that the proposed approach improves the accuracy up to 2% on three benchmark datasets compared to well-known approaches, whereas it increases the performance up to 4.1% over the baseline methods. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.18064 [pdf]

New Fast Transform for Orthogonal Frequency Division Multiplexing

Authors: Said Boussakta, Mounir T. Hamood, Mohammed Sh. Ahmed

Abstract: In this paper, a new fast and low complexity transform is introduced for orthogonal frequency division multiplexing (OFDM) wireless systems. The new transform combines the effects of fast complex-Walsh-Hadamard transform (CHT) and the fast Fourier transform (FFT) into a single unitary transform named in this paper as the complex transition transform (CTT). The development of a new algorithm for fa… ▽ More In this paper, a new fast and low complexity transform is introduced for orthogonal frequency division multiplexing (OFDM) wireless systems. The new transform combines the effects of fast complex-Walsh-Hadamard transform (CHT) and the fast Fourier transform (FFT) into a single unitary transform named in this paper as the complex transition transform (CTT). The development of a new algorithm for fast calculation of the CT transform called FCT is found to have all the desirable properties such as in-place computation, simple indexing scheme and considerably lower arithmetic complexity than existing algorithms. Furthermore, a new OFDM system using the FCT algorithm is introduced and its performance has been evaluated. The proposed CT-OFDM achieves a noticeable reduction in peak-to-average-power-ratio (PAPR) and a significant improvement in the bit-error-rate (BER) performance compared with the conventional OFDM. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 12 pages, 8 figures, 1 table

arXiv:2309.13033 [pdf, ps, other]

Robust Stability Analysis of a Class of LTV Systems

Authors: Shahzad Ahmed, Hafiz Zeeshan Iqbal Khan, Jamshed Riaz

Abstract: Many physical systems are inherently time-varying in nature. When these systems are linearized around a trajectory, generally, the resulting system is Linear Time-Varying (LTV). LTV systems describe an important class of linear systems and can be thought of as a natural extension of LTI systems. However, it is well known that, unlike LTI systems, the eigenvalues of an LTV system do not determine i… ▽ More Many physical systems are inherently time-varying in nature. When these systems are linearized around a trajectory, generally, the resulting system is Linear Time-Varying (LTV). LTV systems describe an important class of linear systems and can be thought of as a natural extension of LTI systems. However, it is well known that, unlike LTI systems, the eigenvalues of an LTV system do not determine its stability. In this paper, the stability conditions for a class of LTV systems are derived. This class is composed of piecewise LTV systems, i.e. LTV systems that are piecewise linear in time. Sufficient conditions of stability are derived in the form of linear matrix inequalities (LMIs) by using the Lyapunov stability criterion. The feasibility of LMIs guarantees the stability of a given piecewise LTV system. Furthermore, uncertain piecewise LTV systems with scalar parametric uncertainty are also considered. Sufficient conditions for robust stability of this case are also presented, which come out to be quasi-LMIs, which can be optimized using a bisection algorithm to find the bounds of uncertainty for which the system is stable. The proposed method is applied to the problem of pitch angle control of a space launch vehicle, and the results are presented. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: Presented at 20th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2023

arXiv:2309.11059 [pdf, other]

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Authors: Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Abstract: Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a com… ▽ More Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset. △ Less

Submitted 8 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.10542 [pdf, ps, other]

A Multi Constrained Transformer-BiLSTM Guided Network for Automated Sleep Stage Classification from Single-Channel EEG

Authors: Farhan Sadik, Md Tanvir Raihan, Rifat Bin Rashid, Minhjaur Rahman, Sabit Md Abdal, Shahed Ahmed, Talha Ibn Mahmud

Abstract: Sleep stage classification from electroencephalogram (EEG) is significant for the rapid evaluation of sleeping patterns and quality. A novel deep learning architecture, ``DenseRTSleep-II'', is proposed for automatic sleep scoring from single-channel EEG signals. The architecture utilizes the advantages of Convolutional Neural Network (CNN), transformer network, and Bidirectional Long Short Term Me… ▽ More Sleep stage classification from electroencephalogram (EEG) is significant for the rapid evaluation of sleeping patterns and quality. A novel deep learning architecture, ``DenseRTSleep-II'', is proposed for automatic sleep scoring from single-channel EEG signals. The architecture utilizes the advantages of Convolutional Neural Network (CNN), transformer network, and Bidirectional Long Short Term Memory (BiLSTM) for effective sleep scoring. Moreover, with the addition of a weighted multi-loss scheme, this model is trained more implicitly for vigorous decision-making tasks. Thus, the model generates the most efficient result in the SleepEDFx dataset and outperforms different state-of-the-art (IIT-Net, DeepSleepNet) techniques by a large margin in terms of accuracy, precision, and F1-score. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2308.15806 [pdf, ps, other]

Communication Reduction for Power Systems: An Observer-Based Event-Triggered Approach

Authors: Gabriel E. Mejia-Ruiz, Yazdan Batmani, Subhash Lakshminarayana, Shehab Ahmed, Charalambos Konstantinou

Abstract: The management of distributed and heterogeneous modern power networks necessitates the deployment of communication links, often characterized by limited bandwidth. This paper presents an event detection mechanism that significantly reduces the volume of data transmission to perform necessary control actions, using a scalable scheme that enhances the stability and reliability of power grids. The ap… ▽ More The management of distributed and heterogeneous modern power networks necessitates the deployment of communication links, often characterized by limited bandwidth. This paper presents an event detection mechanism that significantly reduces the volume of data transmission to perform necessary control actions, using a scalable scheme that enhances the stability and reliability of power grids. The approach relies on implementing a linear quadratic regulator and the execution of a pair of Luenberger observers. The linear quadratic regulator minimizes the amount of energy required to achieve the control actions. Meanwhile, the Luenberger observers estimate the unmeasured states from the sensed states, providing the necessary information to trigger the event detection mechanism. The effectiveness of the method is tested via time-domain simulations on the IEEE 13-node test feeder interfaced with inverter-based distributed generation systems and the proposed observed-based event-triggered controller. The results demonstrate that the presented control scheme guarantees the bounding of the system states to a pre-specified limit while reducing the number of data packet transmissions by 39.8%. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)

arXiv:2308.15797 [pdf, other]

Volt/VAR Optimization in the Presence of Attacks: A Real-Time Co-Simulation Study

Authors: Mohd Asim Aftab, Astha Chawla, Pedro P. Vergara, Shehab Ahmed, Charalambos Konstantinou

Abstract: Traditionally, Volt/VAR optimization (VVO) is performed in distribution networks through legacy devices such as on-load tap changers (OLTCs), voltage regulators (VRs), and capacitor banks. With the amendment in IEEE 1547 standard, distributed energy resources (DERs) can now provide reactive power support to the grid. For this, renewable energy-based DERs, such as PV, are interfaced with the distri… ▽ More Traditionally, Volt/VAR optimization (VVO) is performed in distribution networks through legacy devices such as on-load tap changers (OLTCs), voltage regulators (VRs), and capacitor banks. With the amendment in IEEE 1547 standard, distributed energy resources (DERs) can now provide reactive power support to the grid. For this, renewable energy-based DERs, such as PV, are interfaced with the distribution networks through smart inverters (SIs). Due to the intermittent nature of such resources, VVO transforms into a dynamic problem that requires extensive communication between the VVO controller and devices performing the VVO scheme. This communication, however, can be potentially tampered with by an adversary rendering the VVO ineffective. In this regard, it is important to assess the impact of cyberattacks on the VVO scheme. This paper develops a real-time co-simulation setup to assess the effect of cyberattacks on VVO. The setup consists of a real-time power system simulator, a communication network emulator, and a master controller in a system-in-the-loop (SITL) setup. The DNP3 communication protocol is adopted for the underlying communication infrastructure. The results show that corrupted communication messages can lead to violation of voltage limits, increased number of setpoint updates of VRs, and economic loss. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm)

arXiv:2308.13833 [pdf, other]

A Cognitive Network Architecture for Vehicle-to-Network (V2N) Communications over Smart Meters for URLLC

Authors: Shoaib Ahmed, Sayonto Khan, Kumudu S. Munasinghe, Md. Farhad Hossain

Abstract: With the rapid advancement of smart city infrastructure, vehicle-to-network (V2N) communication has emerged as a crucial technology to enable intelligent transportation systems (ITS). The investigation of new methods to improve V2N communications is sparked by the growing need for high-speed and dependable communications in vehicular networks. To achieve ultra-reliable low latency communication (U… ▽ More With the rapid advancement of smart city infrastructure, vehicle-to-network (V2N) communication has emerged as a crucial technology to enable intelligent transportation systems (ITS). The investigation of new methods to improve V2N communications is sparked by the growing need for high-speed and dependable communications in vehicular networks. To achieve ultra-reliable low latency communication (URLLC) for V2N scenarios, we propose a smart meter (SM)-based cognitive network (CN) architecture for V2N communications. Our scheme makes use of SMs' available underutilized time resources to let them serve as distributed access points (APs) for V2N communications to increase reliability and decrease latency. We propose and investigate two algorithms for efficiently associating vehicles with the appropriate SMs. Extensive simulations are carried out for comprehensive performance evaluation of our proposed architecture and algorithms under diverse system scenarios. Performance is investigated with particular emphasis on communication latency and reliability, which are also compared with the conventional base station (BS)-based V2N architecture for further validation. The results highlight the value of incorporating SMs into the current infrastructure and open the door for future ITSs to utilize more effective and dependable V2N communications. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 12 pages, 19 figures, IEEE format

arXiv:2307.05375 [pdf, other]

Emotion Analysis on EEG Signal Using Machine Learning and Neural Network

Authors: S. M. Masrur Ahmed, Eshaan Tanzim Sabur

Abstract: Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of populatio… ▽ More Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2307.04771 [pdf, other]

Invariant Scattering Transform for Medical Imaging

Authors: Nafisa Labiba Ishrat Huda, Angona Biswas, MD Abdullah Al Nasim, Md. Fahim Rahman, Shoaib Ahmed

Abstract: Invariant scattering transform introduces new area of research that merges the signal processing with deep learning for computer vision. Nowadays, Deep Learning algorithms are able to solve a variety of problems in medical sector. Medical images are used to detect diseases brain cancer or tumor, Alzheimer's disease, breast cancer, Parkinson's disease and many others. During pandemic back in 2020,… ▽ More Invariant scattering transform introduces new area of research that merges the signal processing with deep learning for computer vision. Nowadays, Deep Learning algorithms are able to solve a variety of problems in medical sector. Medical images are used to detect diseases brain cancer or tumor, Alzheimer's disease, breast cancer, Parkinson's disease and many others. During pandemic back in 2020, machine learning and deep learning has played a critical role to detect COVID-19 which included mutation analysis, prediction, diagnosis and decision making. Medical images like X-ray, MRI known as magnetic resonance imaging, CT scans are used for detecting diseases. There is another method in deep learning for medical imaging which is scattering transform. It builds useful signal representation for image classification. It is a wavelet technique; which is impactful for medical image classification problems. This research article discusses scattering transform as the efficient system for medical image analysis where it's figured by scattering the signal information implemented in a deep convolutional network. A step by step case study is manifested at this research work. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 11 pages, 8 figures and 1 table

arXiv:2305.18283 [pdf, other]

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

Authors: Juan Zuluaga-Gomez, Sara Ahmed, Danielius Visockas, Cem Subakan

Abstract: Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECA… ▽ More Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECAPA-TDNN and Wav2Vec 2.0/XLSR architectures which have been proven to perform well on a variety of speech-related downstream tasks. We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). Furthermore, we establish new state-of-the-art for English accent classification with as high as 95% accuracy. We also study the internal categorization of the Wav2Vev 2.0 embeddings through t-SNE, noting that there is a level of clustering based on phonological similarity. (Our recipe is open-source in the SpeechBrain toolkit, see: https://github.com/speechbrain/speechbrain/tree/develop/recipes) △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

arXiv:2305.16055 [pdf, ps, other]

Machine Learning-Based Automatic Cardiovascular Disease Diagnosis Using Two ECG Leads

Authors: Cheng Guo, Sajid Ahmed, Mohamed-Slim Alouini

Abstract: The state-of-the-art cardiovascular disease diagnosis techniques use machine-learning algorithms based on feature extraction and classification. In this work, in contrast to a conventional single Electrocardiogram (ECG) lead, two leads are used, and autoregressive (AR) coefficients and statistical parameters are extracted to be used as features. Four machine-learning classifiers support-vector-mac… ▽ More The state-of-the-art cardiovascular disease diagnosis techniques use machine-learning algorithms based on feature extraction and classification. In this work, in contrast to a conventional single Electrocardiogram (ECG) lead, two leads are used, and autoregressive (AR) coefficients and statistical parameters are extracted to be used as features. Four machine-learning classifiers support-vector-machine (SVM), K-nearest neighbors (KNN), multi-layer perceptron (MLP), and Naive Bayes are applied on these features to test the accuracy of each classifier. For simulation, data is collected from the MIT-BIH and Shaoxing Peoples Hospital China (SPHC) database. To test the generalization ability of our proposed methodology machine-learning model is built on the SPHC database and tested on the MIT-BIH database and self-collected datasets. In the single-database simulation, the MLP performs better than the other three classifiers. While in the cross-database simulation, the SVM-based model trained by the SPHC database shows superiority. For normal and LBBB heartbeats, the predicted recall respectively reaches 100% and 98.4%. Simulation results show that the performance of our proposed methodology is better than the state-of-the-art techniques for the same database. While for cross-database simulation, the results are promising too. Finally, in the demonstration of our realized system, all heartbeats collected from healthy people are classified as normal beats. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 15 pages, 11 figures

MSC Class: 53A45

arXiv:2305.00300 [pdf, other]

doi 10.1098/rspa.2022.0815

On the dual advantage of placing observations through forward sensitivity analysis

Authors: Shady E Ahmed, Omer San, Sivaramakrishnan Lakshmivarahan, John M Lewis

Abstract: The four-dimensional variational data assimilation methodology for assimilating noisy observations into a deterministic model has been the workhorse of forecasting centers for over three decades. While this method provides a computationally efficient framework for dynamic data assimilation, it is largely silent on the important question concerning the minimum number and placement of observations.… ▽ More The four-dimensional variational data assimilation methodology for assimilating noisy observations into a deterministic model has been the workhorse of forecasting centers for over three decades. While this method provides a computationally efficient framework for dynamic data assimilation, it is largely silent on the important question concerning the minimum number and placement of observations. To answer this question, we demonstrate the dual advantage of placing the observations where the square of the sensitivity of the model solution with respect to the unknown control variables, called forward sensitivities, attains its maximum. Therefore, we can force the observability Gramian to be of full rank, which in turn guarantees efficient recovery of the optimal values of the control variables, which is the first of the two advantages of this strategy. We further show that the proposed strategy of placing observations has another inherent optimality: the square of the sensitivity of the optimal estimates of the control with respect to the observations (used to obtain these estimates) attains its minimum value, a second advantage that is a direct consequence of the above strategy for placing observations. Our analytical framework and numerical experiments on linear and nonlinear systems confirm the effectiveness of our proposed strategy. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2301.13712 [pdf, other]

A Bi-Level Stochastic Game Model for PMU Placement in Power Grid with Cybersecurity Risks

Authors: Saptarshi Ghosh, Murali Sankar Venkatraman, Shehab Ahmed, Charalambos Konstantinou

Abstract: Phasor measurement units (PMUs) provide accurate and high-fidelity measurements in order to monitor the state of the power grid and support various control and planning tasks. However, PMUs have a high installation cost prohibiting their massive deployment. Minimizing the number of installed PMUs needs to be achieved while also maintaining full observability of the network. At the same time, data… ▽ More Phasor measurement units (PMUs) provide accurate and high-fidelity measurements in order to monitor the state of the power grid and support various control and planning tasks. However, PMUs have a high installation cost prohibiting their massive deployment. Minimizing the number of installed PMUs needs to be achieved while also maintaining full observability of the network. At the same time, data integrity attacks on PMU measurements can cause mislead power system control and operation routines. In this paper, a bi-level stochastic non-cooperative game-based placement model is proposed for PMU allocation in the presence of cyber-attack risks. In the first level, the protection of individual PMU placed in a network is addressed, while considering the interaction between the grid operator and the attacker with respective resource constraints. In the second level, the attacker observes the placement of the PMUs and compromises them, with the aim of maximizing the state estimation error and reducing the observability of the network. The proposed technique is deployed in the IEEE-9 bus test system. The results demonstrate a 9% reduction in the cost incurred by the power grid operator for deploying PMUs while considering cyber-risks. △ Less

Submitted 15 April, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: 2023 IEEE Belgrade PowerTech

arXiv:2211.09751 [pdf, other]

Heart Abnormality Detection from Heart Sound Signals using MFCC Feature and Dual Stream Attention Based Network

Authors: Nayeeb Rashid, Swapnil Saha, Mohseu Rashid Subah, Rizwan Ahmed Robin, Syed Mortuza Hasan Fahim, Shahed Ahmed, Talha Ibn Mahmud

Abstract: Cardiovascular diseases are one of the leading cause of death in today's world and early screening of heart condition plays a crucial role in preventing them. The heart sound signal is one of the primary indicator of heart condition and can be used to detect abnormality in the heart. The acquisition of heart sound signal is non-invasive, cost effective and requires minimum equipment. But currently… ▽ More Cardiovascular diseases are one of the leading cause of death in today's world and early screening of heart condition plays a crucial role in preventing them. The heart sound signal is one of the primary indicator of heart condition and can be used to detect abnormality in the heart. The acquisition of heart sound signal is non-invasive, cost effective and requires minimum equipment. But currently the detection of heart abnormality from heart sound signal depends largely on the expertise and experience of the physician. As such an automatic detection system for heart abnormality detection from heart sound signal can be a great asset for the people living in underdeveloped areas. In this paper we propose a novel deep learning based dual stream network with attention mechanism that uses both the raw heart sound signal and the MFCC features to detect abnormality in heart condition of a patient. The deep neural network has a convolutional stream that uses the raw heart sound signal and a recurrent stream that uses the MFCC features of the signal. The features from these two streams are merged together using a novel attention network and passed through the classification network. The model is trained on the largest publicly available dataset of PCG signal and achieves an accuracy of 87.11, sensitivity of 82.41, specificty of 91.8 and a MACC of 87.12. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2209.12693 [pdf, other]

doi 10.1109/ACCESS.2025.3560811

Leveraging the Potential of Novel Data in Power Line Communication of Electricity Grids

Authors: Christoph Balada, Max Bondorf, Sheraz Ahmed, Andreas Dengela, Markus Zdrallek

Abstract: Electricity grids have become an essential part of daily life, even if they are often not noticed in everyday life. We usually only become particularly aware of this dependence by the time the electricity grid is no longer available. However, significant changes, such as the transition to renewable energy (photovoltaic, wind turbines, etc.) and an increasing number of energy consumers with complex… ▽ More Electricity grids have become an essential part of daily life, even if they are often not noticed in everyday life. We usually only become particularly aware of this dependence by the time the electricity grid is no longer available. However, significant changes, such as the transition to renewable energy (photovoltaic, wind turbines, etc.) and an increasing number of energy consumers with complex load profiles (electric vehicles, home battery systems, etc.), pose new challenges for the electricity grid. To address these challenges, we propose two first-of-its-kind datasets based on measurements in a broadband powerline communications (PLC) infrastructure. Both datasets FiN-1 and FiN-2, were collected during real practical use in a part of the German low-voltage grid that supplies around 4.4 million people and show more than 13 billion datapoints collected by more than 5100 sensors. In addition, we present different use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection to highlight the benefits of these types of data. For these applications, we particularly highlight the use of novel machine learning architectures to extract rich information from real-world data that cannot be captured using traditional approaches. By publishing the first large-scale real-world dataset, we aim to shed light on the previously largely unrecognized potential of PLC data and emphasize machine-learning-based research in low-voltage distribution networks by presenting a variety of different use cases. △ Less

Submitted 8 September, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.11088 [pdf, other]

Blockage Prediction for Mobile UE in RIS-assisted Wireless Networks: A Deep Learning Approach

Authors: Shakil Ahmed, Ibrahim Abdelmawla, Ahmed E. Kamal, Mohamed Y. Selim

Abstract: Due to significant blockage conditions in wireless networks, transmitted signals may considerably degrade before reaching the receiver. The reliability of the transmitted signals, therefore, may be critically problematic due to blockages between the communicating nodes. Thanks to the ability of Reconfigurable Intelligent Surfaces (RISs) to reflect the incident signals with different reflection ang… ▽ More Due to significant blockage conditions in wireless networks, transmitted signals may considerably degrade before reaching the receiver. The reliability of the transmitted signals, therefore, may be critically problematic due to blockages between the communicating nodes. Thanks to the ability of Reconfigurable Intelligent Surfaces (RISs) to reflect the incident signals with different reflection angles, this may counter the blockage effect by optimally reflecting the transmit signals to receiving nodes, hence, improving the wireless network's performance. With this motivation, this paper formulates a RIS-aided wireless communication problem from a base station (BS) to a mobile user equipment (UE). The BS is equipped with an RGB camera. We use the RGB camera at the BS and the RIS panel to improve the system's performance while considering signal propagating through multiple paths and the Doppler spread for the mobile UE. First, the RGB camera is used to detect the presence of the UE with no blockage. When unsuccessful, the RIS-assisted gain takes over and is then used to detect if the UE is either "present but blocked" or "absent". The problem is determined as a ternary classification problem with the goal of maximizing the probability of UE communication blockage detection. We find the optimal solution for the probability of predicting the blockage status for a given RGB image and RIS-assisted data rate using a deep neural learning model. We employ the residual network 18-layer neural network model to find this optimal probability of blockage prediction. Extensive simulation results reveal that our proposed RIS panel-assisted model enhances the accuracy of maximization of the blockage prediction probability problem by over 38\% compared to the baseline scheme. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2208.10379 [pdf, other]

Enhanced IoT Batteryless D2D Communications Using Reconfigurable Intelligent Surfaces

Authors: Shakil Ahmed, Mohamed Y. Selim, Ahmed E. Kamal

Abstract: Recent research on reconfigurable intelligent surfaces (RIS) suggests that the RIS panel, containing passive elements, enhances channel performance for the internet of things (IoT) systems by reflecting transmitted signals to the receiving nodes. This paper investigates RIS panel assisted-wireless network to instigate minimal base station (BS) transmit power in the form of energy harvesting for ba… ▽ More Recent research on reconfigurable intelligent surfaces (RIS) suggests that the RIS panel, containing passive elements, enhances channel performance for the internet of things (IoT) systems by reflecting transmitted signals to the receiving nodes. This paper investigates RIS panel assisted-wireless network to instigate minimal base station (BS) transmit power in the form of energy harvesting for batteryless IoT sensors to maximize bits transmission in the significant multi-path environment, such as urban areas. Batteryless IoT sensors harvest energy through the RIS panel from external sources, such as from nearby BS radio frequency (RF) signal in the first optimal time frame, for a given time frame. The bits transmission among IoT sensors, followed by a device-to-device (D2D) communications protocol, is maximized using harvested energy in the final optimal time frame. The bits transmission is at least equal to the number of bits sampled by the IoT sensor. We formulate a non-convex mixed-integer non-linear problem to maximize the number of communicating bits subject to energy harvesting from BS RF signals, RIS panel energy consumption, and required time. We propose a robust solution by presenting an iterative algorithm. We perform extensive simulation results based on the 3GPP Urban Micro channel model to validate our model. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2204.06336 [pdf, other]

FiN: A Smart Grid and Power Line Communication Dataset

Authors: Christoph Balada, Sheraz Ahmed, Andreas Dengel, Max Bondorf, Nikolai Hopfer, Markus Zdrallek

Abstract: The increasing complexity of low-voltage networks poses a growing challenge for the reliable and fail-safe operation of electricity grids. The reasons for this include an increasingly decentralized energy generation (photovoltaic systems, wind power, etc.) and the emergence of new types of consumers (e-mobility, domestic electricity storage, etc.). At the same time, the low-voltage grid is largely… ▽ More The increasing complexity of low-voltage networks poses a growing challenge for the reliable and fail-safe operation of electricity grids. The reasons for this include an increasingly decentralized energy generation (photovoltaic systems, wind power, etc.) and the emergence of new types of consumers (e-mobility, domestic electricity storage, etc.). At the same time, the low-voltage grid is largely unmonitored and local power failures are sometimes hard to detect. To overcome this, power line communication (PLC) has emerged as a potential solution for reliable monitoring of the low-voltage grid. In addition to establishing a communication infrastructure, PLC also offers the possibility of evaluating the cables themselves, as well as the connection quality between individual cable distributors based on their Signal-to-Noise Ratio (SNR). The roll-out of a large-scale PLC infrastructure therefore not only ensures communication, but also introduces a tool for monitoring the entire network. To evaluate the potential of this data, we installed 38 PLC modems in three different areas of a German city with a population of about 150,000 as part of the Fühler-im-Netz project. Over a period of 22 months, an SNR spectrum of each connection between adjacent PLC modems was generated every quarter of an hour. % and the voltage was measured every minute. The availability of this real-world PLC data opens up new possibilities to react to the increasingly complex challenges in future smart grids. This paper provides a detailed analysis of the data generation and describes how the data was collected during normal operation of the electricity grid. In addition, we present common anomalies, effects, and trends that could be observed in the PLC data at daily, weekly, or seasonal levels. Finally, we discuss potential use cases and the remote inspection of a cable section is highlighted as an example. △ Less

Submitted 12 September, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2203.08651 [pdf, ps, other]

doi 10.1016/j.ifacol.2023.02.001

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Authors: Patrick Bachmann, Saeed Ahmed

Abstract: Time-varying ISS-Lyapunov functions for impulsive systems provide a necessary and sufficient condition for ISS. This property makes them a more powerful tool for stability analysis than classical candidate ISS-Lyapunov functions providing only a sufficient ISS condition. Moreover, time-varying ISS-Lyapunov functions cover systems with simultaneous instability in continuous and discrete dynamics fo… ▽ More Time-varying ISS-Lyapunov functions for impulsive systems provide a necessary and sufficient condition for ISS. This property makes them a more powerful tool for stability analysis than classical candidate ISS-Lyapunov functions providing only a sufficient ISS condition. Moreover, time-varying ISS-Lyapunov functions cover systems with simultaneous instability in continuous and discrete dynamics for which candidate ISS-Lyapunov functions remain inconclusive. The present paper links these two concepts by suggesting a method of constructing time-varying ISS-Lyapunov functions from candidate ISS-Lyapunov functions, thereby effectively combining the ease of construction of candidate ISS-Lyapunov functions with the guaranteed existence of time-varying ISS-Lyapunov functions. △ Less

Submitted 27 June, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

Showing 1–50 of 93 results for author: Ahmed, S