Search | arXiv e-print repository

HOTA: Hierarchical Overlap-Tiling Aggregation for Large-Area 3D Flood Mapping

Authors: Wenfeng Jia, Bin Liang, Yuxi Lu, Attavit Wilaiwongsakul, Muhammad Arif Khan, Lihong Zheng

Abstract: Floods are among the most frequent natural hazards and cause significant social and economic damage. Timely, large-scale information on flood extent and depth is essential for disaster response; however, existing products often trade spatial detail for coverage or ignore flood depth altogether. To bridge this gap, this work presents HOTA: Hierarchical Overlap-Tiling Aggregation, a plug-and-play, m… ▽ More Floods are among the most frequent natural hazards and cause significant social and economic damage. Timely, large-scale information on flood extent and depth is essential for disaster response; however, existing products often trade spatial detail for coverage or ignore flood depth altogether. To bridge this gap, this work presents HOTA: Hierarchical Overlap-Tiling Aggregation, a plug-and-play, multi-scale inference strategy. When combined with SegFormer and a dual-constraint depth estimation module, this approach forms a complete 3D flood-mapping pipeline. HOTA applies overlapping tiles of different sizes to multispectral Sentinel-2 images only during inference, enabling the SegFormer model to capture both local features and kilometre-scale inundation without changing the network weights or retraining. The subsequent depth module is based on a digital elevation model (DEM) differencing method, which refines the 2D mask and estimates flood depth by enforcing (i) zero depth along the flood boundary and (ii) near-constant flood volume with respect to the DEM. A case study on the March 2021 Kempsey (Australia) flood shows that HOTA, when coupled with SegFormer, improves IoU from 73\% (U-Net baseline) to 84\%. The resulting 3D surface achieves a mean absolute boundary error of less than 0.5 m. These results demonstrate that HOTA can produce accurate, large-area 3D flood maps suitable for rapid disaster response. △ Less

Submitted 10 July, 2025; originally announced July 2025.

arXiv:2506.23368 [pdf]

Optimizing Solar Energy Production in the USA: Time-Series Analysis Using AI for Smart Energy Management

Authors: Istiaq Ahmed, Md Asif Ul Hoq Khan, MD Zahedul Islam, Md Sakibul Hasan, Tanaya Jakir, Arat Hossain, Joynal Abed, Muhammad Hasanuzzaman, Sadia Sharmeen Shatyi, Kazi Nehal Hasnain

Abstract: As the US rapidly moves towards cleaner energy sources, solar energy is fast becoming the pillar of its renewable energy mix. Even while solar energy is increasingly being used, its variability is a key hindrance to grid stability, storage efficiency, and system stability overall. Solar energy has emerged as one of the fastest-growing renewable energy sources in the United States, adding noticeabl… ▽ More As the US rapidly moves towards cleaner energy sources, solar energy is fast becoming the pillar of its renewable energy mix. Even while solar energy is increasingly being used, its variability is a key hindrance to grid stability, storage efficiency, and system stability overall. Solar energy has emerged as one of the fastest-growing renewable energy sources in the United States, adding noticeably to the country's energy mix. Retrospectively, the necessity of inserting the sun's energy into the grid without disrupting reliability and cost efficiencies highlights the necessity of good forecasting software and smart control systems. The dataset utilized for this research project comprised both hourly and daily solar energy production records collected from multiple utility-scale solar farms across diverse U.S. regions, including California, Texas, and Arizona. Training and evaluation of all models were performed with a time-based cross-validation scheme, namely, sliding window validation. Both the Random Forest and the XG-Boost models demonstrated noticeably greater and the same performance across each of the measures considered, with relatively high accuracy. The almost perfect and equal performance by the Random Forest and XG-Boost models also shows both models to have learned the patterns in the data very comprehensively, with high reliability in their predictions. By incorporating AI-powered time-series models like XG-Boost in grid management software, utility companies can dynamically modify storage cycles in real-time as well as dispatch and peak load planning, based on their predictions. AI-powered solar forecasting also has profound implications for renewable energy policy and planning, particularly as U.S. federal and state governments accelerate toward ambitious decarbonization goals. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.07685 [pdf]

CommSense: A Rapid and Accurate ISAC Paradigm

Authors: Sandip Jana, Amit Kumar Mishra, Mohammed Zafar Ali Khan

Abstract: Future 6G networks envisions to blur the line between communication and sensing, leveraging ubiquitous OFDM waveforms for both high throughput data and environmental awareness. In this work, we do a thorough analysis of Communication based Sensing (CommSense) framework that embeds lightweight, PCA based detectors into standard OFDM receivers; enabling real-time, device free detection of passive sc… ▽ More Future 6G networks envisions to blur the line between communication and sensing, leveraging ubiquitous OFDM waveforms for both high throughput data and environmental awareness. In this work, we do a thorough analysis of Communication based Sensing (CommSense) framework that embeds lightweight, PCA based detectors into standard OFDM receivers; enabling real-time, device free detection of passive scatterers (e.g.\ drones, vehicles etc.) without any extra transmitters. Starting from a realistic three link Rician channel model (direct Tx$\rightarrow$Rx, cascaded Tx$\rightarrow$Scatterer and Scatterer$\rightarrow$Rx), we compare four detectors: the full dimensional Likelihood Ratio Test (Full LRT), PCA based LRT, PCA+SVM with linear and RBF kernels. By projecting $N$-dimensional CSI onto a $P\ll N$ principal component subspace, inference time gets reduced by an order of magnitude compared to the full LRT, while achieving optimal error rates i.e. empirical errors align tightly with the Bhattacharyya error bound and Area Under ROC Curve (AUC)$\approx1$ for $P\approx10$. PCA+SVM classifiers further improve robustness in very high dimensions ($N=1024$), maintaining AUC$\gtrsim0.60$ at $-10$dB and exceeding 0.90 by 0dB even when full LRT fails due to numerical overflow. From the simulated result we have shown LRT based techniques are susceptible to the parameter estimation error, where as SVM is resilient to that. Our results demonstrate that PCA driven detection when paired with lightweight SVMs can deliver fast, accurate, and robust scatterer sensing, paving the way for integrated sensing and communication (ISAC) in 6G and beyond. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2505.20810 [pdf, other]

The Role of AI in Early Detection of Life-Threatening Diseases: A Retinal Imaging Perspective

Authors: Tariq M Khan, Toufique Ahmed Soomro, Imran Razzak

Abstract: Retinal imaging has emerged as a powerful, non-invasive modality for detecting and quantifying biomarkers of systemic diseases-ranging from diabetes and hypertension to Alzheimer's disease and cardiovascular disorders but current insights remain dispersed across platforms and specialties. Recent technological advances in optical coherence tomography (OCT/OCTA) and adaptive optics (AO) now deliver… ▽ More Retinal imaging has emerged as a powerful, non-invasive modality for detecting and quantifying biomarkers of systemic diseases-ranging from diabetes and hypertension to Alzheimer's disease and cardiovascular disorders but current insights remain dispersed across platforms and specialties. Recent technological advances in optical coherence tomography (OCT/OCTA) and adaptive optics (AO) now deliver ultra-high-resolution scans (down to 5 μm ) with superior contrast and spatial integration, allowing early identification of microvascular abnormalities and neurodegenerative changes. At the same time, AI-driven and machine learning (ML) algorithms have revolutionized the analysis of large-scale retinal datasets, increasing sensitivity and specificity; for example, deep learning models achieve > 90 \% sensitivity for diabetic retinopathy and AUC = 0.89 for the prediction of cardiovascular risk from fundus photographs. The proliferation of mobile health technologies and telemedicine platforms further extends access, reduces costs, and facilitates community-based screening and longitudinal monitoring. Despite these breakthroughs, translation into routine practice is hindered by heterogeneous imaging protocols, limited external validation of AI models, and integration challenges within clinical workflows. In this review, we systematically synthesize the latest OCT/OCT and AO developments, AI/ML approaches, and mHealth/Tele-ophthalmology initiatives and quantify their diagnostic performance across disease domains. Finally, we propose a roadmap for multicenter protocol standardization, prospective validation trials, and seamless incorporation of retinal screening into primary and specialty care pathways-paving the way for precision prevention, early intervention, and ongoing treatment of life-threatening systemic diseases. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.14723 [pdf, other]

QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

Authors: Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam

Abstract: Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enha… ▽ More Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73$\times$ (GMACs) and model size by 83--700$\times$, demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Journal ref: INTERSPEECH, 2025

arXiv:2505.04318 [pdf, other]

Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing

Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan

Abstract: As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the… ▽ More As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the wide variety of model architectures, applications, and datasets, it is important that concept drift detection algorithms are adaptable to different inference scenarios. In this paper, we introduce an application of the $χ^2$ Goodness of Fit Hypothesis Test as a drift detection meta-algorithm applied to a multilayer perceptron, a convolutional neural network, and a transformer trained for machine vision as they are exposed to simulated drift during inference. To that end, we demonstrate how unexpected drops in accuracy due to concept drift can be detected without directly examining the inference outputs. Our approach enhances safety by ensuring models are continually evaluated for reliability across varying conditions. △ Less

Submitted 7 May, 2025; originally announced May 2025.

Comments: 8 pages, 6 figures, 1 table

arXiv:2505.01831 [pdf, other]

Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r… ▽ More High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications. △ Less

Submitted 3 May, 2025; originally announced May 2025.

Comments: Under review at Neural Networks

arXiv:2505.00839 [pdf, other]

SMSAT: A Multimodal Acoustic Dataset and Deep Contrastive Learning Framework for Affective and Physiological Modeling of Spiritual Meditation

Authors: Ahmad Suleman, Yazeed Alkhrijah, Misha Urooj Khan, Hareem Khan, Muhammad Abdullah Husnain Ali Faiz, Mohamad A. Alawad, Zeeshan Kaleem, Guan Gui

Abstract: Understanding how auditory stimuli influence emotional and physiological states is fundamental to advancing affective computing and mental health technologies. In this paper, we present a multimodal evaluation of the affective and physiological impacts of three auditory conditions, that is, spiritual meditation (SM), music (M), and natural silence (NS), using a comprehensive suite of biometric sig… ▽ More Understanding how auditory stimuli influence emotional and physiological states is fundamental to advancing affective computing and mental health technologies. In this paper, we present a multimodal evaluation of the affective and physiological impacts of three auditory conditions, that is, spiritual meditation (SM), music (M), and natural silence (NS), using a comprehensive suite of biometric signal measures. To facilitate this analysis, we introduce the Spiritual, Music, Silence Acoustic Time Series (SMSAT) dataset, a novel benchmark comprising acoustic time series (ATS) signals recorded under controlled exposure protocols, with careful attention to demographic diversity and experimental consistency. To model the auditory induced states, we develop a contrastive learning based SMSAT audio encoder that extracts highly discriminative embeddings from ATS data, achieving 99.99% classification accuracy in interclass and intraclass evaluations. Furthermore, we propose the Calmness Analysis Model (CAM), a deep learning framework integrating 25 handcrafted and learned features for affective state classification across auditory conditions, attaining robust 99.99% classification accuracy. In contrast, pairwise t tests reveal significant deviations in cardiac response characteristics (CRC) between SM analysis via ANOVA inducing more significant physiological fluctuations. Compared to existing state of the art methods reporting accuracies up to 90%, the proposed model demonstrates substantial performance gains (up to 99%). This work contributes a validated multimodal dataset and a scalable deep learning framework for affective computing applications in stress monitoring, mental well-being, and therapeutic audio-based interventions. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.10695 [pdf, other]

Neyman-Pearson Detector for Ambient Backscatter Zero-Energy-Devices Beacons

Authors: Shanglin Yang, Jean-Marie Gorce, Muhammad Jehangir Khan, Dinh-Thuy Phan-Huy, Guillaume Villemaud

Abstract: Recently, a novel ultra-low power indoor wireless positioning system has been proposed. In this system, Zero-Energy-Devices (ZED) beacons are deployed in Indoor environments, and located on a map with unique broadcast identifiers. They harvest ambient energy to power themselves and backscatter ambient waves from cellular networks to send their identifiers. This paper presents a novel detection met… ▽ More Recently, a novel ultra-low power indoor wireless positioning system has been proposed. In this system, Zero-Energy-Devices (ZED) beacons are deployed in Indoor environments, and located on a map with unique broadcast identifiers. They harvest ambient energy to power themselves and backscatter ambient waves from cellular networks to send their identifiers. This paper presents a novel detection method for ZEDs in ambient backscatter systems, with an emphasis on performance evaluation through experimental setups and simulations. We introduce a Neyman-Pearson detection framework, which leverages a predefined false alarm probability to determine the optimal detection threshold. This method, applied to the analysis of backscatter signals in a controlled testbed environment, incorporates the use of BC sequences to enhance signal detection accuracy. The experimental setup, conducted on the FIT/CorteXlab testbed, employs a two-node configuration for signal transmission and reception. Key performance metrics, which is the peak-to-lobe ratio, is evaluated, confirming the effectiveness of the proposed detection model. The results demonstrate a detection system that effectively handles varying noise levels and identifies ZEDs with high reliability. The simulation results show the robustness of the model, highlighting its capacity to achieve desired detection performance even with stringent false alarm thresholds. This work paves the way for robust ZED detection in real-world scenarios, contributing to the advancement of wireless communication technologies. △ Less

Submitted 28 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

Comments: This paper is accepted by European Conference on Networks and Communications 2025

arXiv:2503.17275 [pdf, other]

Vision Transformer Based Semantic Communications for Next Generation Wireless Networks

Authors: Muhammad Ahmed Mohsin, Muhammad Jazib, Zeeshan Alam, Muhmmad Farhan Khan, Muhammad Saad, Muhammad Ali Jamshed

Abstract: In the evolving landscape of 6G networks, semantic communications are poised to revolutionize data transmission by prioritizing the transmission of semantic meaning over raw data accuracy. This paper presents a Vision Transformer (ViT)-based semantic communication framework that has been deliberately designed to achieve high semantic similarity during image transmission while simultaneously minimi… ▽ More In the evolving landscape of 6G networks, semantic communications are poised to revolutionize data transmission by prioritizing the transmission of semantic meaning over raw data accuracy. This paper presents a Vision Transformer (ViT)-based semantic communication framework that has been deliberately designed to achieve high semantic similarity during image transmission while simultaneously minimizing the demand for bandwidth. By equipping ViT as the encoder-decoder framework, the proposed architecture can proficiently encode images into a high semantic content at the transmitter and precisely reconstruct the images, considering real-world fading and noise consideration at the receiver. Building on the attention mechanisms inherent to ViTs, our model outperforms Convolution Neural Network (CNNs) and Generative Adversarial Networks (GANs) tailored for generating such images. The architecture based on the proposed ViT network achieves the Peak Signal-to-noise Ratio (PSNR) of 38 dB, which is higher than other Deep Learning (DL) approaches in maintaining semantic similarity across different communication environments. These findings establish our ViT-based approach as a significant breakthrough in semantic communications. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: Accepted @ ICC 2025

arXiv:2503.01916 [pdf, other]

QDCNN: Quantum Deep Learning for Enhancing Safety and Reliability in Autonomous Transportation Systems

Authors: Ashtakala Meghanath, Subham Das, Bikash K. Behera, Muhammad Attique Khan, Saif Al-Kuwari, Ahmed Farouk

Abstract: In transportation cyber-physical systems (CPS), ensuring safety and reliability in real-time decision-making is essential for successfully deploying autonomous vehicles and intelligent transportation networks. However, these systems face significant challenges, such as computational complexity and the ability to handle ambiguous inputs like shadows in complex environments. This paper introduces a… ▽ More In transportation cyber-physical systems (CPS), ensuring safety and reliability in real-time decision-making is essential for successfully deploying autonomous vehicles and intelligent transportation networks. However, these systems face significant challenges, such as computational complexity and the ability to handle ambiguous inputs like shadows in complex environments. This paper introduces a Quantum Deep Convolutional Neural Network (QDCNN) designed to enhance the safety and reliability of CPS in transportation by leveraging quantum algorithms. At the core of QDCNN is the UU† method, which is utilized to improve shadow detection through a propagation algorithm that trains the centroid value with preprocessing and postprocessing operations to classify shadow regions in images accurately. The proposed QDCNN is evaluated on three datasets on normal conditions and one road affected by rain to test its robustness. It outperforms existing methods in terms of computational efficiency, achieving a shadow detection time of just 0.0049352 seconds, faster than classical algorithms like intensity-based thresholding (0.03 seconds), chromaticity-based shadow detection (1.47 seconds), and local binary pattern techniques (2.05 seconds). This remarkable speed, superior accuracy, and noise resilience demonstrate the key factors for safe navigation in autonomous transportation in real-time. This research demonstrates the potential of quantum-enhanced models in addressing critical limitations of classical methods, contributing to more dependable and robust autonomous transportation systems within the CPS framework. △ Less

Submitted 1 March, 2025; originally announced March 2025.

Comments: 11 Pages, 7 Figures, 4 Tables

arXiv:2502.02782 [pdf, other]

A Comprehensive Survey on Feature Extraction Techniques Using I/Q Imbalance in RFFI

Authors: Muhammad Aqib Khan, Muhammad Usman Siddiqui

Abstract: The proliferation of Internet of Things (IoT) devices has increased the need for secure authentication. While traditional encryption-based solutions can be robust, they often impose high computational and energy overhead on resource-limited IoT nodes. As an alternative, radio frequency fingerprint identification (RFFI) leverages hardware-induced imperfections-such as Inphase/Quadrature (I/Q) imbal… ▽ More The proliferation of Internet of Things (IoT) devices has increased the need for secure authentication. While traditional encryption-based solutions can be robust, they often impose high computational and energy overhead on resource-limited IoT nodes. As an alternative, radio frequency fingerprint identification (RFFI) leverages hardware-induced imperfections-such as Inphase/Quadrature (I/Q) imbalance-in Radio Frequency (RF) front-end components as unique identifiers that are inherently difficult to clone or spoof. Despite recent advances, significant challenges remain in standardizing feature extraction methods, maintaining high accuracy across diverse environments, and efficiently handling large-scale IoT deployments. This paper addresses these gaps by providing a comprehensive review of feature extraction techniques that utilize I/Q imbalance for RFFI. We also discuss other hardware-based RF fingerprinting sources, including power amplifier nonlinearity and oscillator imperfections, and examine modern machine learning (ML) and deep learning (DL) approaches that enhance device identification performance. △ Less

Submitted 4 February, 2025; originally announced February 2025.

Comments: 5 pages, 5 figures

arXiv:2501.13690 [pdf]

Variational U-Net with Local Alignment for Joint Tumor Extraction and Registration (VALOR-Net) of Breast MRI Data Acquired at Two Different Field Strengths

Authors: Muhammad Shahkar Khan, Haider Ali, Laura Villazan Garcia, Noor Badshah, Siegfried Trattnig, Florian Schwarzhans, Ramona Woitek, Olgica Zaric

Abstract: Background: Multiparametric breast MRI data might improve tumor diagnostics, characterization, and treatment planning. Accurate alignment and delineation of images acquired at different field strengths such as 3T and 7T, remain challenging research tasks. Purpose: To address alignment challenges and enable consistent tumor segmentation across different MRI field strengths. Study type: Retrospectiv… ▽ More Background: Multiparametric breast MRI data might improve tumor diagnostics, characterization, and treatment planning. Accurate alignment and delineation of images acquired at different field strengths such as 3T and 7T, remain challenging research tasks. Purpose: To address alignment challenges and enable consistent tumor segmentation across different MRI field strengths. Study type: Retrospective. Subjects: Nine female subjects with breast tumors were involved: six histologically proven invasive ductal carcinomas (IDC) and three fibroadenomas. Field strength/sequence: Imaging was performed at 3T and 7T scanners using post-contrast T1-weighted three-dimensional time-resolved angiography with stochastic trajectories (TWIST) sequence. Assessments: The method's performance for joint image registration and tumor segmentation was evaluated using several quantitative metrics, including signal-to-noise ratio (PSNR), structural similarity index (SSIM), normalized cross-correlation (NCC), Dice coefficient, F1 score, and relative sum of squared differences (rel SSD). Statistical tests: The Pearson correlation coefficient was used to test the relationship between the registration and segmentation metrics. Results: When calculated for each subject individually, the PSNR was in a range from 27.5 to 34.5 dB, and the SSIM was from 82.6 to 92.8%. The model achieved an NCC from 96.4 to 99.3% and a Dice coefficient of 62.9 to 95.3%. The F1 score was between 55.4 and 93.2% and the rel SSD was in the range of 2.0 and 7.5%. The segmentation metrics Dice and F1 Score are highly correlated (0.995), while a moderate correlation between NCC and SSIM (0.681) was found for registration. Data conclusion: Initial results demonstrate that the proposed method may be feasible in providing joint tumor segmentation and registration of MRI data acquired at different field strengths. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.06482 [pdf, other]

Deep Reinforcement Learning Optimized Intelligent Resource Allocation in Active RIS-Integrated TN-NTN Networks

Authors: Muhammad Ahmed Mohsin, Hassan Rizwan, Muhammad Jazib, Muhammad Iqbal, Muhammad Bilal, Tabinda Ashraf, Muhammad Farhan Khan, Jen-Yi Pan

Abstract: This work explores the deployment of active reconfigurable intelligent surfaces (A-RIS) in integrated terrestrial and non-terrestrial networks (TN-NTN) while utilizing coordinated multipoint non-orthogonal multiple access (CoMP-NOMA). Our system model incorporates a UAV-assisted RIS in coordination with a terrestrial RIS which aims for signal enhancement. We aim to maximize the sum rate for all us… ▽ More This work explores the deployment of active reconfigurable intelligent surfaces (A-RIS) in integrated terrestrial and non-terrestrial networks (TN-NTN) while utilizing coordinated multipoint non-orthogonal multiple access (CoMP-NOMA). Our system model incorporates a UAV-assisted RIS in coordination with a terrestrial RIS which aims for signal enhancement. We aim to maximize the sum rate for all users in the network using a custom hybrid proximal policy optimization (H-PPO) algorithm by optimizing the UAV trajectory, base station (BS) power allocation factors, active RIS amplification factor, and phase shift matrix. We integrate edge users into NOMA pairs to achieve diversity gain, further enhancing the overall experience for edge users. Exhaustive comparisons are made with passive RIS-assisted networks to demonstrate the superior efficacy of active RIS in terms of energy efficiency, outage probability, and network sum rate. △ Less

Submitted 11 January, 2025; originally announced January 2025.

Comments: Accepted to WCNC 2025

arXiv:2412.17844 [pdf]

doi 10.1109/RoboSoft48309.2020.9116035

Low-cost foil/paper based touch mode pressure sensing element as artificial skin module for prosthetic hand

Authors: Rishabh B. Mishra, Sherjeel M. Khan, Sohail F. Shaikh, Aftab M. Hussain, Muhammad M. Hussain

Abstract: Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensin… ▽ More Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensing element is utilized to design large-area electronics skin for low-cost prosthetic hands. The presented sensor is characterized in normal, transition, touch and saturation modes. The sensor has higher sensitivity and linearity in touch mode operation from 10 to 40 kPa of applied pressure compared to the normal (0 to 8 kPa), transition (8 to 10 kPa) and saturation mode (after 40 kPa) with response time of 15.85 ms. Advantages of the presented sensor are higher sensitivity, linear response, less diaphragm area, less von Mises stress at the clamped edges region, low temperature drift, robust structure and less separation gap for large pressure measurement compared to normal mode capacitive pressure sensors. The linear range of pressure change is utilized for controlling the position of a servo motor for precise movement in robotic arm using wireless communication, which can be utilized for designing skin-like structure for low-cost prosthetic hands. △ Less

Submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.14538 [pdf, other]

doi 10.1007/s11432-024-4337-1

Overview of AI and Communication for 6G Network: Fundamentals, Challenges, and Future Research Opportunities

Authors: Qimei Cui, Xiaohu You, Ni Wei, Guoshun Nan, Xuefei Zhang, Jianhua Zhang, Xinchen Lyu, Ming Ai, Xiaofeng Tao, Zhiyong Feng, Ping Zhang, Qingqing Wu, Meixia Tao, Yongming Huang, Chongwen Huang, Guangyi Liu, Chenghui Peng, Zhiwen Pan, Tao Sun, Dusit Niyato, Tao Chen, Muhammad Khurram Khan, Abbas Jamalipour, Mohsen Guizani, Chau Yuen

Abstract: With the growing demand for seamless connectivity and intelligent communication, the integration of artificial intelligence (AI) and sixth-generation (6G) communication networks has emerged as a transformative paradigm. By embedding AI capabilities across various network layers, this integration enables optimized resource allocation, improved efficiency, and enhanced system robust performance, par… ▽ More With the growing demand for seamless connectivity and intelligent communication, the integration of artificial intelligence (AI) and sixth-generation (6G) communication networks has emerged as a transformative paradigm. By embedding AI capabilities across various network layers, this integration enables optimized resource allocation, improved efficiency, and enhanced system robust performance, particularly in intricate and dynamic environments. This paper presents a comprehensive overview of AI and communication for 6G networks, with a focus on emphasizing their foundational principles, inherent challenges, and future research opportunities. We first review the integration of AI and communications in the context of 6G, exploring the driving factors behind incorporating AI into wireless communications, as well as the vision for the convergence of AI and 6G. The discourse then transitions to a detailed exposition of the envisioned integration of AI within 6G networks, delineated across three progressive developmental stages. The first stage, AI for Network, focuses on employing AI to augment network performance, optimize efficiency, and enhance user service experiences. The second stage, Network for AI, highlights the role of the network in facilitating and buttressing AI operations and presents key enabling technologies, such as digital twins for AI and semantic communication. In the final stage, AI as a Service, it is anticipated that future 6G networks will innately provide AI functions as services, supporting application scenarios like immersive communication and intelligent industrial robots. In addition, we conduct an in-depth analysis of the critical challenges faced by the integration of AI and communications in 6G. Finally, we outline promising future research opportunities that are expected to drive the development and refinement of AI and 6G communications. △ Less

Submitted 13 February, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

Journal ref: Sci China Inf Sci, 2025, 68(7): 171301

arXiv:2412.07175 [pdf, other]

Robust Feature Engineering Techniques for Designing Efficient Motor Imagery-Based BCI-Systems

Authors: Syed Saim Gardezi, Soyiba Jawed, Mahnoor Khan, Muneeba Bukhari, Rizwan Ahmed Khan

Abstract: A multitude of individuals across the globe grapple with motor disabilities. Neural prosthetics utilizing Brain-Computer Interface (BCI) technology exhibit promise for improving motor rehabilitation outcomes. The intricate nature of EEG data poses a significant hurdle for current BCI systems. Recently, a qualitative repository of EEG signals tied to both upper and lower limb execution of motor and… ▽ More A multitude of individuals across the globe grapple with motor disabilities. Neural prosthetics utilizing Brain-Computer Interface (BCI) technology exhibit promise for improving motor rehabilitation outcomes. The intricate nature of EEG data poses a significant hurdle for current BCI systems. Recently, a qualitative repository of EEG signals tied to both upper and lower limb execution of motor and motor imagery tasks has been unveiled. Despite this, the productivity of the Machine Learning (ML) Models that were trained on this dataset was alarmingly deficient, and the evaluation framework seemed insufficient. To enhance outcomes, robust feature engineering (signal processing) methodologies are implemented. A collection of time domain, frequency domain, and wavelet-derived features was obtained from 16-channel EEG signals, and the Maximum Relevance Minimum Redundancy (MRMR) approach was employed to identify the four most significant features. For classification K Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Naïve Bayes (NB) models were implemented with these selected features, evaluating their effectiveness through metrics such as testing accuracy, precision, recall, and F1 Score. By leveraging SVM with a Gaussian Kernel, a remarkable maximum testing accuracy of 92.50% for motor activities and 95.48% for imagery activities is achieved. These results are notably more dependable and gratifying compared to the previous study, where the peak accuracy was recorded at 74.36%. This research work provides an in-depth analysis of the MI Limb EEG dataset and it will help in designing and developing simple, cost-effective and reliable BCI systems for neuro-rehabilitation. △ Less

Submitted 20 February, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: 26 pages

arXiv:2412.05968 [pdf, other]

LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis

Authors: Mehwish Mehmood, Shahzaib Iqbal, Tariq Mahmood Khan, Ivor Spence, Muhammad Fahim

Abstract: The analysis of retinal images for the diagnosis of various diseases is one of the emerging areas of research. Recently, the research direction has been inclined towards investigating several changes in retinal blood vessels in subjects with many neurological disorders, including dementia. This research focuses on detecting diseases early by improving the performance of models for segmentation of… ▽ More The analysis of retinal images for the diagnosis of various diseases is one of the emerging areas of research. Recently, the research direction has been inclined towards investigating several changes in retinal blood vessels in subjects with many neurological disorders, including dementia. This research focuses on detecting diseases early by improving the performance of models for segmentation of retinal vessels with fewer parameters, which reduces computational costs and supports faster processing. This paper presents a novel lightweight encoder-decoder model that segments retinal vessels to improve the efficiency of disease detection. It incorporates multi-scale convolutional blocks in the encoder to accurately identify vessels of various sizes and thicknesses. The bottleneck of the model integrates the Focal Modulation Attention and Spatial Feature Refinement Blocks to refine and enhance essential features for efficient segmentation. The decoder upsamples features and integrates them with the corresponding feature in the encoder using skip connections and the spatial feature refinement block at every upsampling stage to enhance feature representation at various scales. The estimated computation complexity of our proposed model is around 29.60 GFLOP with 0.71 million parameters and 2.74 MB of memory size, and it is evaluated using public datasets, that is, DRIVE, CHASE\_DB, and STARE. It outperforms existing models with dice scores of 86.44\%, 84.22\%, and 87.88\%, respectively. △ Less

Submitted 8 December, 2024; originally announced December 2024.

arXiv:2412.01456 [pdf, other]

Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond

Authors: MD Raqib Khan, Anshul Negi, Ashutosh Kulkarni, Shruti S. Phutke, Santosh Kumar Vipparthi, Subrahmanyam Murala

Abstract: Quality degradation is observed in underwater images due to the effects of light refraction and absorption by water, leading to issues like color cast, haziness, and limited visibility. This degradation negatively affects the performance of autonomous underwater vehicles used in marine applications. To address these challenges, we propose a lightweight phase-based transformer network with 1.77M pa… ▽ More Quality degradation is observed in underwater images due to the effects of light refraction and absorption by water, leading to issues like color cast, haziness, and limited visibility. This degradation negatively affects the performance of autonomous underwater vehicles used in marine applications. To address these challenges, we propose a lightweight phase-based transformer network with 1.77M parameters for underwater image restoration (UIR). Our approach focuses on effectively extracting non-contaminated features using a phase-based self-attention mechanism. We also introduce an optimized phase attention block to restore structural information by propagating prominent attentive features from the input. We evaluate our method on both synthetic (UIEB, UFO-120) and real-world (UIEB, U45, UCCS, SQUID) underwater image datasets. Additionally, we demonstrate its effectiveness for low-light image enhancement using the LOL dataset. Through extensive ablation studies and comparative analysis, it is clear that the proposed approach outperforms existing state-of-the-art (SOTA) methods. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: 8 pages, 8 figures, conference

arXiv:2411.17556 [pdf, other]

TAFM-Net: A Novel Approach to Skin Lesion Segmentation Using Transformer Attention and Focal Modulation

Authors: Tariq M Khan, Dawn Lin, Shahzaib Iqbal, Erik Meijering

Abstract: Incorporating modern computer vision techniques into clinical protocols shows promise in improving skin lesion segmentation. The U-Net architecture has been a key model in this area, iteratively improved to address challenges arising from the heterogeneity of dermatologic images due to varying clinical settings, lighting, patient attributes, and hair density. To further improve skin lesion segment… ▽ More Incorporating modern computer vision techniques into clinical protocols shows promise in improving skin lesion segmentation. The U-Net architecture has been a key model in this area, iteratively improved to address challenges arising from the heterogeneity of dermatologic images due to varying clinical settings, lighting, patient attributes, and hair density. To further improve skin lesion segmentation, we developed TAFM-Net, an innovative model leveraging self-adaptive transformer attention (TA) coupled with focal modulation (FM). Our model integrates an EfficientNetV2B1 encoder, which employs TA to enhance spatial and channel-related saliency, while a densely connected decoder integrates FM within skip connections, enhancing feature emphasis, segmentation performance, and interpretability crucial for medical image analysis. A novel dynamic loss function amalgamates region and boundary information, guiding effective model training. Our model achieves competitive performance, with Jaccard coefficients of 93.64\%, 86.88\% and 92.88\% in the ISIC2016, ISIC2017 and ISIC2018 datasets, respectively, demonstrating its potential in real-world scenarios. △ Less

Submitted 26 November, 2024; originally announced November 2024.

arXiv:2411.15656 [pdf]

doi 10.1016/j.eswa.2024.125862

Machine-agnostic Automated Lumbar MRI Segmentation using a Cascaded Model Based on Generative Neurons

Authors: Promit Basak, Rusab Sarmun, Saidul Kabir, Israa Al-Hashimi, Enamul Hoque Bhuiyan, Anwarul Hasan, Muhammad Salman Khan, Muhammad E. H. Chowdhury

Abstract: Automated lumbar spine segmentation is very crucial for modern diagnosis systems. In this study, we introduce a novel machine-agnostic approach for segmenting lumbar vertebrae and intervertebral discs from MRI images, employing a cascaded model that synergizes an ROI detection and a Self-organized Operational Neural Network (Self-ONN)-based encoder-decoder network for segmentation. Addressing the… ▽ More Automated lumbar spine segmentation is very crucial for modern diagnosis systems. In this study, we introduce a novel machine-agnostic approach for segmenting lumbar vertebrae and intervertebral discs from MRI images, employing a cascaded model that synergizes an ROI detection and a Self-organized Operational Neural Network (Self-ONN)-based encoder-decoder network for segmentation. Addressing the challenge of diverse MRI modalities, our methodology capitalizes on a unique dataset comprising images from 12 scanners and 34 subjects, enhanced through strategic preprocessing and data augmentation techniques. The YOLOv8 medium model excels in ROI extraction, achieving an excellent performance of 0.916 mAP score. Significantly, our Self-ONN-based model, combined with a DenseNet121 encoder, demonstrates excellent performance in lumbar vertebrae and IVD segmentation with a mean Intersection over Union (IoU) of 83.66%, a sensitivity of 91.44%, and Dice Similarity Coefficient (DSC) of 91.03%, as validated through rigorous 10-fold cross-validation. This study not only showcases an effective approach to MRI segmentation in spine-related disorders but also sets the stage for future advancements in automated diagnostic tools, emphasizing the need for further dataset expansion and model refinement for broader clinical applicability. △ Less

Submitted 23 November, 2024; originally announced November 2024.

Comments: 19 Pages, 11 Figures, Expert Systems with Applications, 2024

ACM Class: I.4.6

arXiv:2411.15596 [pdf, other]

Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification

Authors: Md Ashik Khan, Rafath Bin Zafar Auvee

Abstract: Accurate brain tumor classification in MRI images is critical for timely diagnosis and treatment planning. While deep learning models like ResNet-18, VGG-16 have shown high accuracy, they often come with increased complexity and computational demands. This study presents a comparative analysis of effective yet simple Convolutional Neural Network (CNN) architecture and pre-trained ResNet18, and VGG… ▽ More Accurate brain tumor classification in MRI images is critical for timely diagnosis and treatment planning. While deep learning models like ResNet-18, VGG-16 have shown high accuracy, they often come with increased complexity and computational demands. This study presents a comparative analysis of effective yet simple Convolutional Neural Network (CNN) architecture and pre-trained ResNet18, and VGG16 model for brain tumor classification using two publicly available datasets: Br35H:: Brain Tumor Detection 2020 and Brain Tumor MRI Dataset. The custom CNN architecture, despite its lower complexity, demonstrates competitive performance with the pre-trained ResNet18 and VGG16 models. In binary classification tasks, the custom CNN achieved an accuracy of 98.67% on the Br35H dataset and 99.62% on the Brain Tumor MRI Dataset. For multi-class classification, the custom CNN, with a slight architectural modification, achieved an accuracy of 98.09%, on the Brain Tumor MRI Dataset. Comparatively, ResNet18 and VGG16 maintained high performance levels, but the custom CNNs provided a more computationally efficient alternative. Additionally,the custom CNNs were evaluated using few-shot learning (0, 5, 10, 15, 20, 40, and 80 shots) to assess their robustness, achieving notable accuracy improvements with increased shots. This study highlights the potential of well-designed, less complex CNN architectures as effective and computationally efficient alternatives to deeper, pre-trained models for medical imaging tasks, including brain tumor classification. This study underscores the potential of custom CNNs in medical imaging tasks and encourages further exploration in this direction. △ Less

Submitted 23 December, 2024; v1 submitted 23 November, 2024; originally announced November 2024.

Comments: A revised and extended version of this paper has been accepted at the 27th International Conference on Computer and Information Technology (ICCIT 2024). It spans 8 pages and includes 6 figures

MSC Class: I.2.10 Vision and Scene Understanding; I.4.8 Scene Analysis; 92C55 Biomedical imaging and signal processing

arXiv:2411.02614 [pdf, other]

Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading

Authors: Sharon Chokuwa, Muhammad Haris Khan

Abstract: Diabetic Retinopathy (DR) constitutes 5% of global blindness cases. While numerous deep learning approaches have sought to enhance traditional DR grading methods, they often falter when confronted with new out-of-distribution data thereby impeding their widespread application. In this study, we introduce a novel deep learning method for achieving domain generalization (DG) in DR grading and make t… ▽ More Diabetic Retinopathy (DR) constitutes 5% of global blindness cases. While numerous deep learning approaches have sought to enhance traditional DR grading methods, they often falter when confronted with new out-of-distribution data thereby impeding their widespread application. In this study, we introduce a novel deep learning method for achieving domain generalization (DG) in DR grading and make the following contributions. First, we propose a new way of generating image-to-image diagnostically relevant fundus augmentations conditioned on the grade of the original fundus image. These augmentations are tailored to emulate the types of shifts in DR datasets thus increase the model's robustness. Second, we address the limitations of the standard classification loss in DG for DR fundus datasets by proposing a new DG-specific loss, domain alignment loss; which ensures that the feature vectors from all domains corresponding to the same class converge onto the same manifold for better domain generalization. Third, we tackle the coupled problem of data imbalance across DR domains and classes by proposing to employ Focal loss which seamlessly integrates with our new alignment loss. Fourth, due to inevitable observer variability in DR diagnosis that induces label noise, we propose leveraging self-supervised pretraining. This approach ensures that our DG model remains robust against early susceptibility to label noise, even when only a limited dataset of non-DR fundus images is available for pretraining. Our method demonstrates significant improvements over the strong Empirical Risk Minimization baseline and other recently proposed state-of-the-art DG methods for DR grading. Code is available at https://github.com/sharonchokuwa/dg-adr. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: Accepted at WACV 2025

arXiv:2411.02449 [pdf]

Chronic Obstructive Pulmonary Disease Prediction Using Deep Convolutional Network

Authors: Shahran Rahman Alve, Muhammad Zawad Mahmud, Samiha Islam, Mohammad Monirujjaman Khan

Abstract: AI and deep learning are two recent innovations that have made a big difference in helping to solve problems in the clinical space. Using clinical imaging and sound examination, they also work on improving their vision so that they can spot diseases early and correctly. Because there aren't enough trained HR, clinical professionals are asking for help with innovation because it helps them adapt to… ▽ More AI and deep learning are two recent innovations that have made a big difference in helping to solve problems in the clinical space. Using clinical imaging and sound examination, they also work on improving their vision so that they can spot diseases early and correctly. Because there aren't enough trained HR, clinical professionals are asking for help with innovation because it helps them adapt to more patients. Aside from serious health problems like cancer and diabetes, the effects of respiratory infections are also slowly getting worse and becoming dangerous for society. Respiratory diseases need to be found early and treated quickly, so listening to the sounds of the lungs is proving to be a very helpful tool along with chest X-rays. The presented research hopes to use deep learning ideas based on Convolutional Brain Organization to help clinical specialists by giving a detailed and thorough analysis of clinical respiratory sound data for Ongoing Obstructive Pneumonic identification. We used MFCC, Mel-Spectrogram, Chroma, Chroma (Steady Q), and Chroma CENS from the Librosa AI library in the tests we ran. The new system could also figure out how serious the infection was, whether it was mild, moderate, or severe. The test results agree with the outcome of the deep learning approach that was proposed. The accuracy of the framework arrangement has been raised to a score of 96% on the ICBHI. Also, in the led tests, we used K-Crisp Cross-Approval with ten parts to make the presentation of the new deep learning approach easier to understand. With a 96 percent accuracy rate, the suggested network is better than the rest. If you don't use cross-validation, the model is 90% accurate. △ Less

Submitted 22 December, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

Comments: 16 Pages, 11 Figures

arXiv:2410.21276 [pdf, other]

GPT-4o System Card

Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.20395 [pdf, other]

Depth Attention for Robust RGB Tracking

Authors: Yu Liu, Arif Mahmood, Muhammad Haris Khan

Abstract: RGB video object tracking is a fundamental task in computer vision. Its effectiveness can be improved using depth information, particularly for handling motion-blurred target. However, depth information is often missing in commonly used tracking benchmarks. In this work, we propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out… ▽ More RGB video object tracking is a fundamental task in computer vision. Its effectiveness can be improved using depth information, particularly for handling motion-blurred target. However, depth information is often missing in commonly used tracking benchmarks. In this work, we propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences. Specifically, our work introduces following contributions. To the best of our knowledge, we are the first to propose a depth attention mechanism and to formulate a simple framework that allows seamlessly integration of depth information with state of the art tracking algorithms, without RGB-D cameras, elevating accuracy and robustness. We provide extensive experiments on six challenging tracking benchmarks. Our results demonstrate that our approach provides consistent gains over several strong baselines and achieves new SOTA performance. We believe that our method will open up new possibilities for more sophisticated VOT solutions in real-world scenarios. Our code and models are publicly released: https://github.com/LiuYuML/Depth-Attention. △ Less

Submitted 27 October, 2024; originally announced October 2024.

Comments: Oral Acceptance at the Asian Conference on Computer Vision (ACCV) 2024, Hanoi, Vietnam

arXiv:2410.19772 [pdf, other]

A Novel Numerical Method for Relaxing the Minimal Configurations of TOA-Based Joint Sensors and Sources Localization

Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, Yingxiu Chang

Abstract: This work introduces a novel numerical method that relaxes the minimal configuration requirements for joint sensors and sources localization (JSSL) in 3D space using time of arrival (TOA) measurements. Traditionally, the principle requires that the number of valid equations (TOA measurements) must be equal to or greater than the number of unknown variables (sensor and source locations). State-of-t… ▽ More This work introduces a novel numerical method that relaxes the minimal configuration requirements for joint sensors and sources localization (JSSL) in 3D space using time of arrival (TOA) measurements. Traditionally, the principle requires that the number of valid equations (TOA measurements) must be equal to or greater than the number of unknown variables (sensor and source locations). State-of-the-art literature suggests that the minimum numbers of sensors and sources needed for localization are four to six and six to four, respectively. However, these stringent configurations limit the application of JSSL in scenarios with an insufficient number of sensors and sources. To overcome this limitation, we propose a numerical method that reduces the required number of sensors and sources, enabling more flexible JSSL configurations. First, we formulate the JSSL task as a series of triangles and apply the law of cosines to determine four unknown distances associated with one pair of sensors and three pairs of sources. Next, by utilizing triangle inequalities, we establish the lower and upper boundaries for these unknowns based on the known TOA measurements. The numerical method then searches within these boundaries to find the global optimal solutions, demonstrating that JSSL in 3D space is achievable with only four sensors and four sources, thus significantly relaxing the minimal configuration requirements. Theoretical proofs and simulation results confirm the feasibility and effectiveness of the proposed method. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 13 pages, 6 figures

arXiv:2410.18366 [pdf]

Cochlear Implantation of Slim Pre-curved Arrays using Automatic Pre-operative Insertion Plans

Authors: Kareem O. Tawfik, Mohammad M. R. Khan, Ankita Patro, Miriam R. Smetak, David Haynes, Robert F. Labadie, René H. Gifford, Jack H. Noble

Abstract: Hypothesis: Pre-operative cochlear implant (CI) electrode array (EL) insertion plans created by automated image analysis methods can improve positioning of slim pre-curved EL. Background: This study represents the first evaluation of a system for patient-customized EL insertion planning for a slim pre-curved EL. Methods: Twenty-one temporal bone specimens were divided into experimental and con… ▽ More Hypothesis: Pre-operative cochlear implant (CI) electrode array (EL) insertion plans created by automated image analysis methods can improve positioning of slim pre-curved EL. Background: This study represents the first evaluation of a system for patient-customized EL insertion planning for a slim pre-curved EL. Methods: Twenty-one temporal bone specimens were divided into experimental and control groups and underwent cochlear implantation. For the control group, the surgeon performed a traditional insertion without an insertion plan. For the experimental group, customized insertion plans guided entry site, trajectory, curl direction, and base insertion depth. An additional 35 clinical insertions from the same surgeon were analyzed, 7 of which were conducted using the insertion plans. EL positioning was analyzed using post-operative imaging auto-segmentation techniques, allowing measurement of angular insertion depth (AID), mean modiolar distance (MMD), and scalar position. Results: In the cadaveric temporal bones, 3 scalar translocations, including 2 foldovers, occurred in 14 control group insertions. In the clinical insertions, translocations occurred in 2 of 28 control cases. No translocations or folds occurred in the 7 experimental temporal bone and the 7 experimental clinical insertions. Among the non-translocated cases, overall AID and MMD were 401(41) degrees and 0.34(0.13) mm for the control insertions. AID and MMD for the experimental insertions were 424(43) degrees and 0.34(0.09) mm overall and were 432(19) and 0.30(0.07) mm for cases where the planned insertion depth was achieved. Conclusions: Trends toward improved EL positioning within scala tympani were observed when EL insertion plans are used. Variability in MMD was significantly reduced (0.07mm vs 0.13 mm, p=0.039) when the planned depth was achieved. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: First two listed authors are co-first authors

arXiv:2409.18715 [pdf, other]

doi 10.1109/ICIP51287.2024.10648275

Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification

Authors: Salma Hassan, Hamad Al Hammadi, Ibrahim Mohammed, Muhammad Haris Khan

Abstract: The early detection and nuanced subtype classification of non-small cell lung cancer (NSCLC), a predominant cause of cancer mortality worldwide, is a critical and complex issue. In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data. This unique fusion methodology leverages advan… ▽ More The early detection and nuanced subtype classification of non-small cell lung cancer (NSCLC), a predominant cause of cancer mortality worldwide, is a critical and complex issue. In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data. This unique fusion methodology leverages advanced machine learning models, notably MedClip and BEiT, for sophisticated image feature extraction, setting a new standard in computational oncology. Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision. The results showcase notable improvements across key performance metrics, including accuracy, precision, recall, and F1-score. Specifically, our leading multi-modal classifier model records an impressive accuracy of 94.04%. We believe that our approach has the potential to transform NSCLC diagnostics, facilitating earlier detection and more effective treatment planning and, ultimately, leading to superior patient outcomes in lung cancer care. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.06742 [pdf, other]

Stain Normalization of Hematology Slides using Neural Color Transfer

Authors: M. Muneeb Arshad, Hasan Sajid, M. Jawad Khan

Abstract: Deep learning is popularly used for analyzing pathology images, but variations in image properties can limit the effectiveness of the models. The study aims to develop a method that transfers the variability present in the training set to unseen images, improving the model's ability to make accurate inferences. YOLOv5 was trained on peripheral blood and bone marrow sample images and Neural Color T… ▽ More Deep learning is popularly used for analyzing pathology images, but variations in image properties can limit the effectiveness of the models. The study aims to develop a method that transfers the variability present in the training set to unseen images, improving the model's ability to make accurate inferences. YOLOv5 was trained on peripheral blood and bone marrow sample images and Neural Color Transfer techniques were used to incorporate invariance. The results showed significant improvement in detecting WBCs from untrained samples after normalization, highlighting the potential of deep learning-based normalization techniques for inference robustness. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.06018 [pdf, other]

Pioneering Precision in Lumbar Spine MRI Segmentation with Advanced Deep Learning and Data Enhancement

Authors: Istiak Ahmed, Md. Tanzim Hossain, Md. Zahirul Islam Nahid, Kazi Shahriar Sanjid, Md. Shakib Shahariar Junayed, M. Monir Uddin, Mohammad Monirujjaman Khan

Abstract: This study presents an advanced approach to lumbar spine segmentation using deep learning techniques, focusing on addressing key challenges such as class imbalance and data preprocessing. Magnetic resonance imaging (MRI) scans of patients with low back pain are meticulously preprocessed to accurately represent three critical classes: vertebrae, spinal canal, and intervertebral discs (IVDs). By rec… ▽ More This study presents an advanced approach to lumbar spine segmentation using deep learning techniques, focusing on addressing key challenges such as class imbalance and data preprocessing. Magnetic resonance imaging (MRI) scans of patients with low back pain are meticulously preprocessed to accurately represent three critical classes: vertebrae, spinal canal, and intervertebral discs (IVDs). By rectifying class inconsistencies in the data preprocessing stage, the fidelity of the training data is ensured. The modified U-Net model incorporates innovative architectural enhancements, including an upsample block with leaky Rectified Linear Units (ReLU) and Glorot uniform initializer, to mitigate common issues such as the dying ReLU problem and improve stability during training. Introducing a custom combined loss function effectively tackles class imbalance, significantly improving segmentation accuracy. Evaluation using a comprehensive suite of metrics showcases the superior performance of this approach, outperforming existing methods and advancing the current techniques in lumbar spine segmentation. These findings hold significant advancements for enhanced lumbar spine MRI and segmentation diagnostic accuracy. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.03367 [pdf, other]

TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation

Authors: Shahzaib Iqbal, Tariq M. Khan, Syed S. Naqvi, Asim Naveed, Erik Meijering

Abstract: Deep learning has shown great potential for automated medical image segmentation to improve the precision and speed of disease diagnostics. However, the task presents significant difficulties due to variations in the scale, shape, texture, and contrast of the pathologies. Traditional convolutional neural network (CNN) models have certain limitations when it comes to effectively modelling multiscal… ▽ More Deep learning has shown great potential for automated medical image segmentation to improve the precision and speed of disease diagnostics. However, the task presents significant difficulties due to variations in the scale, shape, texture, and contrast of the pathologies. Traditional convolutional neural network (CNN) models have certain limitations when it comes to effectively modelling multiscale context information and facilitating information interaction between skip connections across levels. To overcome these limitations, a novel deep learning architecture is introduced for medical image segmentation, taking advantage of CNNs and vision transformers. Our proposed model, named TBConvL-Net, involves a hybrid network that combines the local features of a CNN encoder-decoder architecture with long-range and temporal dependencies using biconvolutional long-short-term memory (LSTM) networks and vision transformers (ViT). This enables the model to capture contextual channel relationships in the data and account for the uncertainty of segmentation over time. Additionally, we introduce a novel composite loss function that considers both the segmentation robustness and the boundary agreement of the predicted output with the gold standard. Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets of seven different medical imaging modalities. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.12323 [pdf, other]

EUIS-Net: A Convolutional Neural Network for Efficient Ultrasound Image Segmentation

Authors: Shahzaib Iqbal, Hasnat Ahmed, Muhammad Sharif, Madiha Hena, Tariq M. Khan, Imran Razzak

Abstract: Segmenting ultrasound images is critical for various medical applications, but it offers significant challenges due to ultrasound images' inherent noise and unpredictability. To address these challenges, we proposed EUIS-Net, a CNN network designed to segment ultrasound images efficiently and precisely. The proposed EUIS-Net utilises four encoder-decoder blocks, resulting in a notable decrease in… ▽ More Segmenting ultrasound images is critical for various medical applications, but it offers significant challenges due to ultrasound images' inherent noise and unpredictability. To address these challenges, we proposed EUIS-Net, a CNN network designed to segment ultrasound images efficiently and precisely. The proposed EUIS-Net utilises four encoder-decoder blocks, resulting in a notable decrease in computational complexity while achieving excellent performance. The proposed EUIS-Net integrates both channel and spatial attention mechanisms into the bottleneck to improve feature representation and collect significant contextual information. In addition, EUIS-Net incorporates a region-aware attention module in skip connections, which enhances the ability to concentrate on the region of the injury. To enable thorough information exchange across various network blocks, skip connection aggregation is employed from the network's lowermost to the uppermost block. Comprehensive evaluations are conducted on two publicly available ultrasound image segmentation datasets. The proposed EUIS-Net achieved mean IoU and dice scores of 78. 12\%, 85. 42\% and 84. 73\%, 89. 01\% in the BUSI and DDTI datasets, respectively. The findings of our study showcase the substantial capabilities of EUIS-Net for immediate use in clinical settings and its versatility in various ultrasound imaging tasks. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.09687 [pdf, other]

TESL-Net: A Transformer-Enhanced CNN for Accurate Skin Lesion Segmentation

Authors: Shahzaib Iqbal, Muhammad Zeeshan, Mehwish Mehmood, Tariq M. Khan, Imran Razzak

Abstract: Early detection of skin cancer relies on precise segmentation of dermoscopic images of skin lesions. However, this task is challenging due to the irregular shape of the lesion, the lack of sharp borders, and the presence of artefacts such as marker colours and hair follicles. Recent methods for melanoma segmentation are U-Nets and fully connected networks (FCNs). As the depth of these neural netwo… ▽ More Early detection of skin cancer relies on precise segmentation of dermoscopic images of skin lesions. However, this task is challenging due to the irregular shape of the lesion, the lack of sharp borders, and the presence of artefacts such as marker colours and hair follicles. Recent methods for melanoma segmentation are U-Nets and fully connected networks (FCNs). As the depth of these neural network models increases, they can face issues like the vanishing gradient problem and parameter redundancy, potentially leading to a decrease in the Jaccard index of the segmentation model. In this study, we introduced a novel network named TESL-Net for the segmentation of skin lesions. The proposed TESL-Net involves a hybrid network that combines the local features of a CNN encoder-decoder architecture with long-range and temporal dependencies using bi-convolutional long-short-term memory (Bi-ConvLSTM) networks and a Swin transformer. This enables the model to account for the uncertainty of segmentation over time and capture contextual channel relationships in the data. We evaluated the efficacy of TESL-Net in three commonly used datasets (ISIC 2016, ISIC 2017, and ISIC 2018) for the segmentation of skin lesions. The proposed TESL-Net achieves state-of-the-art performance, as evidenced by a significantly elevated Jaccard index demonstrated by empirical results. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.07925 [pdf]

A Single Channel-Based Neonatal Sleep-Wake Classification using Hjorth Parameters and Improved Gradient Boosting

Authors: Muhammad Arslan, Muhammad Mubeen, Saadullah Farooq Abbasi, Muhammad Shahbaz Khan, Wadii Boulila, Jawad Ahmad

Abstract: Sleep plays a crucial role in neonatal development. Monitoring the sleep patterns in neonates in a Neonatal Intensive Care Unit (NICU) is imperative for understanding the maturation process. While polysomnography (PSG) is considered the best practice for sleep classification, its expense and reliance on human annotation pose challenges. Existing research often relies on multichannel EEG signals; h… ▽ More Sleep plays a crucial role in neonatal development. Monitoring the sleep patterns in neonates in a Neonatal Intensive Care Unit (NICU) is imperative for understanding the maturation process. While polysomnography (PSG) is considered the best practice for sleep classification, its expense and reliance on human annotation pose challenges. Existing research often relies on multichannel EEG signals; however, concerns arise regarding the vulnerability of neonates and the potential impact on their sleep quality. This paper introduces a novel approach to neonatal sleep stage classification using a single-channel gradient boosting algorithm with Hjorth features. The gradient boosting parameters are fine-tuned using random search cross-validation (randomsearchCV), achieving an accuracy of 82.35% for neonatal sleep-wake classification. Validation is conducted through 5-fold cross-validation. The proposed algorithm not only enhances existing neonatal sleep algorithms but also opens avenues for broader applications. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 8 pages, 5 figures, 3 tables, International Polydisciplinary Conference on Artificial Intelligence and New Technologies

arXiv:2408.04773 [pdf, other]

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement

Authors: Muhammad Salman Khan, Moreno La Quatra, Kuo-Hsuan Hung, Szu-Wei Fu, Sabato Marco Siniscalchi, Yu Tsao

Abstract: Self-supervised representation learning (SSL) has attained SOTA results on several downstream speech tasks, but SSL-based speech enhancement (SE) solutions still lag behind. To address this issue, we exploit three main ideas: (i) Transformer-based masking generation, (ii) consistency-preserving loss, and (iii) perceptual contrast stretching (PCS). In detail, conformer layers, leveraging an attenti… ▽ More Self-supervised representation learning (SSL) has attained SOTA results on several downstream speech tasks, but SSL-based speech enhancement (SE) solutions still lag behind. To address this issue, we exploit three main ideas: (i) Transformer-based masking generation, (ii) consistency-preserving loss, and (iii) perceptual contrast stretching (PCS). In detail, conformer layers, leveraging an attention mechanism, are introduced to effectively model frame-level representations and obtain the Ideal Ratio Mask (IRM) for SE. Moreover, we incorporate consistency in the loss function, which processes the input to account for the inconsistency effects of signal reconstruction from the spectrogram. Finally, PCS is employed to improve the contrast of input and target features according to perceptual importance. Evaluated on the VoiceBank-DEMAND task, the proposed solution outperforms previously SSL-based SE solutions when tested on several objective metrics, attaining a SOTA PESQ score of 3.54. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.02359 [pdf, other]

Blind User Activity Detection for Grant-Free Random Access in Cell-Free mMIMO Networks

Authors: Muhammad Usman Khan, Enrico Testi, Marco Chiani, Enrico Paolini

Abstract: Cell-free massive MIMO (CF-mMIMO) networks have recently emerged as a promising solution to tackle the challenges arising from next-generation massive machine-type communications. In this paper, a fully grant-free deep learning (DL)-based method for user activity detection in CF-mMIMO networks is proposed. Initially, the known non-orthogonal pilot sequences are used to estimate the channel coeffic… ▽ More Cell-free massive MIMO (CF-mMIMO) networks have recently emerged as a promising solution to tackle the challenges arising from next-generation massive machine-type communications. In this paper, a fully grant-free deep learning (DL)-based method for user activity detection in CF-mMIMO networks is proposed. Initially, the known non-orthogonal pilot sequences are used to estimate the channel coefficients between each user and the access points. Then, a deep convolutional neural network is used to estimate the activity status of the users. The proposed method is "blind", i.e., it is fully data-driven and does not require prior large-scale fading coefficients estimation. Numerical results show how the proposed DL-based algorithm is able to merge the information gathered by the distributed antennas to estimate the user activity status, yet outperforming a state-of-the-art covariance-based method. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: Accepted for publication at IEEE RTSI 2024, Lecco, Italy

arXiv:2408.01372 [pdf, other]

doi 10.1016/j.neucom.2025.129995

Spatial and Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Muhammad Usama, Swalpa Kumar Roy, Jocelyn Chanussot, Danfeng Hong

Abstract: Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (… ▽ More Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (SSMM) model (MorpMamba), which combines the strengths of morphological operations and the state space model framework, offering a more computationally efficient alternative to transformers. In MorpMamba, a novel token generation module first converts HSI patches into spatial-spectral tokens. These tokens are then processed through morphological operations such as erosion and dilation, utilizing depthwise separable convolutions to capture structural and shape information. A token enhancement module refines these features by dynamically adjusting the spatial and spectral tokens based on central HSI regions, ensuring effective feature fusion within each block. Subsequently, multi-head self-attention is applied to further enrich the feature representations, allowing the model to capture complex relationships and dependencies within the data. Finally, the enhanced tokens are fed into a state space module, which efficiently models the temporal evolution of the features for classification. Experimental results on widely used HSI datasets demonstrate that MorpMamba achieves superior parametric efficiency compared to traditional CNN and transformer models while maintaining high accuracy. The code will be made publicly available at \url{https://github.com/mahmad000/MorpMamba}. △ Less

Submitted 30 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.19570 [pdf, other]

A Baseline Approach for Modeling and Characterization of Commercial Off-The-Shelf (COTS) Droop Controlled Converter

Authors: Muhammad Anees, Lisa Qi, Mehnaz Khan, Srdjan Lukic

Abstract: Due to advancements in power electronics, new converter topologies are introduced day by day. It's hard to get an equivalent model from any manufacturer of any Commercial Off-The-Shelf (COTS) power electronics converters because of intellectual property (IP) and safety concerns. Most COTS products don't reveal the exact topology of the converter as well as the control architecture and correspondin… ▽ More Due to advancements in power electronics, new converter topologies are introduced day by day. It's hard to get an equivalent model from any manufacturer of any Commercial Off-The-Shelf (COTS) power electronics converters because of intellectual property (IP) and safety concerns. Most COTS products don't reveal the exact topology of the converter as well as the control architecture and corresponding control gains. Because of these reasons, it's very hard to implement an exact equivalent model of COTS converters. Hierarchical control is widely used in microgrid applications, where different control layers are decoupled on the timescale. This article gives a baseline approach to build an equivalent model of any droop controlled COTS converter to mimic its control performance. Bandwidths of the different control loops are estimated along with the control logic from experiments on COTS converter and then hierarchical control is followed to come up with the equivalent model. The approach is verified on COTS CE+T Stabiliti 30C3 converter with its equivalent model build in MATLAB Simulink. △ Less

Submitted 17 September, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.08813 [pdf, other]

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., different retinal imaging modalities) for patient diagnosis. This paper presents FairDomain, a pioneering systemic study into algorithmic fairness under domain shifts, employing state-of-the-art domain adaptation (DA) and generalization (DG) algorithms for both medical segmentation and classification tasks to understand how biases are transferred between different domains. We also introduce a novel plug-and-play fair identity attention (FIA) module that adapts to various DA and DG algorithms to improve fairness by using self-attention to adjust feature importance based on demographic attributes. Additionally, we curate the first fairness-focused dataset with two paired imaging modalities for the same patient cohort on medical segmentation and classification tasks, to rigorously assess fairness in domain-shift scenarios. Excluding the confounding impact of demographic distribution variation between source and target domains will allow clearer quantification of the performance of domain transfer models. Our extensive evaluations reveal that the proposed FIA significantly enhances both model performance accounted for fairness across all domain shift settings (i.e., DA and DG) with respect to different demographics, which outperforms existing methods on both segmentation and classification. The code and data can be accessed at https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k. △ Less

Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Codes and datasets are available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

arXiv:2407.02871 [pdf, other]

LMBF-Net: A Lightweight Multipath Bidirectional Focal Attention Network for Multifeatures Segmentation

Authors: Tariq M Khan, Shahzaib Iqbal, Syed S. Naqvi, Imran Razzak, Erik Meijering

Abstract: Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early. Since retinal diseases are so complicated, retinal imaging is likely to show two or more abnormalities. Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability. This paper presents a multipath convolutional neu… ▽ More Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early. Since retinal diseases are so complicated, retinal imaging is likely to show two or more abnormalities. Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability. This paper presents a multipath convolutional neural network for multifeature segmentation. The proposed network is lightweight and spatially sensitive to information. A patch-based implementation is used to extract local image features, and focal modulation attention blocks are incorporated between the encoder and the decoder for improved segmentation. Filter optimisation is used to prevent filter overlaps and speed up model convergence. A combination of convolution operations and group convolution operations is used to reduce computational costs. This is the first robust and generalisable network capable of segmenting multiple features of fundus images (including retinal vessels, microaneurysms, optic discs, haemorrhages, hard exudates, and soft exudates). The results of our experimental evaluation on more than ten publicly available datasets with multiple features show that the proposed network outperforms recent networks despite having a small number of learnable parameters. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.17190 [pdf, other]

Sound Tagging in Infant-centric Home Soundscapes

Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or… ▽ More Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted in IEEE/ACM CHASE 2024

arXiv:2405.17520 [pdf, other]

Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images

Authors: Syed Javed, Tariq M. Khan, Abdul Qayyum, Hamid Alinejad-Rokny, Arcot Sowmya, Imran Razzak

Abstract: Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issue… ▽ More Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issues, we propose Mini-Net, a lightweight segmentation network specifically designed for medical images. With fewer than 38,000 parameters, Mini-Net efficiently captures both high- and low-frequency features, enabling real-time applications in various medical imaging scenarios. We evaluate Mini-Net on various datasets, including DRIVE, STARE, ISIC-2016, ISIC-2018, and MoNuSeg, demonstrating its robustness and good performance compared to state-of-the-art methods. △ Less

Submitted 20 September, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.15337 [pdf, other]

RSSI Estimation for Constrained Indoor Wireless Networks using ANN

Authors: Samrah Arif, M. Arif Khan, Sabih Ur Rehman

Abstract: In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-bas… ▽ More In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-based ANN model. Both models have been constructed to enhance LP-IoT communication by lowering the estimation error in the LP-IoT wireless channel. The Feature-based model aims to capture complex patterns of measured Received Signal Strength Indicator (RSSI) data using environmental characteristics. The Sequence-based approach utilises predetermined categorisation techniques to estimate the RSSI sequence of specifically selected environment characteristics. The findings demonstrate that our suggested approaches attain remarkable precision in channel estimation, with an improvement in MSE of $88.29\%$ of the Feature-based model and $97.46\%$ of the Sequence-based model over existing research. Additionally, the comparative analysis of these techniques with traditional and other Deep Learning (DL)-based techniques also highlights the superior performance of our developed models and their potential in real-world IoT applications. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.11771 [pdf]

IoT-Driven Cloud-based Energy and Environment Monitoring System for Manufacturing Industry

Authors: Nitol Saha, Md Masruk Aulia, Md. Mostafizur Rahman, Mohammed Shafiul Alam Khan

Abstract: This research focused on the development of a cost-effective IoT solution for energy and environment monitoring geared towards manufacturing industries. The proposed system is developed using open-source software that can be easily deployed in any manufacturing environment. The system collects real-time temperature, humidity, and energy data from different devices running on different communicatio… ▽ More This research focused on the development of a cost-effective IoT solution for energy and environment monitoring geared towards manufacturing industries. The proposed system is developed using open-source software that can be easily deployed in any manufacturing environment. The system collects real-time temperature, humidity, and energy data from different devices running on different communication such as TCP/IP, Modbus, etc., and the data is transferred wirelessly using an MQTT client to a database working as a cloud storage solution. The collected data is then visualized and analyzed using a website running on a host machine working as a web client. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09342 [pdf, other]

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge. △ Less

Submitted 22 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: ACM Multimedia Conference - Grand Challenge

arXiv:2403.14120 [pdf, other]

Advancing IIoT with Over-the-Air Federated Learning: The Role of Iterative Magnitude Pruning

Authors: Fazal Muhammad Ali Khan, Hatem Abou-Zeid, Aryan Kaushik, Syed Ali Hassan

Abstract: The industrial Internet of Things (IIoT) under Industry 4.0 heralds an era of interconnected smart devices where data-driven insights and machine learning (ML) fuse to revolutionize manufacturing. A noteworthy development in IIoT is the integration of federated learning (FL), which addresses data privacy and security among devices. FL enables edge sensors, also known as peripheral intelligence uni… ▽ More The industrial Internet of Things (IIoT) under Industry 4.0 heralds an era of interconnected smart devices where data-driven insights and machine learning (ML) fuse to revolutionize manufacturing. A noteworthy development in IIoT is the integration of federated learning (FL), which addresses data privacy and security among devices. FL enables edge sensors, also known as peripheral intelligence units (PIUs) to learn and adapt using their data locally, without explicit sharing of confidential data, to facilitate a collaborative yet confidential learning process. However, the lower memory footprint and computational power of PIUs inherently require deep neural network (DNN) models that have a very compact size. Model compression techniques such as pruning can be used to reduce the size of DNN models by removing unnecessary connections that have little impact on the model's performance, thus making the models more suitable for the limited resources of PIUs. Targeting the notion of compact yet robust DNN models, we propose the integration of iterative magnitude pruning (IMP) of the DNN model being trained in an over-the-air FL (OTA-FL) environment for IIoT. We provide a tutorial overview and also present a case study of the effectiveness of IMP in OTA-FL for an IIoT environment. Finally, we present future directions for enhancing and optimizing these deep compression techniques further, aiming to push the boundaries of IIoT capabilities in acquiring compact yet robust and high-performing DNN models. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 6 pages, 6 figures

arXiv:2403.08099 [pdf, other]

Application of Distributed Arithmetic to Adaptive Filtering Algorithms: Trends, Challenges and Future

Authors: Mohd. Tasleem Khan

Abstract: The utilization of distributed arithmetic (DA) in AF algorithms has gained significant attention in recent years due to its potential to enhance computational efficiency and reduce resource requirements. This paper presents an exploration of the application of DA to adaptive filtering (AF) algorithms, analyzing trends, discussing challenges, and outlining future prospects. It begins by providing a… ▽ More The utilization of distributed arithmetic (DA) in AF algorithms has gained significant attention in recent years due to its potential to enhance computational efficiency and reduce resource requirements. This paper presents an exploration of the application of DA to adaptive filtering (AF) algorithms, analyzing trends, discussing challenges, and outlining future prospects. It begins by providing an overview of both DA and AF algorithms, highlighting their individual merits and established applications. Subsequently, the integration of DA into AF algorithms is explored, showcasing its ability to optimize multiply-accumulate operations and mitigate the computational burden associated with AF algorithms. Throughout the paper, the critical trends observed in the field are discussed, including advancements in DA-based hardware architectures. Moreover, the challenges encountered in implementing DA-based AF is also discussed. The continued evolution of DA techniques to cater to the demands of modern AF applications, including real-time processing, resource-constrained environments, and high-dimensional data streams is anticipated. In conclusion, this paper consolidates the current state of applying DA to AF algorithms, offering insights into prevailing trends, discussing challenges, and presenting future research and development in the field. The fusion of these two domains holds promise for achieving improved computational efficiency, reduced hardware complexity, and enhanced performance in various signal processing applications. △ Less

Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05415 [pdf]

An Overview of Automated Vehicle Longitudinal Platoon Formation Strategies

Authors: M Sabbir Salek, Mugdha Basu Thakur, Pardha Sai Krishna Ala, Mashrur Chowdhury, Matthias Schmid, Pamela Murray-Tuite, Sakib Mahmud Khan, Venkat Krovi

Abstract: Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communicatio… ▽ More Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communication, and control strategies has resulted in rapid and constant evolution of AV platooning strategies. In this paper, we review the state-of-the-art knowledge in AV longitudinal platoon formation using a five-component platooning framework, which includes vehicle model, information-receiving process, information flow topology, spacing policy, and controller and discuss the advantages and limitations of the components. Based on the discussion about existing strategies and associated limitations, potential future research directions are presented. △ Less

Submitted 16 May, 2025; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18859 [pdf]

doi 10.1016/j.xcrp.2024.101941

Taking Second-life Batteries from Exhausted to Empowered using Experiments, Data Analysis, and Health Estimation

Authors: Xiaofan Cui, Muhammad Aadil Khan, Gabriele Pozzato, Surinder Singh, Ratnesh Sharma, Simona Onori

Abstract: The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within… ▽ More The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within a 3-4 V voltage window. Four machine-learning-based health estimation models, relying on online-accessible features and initial capacity, are compared, with the selected model achieving a mean absolute percentage error below 2.3% on test data. Additionally, an adaptive online health estimation algorithm is proposed by integrating a clustering-based method, thus limiting estimation errors during online deployment. These results showcase the feasibility of repurposing retired batteries for second-life applications. Based on obtained data and power demand, these second-life batteries exhibit potential for over a decade of grid energy storage use. △ Less

Submitted 8 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 16 pages, 8 figures

Showing 1–50 of 160 results for author: Khan, M