-
Deep Perceptual Enhancement for Medical Image Analysis
Authors:
S M A Sharif,
Rizwan Ali Naqvi,
Mithun Biswas,
Woong-Kee Loh
Abstract:
Due to numerous hardware shortcomings, medical image acquisition devices are susceptible to producing low-quality (i.e., low contrast, inappropriate brightness, noisy, etc.) images. Regrettably, perceptually degraded images directly impact the diagnosis process and make the decision-making manoeuvre of medical practitioners notably complicated. This study proposes to enhance such low-quality image…
▽ More
Due to numerous hardware shortcomings, medical image acquisition devices are susceptible to producing low-quality (i.e., low contrast, inappropriate brightness, noisy, etc.) images. Regrettably, perceptually degraded images directly impact the diagnosis process and make the decision-making manoeuvre of medical practitioners notably complicated. This study proposes to enhance such low-quality images by incorporating end-to-end learning strategies for accelerating medical image analysis tasks. To the best concern, this is the first work in medical imaging which comprehensively tackles perceptual enhancement, including contrast correction, luminance correction, denoising, etc., with a fully convolutional deep network. The proposed network leverages residual blocks and a residual gating mechanism for diminishing visual artefacts and is guided by a multi-term objective function to perceive the perceptually plausible enhanced images. The practicability of the deep medical image enhancement method has been extensively investigated with sophisticated experiments. The experimental outcomes illustrate that the proposed method could outperform the existing enhancement methods for different medical image modalities by 5.00 to 7.00 dB in peak signal-to-noise ratio (PSNR) metrics and 4.00 to 6.00 in DeltaE metrics. Additionally, the proposed method can drastically improve the medical image analysis tasks' performance and reveal the potentiality of such an enhancement method in real-world applications. Code Available: https://github.com/sharif-apu/DPE_JBHI
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Two-stage Deep Denoising with Self-guided Noise Attention for Multimodal Medical Images
Authors:
S M A Sharif,
Rizwan Ali Naqvi,
Woong-Kee Loh
Abstract:
Medical image denoising is considered among the most challenging vision tasks. Despite the real-world implications, existing denoising methods have notable drawbacks as they often generate visual artifacts when applied to heterogeneous medical images. This study addresses the limitation of the contemporary denoising methods with an artificial intelligence (AI)-driven two-stage learning strategy. T…
▽ More
Medical image denoising is considered among the most challenging vision tasks. Despite the real-world implications, existing denoising methods have notable drawbacks as they often generate visual artifacts when applied to heterogeneous medical images. This study addresses the limitation of the contemporary denoising methods with an artificial intelligence (AI)-driven two-stage learning strategy. The proposed method learns to estimate the residual noise from the noisy images. Later, it incorporates a novel noise attention mechanism to correlate estimated residual noise with noisy inputs to perform denoising in a course-to-refine manner. This study also proposes to leverage a multi-modal learning strategy to generalize the denoising among medical image modalities and multiple noise patterns for widespread applications. The practicability of the proposed method has been evaluated with dense experiments. The experimental results demonstrated that the proposed method achieved state-of-the-art performance by significantly outperforming the existing medical image denoising methods in quantitative and qualitative comparisons. Overall, it illustrates a performance gain of 7.64 in Peak Signal-to-Noise Ratio (PSNR), 0.1021 in Structural Similarity Index (SSIM), 0.80 in DeltaE ($ΔE$), 0.1855 in Visual Information Fidelity Pixel-wise (VIFP), and 18.54 in Mean Squared Error (MSE) metrics.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
DarkDeblur: Learning single-shot image deblurring in low-light condition
Authors:
S M A Sharif,
Rizwan Ali Naqvi,
Farman Alic,
Mithun Biswas
Abstract:
Single-shot image deblurring in a low-light condition is known to be a profoundly challenging image translation task. This study tackles the limitations of the low-light image deblurring with a learning-based approach and proposes a novel deep network named as DarkDeblurNet. The proposed DarkDeblur- Net comprises a dense-attention block and a contextual gating mechanism in a feature pyramid struct…
▽ More
Single-shot image deblurring in a low-light condition is known to be a profoundly challenging image translation task. This study tackles the limitations of the low-light image deblurring with a learning-based approach and proposes a novel deep network named as DarkDeblurNet. The proposed DarkDeblur- Net comprises a dense-attention block and a contextual gating mechanism in a feature pyramid structure to leverage content awareness. The model additionally incorporates a multi-term objective function to perceive a plausible perceptual image quality while performing image deblurring in the low-light settings. The practicability of the proposed model has been verified by fusing it in numerous computer vision applications. Apart from that, this study introduces a benchmark dataset collected with actual hardware to assess the low-light image deblurring methods in a real-world setup. The experimental results illustrate that the proposed method can outperform the state-of-the-art methods in both synthesized and real-world data for single-shot image deblurring, even in challenging lighting environment.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
An Integrated Deep Learning Framework Leveraging NASNet and Vision Transformer with MixProcessing for Accurate and Precise Diagnosis of Lung Diseases
Authors:
Sajjad Saleem,
Muhammad Imran Sharif
Abstract:
The lungs are the essential organs of respiration, and this system is significant in the carbon dioxide and exchange between oxygen that occurs in human life. However, several lung diseases, which include pneumonia, tuberculosis, COVID-19, and lung cancer, are serious healthiness challenges and demand early and precise diagnostics. The methodological study has proposed a new deep learning framewor…
▽ More
The lungs are the essential organs of respiration, and this system is significant in the carbon dioxide and exchange between oxygen that occurs in human life. However, several lung diseases, which include pneumonia, tuberculosis, COVID-19, and lung cancer, are serious healthiness challenges and demand early and precise diagnostics. The methodological study has proposed a new deep learning framework called NASNet-ViT, which effectively incorporates the convolution capability of NASNet with the global attention mechanism capability of Vision Transformer ViT. The proposed model will classify the lung conditions into five classes: Lung cancer, COVID-19, pneumonia, TB, and normal. A sophisticated multi-faceted preprocessing strategy called MixProcessing has been used to improve diagnostic accuracy. This preprocessing combines wavelet transform, adaptive histogram equalization, and morphological filtering techniques. The NASNet-ViT model performs at state of the art, achieving an accuracy of 98.9%, sensitivity of 0.99, an F1-score of 0.989, and specificity of 0.987, outperforming other state of the art architectures such as MixNet-LD, D-ResNet, MobileNet, and ResNet50. The model's efficiency is further emphasized by its compact size, 25.6 MB, and a low computational time of 12.4 seconds, hence suitable for real-time, clinically constrained environments. These results reflect the high-quality capability of NASNet-ViT in extracting meaningful features and recognizing various types of lung diseases with very high accuracy. This work contributes to medical image analysis by providing a robust and scalable solution for diagnostics in lung diseases.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language
Authors:
Muhammad Sharif,
Zeeshan Abbas,
Jiangyan Yi,
Chenglin Liu
Abstract:
Automatic Speech Recognition (ASR) technology has witnessed significant advancements in recent years, revolutionizing human-computer interactions. While major languages have benefited from these developments, lesser-resourced languages like Urdu face unique challenges. This paper provides an extensive exploration of the dynamic landscape of ASR research, focusing particularly on the resource-const…
▽ More
Automatic Speech Recognition (ASR) technology has witnessed significant advancements in recent years, revolutionizing human-computer interactions. While major languages have benefited from these developments, lesser-resourced languages like Urdu face unique challenges. This paper provides an extensive exploration of the dynamic landscape of ASR research, focusing particularly on the resource-constrained Urdu language, which is widely spoken across South Asian nations. It outlines current research trends, technological advancements, and potential directions for future studies in Urdu ASR, aiming to pave the way for forthcoming researchers interested in this domain. By leveraging contemporary technologies, analyzing existing datasets, and evaluating effective algorithms and tools, the paper seeks to shed light on the unique challenges and opportunities associated with Urdu language processing and its integration into the broader field of speech research.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
TransResNet: Integrating the Strengths of ViTs and CNNs for High Resolution Medical Image Segmentation via Feature Grafting
Authors:
Muhammad Hamza Sharif,
Dmitry Demidov,
Asif Hanif,
Mohammad Yaqub,
Min Xu
Abstract:
High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method. In particular, high resolution helps substantially in improving automatic image segmentation. However, most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and p…
▽ More
High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method. In particular, high resolution helps substantially in improving automatic image segmentation. However, most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and perform poorly on high-resolution images. To address this shortcoming, we propose a parallel-in-branch architecture called TransResNet, which incorporates Transformer and CNN in a parallel manner to extract features from multi-resolution images independently. In TransResNet, we introduce Cross Grafting Module (CGM), which generates the grafted features, enriched in both global semantic and low-level spatial details, by combining the feature maps from Transformer and CNN branches through fusion and self-attention mechanism. Moreover, we use these grafted features in the decoding process, increasing the information flow for better prediction of the segmentation mask. Extensive experiments on ten datasets demonstrate that TransResNet achieves either state-of-the-art or competitive results on several segmentation tasks, including skin lesion, retinal vessel, and polyp segmentation. The source code and pre-trained models are available at https://github.com/Sharifmhamza/TransResNet.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Ionospheric Scintillation Forecasting Using Machine Learning
Authors:
Sultan Halawa,
Maryam Alansaari,
Maryam Sharif,
Amel Alhammadi,
Ilias Fernini
Abstract:
This study explores the use of historical data from Global Navigation Satellite System (GNSS) scintillation monitoring receivers to predict the severity of amplitude scintillation, a phenomenon where electron density irregularities in the ionosphere cause fluctuations in GNSS signal power. These fluctuations can be measured using the S4 index, but real-time data is not always available. The resear…
▽ More
This study explores the use of historical data from Global Navigation Satellite System (GNSS) scintillation monitoring receivers to predict the severity of amplitude scintillation, a phenomenon where electron density irregularities in the ionosphere cause fluctuations in GNSS signal power. These fluctuations can be measured using the S4 index, but real-time data is not always available. The research focuses on developing a machine learning (ML) model that can forecast the intensity of amplitude scintillation, categorizing it into low, medium, or high severity levels based on various time and space-related factors. Among six different ML models tested, the XGBoost model emerged as the most effective, demonstrating a remarkable 77% prediction accuracy when trained with a balanced dataset. This work underscores the effectiveness of machine learning in enhancing the reliability and performance of GNSS signals and navigation systems by accurately predicting amplitude scintillation severity.
△ Less
Submitted 28 August, 2024;
originally announced September 2024.
-
EUIS-Net: A Convolutional Neural Network for Efficient Ultrasound Image Segmentation
Authors:
Shahzaib Iqbal,
Hasnat Ahmed,
Muhammad Sharif,
Madiha Hena,
Tariq M. Khan,
Imran Razzak
Abstract:
Segmenting ultrasound images is critical for various medical applications, but it offers significant challenges due to ultrasound images' inherent noise and unpredictability. To address these challenges, we proposed EUIS-Net, a CNN network designed to segment ultrasound images efficiently and precisely. The proposed EUIS-Net utilises four encoder-decoder blocks, resulting in a notable decrease in…
▽ More
Segmenting ultrasound images is critical for various medical applications, but it offers significant challenges due to ultrasound images' inherent noise and unpredictability. To address these challenges, we proposed EUIS-Net, a CNN network designed to segment ultrasound images efficiently and precisely. The proposed EUIS-Net utilises four encoder-decoder blocks, resulting in a notable decrease in computational complexity while achieving excellent performance. The proposed EUIS-Net integrates both channel and spatial attention mechanisms into the bottleneck to improve feature representation and collect significant contextual information. In addition, EUIS-Net incorporates a region-aware attention module in skip connections, which enhances the ability to concentrate on the region of the injury. To enable thorough information exchange across various network blocks, skip connection aggregation is employed from the network's lowermost to the uppermost block. Comprehensive evaluations are conducted on two publicly available ultrasound image segmentation datasets. The proposed EUIS-Net achieved mean IoU and dice scores of 78. 12\%, 85. 42\% and 84. 73\%, 89. 01\% in the BUSI and DDTI datasets, respectively. The findings of our study showcase the substantial capabilities of EUIS-Net for immediate use in clinical settings and its versatility in various ultrasound imaging tasks.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Power Spectral Density-Based Resting-State EEG Classification of First-Episode Psychosis
Authors:
Sadi Md. Redwan,
Md Palash Uddin,
Anwaar Ulhaq,
Muhammad Imran Sharif
Abstract:
Historically, the analysis of stimulus-dependent time-frequency patterns has been the cornerstone of most electroencephalography (EEG) studies. The abnormal oscillations in high-frequency waves associated with psychotic disorders during sensory and cognitive tasks have been studied many times. However, any significant dissimilarity in the resting-state low-frequency bands is yet to be established.…
▽ More
Historically, the analysis of stimulus-dependent time-frequency patterns has been the cornerstone of most electroencephalography (EEG) studies. The abnormal oscillations in high-frequency waves associated with psychotic disorders during sensory and cognitive tasks have been studied many times. However, any significant dissimilarity in the resting-state low-frequency bands is yet to be established. Spectral analysis of the alpha and delta band waves shows the effectiveness of stimulus-independent EEG in identifying the abnormal activity patterns of pathological brains. A generalized model incorporating multiple frequency bands should be more efficient in associating potential EEG biomarkers with First-Episode Psychosis (FEP), leading to an accurate diagnosis. We explore multiple machine-learning methods, including random-forest, support vector machine, and Gaussian Process Classifier (GPC), to demonstrate the practicality of resting-state Power Spectral Density (PSD) to distinguish patients of FEP from healthy controls. A comprehensive discussion of our preprocessing methods for PSD analysis and a detailed comparison of different models are included in this paper. The GPC model outperforms the other models with a specificity of 95.78% to show that PSD can be used as an effective feature extraction technique for analyzing and classifying resting-state EEG signals of psychiatric disorders.
△ Less
Submitted 22 November, 2022;
originally announced January 2023.
-
A Network Theory Investigation into the Altered Resting State Functional Connectivity in Attention-Deficit Hyperactivity Disorder
Authors:
Sadi Md. Redwan,
Md Palash Uddin,
Muhammad Imran Sharif,
Anwaar Ulhaq
Abstract:
In the last two decades, functional magnetic resonance imaging (fMRI) has emerged as one of the most effective technologies in clinical research of the human brain. fMRI allows researchers to study healthy and pathological brains while they perform various neuropsychological functions. Beyond task-related activations, the human brain has some intrinsic activity at a task-negative (resting) state t…
▽ More
In the last two decades, functional magnetic resonance imaging (fMRI) has emerged as one of the most effective technologies in clinical research of the human brain. fMRI allows researchers to study healthy and pathological brains while they perform various neuropsychological functions. Beyond task-related activations, the human brain has some intrinsic activity at a task-negative (resting) state that surprisingly consumes a lot of energy to support communication among neurons. Recent neuroimaging research has also seen an increase in modeling and analyzing brain activity in terms of a graph or network. Since graph models facilitate a systems-theoretic explanation of the brain, they have become increasingly relevant with advances in network science and the popularization of complex systems theory. The purpose of this study is to look into the abnormalities in resting brain functions in adults with Attention Deficit Hyperactivity Disorder (ADHD). The primary goal is to investigate resting-state functional connectivity (FC), which can be construed as a significant temporal coincidence in blood-oxygen-level dependent (BOLD) signals between functionally related brain regions in the absence of any stimulus or task. When compared to healthy controls, ADHD patients have lower average connectivity in the Supramarginal Gyrus and Superior Parietal Lobule, but higher connectivity in the Lateral Occipital Cortex and Inferior Temporal Gyrus. We also hypothesize that the network organization of default mode and dorsal attention regions is abnormal in ADHD patients.
△ Less
Submitted 22 November, 2022;
originally announced December 2022.
-
AIM 2022 Challenge on Instagram Filter Removal: Methods and Results
Authors:
Furkan Kınlı,
Sami Menteş,
Barış Özcan,
Furkan Kıraç,
Radu Timofte,
Yi Zuo,
Zitao Wang,
Xiaowen Zhang,
Yu Zhu,
Chenghua Li,
Cong Leng,
Jian Cheng,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Tianzhi Ma,
Zihan Gao,
Wenxin He,
Woon-Ha Yeo,
Wang-Taek Oh,
Young-Il Kim,
Han-Cheol Ryu,
Gang He
, et al. (8 additional authors not shown)
Abstract:
This paper introduces the methods and the results of AIM 2022 challenge on Instagram Filter Removal. Social media filters transform the images by consecutive non-linear operations, and the feature maps of the original content may be interpolated into a different domain. This reduces the overall performance of the recent deep learning strategies. The main goal of this challenge is to produce realis…
▽ More
This paper introduces the methods and the results of AIM 2022 challenge on Instagram Filter Removal. Social media filters transform the images by consecutive non-linear operations, and the feature maps of the original content may be interpolated into a different domain. This reduces the overall performance of the recent deep learning strategies. The main goal of this challenge is to produce realistic and visually plausible images where the impact of the filters applied is mitigated while preserving the content. The proposed solutions are ranked in terms of the PSNR value with respect to the original images. There are two prior studies on this task as the baseline, and a total of 9 teams have competed in the final phase of the challenge. The comparison of qualitative results of the proposed solutions and the benchmark for the challenge are presented in this report.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
SAGAN: Adversarial Spatial-asymmetric Attention for Noisy Nona-Bayer Reconstruction
Authors:
S M A Sharif,
Rizwan Ali Naqvi,
Mithun Biswas
Abstract:
Nona-Bayer colour filter array (CFA) pattern is considered one of the most viable alternatives to traditional Bayer patterns. Despite the substantial advantages, such non-Bayer CFA patterns are susceptible to produce visual artefacts while reconstructing RGB images from noisy sensor data. This study addresses the challenges of learning RGB image reconstruction from noisy Nona-Bayer CFA comprehensi…
▽ More
Nona-Bayer colour filter array (CFA) pattern is considered one of the most viable alternatives to traditional Bayer patterns. Despite the substantial advantages, such non-Bayer CFA patterns are susceptible to produce visual artefacts while reconstructing RGB images from noisy sensor data. This study addresses the challenges of learning RGB image reconstruction from noisy Nona-Bayer CFA comprehensively. We propose a novel spatial-asymmetric attention module to jointly learn bi-direction transformation and large-kernel global attention to reduce the visual artefacts. We combine our proposed module with adversarial learning to produce plausible images from Nona-Bayer CFA. The feasibility of the proposed method has been verified and compared with the state-of-the-art image reconstruction method. The experiments reveal that the proposed method can reconstruct RGB images from noisy Nona-Bayer CFA without producing any visually disturbing artefacts. Also, it can outperform the state-of-the-art image reconstruction method in both qualitative and quantitative comparison. Code available: https://github.com/sharif-apu/SAGAN_BMVC21.
△ Less
Submitted 18 October, 2021; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control
Authors:
Mehrshad Zandigohar,
Mo Han,
Mohammadreza Sharif,
Sezen Yagmur Gunay,
Mariusz P. Furmanek,
Mathew Yarossi,
Paolo Bonato,
Cagdas Onal,
Taskin Padir,
Deniz Erdogmus,
Gunar Schirner
Abstract:
Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and…
▽ More
Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.
△ Less
Submitted 27 February, 2024; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Accurate and Rapid Diagnosis of COVID-19 Pneumonia with Batch Effect Removal of Chest CT-Scans and Interpretable Artificial Intelligence
Authors:
Rassa Ghavami Modegh,
Mehrab Hamidi,
Saeed Masoudian,
Amir Mohseni,
Hamzeh Lotfalinezhad,
Mohammad Ali Kazemi,
Behnaz Moradi,
Mahyar Ghafoori,
Omid Motamedi,
Omid Pournik,
Kiara Rezaei-Kalantari,
Amirreza Manteghinezhad,
Shaghayegh Haghjooy Javanmard,
Fateme Abdoli Nezhad,
Ahmad Enhesari,
Mohammad Saeed Kheyrkhah,
Razieh Eghtesadi,
Javid Azadbakht,
Akbar Aliasgharzadeh,
Mohammad Reza Sharif,
Ali Khaleghi,
Abbas Foroutan,
Hossein Ghanaati,
Hamed Dashti,
Hamid R. Rabiee
Abstract:
COVID-19 is a virus with high transmission rate that demands rapid identification of the infected patients to reduce the spread of the disease. The current gold-standard test, Reverse-Transcription Polymerase Chain Reaction (RT-PCR), has a high rate of false negatives. Diagnosing from CT-scan images as a more accurate alternative has the challenge of distinguishing COVID-19 from other pneumonia di…
▽ More
COVID-19 is a virus with high transmission rate that demands rapid identification of the infected patients to reduce the spread of the disease. The current gold-standard test, Reverse-Transcription Polymerase Chain Reaction (RT-PCR), has a high rate of false negatives. Diagnosing from CT-scan images as a more accurate alternative has the challenge of distinguishing COVID-19 from other pneumonia diseases. Artificial intelligence can help radiologists and physicians to accelerate the process of diagnosis, increase its accuracy, and measure the severity of the disease. We designed a new interpretable deep neural network to distinguish healthy people, patients with COVID-19, and patients with other pneumonia diseases from axial lung CT-scan images. Our model also detects the infected areas and calculates the percentage of the infected lung volume. We first preprocessed the images to eliminate the batch effects of different devices, and then adopted a weakly supervised method to train the model without having any tags for the infected parts. We trained and evaluated the model on a large dataset of 3359 samples from 6 different medical centers. The model reached sensitivities of 97.75% and 98.15%, and specificities of 87% and 81.03% in separating healthy people from the diseased and COVID-19 from other diseases, respectively. It also demonstrated similar performance for 1435 samples from 6 different medical centers which proves its generalizability. The performance of the model on a large diverse dataset, its generalizability, and interpretability makes it suitable to be used as a reliable diagnostic system.
△ Less
Submitted 8 January, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.