Search | arXiv e-print repository

arXiv:2504.20093 [pdf]

Self-Healing Software Systems: Lessons from Nature, Powered by AI

Authors: Mohammad Baqar, Rajat Khanda, Saba Naqvi

Abstract: As modern software systems grow in complexity and scale, their ability to autonomously detect, diagnose, and recover from failures becomes increasingly vital. Drawing inspiration from biological healing - where the human body detects damage, signals the brain, and activates targeted recovery - this paper explores the concept of self-healing software driven by artificial intelligence. We propose a… ▽ More As modern software systems grow in complexity and scale, their ability to autonomously detect, diagnose, and recover from failures becomes increasingly vital. Drawing inspiration from biological healing - where the human body detects damage, signals the brain, and activates targeted recovery - this paper explores the concept of self-healing software driven by artificial intelligence. We propose a novel framework that mimics this biological model system observability tools serve as sensory inputs, AI models function as the cognitive core for diagnosis and repair, and healing agents apply targeted code and test modifications. By combining log analysis, static code inspection, and AI-driven generation of patches or test updates, our approach aims to reduce downtime, accelerate debugging, and enhance software resilience. We evaluate the effectiveness of this model through case studies and simulations, comparing it against traditional manual debugging and recovery workflows. This work paves the way toward intelligent, adaptive and self-reliant software systems capable of continuous healing, akin to living organisms. △ Less

Submitted 25 April, 2025; originally announced April 2025.

arXiv:2501.15293 [pdf]

Deep Learning in Early Alzheimer's disease's Detection: A Comprehensive Survey of Classification, Segmentation, and Feature Extraction Methods

Authors: Rubab Hafeez, Sadia Waheed, Syeda Aleena Naqvi, Fahad Maqbool, Amna Sarwar, Sajjad Saleem, Muhammad Imran Sharif, Kamran Siddique, Zahid Akhtar

Abstract: Alzheimers disease is a deadly neurological condition, impairing important memory and brain functions. Alzheimers disease promotes brain shrinkage, ultimately leading to dementia. Dementia diagnosis typically takes 2.8 to 4.4 years after the first clinical indication. Advancements in computing and information technology have led to many techniques of studying Alzheimers disease. Early identificati… ▽ More Alzheimers disease is a deadly neurological condition, impairing important memory and brain functions. Alzheimers disease promotes brain shrinkage, ultimately leading to dementia. Dementia diagnosis typically takes 2.8 to 4.4 years after the first clinical indication. Advancements in computing and information technology have led to many techniques of studying Alzheimers disease. Early identification and therapy are crucial for preventing Alzheimers disease, as early-onset dementia hits people before the age of 65, while late-onset dementia occurs after this age. According to the 2015 World Alzheimers disease Report, there are 46.8 million individuals worldwide suffering from dementia, with an anticipated 74.7 million more by 2030 and 131.5 million by 2050. Deep Learning has outperformed conventional Machine Learning techniques by identifying intricate structures in high-dimensional data. Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), have achieved an accuracy of up to 96.0% for Alzheimers disease classification, and 84.2% for mild cognitive impairment (MCI) conversion prediction. There have been few literature surveys available on applying ML to predict dementia, lacking in congenital observations. However, this survey has focused on a specific data channel for dementia detection. This study evaluated Deep Learning algorithms for early Alzheimers disease detection, using openly accessible datasets, feature segmentation, and classification methods. This article also has identified research gaps and limits in detecting Alzheimers disease, which can inform future research. △ Less

Submitted 20 April, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

Comments: 22 pages

arXiv:2501.02516

A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio

Authors: Shuanglin Li, Siyang Song, Rajesh Nair, Syed Mohsen Naqvi

Abstract: Depression and Attention Deficit Hyperactivity Disorder (ADHD) stand out as the common mental health challenges today. In affective computing, speech signals serve as effective biomarkers for mental disorder assessment. Current research, relying on labor-intensive hand-crafted features or simplistic time-frequency representations, often overlooks critical details by not accounting for the differen… ▽ More Depression and Attention Deficit Hyperactivity Disorder (ADHD) stand out as the common mental health challenges today. In affective computing, speech signals serve as effective biomarkers for mental disorder assessment. Current research, relying on labor-intensive hand-crafted features or simplistic time-frequency representations, often overlooks critical details by not accounting for the differential impacts of various frequency bands and temporal fluctuations. Therefore, we propose a frequency-aware augmentation network with dynamic convolution for depression and ADHD assessment. In the proposed method, the spectrogram is used as the input feature and adopts a multi-scale convolution to help the network focus on discriminative frequency bands related to mental disorders. A dynamic convolution is also designed to aggregate multiple convolution kernels dynamically based upon their attentions which are input-independent to capture dynamic information. Finally, a feature augmentation block is proposed to enhance the feature representation ability and make full use of the captured information. Experimental results on AVEC 2014 and self-recorded ADHD dataset prove the robustness of our method, an RMSE of 9.23 was attained for estimating depression severity, along with an accuracy of 89.8\% in detecting ADHD. △ Less

Submitted 4 March, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

Comments: Have find some technical problems which need be addressed within a plenty of time, and some part of them should be completed

arXiv:2501.02512 [pdf, other]

Efficient Long Speech Sequence Modelling for Time-Domain Depression Level Estimation

Authors: Shuanglin Li, Zhijie Xie, Syed Mohsen Naqvi

Abstract: Depression significantly affects emotions, thoughts, and daily activities. Recent research indicates that speech signals contain vital cues about depression, sparking interest in audio-based deep-learning methods for estimating its severity. However, most methods rely on time-frequency representations of speech which have recently been criticized for their limitations due to the loss of informatio… ▽ More Depression significantly affects emotions, thoughts, and daily activities. Recent research indicates that speech signals contain vital cues about depression, sparking interest in audio-based deep-learning methods for estimating its severity. However, most methods rely on time-frequency representations of speech which have recently been criticized for their limitations due to the loss of information when performing time-frequency projections, e.g. Fourier transform, and Mel-scale transformation. Furthermore, segmenting real-world speech into brief intervals risks losing critical interconnections between recordings. Additionally, such an approach may not adequately reflect real-world scenarios, as individuals with depression often pause and slow down in their conversations and interactions. Building on these observations, we present an efficient method for depression level estimation using long speech signals in the time domain. The proposed method leverages a state space model coupled with the dual-path structure-based long sequence modelling module and temporal external attention module to reconstruct and enhance the detection of depression-related cues hidden in the raw audio waveforms. Experimental results on the AVEC2013 and AVEC2014 datasets show promising results in capturing consequential long-sequence depression cues and demonstrate outstanding performance over the state-of-the-art. △ Less

Submitted 5 January, 2025; originally announced January 2025.

arXiv:2409.05420 [pdf, other]

AD-Net: Attention-based dilated convolutional residual network with guided decoder for robust skin lesion segmentation

Authors: Asim Naveed, Syed S. Naqvi, Tariq M. Khan, Shahzaib Iqbal, M. Yaqoob Wani, Haroon Ahmed Khan

Abstract: In computer-aided diagnosis tools employed for skin cancer treatment and early diagnosis, skin lesion segmentation is important. However, achieving precise segmentation is challenging due to inherent variations in appearance, contrast, texture, and blurry lesion boundaries. This research presents a robust approach utilizing a dilated convolutional residual network, which incorporates an attention-… ▽ More In computer-aided diagnosis tools employed for skin cancer treatment and early diagnosis, skin lesion segmentation is important. However, achieving precise segmentation is challenging due to inherent variations in appearance, contrast, texture, and blurry lesion boundaries. This research presents a robust approach utilizing a dilated convolutional residual network, which incorporates an attention-based spatial feature enhancement block (ASFEB) and employs a guided decoder strategy. In each dilated convolutional residual block, dilated convolution is employed to broaden the receptive field with varying dilation rates. To improve the spatial feature information of the encoder, we employed an attention-based spatial feature enhancement block in the skip connections. The ASFEB in our proposed method combines feature maps obtained from average and maximum-pooling operations. These combined features are then weighted using the active outcome of global average pooling and convolution operations. Additionally, we have incorporated a guided decoder strategy, where each decoder block is optimized using an individual loss function to enhance the feature learning process in the proposed AD-Net. The proposed AD-Net presents a significant benefit by necessitating fewer model parameters compared to its peer methods. This reduction in parameters directly impacts the number of labeled data required for training, facilitating faster convergence during the training process. The effectiveness of the proposed AD-Net was evaluated using four public benchmark datasets. We conducted a Wilcoxon signed-rank test to verify the efficiency of the AD-Net. The outcomes suggest that our method surpasses other cutting-edge methods in performance, even without the implementation of data augmentation strategies. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.03367 [pdf, other]

TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation

Authors: Shahzaib Iqbal, Tariq M. Khan, Syed S. Naqvi, Asim Naveed, Erik Meijering

Abstract: Deep learning has shown great potential for automated medical image segmentation to improve the precision and speed of disease diagnostics. However, the task presents significant difficulties due to variations in the scale, shape, texture, and contrast of the pathologies. Traditional convolutional neural network (CNN) models have certain limitations when it comes to effectively modelling multiscal… ▽ More Deep learning has shown great potential for automated medical image segmentation to improve the precision and speed of disease diagnostics. However, the task presents significant difficulties due to variations in the scale, shape, texture, and contrast of the pathologies. Traditional convolutional neural network (CNN) models have certain limitations when it comes to effectively modelling multiscale context information and facilitating information interaction between skip connections across levels. To overcome these limitations, a novel deep learning architecture is introduced for medical image segmentation, taking advantage of CNNs and vision transformers. Our proposed model, named TBConvL-Net, involves a hybrid network that combines the local features of a CNN encoder-decoder architecture with long-range and temporal dependencies using biconvolutional long-short-term memory (LSTM) networks and vision transformers (ViT). This enables the model to capture contextual channel relationships in the data and account for the uncertainty of segmentation over time. Additionally, we introduce a novel composite loss function that considers both the segmentation robustness and the boundary agreement of the predicted output with the gold standard. Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets of seven different medical imaging modalities. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.02274 [pdf]

doi 10.1016/j.nsa.2023.102835

ADHD diagnosis based on action characteristics recorded in videos using machine learning

Authors: Yichun Li, Syes Mohsen Naqvi, Rajesh Nair

Abstract: Demand for ADHD diagnosis and treatment is increasing significantly and the existing services are unable to meet the demand in a timely manner. In this work, we introduce a novel action recognition method for ADHD diagnosis by identifying and analysing raw video recordings. Our main contributions include 1) designing and implementing a test focusing on the attention and hyperactivity/impulsivity o… ▽ More Demand for ADHD diagnosis and treatment is increasing significantly and the existing services are unable to meet the demand in a timely manner. In this work, we introduce a novel action recognition method for ADHD diagnosis by identifying and analysing raw video recordings. Our main contributions include 1) designing and implementing a test focusing on the attention and hyperactivity/impulsivity of participants, recorded through three cameras; 2) implementing a novel machine learning ADHD diagnosis system based on action recognition neural networks for the first time; 3) proposing classification criteria to provide diagnosis results and analysis of ADHD action characteristics. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Neuroscience Applied

arXiv:2409.02261 [pdf, ps, other]

doi 10.14428/esann/2023.ES2023-17

Action-Based ADHD Diagnosis in Video

Authors: Yichun Li, Yuxing Yang, Syed Nohsen Naqvi

Abstract: Attention Deficit Hyperactivity Disorder (ADHD) causes significant impairment in various domains. Early diagnosis of ADHD and treatment could significantly improve the quality of life and functioning. Recently, machine learning methods have improved the accuracy and efficiency of the ADHD diagnosis process. However, the cost of the equipment and trained staff required by the existing methods are g… ▽ More Attention Deficit Hyperactivity Disorder (ADHD) causes significant impairment in various domains. Early diagnosis of ADHD and treatment could significantly improve the quality of life and functioning. Recently, machine learning methods have improved the accuracy and efficiency of the ADHD diagnosis process. However, the cost of the equipment and trained staff required by the existing methods are generally huge. Therefore, we introduce the video-based frame-level action recognition network to ADHD diagnosis for the first time. We also record a real multi-modal ADHD dataset and extract three action classes from the video modality for ADHD diagnosis. The whole process data have been reported to CNTW-NHS Foundation Trust, which would be reviewed by medical consultants/professionals and will be made public in due course. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 31st European Symposium on Artificial Neural Networks

arXiv:2409.02243 [pdf, other]

A Novel Audio-Visual Information Fusion System for Mental Disorders Detection

Authors: Yichun Li, Shuanglin Li, Syed Mohsen Naqvi

Abstract: Mental disorders are among the foremost contributors to the global healthcare challenge. Research indicates that timely diagnosis and intervention are vital in treating various mental disorders. However, the early somatization symptoms of certain mental disorders may not be immediately evident, often resulting in their oversight and misdiagnosis. Additionally, the traditional diagnosis methods inc… ▽ More Mental disorders are among the foremost contributors to the global healthcare challenge. Research indicates that timely diagnosis and intervention are vital in treating various mental disorders. However, the early somatization symptoms of certain mental disorders may not be immediately evident, often resulting in their oversight and misdiagnosis. Additionally, the traditional diagnosis methods incur high time and cost. Deep learning methods based on fMRI and EEG have improved the efficiency of the mental disorder detection process. However, the cost of the equipment and trained staff are generally huge. Moreover, most systems are only trained for a specific mental disorder and are not general-purpose. Recently, physiological studies have shown that there are some speech and facial-related symptoms in a few mental disorders (e.g., depression and ADHD). In this paper, we focus on the emotional expression features of mental disorders and introduce a multimodal mental disorder diagnosis system based on audio-visual information input. Our proposed system is based on spatial-temporal attention networks and innovative uses a less computationally intensive pre-train audio recognition network to fine-tune the video recognition module for better results. We also apply the unified system for multiple mental disorders (ADHD and depression) for the first time. The proposed system achieves over 80\% accuracy on the real multimodal ADHD dataset and achieves state-of-the-art results on the depression dataset AVEC 2014. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 27th International Conference on Information (FUSION)

arXiv:2407.02871 [pdf, other]

LMBF-Net: A Lightweight Multipath Bidirectional Focal Attention Network for Multifeatures Segmentation

Authors: Tariq M Khan, Shahzaib Iqbal, Syed S. Naqvi, Imran Razzak, Erik Meijering

Abstract: Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early. Since retinal diseases are so complicated, retinal imaging is likely to show two or more abnormalities. Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability. This paper presents a multipath convolutional neu… ▽ More Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early. Since retinal diseases are so complicated, retinal imaging is likely to show two or more abnormalities. Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability. This paper presents a multipath convolutional neural network for multifeature segmentation. The proposed network is lightweight and spatially sensitive to information. A patch-based implementation is used to extract local image features, and focal modulation attention blocks are incorporated between the encoder and the decoder for improved segmentation. Filter optimisation is used to prevent filter overlaps and speed up model convergence. A combination of convolution operations and group convolution operations is used to reduce computational costs. This is the first robust and generalisable network capable of segmenting multiple features of fundus images (including retinal vessels, microaneurysms, optic discs, haemorrhages, hard exudates, and soft exudates). The results of our experimental evaluation on more than ten publicly available datasets with multiple features show that the proposed network outperforms recent networks despite having a small number of learnable parameters. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2312.10585 [pdf, ps, other]

ESDMR-Net: A Lightweight Network With Expand-Squeeze and Dual Multiscale Residual Connections for Medical Image Segmentation

Authors: Tariq M Khan, Syed S. Naqvi, Erik Meijering

Abstract: Segmentation is an important task in a wide range of computer vision applications, including medical image analysis. Recent years have seen an increase in the complexity of medical image segmentation approaches based on sophisticated convolutional neural network architectures. This progress has led to incremental enhancements in performance on widely recognised benchmark datasets. However, most of… ▽ More Segmentation is an important task in a wide range of computer vision applications, including medical image analysis. Recent years have seen an increase in the complexity of medical image segmentation approaches based on sophisticated convolutional neural network architectures. This progress has led to incremental enhancements in performance on widely recognised benchmark datasets. However, most of the existing approaches are computationally demanding, which limits their practical applicability. This paper presents an expand-squeeze dual multiscale residual network (ESDMR-Net), which is a fully convolutional network that is particularly well-suited for resource-constrained computing hardware such as mobile devices. ESDMR-Net focuses on extracting multiscale features, enabling the learning of contextual dependencies among semantically distinct features. The ESDMR-Net architecture allows dual-stream information flow within encoder-decoder pairs. The expansion operation (depthwise separable convolution) makes all of the rich features with multiscale information available to the squeeze operation (bottleneck layer), which then extracts the necessary information for the segmentation task. The Expand-Squeeze (ES) block helps the network pay more attention to under-represented classes, which contributes to improved segmentation accuracy. To enhance the flow of information across multiple resolutions or scales, we integrated dual multiscale residual (DMR) blocks into the skip connection. This integration enables the decoder to access features from various levels of abstraction, ultimately resulting in more comprehensive feature representations. We present experiments on seven datasets from five distinct examples of applications. Our model achieved the best results despite having significantly fewer trainable parameters, with a reduction of two or even three orders of magnitude. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.02699 [pdf, other]

Enhancing Vehicle Entrance and Parking Management: Deep Learning Solutions for Efficiency and Security

Authors: Muhammad Umer Ramzan, Usman Ali, Syed Haider Abbas Naqvi, Zeeshan Aslam, Tehseen, Husnain Ali, Muhammad Faheem

Abstract: The auto-management of vehicle entrance and parking in any organization is a complex challenge encompassing record-keeping, efficiency, and security concerns. Manual methods for tracking vehicles and finding parking spaces are slow and a waste of time. To solve the problem of auto management of vehicle entrance and parking, we have utilized state-of-the-art deep learning models and automated the p… ▽ More The auto-management of vehicle entrance and parking in any organization is a complex challenge encompassing record-keeping, efficiency, and security concerns. Manual methods for tracking vehicles and finding parking spaces are slow and a waste of time. To solve the problem of auto management of vehicle entrance and parking, we have utilized state-of-the-art deep learning models and automated the process of vehicle entrance and parking into any organization. To ensure security, our system integrated vehicle detection, license number plate verification, and face detection and recognition models to ensure that the person and vehicle are registered with the organization. We have trained multiple deep-learning models for vehicle detection, license number plate detection, face detection, and recognition, however, the YOLOv8n model outperformed all the other models. Furthermore, License plate recognition is facilitated by Google's Tesseract-OCR Engine. By integrating these technologies, the system offers efficient vehicle detection, precise identification, streamlined record keeping, and optimized parking slot allocation in buildings, thereby enhancing convenience, accuracy, and security. Future research opportunities lie in fine-tuning system performance for a wide range of real-world applications. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted for publication in the 25th International Multitopic Conference (INMIC) IEEE 2023, 6 Pages, 3 figures

arXiv:2310.06854 [pdf, other]

Learning with Noisy Labels for Human Fall Events Classification: Joint Cooperative Training with Trinity Networks

Authors: Leiyu Xie, Yang Sun, Syed Mohsen Naqvi

Abstract: With the increasing ageing population, fall events classification has drawn much research attention. In the development of deep learning, the quality of data labels is crucial. Most of the datasets are labelled automatically or semi-automatically, and the samples may be mislabeled, which constrains the performance of Deep Neural Networks (DNNs). Recent research on noisy label learning confirms tha… ▽ More With the increasing ageing population, fall events classification has drawn much research attention. In the development of deep learning, the quality of data labels is crucial. Most of the datasets are labelled automatically or semi-automatically, and the samples may be mislabeled, which constrains the performance of Deep Neural Networks (DNNs). Recent research on noisy label learning confirms that neural networks first focus on the clean and simple instances and then follow the noisy and hard instances in the training stage. To address the learning with noisy label problem and protect the human subjects' privacy, we propose a simple but effective approach named Joint Cooperative training with Trinity Networks (JoCoT). To mitigate the privacy issue, human skeleton data are used. The robustness and performance of the noisy label learning framework is improved by using the two teacher modules and one student module in the proposed JoCoT. To mitigate the incorrect selections, the predictions from the teacher modules are applied with the consensus-based method to guide the student module training. The performance evaluation on the widely used UP-Fall dataset and comparison with the state-of-the-art, confirms the effectiveness of the proposed JoCoT in high noise rates. Precisely, JoCoT outperforms the state-of-the-art by 5.17% and 3.35% with the averaged pairflip and symmetric noises, respectively. △ Less

Submitted 27 September, 2023; originally announced October 2023.

arXiv:2309.15635 [pdf, other]

Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data

Authors: Leiyu Xie, Yuxing Yang, Zeyu Fu, Syed Mohsen Naqvi

Abstract: In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), dynamic time warping (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. T… ▽ More In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), dynamic time warping (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. The proposed SIG method aims to transform the raw skeleton data into privacy-preserved features for training. The CsA module is developed to guide the network in reducing medical action recognition bias and more focusing on important human body parts for each specific action, aimed at addressing similar medical action related issues. Moreover, the DTW module is employed to minimize temporal mismatching between instances and further improve model performance. Furthermore, the proposed privacy-preserved orientation-level features are utilized to assist the position-level features in both of the two stages for enhancing medical action recognition performance. Extensive experimental results on the widely-used and well-known NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets all demonstrate the effectiveness of the proposed method, which outperforms the other state-of-the-art methods with general dataset partitioning by 2.7%, 6.2% and 4.1%, respectively. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.04968 [pdf, other]

LMBiS-Net: A Lightweight Multipath Bidirectional Skip Connection based CNN for Retinal Blood Vessel Segmentation

Authors: Mufassir M. Abbasi, Shahzaib Iqbal, Asim Naveed, Tariq M. Khan, Syed S. Naqvi, Wajeeha Khalid

Abstract: Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the… ▽ More Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the representation of edge information, ultimately limiting overall segmentation accuracy. In this paper, we propose a lightweight pixel-level CNN named LMBiS-Net for the segmentation of retinal vessels with an exceptionally low number of learnable parameters \textbf{(only 0.172 M)}. The network used multipath feature extraction blocks and incorporates bidirectional skip connections for the information flow between the encoder and decoder. Additionally, we have optimized the efficiency of the model by carefully selecting the number of filters to avoid filter overlap. This optimization significantly reduces training time and enhances computational efficiency. To assess the robustness and generalizability of LMBiS-Net, we performed comprehensive evaluations on various aspects of retinal images. Specifically, the model was subjected to rigorous tests to accurately segment retinal vessels, which play a vital role in ophthalmological diagnosis and treatment. By focusing on the retinal blood vessels, we were able to thoroughly analyze the performance and effectiveness of the LMBiS-Net model. The results of our tests demonstrate that LMBiS-Net is not only robust and generalizable but also capable of maintaining high levels of segmentation accuracy. These characteristics highlight the potential of LMBiS-Net as an efficient tool for high-speed and accurate segmentation of retinal images in various clinical applications. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2308.10192 [pdf, ps, other]

EDDense-Net: Fully Dense Encoder Decoder Network for Joint Segmentation of Optic Cup and Disc

Authors: Mehwish Mehmood, Khuram Naveed, Khursheed Aurangzeb, Haroon Ahmed Khan, Musaed Alhussein, Syed Saud Naqvi

Abstract: Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation networ… ▽ More Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation network for the joint segmentation of OC and OD. The encoder and decoder in this network are made up of dense blocks with a grouped convolutional layer in each block, allowing the network to acquire and convey spatial information from the image while simultaneously reducing the network's complexity. To reduce spatial information loss, the optimal number of filters in all convolution layers were utilised. In semantic segmentation, dice pixel classification is employed in the decoder to alleviate the problem of class imbalance. The proposed network was evaluated on two publicly available datasets where it outperformed existing state-of-the-art methods in terms of accuracy and efficiency. For the diagnosis and analysis of glaucoma, this method can be used as a second opinion system to assist medical ophthalmologists. △ Less

Submitted 23 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2306.07300 [pdf, other]

Progressive Class-Wise Attention (PCA) Approach for Diagnosing Skin Lesions

Authors: Asim Naveed, Syed S. Naqvi, Tariq M. Khan, Imran Razzak

Abstract: Skin cancer holds the highest incidence rate among all cancers globally. The importance of early detection cannot be overstated, as late-stage cases can be lethal. Classifying skin lesions, however, presents several challenges due to the many variations they can exhibit, such as differences in colour, shape, and size, significant variation within the same class, and notable similarities between di… ▽ More Skin cancer holds the highest incidence rate among all cancers globally. The importance of early detection cannot be overstated, as late-stage cases can be lethal. Classifying skin lesions, however, presents several challenges due to the many variations they can exhibit, such as differences in colour, shape, and size, significant variation within the same class, and notable similarities between different classes. This paper introduces a novel class-wise attention technique that equally regards each class while unearthing more specific details about skin lesions. This attention mechanism is progressively used to amalgamate discriminative feature details from multiple scales. The introduced technique demonstrated impressive performance, surpassing more than 15 cutting-edge methods including the winners of HAM1000 and ISIC 2019 leaderboards. It achieved an impressive accuracy rate of 97.40% on the HAM10000 dataset and 94.9% on the ISIC 2019 dataset. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2306.06145 [pdf, other]

LDMRes-Net: Enabling Efficient Medical Image Segmentation on IoT and Edge Platforms

Authors: Shahzaib Iqbal, Tariq M. Khan, Syed S. Naqvi, Muhammad Usman, Imran Razzak

Abstract: In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net ove… ▽ More In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net overcomes these limitations with its remarkably low number of learnable parameters (0.072M), making it highly suitable for resource-constrained devices. The model's key innovation lies in its dual multi-residual block architecture, which enables the extraction of refined features on multiple scales, enhancing overall segmentation performance. To further optimize efficiency, the number of filters is carefully selected to prevent overlap, reduce training time, and improve computational efficiency. The study includes comprehensive evaluations, focusing on segmentation of the retinal image of vessels and hard exudates crucial for the diagnosis and treatment of ophthalmology. The results demonstrate the robustness, generalizability, and high segmentation accuracy of LDMRes-Net, positioning it as an efficient tool for accurate and rapid medical image segmentation in diverse clinical applications, particularly on IoT and edge platforms. Such advances hold significant promise for improving healthcare outcomes and enabling real-time medical image analysis in resource-limited settings. △ Less

Submitted 7 September, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2304.12856 [pdf, other]

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Authors: Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Imran Razzak

Abstract: Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual depende… ▽ More Timely and affordable computer-aided diagnosis of retinal diseases is pivotal in precluding blindness. Accurate retinal vessel segmentation plays an important role in disease progression and diagnosis of such vision-threatening diseases. To this end, we propose a Multi-resolution Contextual Network (MRC-Net) that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former-latter and latter-former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score (and correspondingly Jaccard index) while keeping the number of trainable parameters comparatively low. We have evaluated our method on three benchmark datasets, including DRIVE, STARE, and CHASE, demonstrating its superior performance as compared with competitive approaches elsewhere in the literature. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.09751 [pdf, other]

Skeleton-based action analysis for ADHD diagnosis

Authors: Yichun Li, Yi Li, Rajesh Nair, Syed Mohsen Naqvi

Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a common neurobehavioral disorder worldwide. While extensive research has focused on machine learning methods for ADHD diagnosis, most research relies on high-cost equipment, e.g., MRI machine and EEG patch. Therefore, low-cost diagnostic methods based on the action characteristics of ADHD are desired. Skeleton-based action recognition has gained… ▽ More Attention Deficit Hyperactivity Disorder (ADHD) is a common neurobehavioral disorder worldwide. While extensive research has focused on machine learning methods for ADHD diagnosis, most research relies on high-cost equipment, e.g., MRI machine and EEG patch. Therefore, low-cost diagnostic methods based on the action characteristics of ADHD are desired. Skeleton-based action recognition has gained attention due to the action-focused nature and robustness. In this work, we propose a novel ADHD diagnosis system with a skeleton-based action recognition framework, utilizing a real multi-modal ADHD dataset and state-of-the-art detection algorithms. Compared to conventional methods, the proposed method shows cost-efficiency and significant performance improvement, making it more accessible for a broad range of initial ADHD diagnoses. Through the experiment results, the proposed method outperforms the conventional methods in accuracy and AUC. Meanwhile, our method is widely applicable for mass screening. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2211.00733 [pdf]

State-of-the-art Models for Object Detection in Various Fields of Application

Authors: Syed Ali John Naqvi, Syed Bazil Ali

Abstract: We present a list of datasets and their best models with the goal of advancing the state-of-the-art in object detection by placing the question of object recognition in the context of the two types of state-of-the-art methods: one-stage methods and two stage-methods. We provided an in-depth statistical analysis of the five top datasets in the light of recent developments in granulated Deep Learnin… ▽ More We present a list of datasets and their best models with the goal of advancing the state-of-the-art in object detection by placing the question of object recognition in the context of the two types of state-of-the-art methods: one-stage methods and two stage-methods. We provided an in-depth statistical analysis of the five top datasets in the light of recent developments in granulated Deep Learning models - COCO minival, COCO test, Pascal VOC 2007, ADE20K, and ImageNet. The datasets are handpicked after closely comparing them with the rest in terms of diversity, quality of data, minimal bias, labeling quality etc. More importantly, our work extends to provide the best combination of these datasets with the emerging models in the last two years. It lists the top models and their optimal use cases for each of the respective datasets. We have provided a comprehensive overview of a variety of both generic and specific object detection models, enlisting comparative results like inference time and average precision of box (AP) fixed at different Intersection Over Union (IoUs) and for different sized objects. The qualitative and quantitative analysis will allow experts to achieve new performance records using the best combination of datasets and models. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 4 pages, 5 tables

arXiv:2210.10545 [pdf, other]

Improved lung segmentation based on U-Net architecture and morphological operations

Authors: S Ali John Naqvi, Abdullah Tauqeer, Rohaib Bhatti, S Bazil Ali

Abstract: An essential stage in computer aided diagnosis of chest X rays is automated lung segmentation. Due to rib cages and the unique modalities of each persons lungs, it is essential to construct an effective automated lung segmentation model. This paper presents a reliable model for the segmentation of lungs in chest radiographs. Our model overcomes the challenges by learning to ignore unimportant area… ▽ More An essential stage in computer aided diagnosis of chest X rays is automated lung segmentation. Due to rib cages and the unique modalities of each persons lungs, it is essential to construct an effective automated lung segmentation model. This paper presents a reliable model for the segmentation of lungs in chest radiographs. Our model overcomes the challenges by learning to ignore unimportant areas in the source Chest Radiograph and emphasize important features for lung segmentation. We evaluate our model on public datasets, Montgomery and Shenzhen. The proposed model has a DICE coefficient of 98.1 percent which demonstrates the reliability of our model. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 8 pages, 5 figures, conference

arXiv:2210.07451 [pdf, other]

Neural Network Compression by Joint Sparsity Promotion and Redundancy Reduction

Authors: Tariq M. Khan, Syed S. Naqvi, Antonio Robles-Kelly, Erik Meijering

Abstract: Compression of convolutional neural network models has recently been dominated by pruning approaches. A class of previous works focuses solely on pruning the unimportant filters to achieve network compression. Another important direction is the design of sparsity-inducing constraints which has also been explored in isolation. This paper presents a novel training scheme based on composite constrain… ▽ More Compression of convolutional neural network models has recently been dominated by pruning approaches. A class of previous works focuses solely on pruning the unimportant filters to achieve network compression. Another important direction is the design of sparsity-inducing constraints which has also been explored in isolation. This paper presents a novel training scheme based on composite constraints that prune redundant filters and minimize their effect on overall network learning via sparsity promotion. Also, as opposed to prior works that employ pseudo-norm-based sparsity-inducing constraints, we propose a sparse scheme based on gradient counting in our framework. Our tests on several pixel-wise segmentation benchmarks show that the number of neurons and the memory footprint of networks in the test phase are significantly reduced without affecting performance. MobileNetV3 and UNet, two well-known architectures, are used to test the proposed scheme. Our network compression method not only results in reduced parameters but also achieves improved performance compared to MobileNetv3, which is an already optimized architecture. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2208.12027 [pdf, other]

Two-stage Fall Events Classification with Human Skeleton Data

Authors: Leiyu Xie, Yang Sun, Jonathon A. Chambers, Syed Mohsen Naqvi

Abstract: Fall detection and classification become an imper- ative problem for healthcare applications particularity with the increasingly ageing population. Currently, most of the fall clas- sification algorithms provide binary fall or no-fall classification. For better healthcare, it is thus not enough to do binary fall classification but to extend it to multiple fall events classification. In this work,… ▽ More Fall detection and classification become an imper- ative problem for healthcare applications particularity with the increasingly ageing population. Currently, most of the fall clas- sification algorithms provide binary fall or no-fall classification. For better healthcare, it is thus not enough to do binary fall classification but to extend it to multiple fall events classification. In this work, we utilize the privacy mitigating human skeleton data for multiple fall events classification. The skeleton features are extracted from the original RGB images to not only mitigate the personal privacy, but also to reduce the impact of the dynamic illuminations. The proposed fall events classification method is divided into two stages. In the first stage, the model is trained to achieve the binary classification to filter out the no-fall events. Then, in the second stage, the deep neural network (DNN) model is trained to further classify the five types of fall events. In order to confirm the efficiency of the proposed method, the experiments on the UP-Fall dataset outperform the state-of-the-art. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2206.04962 [pdf, other]

Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation

Authors: Yi Li, ShuangLin Li, Yang Sun, Syed Mohsen Naqvi

Abstract: Self-supervised learning (SSL) achieves great success in monaural speech enhancement, while the accuracy of the target speech estimation, particularly for unseen speakers, remains inadequate with existing pre-tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, and spoken content, the latent representation for speech enhancement becomes a tough ta… ▽ More Self-supervised learning (SSL) achieves great success in monaural speech enhancement, while the accuracy of the target speech estimation, particularly for unseen speakers, remains inadequate with existing pre-tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, and spoken content, the latent representation for speech enhancement becomes a tough task. In this paper, we study the effectiveness of each feature which is commonly used in speech enhancement and exploit the feature combination in the SSL case. Besides, we propose an ensemble training strategy. The latent representation of the clean speech signal is learned, meanwhile, the dereverberated mask and the estimated ratio mask are exploited to denoise and dereverberate the mixture. The latent representation learning and the masks estimation are considered as two pre-tasks in the training stage. In addition, to study the effectiveness between the pre-tasks, we compare different training routines to train the model and further refine the performance. The NOISEX and DAPS corpora are used to evaluate the efficacy of the proposed method, which also outperforms the state-of-the-art methods. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2112.11142

arXiv:2201.05963 [pdf, ps, other]

A Residual Encoder-Decoder Network for Segmentation of Retinal Image-Based Exudates in Diabetic Retinopathy Screening

Authors: Malik A. Manan, Tariq M. Khan, Ahsan Saadat, Muhammad Arsalan, Syed S. Naqvi

Abstract: Diabetic retinopathy refers to the pathology of the retina induced by diabetes and is one of the leading causes of preventable blindness in the world. Early detection of diabetic retinopathy is critical to avoid vision problem through continuous screening and treatment. In traditional clinical practice, the involved lesions are manually detected using photographs of the fundus. However, this task… ▽ More Diabetic retinopathy refers to the pathology of the retina induced by diabetes and is one of the leading causes of preventable blindness in the world. Early detection of diabetic retinopathy is critical to avoid vision problem through continuous screening and treatment. In traditional clinical practice, the involved lesions are manually detected using photographs of the fundus. However, this task is cumbersome and time-consuming and requires intense effort due to the small size of lesion and low contrast of the images. Thus, computer-assisted diagnosis of diabetic retinopathy based on the detection of red lesions is actively being explored recently. In this paper, we present a convolutional neural network with residual skip connection for the segmentation of exudates in retinal images. To improve the performance of network architecture, a suitable image augmentation technique is used. The proposed network can robustly segment exudates with high accuracy, which makes it suitable for diabetic retinopathy screening. Comparative performance analysis of three benchmark databases: HEI-MED, E-ophtha, and DiaretDB1 is presented. It is shown that the proposed method achieves accuracy (0.98, 0.99, 0.98) and sensitivity (0.97, 0.92, and 0.95) on E-ophtha, HEI-MED, and DiaReTDB1, respectively. △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2112.11459 [pdf, other]

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

Authors: Yi Li, Yang Sun, Syed Mohsen Naqvi

Abstract: In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are requir… ▽ More In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are required to learn their latent representations. Meanwhile, to solve the limitation of single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the second pre-task. Different from the PAE, where the target speech signals are estimated, the downstream task autoencoder (DAE) utilizes a large number of unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The trained DAE is shared by the learned representations and masks. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Submitted to ICASSP 2022. arXiv admin note: text overlap with arXiv:2112.11142

arXiv:2112.11142 [pdf, other]

Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent

Authors: Yi Li, Yang Sun, Syed Mohsen Naqvi

Abstract: Recently, self-supervised learning (SSL) techniques have been introduced to solve the monaural speech enhancement problem. Due to the lack of using clean phase information, the enhancement performance is limited in most SSL methods. Therefore, in this paper, we propose a phase-aware self-supervised learning based monaural speech enhancement method. The latent representations of both amplitude and… ▽ More Recently, self-supervised learning (SSL) techniques have been introduced to solve the monaural speech enhancement problem. Due to the lack of using clean phase information, the enhancement performance is limited in most SSL methods. Therefore, in this paper, we propose a phase-aware self-supervised learning based monaural speech enhancement method. The latent representations of both amplitude and phase are studied in two decoders of the foundation autoencoder (FAE) with only a limited set of clean speech signals independently. Then, the downstream autoencoder (DAE) learns a shared latent space between the clean speech and mixture representations with a large number of unseen mixtures. A complex-cycle-consistent (CCC) mechanism is proposed to minimize the reconstruction loss between the amplitude and phase domains. Besides, it is noticed that if the speech features are extracted as the multi-resolution spectra, the desired information distributed in spectra of different scales can be studied to further boost the performance. The NOISEX and DAPS corpora are used to generate mixtures with different interferences to evaluate the efficacy of the proposed method. It is highlighted that the clean speech and mixtures fed in FAE and DAE are not paired. Both ablation and comparison experimental results show that the proposed method clearly outperforms the state-of-the-art approaches. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.11078 [pdf, other]

RC-Net: A Convolutional Neural Network for Retinal Vessel Segmentation

Authors: Tariq M Khan, Antonio Robles-Kelly, Syed S. Naqvi

Abstract: Over recent years, increasingly complex approaches based on sophisticated convolutional neural network architectures have been slowly pushing performance on well-established benchmark datasets. In this paper, we take a step back to examine the real need for such complexity. We present RC-Net, a fully convolutional network, where the number of filters per layer is optimized to reduce feature overla… ▽ More Over recent years, increasingly complex approaches based on sophisticated convolutional neural network architectures have been slowly pushing performance on well-established benchmark datasets. In this paper, we take a step back to examine the real need for such complexity. We present RC-Net, a fully convolutional network, where the number of filters per layer is optimized to reduce feature overlapping and complexity. We also used skip connections to keep spatial information loss to a minimum by keeping the number of pooling operations in the network to a minimum. Two publicly available retinal vessel segmentation datasets were used in our experiments. In our experiments, RC-Net is quite competitive, outperforming alternatives vessels segmentation methods with two or even three orders of magnitude less trainable parameters. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.11065 [pdf, other]

Leveraging Image Complexity in Macro-Level Neural Network Design for Medical Image Segmentation

Authors: Tariq M. Khan, Syed S. Naqvi, Erik Meijering

Abstract: Recent progress in encoder-decoder neural network architecture design has led to significant performance improvements in a wide range of medical image segmentation tasks. However, state-of-the-art networks for a given task may be too computationally demanding to run on affordable hardware, and thus users often resort to practical workarounds by modifying various macro-level design aspects. Two com… ▽ More Recent progress in encoder-decoder neural network architecture design has led to significant performance improvements in a wide range of medical image segmentation tasks. However, state-of-the-art networks for a given task may be too computationally demanding to run on affordable hardware, and thus users often resort to practical workarounds by modifying various macro-level design aspects. Two common examples are downsampling of the input images and reducing the network depth to meet computer memory constraints. In this paper we investigate the effects of these changes on segmentation performance and show that image complexity can be used as a guideline in choosing what is best for a given dataset. We consider four statistical measures to quantify image complexity and evaluate their suitability on ten different public datasets. For the purpose of our experiments we also propose two new encoder-decoder architectures representing shallow and deep networks that are more memory efficient than currently popular networks. Our results suggest that median frequency is the best complexity measure in deciding about an acceptable input downsampling factor and network depth. For high-complexity datasets, a shallow network running on the original images may yield better segmentation results than a deep network running on downsampled images, whereas the opposite may be the case for low-complexity images. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2112.06052 [pdf, other]

doi 10.1109/TASLP.2023.3265839

U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Authors: Yi Li, Yang Sun, Syed Mohsen Naqvi

Abstract: The state-of-the-art speech enhancement has limited performance in speech estimation accuracy. Recently, in deep learning, the Transformer shows the potential to exploit the long-range dependency in speech by self-attention. Therefore, it is introduced in speech enhancement to improve the speech estimation accuracy from a noise mixture. However, to address the computational cost issue in Transform… ▽ More The state-of-the-art speech enhancement has limited performance in speech estimation accuracy. Recently, in deep learning, the Transformer shows the potential to exploit the long-range dependency in speech by self-attention. Therefore, it is introduced in speech enhancement to improve the speech estimation accuracy from a noise mixture. However, to address the computational cost issue in Transformer with self-attention, the axial attention is the option i.e., to split a 2D attention into two 1D attentions. Inspired by the axial attention, in the proposed method we calculate the attention map along both time- and frequency-axis to generate time and frequency sub-attention maps. Moreover, different from the axial attention, the proposed method provides two parallel multi-head attentions for time- and frequency-axis. Furthermore, it is proven in the literature that the lower frequency-band in speech, generally, contains more desired information than the higher frequency-band, in a noise mixture. Therefore, the frequency-band aware attention is proposed i.e., high frequency-band attention (HFA), and low frequency-band attention (LFA). The U-shaped Transformer is also first time introduced in the proposed method to further improve the speech estimation accuracy. The extensive evaluations over four public datasets, confirm the efficacy of the proposed method. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 31), 2023

arXiv:2112.05036 [pdf, other]

doi 10.1109/TAI.2021.3119927

Domain Adaptation and Autoencoder Based Unsupervised Speech Enhancement

Authors: Yi Li, Yang Sun, Kirill Horoshenkov, Syed Mohsen Naqvi

Abstract: As a category of transfer learning, domain adaptation plays an important role in generalizing the model trained in one task and applying it to other similar tasks or settings. In speech enhancement, a well-trained acoustic model can be exploited to obtain the speech signal in the context of other languages, speakers, and environments. Recent domain adaptation research was developed more effectivel… ▽ More As a category of transfer learning, domain adaptation plays an important role in generalizing the model trained in one task and applying it to other similar tasks or settings. In speech enhancement, a well-trained acoustic model can be exploited to obtain the speech signal in the context of other languages, speakers, and environments. Recent domain adaptation research was developed more effectively with various neural networks and high-level abstract features. However, the related studies are more likely to transfer the well-trained model from a rich and more diverse domain to a limited and similar domain. Therefore, in this study, the domain adaptation method is proposed in unsupervised speech enhancement for the opposite circumstance that transferring to a larger and richer domain. On the one hand, the importance-weighting (IW) approach is exploited with a variance constrained autoencoder to reduce the shift of shared weights between the source and target domains. On the other hand, in order to train the classifier with the worst-case weights and minimize the risk, the minimax method is proposed. Both the proposed IW and minimax methods are evaluated from the VOICE BANK and IEEE datasets to the TIMIT dataset. The experiment results show that the proposed methods outperform the state-of-the-art approaches. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Journal ref: IEEE Transactions on Artificial Intelligence. (2021)

arXiv:1903.11677 [pdf, ps, other]

Exact Byzantine Consensus on Undirected Graphs under Local Broadcast Model

Authors: Muhammad Samir Khan, Syed Shalan Naqvi, Nitin H. Vaidya

Abstract: This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known [7] that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among $n$ nodes in… ▽ More This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known [7] that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among $n$ nodes in the presence of up to $f$ Byzantine faulty nodes: $n \ge 3f+1$ and vertex connectivity at least $2f+1$. In the classical point-to-point communication model, it is possible for a faulty node to equivocate, i.e., transmit conflicting information to different neighbors. Such equivocation is possible because messages sent by a node to one of its neighbors are not overheard by other neighbors. This paper considers the local broadcast model. In contrast to the point-to-point communication model, in the local broadcast model, messages sent by a node are received identically by all of its neighbors. Thus, under the local broadcast model, attempts by a node to send conflicting information can be detected by its neighbors. Under this model, we show that the following two conditions are both necessary and sufficient for Byzantine consensus: vertex connectivity at least $\lfloor 3f/2 \rfloor + 1$ and minimum node degree at least $2f$. Observe that the local broadcast model results in a lower requirement for connectivity and the number of nodes $n$, as compared to the point-to-point communication model. We extend the above results to a hybrid model that allows some of the Byzantine faulty nodes to equivocate. The hybrid model bridges the gap between the point-to-point and local broadcast models, and helps to precisely characterize the trade-off between equivocation and network requirements. △ Less

Submitted 27 May, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

arXiv:1811.08535 [pdf, other]

Exact Byzantine Consensus Under Local-Broadcast Model

Authors: Syed Shalan Naqvi, Muhammad Samir Khan, Nitin H. Vaidya

Abstract: This paper considers the problem of achieving exact Byzantine consensus in a synchronous system under a local-broadcast communication model. The nodes communicate with each other via message-passing. The communication network is modeled as an undirected graph, with each vertex representing a node in the system. Under the local-broadcast communication model, when any node transmits a message, all i… ▽ More This paper considers the problem of achieving exact Byzantine consensus in a synchronous system under a local-broadcast communication model. The nodes communicate with each other via message-passing. The communication network is modeled as an undirected graph, with each vertex representing a node in the system. Under the local-broadcast communication model, when any node transmits a message, all its neighbors in the communication graph receive the message reliably. This communication model is motivated by wireless networks. In this work, we present necessary and sufficient conditions on the underlying communication graph to achieve exact Byzantine consensus under the local-broadcast communication model. △ Less

Submitted 20 November, 2018; originally announced November 2018.

arXiv:1810.12126 [pdf, other]

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

Authors: Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, Syed Mohsen Naqvi

Abstract: We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of… ▽ More We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance. △ Less

Submitted 29 October, 2018; originally announced October 2018.

arXiv:1809.10245 [pdf, other]

Cylindrical Transform: 3D Semantic Segmentation of Kidneys With Limited Annotated Images

Authors: Hojjat Salehinejad, Sumeya Naqvi, Errol Colak, Joseph Barfett, Shahrokh Valaee

Abstract: In this paper, we propose a novel technique for sampling sequential images using a cylindrical transform in a cylindrical coordinate system for kidney semantic segmentation in abdominal computed tomography (CT). The images generated from a cylindrical transform augment a limited annotated set of images in three dimensions. This approach enables us to train contemporary classification deep convolut… ▽ More In this paper, we propose a novel technique for sampling sequential images using a cylindrical transform in a cylindrical coordinate system for kidney semantic segmentation in abdominal computed tomography (CT). The images generated from a cylindrical transform augment a limited annotated set of images in three dimensions. This approach enables us to train contemporary classification deep convolutional neural networks (DCNNs) instead of fully convolutional networks (FCNs) for semantic segmentation. Typical semantic segmentation models segment a sequential set of images (e.g. CT or video) by segmenting each image independently. However, the proposed method not only considers the spatial dependency in the x-y plane, but also the spatial sequential dependency along the z-axis. The results show that classification DCNNs, trained on cylindrical transformed images, can achieve a higher segmentation performance value than FCNs using a limited number of annotated images. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: This paper is accepted for presentation at IEEE Global Conference on Signal and Information Processing (IEEE GlobalSIP), California, USA, 2018

arXiv:1801.02430 [pdf]

A Novel Hybrid Biometric Electronic Voting System: Integrating Finger Print and Face Recognition

Authors: Shahram Najam Syed, Aamir Zeb Shaikh, Shabbar Naqvi

Abstract: A novel hybrid design based electronic voting system is proposed, implemented and analyzed. The proposed system uses two voter verification techniques to give better results in comparison to single identification based systems. Finger print and facial recognition based methods are used for voter identification. Cross verification of a voter during an election process provides better accuracy than… ▽ More A novel hybrid design based electronic voting system is proposed, implemented and analyzed. The proposed system uses two voter verification techniques to give better results in comparison to single identification based systems. Finger print and facial recognition based methods are used for voter identification. Cross verification of a voter during an election process provides better accuracy than single parameter identification method. The facial recognition system uses Viola-Jones algorithm along with rectangular Haar feature selection method for detection and extraction of features to develop a biometric template and for feature extraction during the voting process. Cascaded machine learning based classifiers are used for comparing the features for identity verification using GPCA (Generalized Principle Component Analysis) and K-NN (K-Nearest Neighbor). It is accomplished through comparing the Eigen-vectors of the extracted features with the biometric template pre-stored in the election regulatory body database. The results of the proposed system show that the proposed cascaded design based system performs better than the systems using other classifiers or separate schemes i.e. facial or finger print based schemes. The proposed system will be highly useful for real time applications due to the reason that it has 91% accuracy under nominal light in terms of facial recognition. with bags of paper votes. The central station compiles and publishes the names of winners and losers through television and radio stations. This method is useful only if the whole process is completed in a transparent way. However, there are some drawbacks to this system. These include higher expenses, longer time to complete the voting process, fraudulent practices by the authorities administering elections as well as malpractices by the voters [1]. These challenges result in manipulated election results. △ Less

Submitted 5 January, 2018; originally announced January 2018.

Journal ref: Mehran University Research Journal of Engineering and Technology, Mehran University Research Journal of Engineering and Technology, 2018, 37 (1), pp.59-68. http://publications.muet.edu.pk/index.php/muetrj/article/view/100/50

arXiv:1702.03911 [pdf, other]

doi 10.1109/WCNC.2017.7925638

On the Transport Capability of LAN Cables in All-Analog MIMO-RoC Fronthaul

Authors: Syed Hassan Raza Naqvi, Andrea Matera, Lorenzo Combi, Umberto Spagnolini

Abstract: Centralized Radio Access Network (C-RAN) architecture is the only viable solution to handle the complex interference scenario generated by massive antennas and small cells deployment as required by next generation (5G) mobile networks. In conventional C-RAN, the fronthaul links used to exchange the signal between Base Band Units (BBUs) and Remote Antenna Units (RAUs) are based on digital baseband… ▽ More Centralized Radio Access Network (C-RAN) architecture is the only viable solution to handle the complex interference scenario generated by massive antennas and small cells deployment as required by next generation (5G) mobile networks. In conventional C-RAN, the fronthaul links used to exchange the signal between Base Band Units (BBUs) and Remote Antenna Units (RAUs) are based on digital baseband (BB) signals over optical fibers due to the huge bandwidth required. In this paper we evaluate the transport capability of copper-based all-analog fronthaul architecture called Radio over Copper (RoC) that leverages on the pre-existing LAN cables that are already deployed in buildings and enterprises. In particular, the main contribution of the paper is to evaluate the number of independent BB signals for multiple antennas system that can be transported over multi-pair Cat-5/6/7 cables under a predefined fronthauling transparency condition in terms of maximum BB signal degradation. The MIMO-RoC proves to be a complementary solution to optical fiber for the last 200m toward the RAUs, mostly to reuse the existing LAN cables and to power-supply the RAUs over the same cable. △ Less

Submitted 16 May, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

Journal ref: 2017 IEEE Wireless Communications and Networking Conference (WCNC)

arXiv:1511.01726 [pdf, other]

Multi-Target Tracking and Occlusion Handling with Learned Variational Bayesian Clusters and a Social Force Model

Authors: Ata-ur-Rehman, Syed Mohsen Naqvi, Lyudmila Mihaylova, Jonathon Chambers

Abstract: This paper considers the problem of multiple human target tracking in a sequence of video data. A solution is proposed which is able to deal with the challenges of a varying number of targets, interactions and when every target gives rise to multiple measurements. The developed novel algorithm comprises variational Bayesian clustering combined with a social force model, integrated within a particl… ▽ More This paper considers the problem of multiple human target tracking in a sequence of video data. A solution is proposed which is able to deal with the challenges of a varying number of targets, interactions and when every target gives rise to multiple measurements. The developed novel algorithm comprises variational Bayesian clustering combined with a social force model, integrated within a particle filter with an enhanced prediction step. It performs measurement-to-target association by automatically detecting the measurement relevance. The performance of the developed algorithm is evaluated over several sequences from publicly available data sets: AV16.3, CAVIAR and PETS2006, which demonstrates that the proposed algorithm successfully initializes and tracks a variable number of targets in the presence of complex occlusions. A comparison with state-of-the-art techniques due to Khan et al., Laet et al. and Czyz et al. shows improved tracking performance. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Comments: 19 pages, 14 figures

arXiv:1507.07698 [pdf, other]

Interference-Cooperation in Multi-User/Multi-Operator Receivers

Authors: Syed Hassan Raza Naqvi, Umberto Spagnolini

Abstract: In a multi-user scenario where users belong to different operators, any interference mitigation method needs unavoidably some degree of cooperation among service providers. In this paper we propose a cooperation strategy based on the exchange of mutual interference among operators, rather than of decoded data, to let every operator to recover an augmented degree of diversity either for channel est… ▽ More In a multi-user scenario where users belong to different operators, any interference mitigation method needs unavoidably some degree of cooperation among service providers. In this paper we propose a cooperation strategy based on the exchange of mutual interference among operators, rather than of decoded data, to let every operator to recover an augmented degree of diversity either for channel estimation and multi-user detection. In xDSL scenario where multiple operators share the same cable binder the interference-cooperation (IC) approach outperforms data-exchange methods and preserves to certain degree the privacy of the users as signals can be tailored to prevent each operator to infer parameters (channel and data) of the users from the other operators. The IC method is based on Expectation Maximization estimation shaped to account for the degree of information that each operator can exchange with the others during the two steps of multi-user channel estimation and multi-user detection. Convergence of IC is guaranteed into few iterations and it does not depend on the structure of the interference. IC performance attains those of centralized receivers (i.e., one fusion-center that collects all the received signals from all the users/operators), with some loss when in heavily interfered multi-user channel such as in twisted-pair communications allocated beyond 50-100MHz spectrum. △ Less

Submitted 28 July, 2015; originally announced July 2015.

Showing 1–40 of 40 results for author: Naqvi, S