Search | arXiv e-print repository

doi 10.1016/j.neucom.2025.129995

Spatial and Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Muhammad Usama, Swalpa Kumar Roy, Jocelyn Chanussot, Danfeng Hong

Abstract: Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (… ▽ More Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (SSMM) model (MorpMamba), which combines the strengths of morphological operations and the state space model framework, offering a more computationally efficient alternative to transformers. In MorpMamba, a novel token generation module first converts HSI patches into spatial-spectral tokens. These tokens are then processed through morphological operations such as erosion and dilation, utilizing depthwise separable convolutions to capture structural and shape information. A token enhancement module refines these features by dynamically adjusting the spatial and spectral tokens based on central HSI regions, ensuring effective feature fusion within each block. Subsequently, multi-head self-attention is applied to further enrich the feature representations, allowing the model to capture complex relationships and dependencies within the data. Finally, the enhanced tokens are fed into a state space module, which efficiently models the temporal evolution of the features for classification. Experimental results on widely used HSI datasets demonstrate that MorpMamba achieves superior parametric efficiency compared to traditional CNN and transformer models while maintaining high accuracy. The code will be made publicly available at \url{https://github.com/mahmad000/MorpMamba}. △ Less

Submitted 30 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.01231 [pdf, other]

doi 10.1109/LGRS.2024.3506034

WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Muhammad Usama, Manuel Mazzara, Salvatore Distefano

Abstract: Hyperspectral Imaging (HSI) has proven to be a powerful tool for capturing detailed spectral and spatial information across diverse applications. Despite the advancements in Deep Learning (DL) and Transformer architectures for HSI classification, challenges such as computational efficiency and the need for extensive labeled data persist. This paper introduces WaveMamba, a novel approach that integ… ▽ More Hyperspectral Imaging (HSI) has proven to be a powerful tool for capturing detailed spectral and spatial information across diverse applications. Despite the advancements in Deep Learning (DL) and Transformer architectures for HSI classification, challenges such as computational efficiency and the need for extensive labeled data persist. This paper introduces WaveMamba, a novel approach that integrates wavelet transformation with the spatial-spectral Mamba architecture to enhance HSI classification. WaveMamba captures both local texture patterns and global contextual relationships in an end-to-end trainable model. The Wavelet-based enhanced features are then processed through the state-space architecture to model spatial-spectral relationships and temporal dependencies. The experimental results indicate that WaveMamba surpasses existing models, achieving an accuracy improvement of 4.5\% on the University of Houston dataset and a 2.0\% increase on the Pavia University dataset. △ Less

Submitted 22 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.05163 [pdf, other]

A Domain Adaptation Model for Carotid Ultrasound: Image Harmonization, Noise Reduction, and Impact on Cardiovascular Risk Markers

Authors: Mohd Usama, Emma Nyman, Ulf Naslund, Christer Gronlund

Abstract: Deep learning has been used extensively for medical image analysis applications, assuming the training and test data adhere to the same probability distributions. However, a common challenge arises when dealing with medical images generated by different systems or even the same system with varying parameter settings. Such images often contain diverse textures and noise patterns, violating the assu… ▽ More Deep learning has been used extensively for medical image analysis applications, assuming the training and test data adhere to the same probability distributions. However, a common challenge arises when dealing with medical images generated by different systems or even the same system with varying parameter settings. Such images often contain diverse textures and noise patterns, violating the assumption. Consequently, models trained on data from one machine or setting usually struggle to perform effectively on data from another. To address this issue in ultrasound images, we proposed a Generative Adversarial Network (GAN) based model in this paper. We formulated image harmonization and denoising tasks as an image-to-image translation task, wherein we modified the texture pattern and reduced noise in Carotid ultrasound images while keeping the image content (the anatomy) unchanged. The performance was evaluated using feature distribution and pixel-space similarity metrics. In addition, blood-to-tissue contrast and influence on computed risk markers (Gray scale median, GSM) were evaluated. The results showed that domain adaptation was achieved in both tasks (histogram correlation 0.920 and 0.844), as compared to no adaptation (0.890 and 0.707), and that the anatomy of the images was retained (structure similarity index measure of the arterial wall 0.71 and 0.80). In addition, the image noise level (contrast) did not change in the image harmonization task (-34.1 vs 35.2 dB) but was improved in the noise reduction task (-23.5 vs -46.7 dB). The model outperformed the CycleGAN in both tasks. Finally, the risk marker GSM increased by 7.6 (p<0.001) in task 1 but not in task 2. We conclude that domain translation models are powerful tools for ultrasound image improvement while retaining the underlying anatomy but that downstream calculations of risk markers may be affected. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 17 pages, 7 figures, 7 tables

arXiv:2405.08277 [pdf, other]

AI-driven, Model-Free Current Control: A Deep Symbolic Approach for Optimal Induction Machine Performance

Authors: Muhammad Usama, Yunkyung Hwang, Jaehong Kim

Abstract: This paper proposed a straightforward and efficient current control solution for induction machines employing deep symbolic regression (DSR). The proposed DSR-based control design offers a simple yet highly effective approach by creating an optimal control model through training and fitting, resulting in an analytical dynamic numerical expression that characterizes the data. Notably, this approach… ▽ More This paper proposed a straightforward and efficient current control solution for induction machines employing deep symbolic regression (DSR). The proposed DSR-based control design offers a simple yet highly effective approach by creating an optimal control model through training and fitting, resulting in an analytical dynamic numerical expression that characterizes the data. Notably, this approach not only produces an understandable model but also demonstrates the capacity to extrapolate and estimate data points outside its training dataset, showcasing its adaptability and resilience. In contrast to conventional state-of-the-art proportional-integral (PI) current controllers, which heavily rely on specific system models, the proposed DSR-based approach stands out for its model independence. Simulation and experimental tests validate its effectiveness, highlighting its superior extrapolation capabilities compared to conventional methods. These findings pave the way for the integration of deep learning methods in power conversion applications, promising improved performance and adaptability in the control of induction machines. The simulation and experimental test results are provided with a 3.7 kw induction machine to verify the efficacy of the proposed control solution. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: This work has been accepted for potential publication at the IEEE ECCE Asia 2024 International Power Electronics and Motion Control Conference. Please note that copyright may be transferred without prior notice

arXiv:2308.12792 [pdf, other]

Sparks of Large Audio Models: A Survey and Outlook

Authors: Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller

Abstract: This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Pr… ▽ More This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at https://github.com/EmulationAI/awesome-large-audio-models. △ Less

Submitted 21 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Under review, Repo URL: https://github.com/EmulationAI/awesome-large-audio-models

arXiv:2307.06090 [pdf, other]

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Authors: Siddique Latif, Muhammad Usama, Mohammad Ibrahim Malik, Björn W. Schuller

Abstract: Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential o… ▽ More Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential of LLMs to annotate abundant speech data, aiming to enhance the state-of-the-art in SER. We evaluate this capability across various settings using publicly available speech emotion classification datasets. Leveraging ChatGPT, we experimentally demonstrate the promising role of LLMs in speech emotion data annotation. Our evaluation encompasses single-shot and few-shots scenarios, revealing performance variability in SER. Notably, we achieve improved results through data augmentation, incorporating ChatGPT-annotated samples into existing datasets. Our work uncovers new frontiers in speech emotion classification, highlighting the increasing significance of LLMs in this field moving forward. △ Less

Submitted 19 June, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted in IEEE Computational Intelligence Magazine

arXiv:2305.00725 [pdf, other]

Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

Authors: Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak

Abstract: Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely… ▽ More Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. In particular, we explore knowledge distillation to design a computationally efficient system that can be deployed on edge devices with limited resources without degrading the performance significantly. We comprehensively evaluate our proposed framework using two publicly available datasets and highlight its effectiveness by comparing the results with the well-known MobileNet model. Our results demonstrate the feasibility and effectiveness of using edge computing for non-speech emotion detection, which can potentially improve applications that rely on emotion detection in communication networks. To the best of our knowledge, this is the first work on an edge-computing-based framework for detecting emotions in non-speech audio, offering promising directions for future research. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: Under review

arXiv:2202.05631 [pdf, other]

Vehicle and License Plate Recognition with Novel Dataset for Toll Collection

Authors: Muhammad Usama, Hafeez Anwar, Abbas Anwar, Saeed Anwar

Abstract: We propose an automatic framework for toll collection, consisting of three steps: vehicle type recognition, license plate localization, and reading. However, each of the three steps becomes non-trivial due to image variations caused by several factors. The traditional vehicle decorations on the front cause variations among vehicles of the same type. These decorations make license plate localizatio… ▽ More We propose an automatic framework for toll collection, consisting of three steps: vehicle type recognition, license plate localization, and reading. However, each of the three steps becomes non-trivial due to image variations caused by several factors. The traditional vehicle decorations on the front cause variations among vehicles of the same type. These decorations make license plate localization and recognition difficult due to severe background clutter and partial occlusions. Likewise, on most vehicles, specifically trucks, the position of the license plate is not consistent. Lastly, for license plate reading, the variations are induced by non-uniform font styles, sizes, and partially occluded letters and numbers. Our proposed framework takes advantage of both data availability and performance evaluation of the backbone deep learning architectures. We gather a novel dataset, \emph{Diverse Vehicle and License Plates Dataset (DVLPD)}, consisting of 10k images belonging to six vehicle types. Each image is then manually annotated for vehicle type, license plate, and its characters and digits. For each of the three tasks, we evaluate You Only Look Once (YOLO)v2, YOLOv3, YOLOv4, and FasterRCNN. For real-time implementation on a Raspberry Pi, we evaluate the lighter versions of YOLO named Tiny YOLOv3 and Tiny YOLOv4. The best Mean Average Precision ([email protected]) of 98.8% for vehicle type recognition, 98.5% for license plate detection, and 98.3% for license plate reading is achieved by YOLOv4, while its lighter version, i.e., Tiny YOLOv4 obtained a mAP of 97.1%, 97.4%, and 93.7% on vehicle type recognition, license plate detection, and license plate reading, respectively. The dataset and the training codes are available at https://github.com/usama-x930/VT-LPR △ Less

Submitted 15 November, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

arXiv:2005.04651 [pdf]

doi 10.1051/e3sconf/202015203009

Vector Control Algorithm Based on Different Current Control Switching Techniques for Ac Motor Drives

Authors: Muhammad Usama, Jaehong Kim

Abstract: A comparative analysis of vector control scheme based on different current control switching pulses (HC, SPWM, DPWM and SVPWM) for the speed response of motor drive is analysed in this paper. The control system using different switching techniques, are comparatively simulated and analysed. Ac motor drives are progressively used in high-performance application industries due to small size, efficien… ▽ More A comparative analysis of vector control scheme based on different current control switching pulses (HC, SPWM, DPWM and SVPWM) for the speed response of motor drive is analysed in this paper. The control system using different switching techniques, are comparatively simulated and analysed. Ac motor drives are progressively used in high-performance application industries due to small size, efficient performance, robust to torque response and high power to size ratio. A mathematical model of ac motor drives is presented in order to explain the numerical theory of motor drives. The vector control technique is utilized for efficient speed control of ac motor drive based on independent torque and air gap flux control. The study compares the total harmonic distortion contents of phase currents of ac motor drive and speed response in each case. The simulation result shows that total harmonic distortion across the phase current in SVPWM is less as compared to other switching techniques while the rise time in speed response across SVPWM technique is faster as compared to other switching methods. The simulation result of ac motor drives speed control is demonstrated in Matlab/Simulink 2018b. △ Less

Submitted 10 May, 2020; originally announced May 2020.

arXiv:1906.06969 [pdf, other]

Robotic Navigation using Entropy-Based Exploration

Authors: Muhammad Usama, Dong Eui Chang

Abstract: Robotic navigation concerns the task in which a robot should be able to find a safe and feasible path and traverse between two points in a complex environment. We approach the problem of robotic navigation using reinforcement learning and use deep $Q$-networks to train agents to solve the task of robotic navigation. We compare the Entropy-Based Exploration (EBE) with the widely used $ε$-greedy exp… ▽ More Robotic navigation concerns the task in which a robot should be able to find a safe and feasible path and traverse between two points in a complex environment. We approach the problem of robotic navigation using reinforcement learning and use deep $Q$-networks to train agents to solve the task of robotic navigation. We compare the Entropy-Based Exploration (EBE) with the widely used $ε$-greedy exploration strategy by training agents using both of them in simulation. The trained agents are then tested on different versions of the environment to test the generalization ability of the learned policies. We also implement the learned policies on a real robot in complex real environment without any fine tuning and compare the effectiveness of the above-mentioned exploration strategies in the real world setting. Video showing experiments on TurtleBot3 platform is available at \url{https://youtu.be/NHT-EiN_4n8}. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: 5 pages

Showing 1–10 of 10 results for author: Usama, M