Search | arXiv e-print repository

A Methodological and Structural Review of Parkinsons Disease Detection Across Diverse Data Modalities

Authors: Abu Saleh Musa Miah, taro Suzuki, Jungpil Shin

Abstract: Parkinsons Disease (PD) is a progressive neurological disorder that primarily affects motor functions and can lead to mild cognitive impairment (MCI) and dementia in its advanced stages. With approximately 10 million people diagnosed globally 1 to 1.8 per 1,000 individuals, according to reports by the Japan Times and the Parkinson Foundation early and accurate diagnosis of PD is crucial for improv… ▽ More Parkinsons Disease (PD) is a progressive neurological disorder that primarily affects motor functions and can lead to mild cognitive impairment (MCI) and dementia in its advanced stages. With approximately 10 million people diagnosed globally 1 to 1.8 per 1,000 individuals, according to reports by the Japan Times and the Parkinson Foundation early and accurate diagnosis of PD is crucial for improving patient outcomes. While numerous studies have utilized machine learning (ML) and deep learning (DL) techniques for PD recognition, existing surveys are limited in scope, often focusing on single data modalities and failing to capture the potential of multimodal approaches. To address these gaps, this study presents a comprehensive review of PD recognition systems across diverse data modalities, including Magnetic Resonance Imaging (MRI), gait-based pose analysis, gait sensory data, handwriting analysis, speech test data, Electroencephalography (EEG), and multimodal fusion techniques. Based on over 347 articles from leading scientific databases, this review examines key aspects such as data collection methods, settings, feature representations, and system performance, with a focus on recognition accuracy and robustness. This survey aims to serve as a comprehensive resource for researchers, providing actionable guidance for the development of next generation PD recognition systems. By leveraging diverse data modalities and cutting-edge machine learning paradigms, this work contributes to advancing the state of PD diagnostics and improving patient care through innovative, multimodal approaches. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.04664 [pdf, other]

Classification of ADHD and Healthy Children Using EEG Based Multi-Band Spatial Features Enhancement

Authors: Md Bayazid Hossain, Md Anwarul Islam Himel, Md Abdur Rahim, Shabbir Mahmood, Abu Saleh Musa Miah, Jungpil Shin

Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder in children, characterized by difficulties in attention, hyperactivity, and impulsivity. Early and accurate diagnosis of ADHD is critical for effective intervention and management. Electroencephalogram (EEG) signals have emerged as a non-invasive and efficient tool for ADHD detection due to their high temporal… ▽ More Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder in children, characterized by difficulties in attention, hyperactivity, and impulsivity. Early and accurate diagnosis of ADHD is critical for effective intervention and management. Electroencephalogram (EEG) signals have emerged as a non-invasive and efficient tool for ADHD detection due to their high temporal resolution and ability to capture neural dynamics. In this study, we propose a method for classifying ADHD and healthy children using EEG data from the benchmark dataset. There were 61 children with ADHD and 60 healthy children, both boys and girls, aged 7 to 12. The EEG signals, recorded from 19 channels, were processed to extract Power Spectral Density (PSD) and Spectral Entropy (SE) features across five frequency bands, resulting in a comprehensive 190-dimensional feature set. To evaluate the classification performance, a Support Vector Machine (SVM) with the RBF kernel demonstrated the best performance with a mean cross-validation accuracy of 99.2\% and a standard deviation of 0.0079, indicating high robustness and precision. These results highlight the potential of spatial features in conjunction with machine learning for accurately classifying ADHD using EEG data. This work contributes to developing non-invasive, data-driven tools for early diagnosis and assessment of ADHD in children. △ Less

Submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.03221 [pdf, other]

Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics

Authors: Jungpil Shin, Abu Saleh Musa Miah, Sota Konnai, Shu Hoshitaka, Pankoo Kim

Abstract: Hand gesture recognition using multichannel surface electromyography (sEMG) is challenging due to unstable predictions and inefficient time-varying feature enhancement. To overcome the lack of signal based time-varying feature problems, we propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach to build an effecti… ▽ More Hand gesture recognition using multichannel surface electromyography (sEMG) is challenging due to unstable predictions and inefficient time-varying feature enhancement. To overcome the lack of signal based time-varying feature problems, we propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach to build an effective sEMG-based hand gesture recognition system. Each branch of the proposed model was designed to extract hierarchical features, capturing both global and detailed spatial-temporal relationships to ensure feature effectiveness. The first branch, utilizing a Bidirectional-TCN (Bi-TCN), focuses on capturing long-term temporal dependencies by modelling past and future temporal contexts, providing a holistic view of gesture dynamics. The second branch, incorporating a 1D Convolutional layer, separable CNN, and Squeeze-and-Excitation (SE) block, efficiently extracts spatial-temporal features while emphasizing critical feature channels, enhancing feature relevance. The third branch, combining a Temporal Convolutional Network (TCN) and Bidirectional LSTM (BiLSTM), captures bidirectional temporal relationships and time-varying patterns. Outputs from all branches are fused using concatenation to capture subtle variations in the data and then refined with a channel attention module, selectively focusing on the most informative features while improving computational efficiency. The proposed model was tested on the Ninapro DB2, DB4, and DB5 datasets, achieving accuracy rates of 96.41%, 92.40%, and 93.34%, respectively. These results demonstrate the capability of the system to handle complex sEMG dynamics, offering advancements in prosthetic limb control and human-machine interface technologies with significant implications for assistive technologies. △ Less

Submitted 4 April, 2025; originally announced April 2025.

arXiv:2503.16855 [pdf, other]

Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition

Authors: Koki Hirooka, Abu Saleh Musa Miah, Tatsuya Murakami, Yuto Akiba, Yong Seok Hwang, Jungpil Shin

Abstract: Hand gesture-based Sign Language Recognition (SLR) serves as a crucial communication bridge between deaf and non-deaf individuals. Existing SLR systems perform well for their cultural SL but may struggle with multi-cultural sign languages (McSL). To address these challenges, this paper proposes a Stack Spatial-Temporal Transformer Network that leverages multi-head attention mechanisms to capture b… ▽ More Hand gesture-based Sign Language Recognition (SLR) serves as a crucial communication bridge between deaf and non-deaf individuals. Existing SLR systems perform well for their cultural SL but may struggle with multi-cultural sign languages (McSL). To address these challenges, this paper proposes a Stack Spatial-Temporal Transformer Network that leverages multi-head attention mechanisms to capture both spatial and temporal dependencies with hierarchical features using the Stack Transfer concept. In the proceed, firstly, we applied a fully connected layer to make a embedding vector which has high expressive power from the original dataset, then fed them a stack newly proposed transformer to achieve hierarchical features with short-range and long-range dependency. The network architecture is composed of several stages that process spatial and temporal relationships sequentially, ensuring effective feature extraction. After making the fully connected layer, the embedding vector is processed by the Spatial Multi-Head Attention Transformer, which captures spatial dependencies between joints. In the next stage, the Temporal Multi-Head Attention Transformer captures long-range temporal dependencies, and again, the features are concatenated with the output using another skip connection. The processed features are then passed to the Feed-Forward Network (FFN), which refines the feature representations further. After the FFN, additional skip connections are applied to combine the output with earlier layers, followed by a final normalization layer to produce the final output feature tensor. This process is repeated for 10 transformer blocks. The extensive experiment shows that the JSL, KSL and ASL datasets achieved good performance accuracy. Our approach demonstrates improved performance in McSL, and it will be consider as a novel work in this domain. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2501.07039 [pdf, other]

IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

Authors: Subrata Kumer Paul, Abu Saleh Musa Miah, Rakhi Rani Paul, Md. Ekramul Hamid, Jungpil Shin, Md Abdur Rahim

Abstract: The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients. Recognizing medical-related human activities (MRHA) is pivotal for healthcare systems, particularly for identifying actions that are critical to patient well-being. However, challenges such as high computational demands, low accuracy, and limited… ▽ More The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients. Recognizing medical-related human activities (MRHA) is pivotal for healthcare systems, particularly for identifying actions that are critical to patient well-being. However, challenges such as high computational demands, low accuracy, and limited adaptability persist in Human Motion Recognition (HMR). While some studies have integrated HMR with IoT for real-time healthcare applications, limited research has focused on recognizing MRHA as essential for effective patient monitoring. This study proposes a novel HMR method for MRHA detection, leveraging multi-stage deep learning techniques integrated with IoT. The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions (MBConv) blocks, followed by ConvLSTM to capture spatio-temporal patterns. A classification module with global average pooling, a fully connected layer, and a dropout layer generates the final predictions. The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets, focusing on MRHA, such as sneezing, falling, walking, sitting, etc. It achieves 94.85% accuracy for cross-subject evaluations and 96.45% for cross-view evaluations on NTU RGB+D 120, along with 89.00% accuracy on HMDB51. Additionally, the system integrates IoT capabilities using a Raspberry Pi and GSM module, delivering real-time alerts via Twilios SMS service to caregivers and patients. This scalable and efficient solution bridges the gap between HMR and IoT, advancing patient monitoring, improving healthcare outcomes, and reducing costs. △ Less

Submitted 12 January, 2025; originally announced January 2025.

arXiv:2501.02014 [pdf, other]

doi 10.1109/ACCESS.2025.3553528

Machine Learning-Based Differential Diagnosis of Parkinson's Disease Using Kinematic Feature Extraction and Selection

Authors: Masahiro Matsumoto, Abu Saleh Musa Miah, Nobuyoshi Asai, Jungpil Shin

Abstract: Parkinson's disease (PD), the second most common neurodegenerative disorder, is characterized by dopaminergic neuron loss and the accumulation of abnormal synuclein. PD presents both motor and non-motor symptoms that progressively impair daily functioning. The severity of these symptoms is typically assessed using the MDS-UPDRS rating scale, which is subjective and dependent on the physician's exp… ▽ More Parkinson's disease (PD), the second most common neurodegenerative disorder, is characterized by dopaminergic neuron loss and the accumulation of abnormal synuclein. PD presents both motor and non-motor symptoms that progressively impair daily functioning. The severity of these symptoms is typically assessed using the MDS-UPDRS rating scale, which is subjective and dependent on the physician's experience. Additionally, PD shares symptoms with other neurodegenerative diseases, such as progressive supranuclear palsy (PSP) and multiple system atrophy (MSA), complicating accurate diagnosis. To address these diagnostic challenges, we propose a machine learning-based system for differential diagnosis of PD, PSP, MSA, and healthy controls (HC). This system utilizes a kinematic feature-based hierarchical feature extraction and selection approach. Initially, 18 kinematic features are extracted, including two newly proposed features: Thumb-to-index vector velocity and acceleration, which provide insights into motor control patterns. In addition, 41 statistical features were extracted here from each kinematic feature, including some new approaches such as Average Absolute Change, Rhythm, Amplitude, Frequency, Standard Deviation of Frequency, and Slope. Feature selection is performed using One-way ANOVA to rank features, followed by Sequential Forward Floating Selection (SFFS) to identify the most relevant ones, aiming to reduce the computational complexity. The final feature set is used for classification, achieving a classification accuracy of 66.67% for each dataset and 88.89% for each patient, with particularly high performance for the MSA and HC groups using the SVM algorithm. This system shows potential as a rapid and accurate diagnostic tool in clinical practice, though further data collection and refinement are needed to enhance its reliability. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Journal ref: IEEE Access, vol. 13, pp. 54090-54104, 2025

arXiv:2412.09330 [pdf, other]

Computer-Aided Osteoporosis Diagnosis Using Transfer Learning with Enhanced Features from Stacked Deep Learning Modules

Authors: Ayesha Siddiqua, Rakibul Hasan, Anichur Rahman, Abu Saleh Musa Miah

Abstract: Knee osteoporosis weakens the bone tissue in the knee joint, increasing fracture risk. Early detection through X-ray images enables timely intervention and improved patient outcomes. While some researchers have focused on diagnosing knee osteoporosis through manual radiology evaluation and traditional machine learning using hand-crafted features, these methods often struggle with performance and e… ▽ More Knee osteoporosis weakens the bone tissue in the knee joint, increasing fracture risk. Early detection through X-ray images enables timely intervention and improved patient outcomes. While some researchers have focused on diagnosing knee osteoporosis through manual radiology evaluation and traditional machine learning using hand-crafted features, these methods often struggle with performance and efficiency due to reliance on manual feature extraction and subjective interpretation. In this study, we propose a computer-aided diagnosis (CAD) system for knee osteoporosis, combining transfer learning with stacked feature enhancement deep learning blocks. Initially, knee X-ray images are preprocessed, and features are extracted using a pre-trained Convolutional Neural Network (CNN). These features are then enhanced through five sequential Conv-RELU-MaxPooling blocks. The Conv2D layers detect low-level features, while the ReLU activations introduce non-linearity, allowing the network to learn complex patterns. MaxPooling layers down-sample the features, retaining the most important spatial information. This sequential processing enables the model to capture complex, high-level features related to bone structure, joint deformation, and osteoporotic markers. The enhanced features are passed through a classification module to differentiate between healthy and osteoporotic knee conditions. Extensive experiments on three individual datasets and a combined dataset demonstrate that our model achieves 97.32%, 98.24%, 97.27%, and 98.00% accuracy for OKX Kaggle Binary, KXO-Mendeley Multi-Class, OKX Kaggle Multi-Class, and the combined dataset, respectively, showing an improvement of around 2% over existing methods. △ Less

Submitted 12 December, 2024; originally announced December 2024.

arXiv:2412.04792 [pdf, other]

Multi-class heart disease Detection, Classification, and Prediction using Machine Learning Models

Authors: Mahfuzul Haque, Abu Saleh Musa Miah, Debashish Gupta, Md. Maruf Al Hossain Prince, Tanzina Alam, Nusrat Sharmin, Mohammed Sowket Ali, Jungpil Shin

Abstract: Heart disease is a leading cause of premature death worldwide, particularly among middle-aged and older adults, with men experiencing a higher prevalence. According to the World Health Organization (WHO), non-communicable diseases, including heart disease, account for 25\% (17.9 million) of global deaths, with over 43,204 annual fatalities in Bangladesh. However, the development of heart disease d… ▽ More Heart disease is a leading cause of premature death worldwide, particularly among middle-aged and older adults, with men experiencing a higher prevalence. According to the World Health Organization (WHO), non-communicable diseases, including heart disease, account for 25\% (17.9 million) of global deaths, with over 43,204 annual fatalities in Bangladesh. However, the development of heart disease detection (HDD) systems tailored to the Bangladeshi population remains underexplored due to the lack of benchmark datasets and reliance on manual or limited-data approaches. This study addresses these challenges by introducing new, ethically sourced HDD dataset, BIG-Dataset and CD dataset which incorporates comprehensive data on symptoms, examination techniques, and risk factors. Using advanced machine learning techniques, including Logistic Regression and Random Forest, we achieved a remarkable testing accuracy of up to 96.6\% with Random Forest. The proposed AI-driven system integrates these models and datasets to provide real-time, accurate diagnostics and personalized healthcare recommendations. By leveraging structured datasets and state-of-the-art machine learning algorithms, this research offers an innovative solution for scalable and effective heart disease detection, with the potential to reduce mortality rates and improve clinical outcomes. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2411.10661 [pdf, other]

Enhancing PTSD Outcome Prediction with Ensemble Models in Disaster Contexts

Authors: Ayesha Siddiqua, Atib Mohammad Oni, Abu Saleh Musa Miah, Jungpil Shin

Abstract: Post-traumatic stress disorder (PTSD) is a significant mental health challenge that affects individuals exposed to traumatic events. Early detection and effective intervention for PTSD are crucial, as it can lead to long-term psychological distress if untreated. Accurate detection of PTSD is essential for timely and targeted mental health interventions, especially in disaster-affected populations.… ▽ More Post-traumatic stress disorder (PTSD) is a significant mental health challenge that affects individuals exposed to traumatic events. Early detection and effective intervention for PTSD are crucial, as it can lead to long-term psychological distress if untreated. Accurate detection of PTSD is essential for timely and targeted mental health interventions, especially in disaster-affected populations. Existing research has explored machine learning approaches for classifying PTSD, but many face limitations in terms of model performance and generalizability. To address these issues, we implemented a comprehensive preprocessing pipeline. This included data cleaning, missing value treatment using the SimpleImputer, label encoding of categorical variables, data augmentation using SMOTE to balance the dataset, and feature scaling with StandardScaler. The dataset was split into 80\% training and 20\% testing. We developed an ensemble model using a majority voting technique among several classifiers, including Logistic Regression, Support Vector Machines (SVM), Random Forest, XGBoost, LightGBM, and a customized Artificial Neural Network (ANN). The ensemble model achieved an accuracy of 96.76\% with a benchmark dataset, significantly outperforming individual models. The proposed method's advantages include improved robustness through the combination of multiple models, enhanced ability to generalize across diverse data points, and increased accuracy in detecting PTSD. Additionally, the use of SMOTE for data augmentation ensured better handling of imbalanced datasets, leading to more reliable predictions. The proposed approach offers valuable insights for policymakers and healthcare providers by leveraging predictive analytics to address mental health issues in vulnerable populations, particularly those affected by disasters. △ Less

Submitted 15 November, 2024; originally announced November 2024.

arXiv:2411.02816 [pdf, other]

ChatGPT in Research and Education: Exploring Benefits and Threats

Authors: Abu Saleh Musa Miah, Md Mahbubur Rahman Tusher, Md. Moazzem Hossain, Md Mamun Hossain, Md Abdur Rahim, Md Ekramul Hamid, Md. Saiful Islam, Jungpil Shin

Abstract: In recent years, advanced artificial intelligence technologies, such as ChatGPT, have significantly impacted various fields, including education and research. Developed by OpenAI, ChatGPT is a powerful language model that presents numerous opportunities for students and educators. It offers personalized feedback, enhances accessibility, enables interactive conversations, assists with lesson prepar… ▽ More In recent years, advanced artificial intelligence technologies, such as ChatGPT, have significantly impacted various fields, including education and research. Developed by OpenAI, ChatGPT is a powerful language model that presents numerous opportunities for students and educators. It offers personalized feedback, enhances accessibility, enables interactive conversations, assists with lesson preparation and evaluation, and introduces new methods for teaching complex subjects. However, ChatGPT also poses challenges to traditional education and research systems. These challenges include the risk of cheating on online exams, the generation of human-like text that may compromise academic integrity, a potential decline in critical thinking skills, and difficulties in assessing the reliability of information generated by AI. This study examines both the opportunities and challenges ChatGPT brings to education from the perspectives of students and educators. Specifically, it explores the role of ChatGPT in helping students develop their subjective skills. To demonstrate its effectiveness, we conducted several subjective experiments using ChatGPT, such as generating solutions from subjective problem descriptions. Additionally, surveys were conducted with students and teachers to gather insights into how ChatGPT supports subjective learning and teaching. The results and analysis of these surveys are presented to highlight the impact of ChatGPT in this context. △ Less

Submitted 5 November, 2024; originally announced November 2024.

arXiv:2409.20384 [pdf, other]

FireLite: Leveraging Transfer Learning for Efficient Fire Detection in Resource-Constrained Environments

Authors: Mahamudul Hasan, Md Maruf Al Hossain Prince, Mohammad Samar Ansari, Sabrina Jahan, Abu Saleh Musa Miah, Jungpil Shin

Abstract: Fire hazards are extremely dangerous, particularly in sectors such as the transportation industry, where political unrest increases the likelihood of their occurrence. By employing IP cameras to facilitate the setup of fire detection systems on transport vehicles, losses from fire events may be prevented proactively. However, the development of lightweight fire detection models is required due to… ▽ More Fire hazards are extremely dangerous, particularly in sectors such as the transportation industry, where political unrest increases the likelihood of their occurrence. By employing IP cameras to facilitate the setup of fire detection systems on transport vehicles, losses from fire events may be prevented proactively. However, the development of lightweight fire detection models is required due to the computational constraints of the embedded systems within these cameras. We introduce FireLite, a low-parameter convolutional neural network (CNN) designed for quick fire detection in contexts with limited resources, in response to this difficulty. With an accuracy of 98.77\%, our model -- which has just 34,978 trainable parameters achieves remarkable performance numbers. It also shows a validation loss of 8.74 and peaks at 98.77 for precision, recall, and F1-score measures. Because of its precision and efficiency, FireLite is a promising solution for fire detection in resource-constrained environments. △ Less

Submitted 30 September, 2024; originally announced September 2024.

arXiv:2409.11223 [pdf, other]

doi 10.1109/OJCS.2024.3517154

Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection

Authors: Yuta Kaneko, Abu Saleh Musa Miah, Najmul Hassan, Hyoun-Sup Lee, Si-Woong Jang, Jungpil Shin

Abstract: Weakly supervised video anomaly detection (WS-VAD) is a crucial area in computer vision for developing intelligent surveillance systems. This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features using an enhanced attention module to improve detection accuracy and robustness. In the first stream, we emp… ▽ More Weakly supervised video anomaly detection (WS-VAD) is a crucial area in computer vision for developing intelligent surveillance systems. This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features using an enhanced attention module to improve detection accuracy and robustness. In the first stream, we employed an attention-based, multi-stage feature enhancement approach to improve spatial and temporal features from the RGB video where the first stage consists of a ViT-based CLIP module, with top-k features concatenated in parallel with I3D and Temporal Contextual Aggregation (TCA) based rich spatiotemporal features. The second stage effectively captures temporal dependencies using the Uncertainty-Regulated Dual Memory Units (UR-DMU) model, which learns representations of normal and abnormal data simultaneously, and the third stage is employed to select the most relevant spatiotemporal features. The second stream extracted enhanced attention-based spatiotemporal features from the flow data modality-based feature by taking advantage of the integration of the deep learning and attention module. The audio stream captures auditory cues using an attention module integrated with the VGGish model, aiming to detect anomalies based on sound patterns. These streams enrich the model by incorporating motion and audio signals often indicative of abnormal events undetectable through visual analysis alone. The concatenation of the multimodal fusion leverages the strengths of each modality, resulting in a comprehensive feature set that significantly improves anomaly detection accuracy and robustness across three datasets. The extensive experiment and high performance with the three benchmark datasets proved the effectiveness of the proposed system over the existing state-of-the-art system. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Journal ref: IEEE Open Journal of the Computer Society, vol. 6, pp. 129-140, 2025

arXiv:2408.14111 [pdf, other]

Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model

Authors: Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Md Hadiuzzaman, Muhammad Nazrul Islam, Jungpil Shin

Abstract: Hand gesture-based sign language recognition (SLR) is one of the most advanced applications of machine learning, and computer vision uses hand gestures. Although, in the past few years, many researchers have widely explored and studied how to address BSL problems, specific unaddressed issues remain, such as skeleton and transformer-based BSL recognition. In addition, the lack of evaluation of the… ▽ More Hand gesture-based sign language recognition (SLR) is one of the most advanced applications of machine learning, and computer vision uses hand gestures. Although, in the past few years, many researchers have widely explored and studied how to address BSL problems, specific unaddressed issues remain, such as skeleton and transformer-based BSL recognition. In addition, the lack of evaluation of the BSL model in various concealed environmental conditions can prove the generalized property of the existing model by facing daily life signs. As a consequence, existing BSL recognition systems provide a limited perspective of their generalisation ability as they are tested on datasets containing few BSL alphabets that have a wide disparity in gestures and are easy to differentiate. To overcome these limitations, we propose a spatial-temporal attention-based BSL recognition model considering hand joint skeletons extracted from the sequence of images. The main aim of utilising hand skeleton-based BSL data is to ensure the privacy and low-resolution sequence of images, which need minimum computational cost and low hardware configurations. Our model captures discriminative structural displacements and short-range dependency based on unified joint features projected onto high-dimensional feature space. Specifically, the use of Separable TCN combined with a powerful multi-head spatial-temporal attention architecture generated high-performance accuracy. The extensive experiments with a proposed dataset and two benchmark BSL datasets with a wide range of evaluations, such as intra- and inter-dataset evaluation settings, demonstrated that our proposed models achieve competitive performance with extremely low computational complexity and run faster than existing models. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13723 [pdf, other]

doi 0.1038/s41598-024-72996-7

EMG-Based Hand Gesture Recognition through Diverse Domain Feature Enhancement and Machine Learning-Based Approach

Authors: Abu Saleh Musa Miah, Najmul Hassan, Md. Maniruzzaman, Nobuyoshi Asai, Jungpil Shin

Abstract: Surface electromyography (EMG) serves as a pivotal tool in hand gesture recognition and human-computer interaction, offering a non-invasive means of signal acquisition. This study presents a novel methodology for classifying hand gestures using EMG signals. To address the challenges associated with feature extraction where, we explored 23 distinct morphological, time domain and frequency domain fe… ▽ More Surface electromyography (EMG) serves as a pivotal tool in hand gesture recognition and human-computer interaction, offering a non-invasive means of signal acquisition. This study presents a novel methodology for classifying hand gestures using EMG signals. To address the challenges associated with feature extraction where, we explored 23 distinct morphological, time domain and frequency domain feature extraction techniques. However, the substantial size of the features may increase the computational complexity issues that can hinder machine learning algorithm performance. We employ an efficient feature selection approach, specifically an extra tree classifier, to mitigate this. The selected potential feature fed into the various machine learning-based classification algorithms where our model achieved 97.43\% accuracy with the KNN algorithm and selected feature. By leveraging a comprehensive feature extraction and selection strategy, our methodology enhances the accuracy and usability of EMG-based hand gesture recognition systems. The higher performance accuracy proves the effectiveness of the proposed model over the existing system. \keywords{EMG signal, machine learning approach, hand gesture recognition. △ Less

Submitted 25 August, 2024; originally announced August 2024.

Journal ref: Sci Rep 14, 22061 (2024)

arXiv:2408.12211 [pdf, other]

Computer-Aided Fall Recognition Using a Three-Stream Spatial-Temporal GCN Model with Adaptive Feature Aggregation

Authors: Jungpil Shin, Abu Saleh Musa Miah, Rei Egawa1, Koki Hirooka, Md. Al Mehedi Hasan, Yoichi Tomioka, Yong Seok Hwang

Abstract: The prevention of falls is paramount in modern healthcare, particularly for the elderly, as falls can lead to severe injuries or even fatalities. Additionally, the growing incidence of falls among the elderly, coupled with the urgent need to prevent suicide attempts resulting from medication overdose, underscores the critical importance of accurate and efficient fall detection methods. In this sce… ▽ More The prevention of falls is paramount in modern healthcare, particularly for the elderly, as falls can lead to severe injuries or even fatalities. Additionally, the growing incidence of falls among the elderly, coupled with the urgent need to prevent suicide attempts resulting from medication overdose, underscores the critical importance of accurate and efficient fall detection methods. In this scenario, a computer-aided fall detection system is inevitable to save elderly people's lives worldwide. Many researchers have been working to develop fall detection systems. However, the existing fall detection systems often struggle with issues such as unsatisfactory performance accuracy, limited robustness, high computational complexity, and sensitivity to environmental factors due to a lack of effective features. In response to these challenges, this paper proposes a novel three-stream spatial-temporal feature-based fall detection system. Our system incorporates joint skeleton-based spatial and temporal Graph Convolutional Network (GCN) features, joint motion-based spatial and temporal GCN features, and residual connections-based features. Each stream employs adaptive graph-based feature aggregation and consecutive separable convolutional neural networks (Sep-TCN), significantly reducing computational complexity and model parameters compared to prior systems. Experimental results across multiple datasets demonstrate the superior effectiveness and efficiency of our proposed system, with accuracies of 99.51\%, 99.15\%, 99.79\% and 99.85 \% achieved on the ImViA, UR-Fall, Fall-UP and FU-Kinect datasets, respectively. The remarkable performance of our system highlights its superiority, efficiency, and generalizability in real-world fall detection scenarios, offering significant advancements in healthcare and societal well-being. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.10955 [pdf, other]

Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter

Authors: Farhanul Haque, Md. Al-Hasan, Sumaiya Tabssum Mou, Abu Saleh Musa Miah, Jungpil Shin, Md Abdur Rahim

Abstract: The Bengali language is the 5th most spoken native and 7th most spoken language in the world, and Bengali handwritten character recognition has attracted researchers for decades. However, other languages such as English, Arabic, Turkey, and Chinese character recognition have contributed significantly to developing handwriting recognition systems. Still, little research has been done on Bengali cha… ▽ More The Bengali language is the 5th most spoken native and 7th most spoken language in the world, and Bengali handwritten character recognition has attracted researchers for decades. However, other languages such as English, Arabic, Turkey, and Chinese character recognition have contributed significantly to developing handwriting recognition systems. Still, little research has been done on Bengali character recognition because of the similarity of the character, curvature and other complexities. However, many researchers have used traditional machine learning and deep learning models to conduct Bengali hand-written recognition. The study employed a convolutional neural network (CNN) with ensemble transfer learning and a multichannel attention network. We generated the feature from the two branches of the CNN, including Inception Net and ResNet and then produced an ensemble feature fusion by concatenating them. After that, we applied the attention module to produce the contextual information from the ensemble features. Finally, we applied a classification module to refine the features and classification. We evaluated the proposed model using the CAMTERdb 3.1.2 data set and achieved 92\% accuracy for the raw dataset and 98.00\% for the preprocessed dataset. We believe that our contribution to the Bengali handwritten character recognition domain will be considered a great development. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.10518 [pdf, other]

BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition

Authors: Md Hadiuzzaman, Mohammed Sowket Ali, Tamanna Sultana, Abdur Raj Shafi, Abu Saleh Musa Miah, Jungpil Shin

Abstract: People commonly communicate in English, Arabic, and Bengali spoken languages through various mediums. However, deaf and hard-of-hearing individuals primarily use body language and sign language to express their needs and achieve independence. Sign language research is burgeoning to enhance communication with the deaf community. While many researchers have made strides in recognizing sign languages… ▽ More People commonly communicate in English, Arabic, and Bengali spoken languages through various mediums. However, deaf and hard-of-hearing individuals primarily use body language and sign language to express their needs and achieve independence. Sign language research is burgeoning to enhance communication with the deaf community. While many researchers have made strides in recognizing sign languages such as French, British, Arabic, Turkish, and American, there has been limited research on Bangla sign language (BdSL) with less-than-satisfactory results. One significant barrier has been the lack of a comprehensive Bangla sign language dataset. In our work, we introduced a new BdSL dataset comprising alphabets totaling 18,000 images, with each image being 224x224 pixels in size. Our dataset encompasses 36 Bengali symbols, of which 30 are consonants and the remaining six are vowels. Despite our dataset contribution, many existing systems continue to grapple with achieving high-performance accuracy for BdSL. To address this, we devised a hybrid Convolutional Neural Network (CNN) model, integrating multiple convolutional layers, activation functions, dropout techniques, and LSTM layers. Upon evaluating our hybrid-CNN model with the newly created BdSL dataset, we achieved an accuracy rate of 97.92\%. We are confident that both our BdSL dataset and hybrid CNN model will be recognized as significant milestones in BdSL research. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.10498 [pdf, other]

Cervical Cancer Detection Using Multi-Branch Deep Learning Model

Authors: Tatsuhiro Baba, Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan

Abstract: Cervical cancer is a crucial global health concern for women, and the persistent infection of High-risk HPV mainly triggers this remains a global health challenge, with young women diagnosis rates soaring from 10\% to 40\% over three decades. While Pap smear screening is a prevalent diagnostic method, visual image analysis can be lengthy and often leads to mistakes. Early detection of the disease… ▽ More Cervical cancer is a crucial global health concern for women, and the persistent infection of High-risk HPV mainly triggers this remains a global health challenge, with young women diagnosis rates soaring from 10\% to 40\% over three decades. While Pap smear screening is a prevalent diagnostic method, visual image analysis can be lengthy and often leads to mistakes. Early detection of the disease can contribute significantly to improving patient outcomes. In recent decades, many researchers have employed machine learning techniques that achieved promise in cervical cancer detection processes based on medical images. In recent years, many researchers have employed various deep-learning techniques to achieve high-performance accuracy in detecting cervical cancer but are still facing various challenges. This research proposes an innovative and novel approach to automate cervical cancer image classification using Multi-Head Self-Attention (MHSA) and convolutional neural networks (CNNs). The proposed method leverages the strengths of both MHSA mechanisms and CNN to effectively capture both local and global features within cervical images in two streams. MHSA facilitates the model's ability to focus on relevant regions of interest, while CNN extracts hierarchical features that contribute to accurate classification. Finally, we combined the two stream features and fed them into the classification module to refine the feature and the classification. To evaluate the performance of the proposed approach, we used the SIPaKMeD dataset, which classifies cervical cells into five categories. Our model achieved a remarkable accuracy of 98.522\%. This performance has high recognition accuracy of medical image classification and holds promise for its applicability in other medical image recognition tasks. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.08035 [pdf, other]

An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition

Authors: Md Abdur Rahim, Abu Saleh Musa Miah, Hemel Sharker Akash, Jungpil Shin, Md. Imran Hossain, Md. Najmul Hossain

Abstract: In the modern context, hand gesture recognition has emerged as a focal point. This is due to its wide range of applications, which include comprehending sign language, factories, hands-free devices, and guiding robots. Many researchers have attempted to develop more effective techniques for recognizing these hand gestures. However, there are challenges like dataset limitations, variations in hand… ▽ More In the modern context, hand gesture recognition has emerged as a focal point. This is due to its wide range of applications, which include comprehending sign language, factories, hands-free devices, and guiding robots. Many researchers have attempted to develop more effective techniques for recognizing these hand gestures. However, there are challenges like dataset limitations, variations in hand forms, external environments, and inconsistent lighting conditions. To address these challenges, we proposed a novel three-stream hybrid model that combines RGB pixel and skeleton-based features to recognize hand gestures. In the procedure, we preprocessed the dataset, including augmentation, to make rotation, translation, and scaling independent systems. We employed a three-stream hybrid model to extract the multi-feature fusion using the power of the deep learning module. In the first stream, we extracted the initial feature using the pre-trained Imagenet module and then enhanced this feature by using a multi-layer of the GRU and LSTM modules. In the second stream, we extracted the initial feature with the pre-trained ReseNet module and enhanced it with the various combinations of the GRU and LSTM modules. In the third stream, we extracted the hand pose key points using the media pipe and then enhanced them using the stacked LSTM to produce the hierarchical feature. After that, we concatenated the three features to produce the final. Finally, we employed a classification module to produce the probabilistic map to generate predicted output. We mainly produced a powerful feature vector by taking advantage of the pixel-based deep learning feature and pos-estimation-based stacked deep learning feature, including a pre-trained model with a scratched deep learning model for unequalled gesture detection capabilities. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.05436 [pdf, other]

doi 10.1109/ACCESS.2024.3456436

A Methodological and Structural Review of Hand Gesture Recognition Across Diverse Data Modalities

Authors: Jungpil Shin, Abu Saleh Musa Miah, Md. Humaun Kabir, Md. Abdur Rahim, Abdullah Al Shiam

Abstract: Researchers have been developing Hand Gesture Recognition (HGR) systems to enhance natural, efficient, and authentic human-computer interaction, especially benefiting those who rely solely on hand gestures for communication. Despite significant progress, the automatic and precise identification of hand gestures remains a considerable challenge in computer vision. Recent studies have focused on spe… ▽ More Researchers have been developing Hand Gesture Recognition (HGR) systems to enhance natural, efficient, and authentic human-computer interaction, especially benefiting those who rely solely on hand gestures for communication. Despite significant progress, the automatic and precise identification of hand gestures remains a considerable challenge in computer vision. Recent studies have focused on specific modalities like RGB images, skeleton data, and spatiotemporal interest points. This paper provides a comprehensive review of HGR techniques and data modalities from 2014 to 2024, exploring advancements in sensor technology and computer vision. We highlight accomplishments using various modalities, including RGB, Skeleton, Depth, Audio, EMG, EEG, and Multimodal approaches and identify areas needing further research. We reviewed over 200 articles from prominent databases, focusing on data collection, data settings, and gesture representation. Our review assesses the efficacy of HGR systems through their recognition accuracy and identifies a gap in research on continuous gesture recognition, indicating the need for improved vision-based gesture systems. The field has experienced steady research progress, including advancements in hand-crafted features and deep learning (DL) techniques. Additionally, we report on the promising developments in HGR methods and the area of multimodal approaches. We hope this survey will serve as a potential guideline for diverse data modality-based HGR research. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Journal ref: IEEE Access-09 September 2024

Showing 1–20 of 20 results for author: Miah, A S M