-
Performance Analysis of Few-Shot Learning Approaches for Bangla Handwritten Character and Digit Recognition
Authors:
Mehedi Ahamed,
Radib Bin Kabir,
Tawsif Tashwar Dipto,
Mueeze Al Mushabbir,
Sabbir Ahmed,
Md. Hasanul Kabir
Abstract:
This study investigates the performance of few-shot learning (FSL) approaches in recognizing Bangla handwritten characters and numerals using limited labeled data. It demonstrates the applicability of these methods to scripts with intricate and complex structures, where dataset scarcity is a common challenge. Given the complexity of Bangla script, we hypothesize that models performing well on thes…
▽ More
This study investigates the performance of few-shot learning (FSL) approaches in recognizing Bangla handwritten characters and numerals using limited labeled data. It demonstrates the applicability of these methods to scripts with intricate and complex structures, where dataset scarcity is a common challenge. Given the complexity of Bangla script, we hypothesize that models performing well on these characters can generalize effectively to languages of similar or lower structural complexity. To this end, we introduce SynergiProtoNet, a hybrid network designed to improve the recognition accuracy of handwritten characters and digits. The model integrates advanced clustering techniques with a robust embedding framework to capture fine-grained details and contextual nuances. It leverages multi-level (both high- and low-level) feature extraction within a prototypical learning framework. We rigorously benchmark SynergiProtoNet against several state-of-the-art few-shot learning models: BD-CSPN, Prototypical Network, Relation Network, Matching Network, and SimpleShot, across diverse evaluation settings including Monolingual Intra-Dataset Evaluation, Monolingual Inter-Dataset Evaluation, Cross-Lingual Transfer, and Split Digit Testing. Experimental results show that SynergiProtoNet consistently outperforms existing methods, establishing a new benchmark in few-shot learning for handwritten character and digit recognition. The code is available on GitHub: https://github.com/MehediAhamed/SynergiProtoNet.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning
Authors:
H M Dipu Kabir,
Subrota Kumar Mondal,
Mohammad Ali Moni
Abstract:
This paper proposes batch augmentation with unimodal fine-tuning to detect the fetus's organs from ultrasound images and associated clinical textual information. We also prescribe pre-training initial layers with investigated medical data before the multimodal training. At first, we apply a transferred initialization with the unimodal image portion of the dataset with batch augmentation. This step…
▽ More
This paper proposes batch augmentation with unimodal fine-tuning to detect the fetus's organs from ultrasound images and associated clinical textual information. We also prescribe pre-training initial layers with investigated medical data before the multimodal training. At first, we apply a transferred initialization with the unimodal image portion of the dataset with batch augmentation. This step adjusts the initial layer weights for medical data. Then, we apply neural networks (NNs) with fine-tuned initial layers to images in batches with batch augmentation to obtain features. We also extract information from descriptions of images. We combine this information with features obtained from images to train the head layer. We write a dataloader script to load the multimodal data and use existing unimodal image augmentation techniques with batch augmentation for the multimodal data. The dataloader brings a new random augmentation for each batch to get a good generalization. We investigate the FPU23 ultrasound and UPMC Food-101 multimodal datasets. The multimodal large language model (LLM) with the proposed training provides the best results among the investigated methods. We receive near state-of-the-art (SOTA) performance on the UPMC Food-101 dataset. We share the scripts of the proposed method with traditional counterparts at the following repository: github.com/dipuk0506/multimodal
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Recognition of Frequencies of Short-Time SSVEP Signals Utilizing an SSCCA-Based Spatio-Spectral Feature Fusion Framework
Authors:
Saif Bashar,
Samia Nasir Nira,
Shabbir Mahmood,
Md. Humaun Kabir,
Sujit Roy,
Iffat Farhana
Abstract:
A brain-computer interface (BCI) facilitates direct communication between the brain and external equipment through EEG, which is preferred for its superior temporal resolution. Among EEG techniques, the steady-state visual evoked potential (SSVEP) is favored due to its robust signal-to-noise ratio, minimal training demands, and elevated information transmission rate. Frequency detection in SSVEP-b…
▽ More
A brain-computer interface (BCI) facilitates direct communication between the brain and external equipment through EEG, which is preferred for its superior temporal resolution. Among EEG techniques, the steady-state visual evoked potential (SSVEP) is favored due to its robust signal-to-noise ratio, minimal training demands, and elevated information transmission rate. Frequency detection in SSVEP-based brain-computer interfaces commonly employs canonical correlation analysis (CCA). SSCCA (spatio-spectral canonical correlation analysis) augments CCA by refining spatial filtering. This paper presents a multistage feature fusion methodology for short-duration SSVEP frequency identification, employing SSCCA with template signals derived via leave-one-out cross-validation (LOOCV). A filterbank generates bandpass filters for stimulus frequencies and their harmonics, whereas SSCCA calculates correlation coefficients between subbands and templates. Two phases of non-linear weighting amalgamate these coefficients to discern the target stimulus. This multistage methodology surpasses traditional techniques, attaining a accuracy of 94.5%.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
PC-DeepNet: A GNSS Positioning Error Minimization Framework Using Permutation-Invariant Deep Neural Network
Authors:
M. Humayun Kabir,
Md. Ali Hasan,
Md. Shafiqul Islam,
Kyeongjun Ko,
Wonjae Shin
Abstract:
Global navigation satellite systems (GNSS) face significant challenges in urban and sub-urban areas due to non-line-of-sight (NLOS) propagation, multipath effects, and low received power levels, resulting in highly non-linear and non-Gaussian measurement error distributions. In light of this, conventional model-based positioning approaches, which rely on Gaussian error approximations, struggle to…
▽ More
Global navigation satellite systems (GNSS) face significant challenges in urban and sub-urban areas due to non-line-of-sight (NLOS) propagation, multipath effects, and low received power levels, resulting in highly non-linear and non-Gaussian measurement error distributions. In light of this, conventional model-based positioning approaches, which rely on Gaussian error approximations, struggle to achieve precise localization under these conditions. To overcome these challenges, we put forth a novel learning-based framework, PC-DeepNet, that employs a permutation-invariant (PI) deep neural network (DNN) to estimate position corrections (PC). This approach is designed to ensure robustness against changes in the number and/or order of visible satellite measurements, a common issue in GNSS systems, while leveraging NLOS and multipath indicators as features to enhance positioning accuracy in challenging urban and sub-urban environments. To validate the performance of the proposed framework, we compare the positioning error with state-of-the-art model-based and learning-based positioning methods using two publicly available datasets. The results confirm that proposed PC-DeepNet achieves superior accuracy than existing model-based and learning-based methods while exhibiting lower computational complexity compared to previous learning-based approaches.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
SCReedSolo: A Secure and Robust LSB Image Steganography Framework with Randomized Symmetric Encryption and Reed-Solomon Coding
Authors:
Syed Rifat Raiyan,
Md. Hasanul Kabir
Abstract:
Image steganography is an information-hiding technique that involves the surreptitious concealment of covert informational content within digital images. In this paper, we introduce ${\rm SCR{\small EED}S{\small OLO}}$, a novel framework for concealing arbitrary binary data within images. Our approach synergistically leverages Random Shuffling, Fernet Symmetric Encryption, and Reed-Solomon Error C…
▽ More
Image steganography is an information-hiding technique that involves the surreptitious concealment of covert informational content within digital images. In this paper, we introduce ${\rm SCR{\small EED}S{\small OLO}}$, a novel framework for concealing arbitrary binary data within images. Our approach synergistically leverages Random Shuffling, Fernet Symmetric Encryption, and Reed-Solomon Error Correction Codes to encode the secret payload, which is then discretely embedded into the carrier image using LSB (Least Significant Bit) Steganography. The combination of these methods addresses the vulnerability vectors of both security and resilience against bit-level corruption in the resultant stego-images. We show that our framework achieves a data payload of 3 bits per pixel for an RGB image, and mathematically assess the probability of successful transmission for the amalgamated $n$ message bits and $k$ error correction bits. Additionally, we find that ${\rm SCR{\small EED}S{\small OLO}}$ yields good results upon being evaluated with multiple performance metrics, successfully eludes detection by various passive steganalysis tools, and is immune to simple active steganalysis attacks. Our code and data are available at https://github.com/Starscream-11813/SCReedSolo-Steganography.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis
Authors:
Kazi Hasibul Kabir,
Md. Zahiruddin Aqib,
Sharmin Sultana,
Shamim Akhter
Abstract:
Monitoring agricultural activities is important to ensure food security. Remote sensing plays a significant role for large-scale continuous monitoring of cultivation activities. Time series remote sensing data were used for the generation of the cropping pattern. Classification algorithms are used to classify crop patterns and mapped agriculture land used. Some conventional classification methods…
▽ More
Monitoring agricultural activities is important to ensure food security. Remote sensing plays a significant role for large-scale continuous monitoring of cultivation activities. Time series remote sensing data were used for the generation of the cropping pattern. Classification algorithms are used to classify crop patterns and mapped agriculture land used. Some conventional classification methods including support vector machine (SVM) and decision trees were applied for crop pattern recognition. However, in this paper, we are proposing Deep Neural Network (DNN) based classification to improve the performance of crop pattern recognition and make a comparative analysis with two (2) other machine learning approaches including Naive Bayes and Random Forest.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
FUSED-Net: Detecting Traffic Signs with Limited Data
Authors:
Md. Atiqur Rahman,
Nahian Ibn Asad,
Md. Mushfiqul Haque Omi,
Md. Bakhtiar Hasan,
Sabbir Ahmed,
Md. Hasanul Kabir
Abstract:
Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In th…
▽ More
Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In this connection, we present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection, enhanced by Unfrozen Parameters, Pseudo-Support Sets, Embedding Normalization, and Domain Adaptation while reducing data requirement. Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples. The generation of a Pseudo-Support Set through data augmentation further enhances performance by compensating for the scarcity of target domain data. Additionally, Embedding Normalization is incorporated to reduce intra-class variance, standardizing feature representation. Domain Adaptation, achieved by pre-training on a diverse traffic sign dataset distinct from the target domain, improves model generalization. Evaluating FUSED-Net on the BDTSD dataset, we achieved 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively compared to the state-of-the-art Few-Shot Object Detection (FSOD) models. Additionally, we outperform state-of-the-art works on the cross-domain FSOD benchmark under several scenarios.
△ Less
Submitted 3 January, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
A Double-Difference Doppler Shift-Based Positioning Framework with Ephemeris Error Correction of LEO Satellites
Authors:
Md. Ali Hasan,
M. Humayun Kabir,
Md. Shafiqul Islam,
Sangmin Han,
Wonjae Shin
Abstract:
In signals of opportunity (SOPs)-based positioning utilizing low Earth orbit (LEO) satellites, ephemeris data derived from two-line element files can introduce increasing error over time. To handle the erroneous measurement, an additional base receiver with a known position is often used to compensate for the effect of ephemeris error when positioning the user terminal (UT). However, this approach…
▽ More
In signals of opportunity (SOPs)-based positioning utilizing low Earth orbit (LEO) satellites, ephemeris data derived from two-line element files can introduce increasing error over time. To handle the erroneous measurement, an additional base receiver with a known position is often used to compensate for the effect of ephemeris error when positioning the user terminal (UT). However, this approach is insufficient for the long baseline (the distance between the base receiver and UT) as it fails to adequately correct Doppler shift measurement errors caused by ephemeris inaccuracies, resulting in degraded positioning performance. Moreover, the lack of clock synchronization between the base receiver and UT exacerbates erroneous Doppler shift measurements. To address these challenges, we put forth a robust double-difference Doppler shift-based positioning framework, coined 3DPose, to handle the clock synchronization issue between the base receiver and UT, and positioning degradation due to the long baseline. The proposed 3DPose framework leverages double-difference Doppler shift measurements to eliminate the clock synchronization issue and incorporates a novel ephemeris error correction algorithm to enhance UT positioning accuracy in case of the long baseline. The algorithm specifically characterizes and corrects the Doppler shift measurement errors arising from erroneous ephemeris data, focusing on satellite position errors in the tangential direction. To validate the effectiveness of the proposed framework, we conduct comparative analyses across three different scenarios, contrasting its performance with the existing differential Doppler positioning method. The results demonstrate that the proposed 3DPose framework achieves an average reduction of 90% in 3-dimensional positioning errors compared to the existing differential Doppler approach.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
A Methodological and Structural Review of Hand Gesture Recognition Across Diverse Data Modalities
Authors:
Jungpil Shin,
Abu Saleh Musa Miah,
Md. Humaun Kabir,
Md. Abdur Rahim,
Abdullah Al Shiam
Abstract:
Researchers have been developing Hand Gesture Recognition (HGR) systems to enhance natural, efficient, and authentic human-computer interaction, especially benefiting those who rely solely on hand gestures for communication. Despite significant progress, the automatic and precise identification of hand gestures remains a considerable challenge in computer vision. Recent studies have focused on spe…
▽ More
Researchers have been developing Hand Gesture Recognition (HGR) systems to enhance natural, efficient, and authentic human-computer interaction, especially benefiting those who rely solely on hand gestures for communication. Despite significant progress, the automatic and precise identification of hand gestures remains a considerable challenge in computer vision. Recent studies have focused on specific modalities like RGB images, skeleton data, and spatiotemporal interest points. This paper provides a comprehensive review of HGR techniques and data modalities from 2014 to 2024, exploring advancements in sensor technology and computer vision. We highlight accomplishments using various modalities, including RGB, Skeleton, Depth, Audio, EMG, EEG, and Multimodal approaches and identify areas needing further research. We reviewed over 200 articles from prominent databases, focusing on data collection, data settings, and gesture representation. Our review assesses the efficacy of HGR systems through their recognition accuracy and identifies a gap in research on continuous gesture recognition, indicating the need for improved vision-based gesture systems. The field has experienced steady research progress, including advancements in hand-crafted features and deep learning (DL) techniques. Additionally, we report on the promising developments in HGR methods and the area of multimodal approaches. We hope this survey will serve as a potential guideline for diverse data modality-based HGR research.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
AttResDU-Net: Medical Image Segmentation Using Attention-based Residual Double U-Net
Authors:
Akib Mohammed Khan,
Alif Ashrafee,
Fahim Shahriar Khan,
Md. Bakhtiar Hasan,
Md. Hasanul Kabir
Abstract:
Manually inspecting polyps from a colonoscopy for colorectal cancer or performing a biopsy on skin lesions for skin cancer are time-consuming, laborious, and complex procedures. Automatic medical image segmentation aims to expedite this diagnosis process. However, numerous challenges exist due to significant variations in the appearance and sizes of objects with no distinct boundaries. This paper…
▽ More
Manually inspecting polyps from a colonoscopy for colorectal cancer or performing a biopsy on skin lesions for skin cancer are time-consuming, laborious, and complex procedures. Automatic medical image segmentation aims to expedite this diagnosis process. However, numerous challenges exist due to significant variations in the appearance and sizes of objects with no distinct boundaries. This paper proposes an attention-based residual Double U-Net architecture (AttResDU-Net) that improves on the existing medical image segmentation networks. Inspired by the Double U-Net, this architecture incorporates attention gates on the skip connections and residual connections in the convolutional blocks. The attention gates allow the model to retain more relevant spatial information by suppressing irrelevant feature representation from the down-sampling path for which the model learns to focus on target regions of varying shapes and sizes. Moreover, the residual connections help to train deeper models by ensuring better gradient flow. We conducted experiments on three datasets: CVC Clinic-DB, ISIC 2018, and the 2018 Data Science Bowl datasets and achieved Dice Coefficient scores of 94.35%, 91.68% and 92.45% respectively. Our results suggest that AttResDU-Net can be facilitated as a reliable method for automatic medical image segmentation in practice.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Reduction of Class Activation Uncertainty with Background Information
Authors:
H M Dipu Kabir
Abstract:
Multitask learning is a popular approach to training high-performing neural networks with improved generalization. In this paper, we propose a background class to achieve improved generalization at a lower computation compared to multitask learning to help researchers and organizations with limited computation power. We also present a methodology for selecting background images and discuss potenti…
▽ More
Multitask learning is a popular approach to training high-performing neural networks with improved generalization. In this paper, we propose a background class to achieve improved generalization at a lower computation compared to multitask learning to help researchers and organizations with limited computation power. We also present a methodology for selecting background images and discuss potential future improvements. We apply our approach to several datasets and achieve improved generalization with much lower computation. Through the class activation mappings (CAMs) of the trained models, we observed the tendency towards looking at a bigger picture with the proposed model training methodology. Applying the vision transformer with the proposed background class, we receive state-of-the-art (SOTA) performance on CIFAR-10C, Caltech-101, and CINIC-10 datasets. Example scripts are available in the `CAM' folder of the following GitHub Repository: github.com/dipuk0506/UQ
△ Less
Submitted 10 January, 2025; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Uncertainty Aware Neural Network from Similarity and Sensitivity
Authors:
H M Dipu Kabir,
Subrota Kumar Mondal,
Sadia Khanam,
Abbas Khosravi,
Shafin Rahman,
Mohammad Reza Chalak Qazani,
Roohallah Alizadehsani,
Houshyar Asadi,
Shady Mohamed,
Saeid Nahavandi,
U Rajendra Acharya
Abstract:
Researchers have proposed several approaches for neural network (NN) based uncertainty quantification (UQ). However, most of the approaches are developed considering strong assumptions. Uncertainty quantification algorithms often perform poorly in an input domain and the reason for poor performance remains unknown. Therefore, we present a neural network training method that considers similar sampl…
▽ More
Researchers have proposed several approaches for neural network (NN) based uncertainty quantification (UQ). However, most of the approaches are developed considering strong assumptions. Uncertainty quantification algorithms often perform poorly in an input domain and the reason for poor performance remains unknown. Therefore, we present a neural network training method that considers similar samples with sensitivity awareness in this paper. In the proposed NN training method for UQ, first, we train a shallow NN for the point prediction. Then, we compute the absolute differences between prediction and targets and train another NN for predicting those absolute differences or absolute errors. Domains with high average absolute errors represent a high uncertainty. In the next step, we select each sample in the training set one by one and compute both prediction and error sensitivities. Then we select similar samples with sensitivity consideration and save indexes of similar samples. The ranges of an input parameter become narrower when the output is highly sensitive to that parameter. After that, we construct initial uncertainty bounds (UB) by considering the distribution of sensitivity aware similar samples. Prediction intervals (PIs) from initial uncertainty bounds are larger and cover more samples than required. Therefore, we train bound correction NN. As following all the steps for finding UB for each sample requires a lot of computation and memory access, we train a UB computation NN. The UB computation NN takes an input sample and provides an uncertainty bound. The UB computation NN is the final product of the proposed approach. Scripts of the proposed method are available in the following GitHub repository: github.com/dipuk0506/UQ
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning
Authors:
Tanveer Ahmed Belal,
G. M. Shahariar,
Md. Hasanul Kabir
Abstract:
This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among wh…
▽ More
This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Development of a Voice Controlled Robotic Arm
Authors:
Akkas U. Haque,
Humayun Kabir,
S. C. Banik,
M. T. Islam
Abstract:
This paper describes a robotic arm with 5 degrees-of-freedom (DOF) which is controlled by human voice and has been developed in the Mechatronics Laboratory, CUET. This robotic arm is interfaced with a PC by serial communication (RS-232). Users' voice command is captured by a microphone, and this voice is processed by software which is made by Microsoft visual studio. Then the specific signal (obta…
▽ More
This paper describes a robotic arm with 5 degrees-of-freedom (DOF) which is controlled by human voice and has been developed in the Mechatronics Laboratory, CUET. This robotic arm is interfaced with a PC by serial communication (RS-232). Users' voice command is captured by a microphone, and this voice is processed by software which is made by Microsoft visual studio. Then the specific signal (obtained by signal processing) is sent to control unit. The main control unit that is used in the robotic arm is a microcontroller whose model no. is PIC18f452. Then Control unit drives the actuators, (Hitec HS-422, HS-81) according to the signal or signals to give required motion of the robotic arm. At present the robotic arm can perform a set action like pick & pull, gripping, holding & releasing, and some other extra function like dance-like movement, and can turn according to the voice commands.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Land Cover and Land Use Detection using Semi-Supervised Learning
Authors:
Fahmida Tasnim Lisa,
Md. Zarif Hossain,
Sharmin Naj Mou,
Shahriar Ivan,
Md. Hasanul Kabir
Abstract:
Semi-supervised learning (SSL) has made significant strides in the field of remote sensing. Finding a large number of labeled datasets for SSL methods is uncommon, and manually labeling datasets is expensive and time-consuming. Furthermore, accurately identifying remote sensing satellite images is more complicated than it is for conventional images. Class-imbalanced datasets are another prevalent…
▽ More
Semi-supervised learning (SSL) has made significant strides in the field of remote sensing. Finding a large number of labeled datasets for SSL methods is uncommon, and manually labeling datasets is expensive and time-consuming. Furthermore, accurately identifying remote sensing satellite images is more complicated than it is for conventional images. Class-imbalanced datasets are another prevalent phenomenon, and models trained on these become biased towards the majority classes. This becomes a critical issue with an SSL model's subpar performance. We aim to address the issue of labeling unlabeled data and also solve the model bias problem due to imbalanced datasets while achieving better accuracy. To accomplish this, we create "artificial" labels and train a model to have reasonable accuracy. We iteratively redistribute the classes through resampling using a distribution alignment technique. We use a variety of class imbalanced satellite image datasets: EuroSAT, UCM, and WHU-RS19. On UCM balanced dataset, our method outperforms previous methods MSMatch and FixMatch by 1.21% and 0.6%, respectively. For imbalanced EuroSAT, our method outperforms MSMatch and FixMatch by 1.08% and 1%, respectively. Our approach significantly lessens the requirement for labeled data, consistently outperforms alternative approaches, and resolves the issue of model bias caused by class imbalance in datasets.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Performance Analysis of YOLO-based Architectures for Vehicle Detection from Traffic Images in Bangladesh
Authors:
Refaat Mohammad Alamgir,
Ali Abir Shuvro,
Mueeze Al Mushabbir,
Mohammed Ashfaq Raiyan,
Nusrat Jahan Rani,
Md. Mushfiqur Rahman,
Md. Hasanul Kabir,
Sabbir Ahmed
Abstract:
The task of locating and classifying different types of vehicles has become a vital element in numerous applications of automation and intelligent systems ranging from traffic surveillance to vehicle identification and many more. In recent times, Deep Learning models have been dominating the field of vehicle detection. Yet, Bangladeshi vehicle detection has remained a relatively unexplored area. O…
▽ More
The task of locating and classifying different types of vehicles has become a vital element in numerous applications of automation and intelligent systems ranging from traffic surveillance to vehicle identification and many more. In recent times, Deep Learning models have been dominating the field of vehicle detection. Yet, Bangladeshi vehicle detection has remained a relatively unexplored area. One of the main goals of vehicle detection is its real-time application, where `You Only Look Once' (YOLO) models have proven to be the most effective architecture. In this work, intending to find the best-suited YOLO architecture for fast and accurate vehicle detection from traffic images in Bangladesh, we have conducted a performance analysis of different variants of the YOLO-based architectures such as YOLOV3, YOLOV5s, and YOLOV5x. The models were trained on a dataset containing 7390 images belonging to 21 types of vehicles comprising samples from the DhakaAI dataset, the Poribohon-BD dataset, and our self-collected images. After thorough quantitative and qualitative analysis, we found the YOLOV5x variant to be the best-suited model, performing better than YOLOv3 and YOLOv5s models respectively by 7 & 4 percent in mAP, and 12 & 8.5 percent in terms of Accuracy.
△ Less
Submitted 24 December, 2022; v1 submitted 18 December, 2022;
originally announced December 2022.
-
Huruf: An Application for Arabic Handwritten Character Recognition Using Deep Learning
Authors:
Minhaz Kamal,
Fairuz Shaiara,
Chowdhury Mohammad Abdullah,
Sabbir Ahmed,
Tasnim Ahmed,
Md. Hasanul Kabir
Abstract:
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic c…
▽ More
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.
△ Less
Submitted 24 December, 2022; v1 submitted 16 December, 2022;
originally announced December 2022.
-
CoV-TI-Net: Transferred Initialization with Modified End Layer for COVID-19 Diagnosis
Authors:
Sadia Khanam,
Mohammad Reza Chalak Qazani,
Subrota Kumar Mondal,
H M Dipu Kabir,
Abadhan S. Sabyasachi,
Houshyar Asadi,
Keshav Kumar,
Farzin Tabarsinezhad,
Shady Mohamed,
Abbas Khorsavi,
Saeid Nahavandi
Abstract:
This paper proposes transferred initialization with modified fully connected layers for COVID-19 diagnosis. Convolutional neural networks (CNN) achieved a remarkable result in image classification. However, training a high-performing model is a very complicated and time-consuming process because of the complexity of image recognition applications. On the other hand, transfer learning is a relative…
▽ More
This paper proposes transferred initialization with modified fully connected layers for COVID-19 diagnosis. Convolutional neural networks (CNN) achieved a remarkable result in image classification. However, training a high-performing model is a very complicated and time-consuming process because of the complexity of image recognition applications. On the other hand, transfer learning is a relatively new learning method that has been employed in many sectors to achieve good performance with fewer computations. In this research, the PyTorch pre-trained models (VGG19\_bn and WideResNet -101) are applied in the MNIST dataset for the first time as initialization and with modified fully connected layers. The employed PyTorch pre-trained models were previously trained in ImageNet. The proposed model is developed and verified in the Kaggle notebook, and it reached the outstanding accuracy of 99.77% without taking a huge computational time during the training process of the network. We also applied the same methodology to the SIIM-FISABIO-RSNA COVID-19 Detection dataset and achieved 80.01% accuracy. In contrast, the previous methods need a huge compactional time during the training process to reach a high-performing model. Codes are available at the following link: github.com/dipuk0506/SpinalNet
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Multiple Object Tracking in Recent Times: A Literature Review
Authors:
Mk Bashar,
Samia Islam,
Kashifa Kawaakib Hussain,
Md. Bakhtiar Hasan,
A. B. M. Ashikur Rahman,
Md. Hasanul Kabir
Abstract:
Multiple object tracking gained a lot of interest from researchers in recent years, and it has become one of the trending problems in computer vision, especially with the recent advancement of autonomous driving. MOT is one of the critical vision tasks for different issues like occlusion in crowded scenes, similar appearance, small object detection difficulty, ID switching, etc. To tackle these ch…
▽ More
Multiple object tracking gained a lot of interest from researchers in recent years, and it has become one of the trending problems in computer vision, especially with the recent advancement of autonomous driving. MOT is one of the critical vision tasks for different issues like occlusion in crowded scenes, similar appearance, small object detection difficulty, ID switching, etc. To tackle these challenges, as researchers tried to utilize the attention mechanism of transformer, interrelation of tracklets with graph convolutional neural network, appearance similarity of objects in different frames with the siamese network, they also tried simple IOU matching based CNN network, motion prediction with LSTM. To take these scattered techniques under an umbrella, we have studied more than a hundred papers published over the last three years and have tried to extract the techniques that are more focused on by researchers in recent times to solve the problems of MOT. We have enlisted numerous applications, possibilities, and how MOT can be related to real life. Our review has tried to show the different perspectives of techniques that researchers used overtimes and give some future direction for the potential researchers. Moreover, we have included popular benchmark datasets and metrics in this review.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Two Decades of Bengali Handwritten Digit Recognition: A Survey
Authors:
A. B. M. Ashikur Rahman,
Md. Bakhtiar Hasan,
Sabbir Ahmed,
Tasnim Ahmed,
Md. Hamjajul Ashmafee,
Mohammad Ridwan Kabir,
Md. Hasanul Kabir
Abstract:
Handwritten Digit Recognition (HDR) is one of the most challenging tasks in the domain of Optical Character Recognition (OCR). Irrespective of language, there are some inherent challenges of HDR, which mostly arise due to the variations in writing styles across individuals, writing medium and environment, inability to maintain the same strokes while writing any digit repeatedly, etc. In addition t…
▽ More
Handwritten Digit Recognition (HDR) is one of the most challenging tasks in the domain of Optical Character Recognition (OCR). Irrespective of language, there are some inherent challenges of HDR, which mostly arise due to the variations in writing styles across individuals, writing medium and environment, inability to maintain the same strokes while writing any digit repeatedly, etc. In addition to that, the structural complexities of the digits of a particular language may lead to ambiguous scenarios of HDR. Over the years, researchers have developed numerous offline and online HDR pipelines, where different image processing techniques are combined with traditional Machine Learning (ML)-based and/or Deep Learning (DL)-based architectures. Although evidence of extensive review studies on HDR exists in the literature for languages, such as English, Arabic, Indian, Farsi, Chinese, etc., few surveys on Bengali HDR (BHDR) can be found, which lack a comprehensive analysis of the challenges, the underlying recognition process, and possible future directions. In this paper, the characteristics and inherent ambiguities of Bengali handwritten digits along with a comprehensive insight of two decades of state-of-the-art datasets and approaches towards offline BHDR have been analyzed. Furthermore, several real-life application-specific studies, which involve BHDR, have also been discussed in detail. This paper will also serve as a compendium for researchers interested in the science behind offline BHDR, instigating the exploration of newer avenues of relevant research that may further lead to better offline recognition of Bengali handwritten digits in different application areas.
△ Less
Submitted 25 September, 2022; v1 submitted 5 June, 2022;
originally announced June 2022.
-
HEATGait: Hop-Extracted Adjacency Technique in Graph Convolution based Gait Recognition
Authors:
Md. Bakhtiar Hasan,
Tasnim Ahmed,
Md. Hasanul Kabir
Abstract:
Biometric authentication using gait has become a promising field due to its unobtrusive nature. Recent approaches in model-based gait recognition techniques utilize spatio-temporal graphs for the elegant extraction of gait features. However, existing methods often rely on multi-scale operators for extracting long-range relationships among joints resulting in biased weighting. In this paper, we pre…
▽ More
Biometric authentication using gait has become a promising field due to its unobtrusive nature. Recent approaches in model-based gait recognition techniques utilize spatio-temporal graphs for the elegant extraction of gait features. However, existing methods often rely on multi-scale operators for extracting long-range relationships among joints resulting in biased weighting. In this paper, we present HEATGait, a gait recognition system that improves the existing multi-scale graph convolution by efficient hop-extraction technique to alleviate the issue. Combined with preprocessing and augmentation techniques, we propose a powerful feature extractor that utilizes ResGCN to achieve state-of-the-art performance in model-based gait recognition on the CASIA-B gait dataset.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
The state-of-the-art review on resource allocation problem using artificial intelligence methods on various computing paradigms
Authors:
Javad Hassannataj Joloudari,
Sanaz Mojrian,
Hamid Saadatfar,
Issa Nodehi,
Fatemeh Fazl,
Sahar Khanjani shirkharkolaie,
Roohallah Alizadehsani,
H M Dipu Kabir,
Ru-San Tan,
U Rajendra Acharya
Abstract:
With the increasing growth of information through smart devices, increasing the quality level of human life requires various computational paradigms presentation including the Internet of Things, fog, and cloud. Between these three paradigms, the cloud computing paradigm as an emerging technology adds cloud layer services to the edge of the network so that resource allocation operations occur clos…
▽ More
With the increasing growth of information through smart devices, increasing the quality level of human life requires various computational paradigms presentation including the Internet of Things, fog, and cloud. Between these three paradigms, the cloud computing paradigm as an emerging technology adds cloud layer services to the edge of the network so that resource allocation operations occur close to the end-user to reduce resource processing time and network traffic overhead. Hence, the resource allocation problem for its providers in terms of presenting a suitable platform, by using computational paradigms is considered a challenge. In general, resource allocation approaches are divided into two methods, including auction-based methods(goal, increase profits for service providers-increase user satisfaction and usability) and optimization-based methods(energy, cost, network exploitation, Runtime, reduction of time delay). In this paper, according to the latest scientific achievements, a comprehensive literature study (CLS) on artificial intelligence methods based on resource allocation optimization without considering auction-based methods in various computing environments are provided such as cloud computing, Vehicular Fog Computing, wireless, IoT, vehicular networks, 5G networks, vehicular cloud architecture,machine-to-machine communication(M2M),Train-to-Train(T2T) communication network, Peer-to-Peer(P2P) network. Since deep learning methods based on artificial intelligence are used as the most important methods in resource allocation problems; Therefore, in this paper, resource allocation approaches based on deep learning are also used in the mentioned computational environments such as deep reinforcement learning, Q-learning technique, reinforcement learning, online learning, and also Classical learning methods such as Bayesian learning, Cummins clustering, Markov decision process.
△ Less
Submitted 4 November, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Explainable Artificial Intelligence for Smart City Application: A Secure and Trusted Platform
Authors:
M. Humayn Kabir,
Khondokar Fida Hasan,
Mohammad Kamrul Hasan,
Keyvan Ansari
Abstract:
Artificial Intelligence (AI) is one of the disruptive technologies that is shaping the future. It has growing applications for data-driven decisions in major smart city solutions, including transportation, education, healthcare, public governance, and power systems. At the same time, it is gaining popularity in protecting critical cyber infrastructure from cyber threats, attacks, damages, or unaut…
▽ More
Artificial Intelligence (AI) is one of the disruptive technologies that is shaping the future. It has growing applications for data-driven decisions in major smart city solutions, including transportation, education, healthcare, public governance, and power systems. At the same time, it is gaining popularity in protecting critical cyber infrastructure from cyber threats, attacks, damages, or unauthorized access. However, one of the significant issues of those traditional AI technologies (e.g., deep learning) is that the rapid progress in complexity and sophistication propelled and turned out to be uninterpretable black boxes. On many occasions, it is very challenging to understand the decision and bias to control and trust systems' unexpected or seemingly unpredictable outputs. It is acknowledged that the loss of control over interpretability of decision-making becomes a critical issue for many data-driven automated applications. But how may it affect the system's security and trustworthiness? This chapter conducts a comprehensive study of machine learning applications in cybersecurity to indicate the need for explainability to address this question. While doing that, this chapter first discusses the black-box problems of AI technologies for Cybersecurity applications in smart city-based solutions. Later, considering the new technological paradigm, Explainable Artificial Intelligence (XAI), this chapter discusses the transition from black-box to white-box. This chapter also discusses the transition requirements concerning the interpretability, transparency, understandability, and Explainability of AI-based technologies in applying different autonomous systems in smart cities. Finally, it has presented some commercial XAI platforms that offer explainability over traditional AI technologies before presenting future challenges and opportunities.
△ Less
Submitted 31 October, 2021;
originally announced November 2021.
-
A Comprehensive Study on Torchvision Pre-trained Models for Fine-grained Inter-species Classification
Authors:
Feras Albardi,
H M Dipu Kabir,
Md Mahbub Islam Bhuiyan,
Parham M. Kebria,
Abbas Khosravi,
Saeid Nahavandi
Abstract:
This study aims to explore different pre-trained models offered in the Torchvision package which is available in the PyTorch library. And investigate their effectiveness on fine-grained images classification. Transfer Learning is an effective method of achieving extremely good performance with insufficient training data. In many real-world situations, people cannot collect sufficient data required…
▽ More
This study aims to explore different pre-trained models offered in the Torchvision package which is available in the PyTorch library. And investigate their effectiveness on fine-grained images classification. Transfer Learning is an effective method of achieving extremely good performance with insufficient training data. In many real-world situations, people cannot collect sufficient data required to train a deep neural network model efficiently. Transfer Learning models are pre-trained on a large data set, and can bring a good performance on smaller datasets with significantly lower training time. Torchvision package offers us many models to apply the Transfer Learning on smaller datasets. Therefore, researchers may need a guideline for the selection of a good model. We investigate Torchvision pre-trained models on four different data sets: 10 Monkey Species, 225 Bird Species, Fruits 360, and Oxford 102 Flowers. These data sets have images of different resolutions, class numbers, and different achievable accuracies. We also apply their usual fully-connected layer and the Spinal fully-connected layer to investigate the effectiveness of SpinalNet. The Spinal fully-connected layer brings better performance in most situations. We apply the same augmentation for different models for the same data set for a fair comparison. This paper may help future Computer Vision researchers in choosing a proper Transfer Learning model.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
Less is More: Lighter and Faster Deep Neural Architecture for Tomato Leaf Disease Classification
Authors:
Sabbir Ahmed,
Md. Bakhtiar Hasan,
Tasnim Ahmed,
Redwan Karim Sony,
Md. Hasanul Kabir
Abstract:
To ensure global food security and the overall profit of stakeholders, the importance of correctly detecting and classifying plant diseases is paramount. In this connection, the emergence of deep learning-based image classification has introduced a substantial number of solutions. However, the applicability of these solutions in low-end devices requires fast, accurate, and computationally inexpens…
▽ More
To ensure global food security and the overall profit of stakeholders, the importance of correctly detecting and classifying plant diseases is paramount. In this connection, the emergence of deep learning-based image classification has introduced a substantial number of solutions. However, the applicability of these solutions in low-end devices requires fast, accurate, and computationally inexpensive systems. This work proposes a lightweight transfer learning-based approach for detecting diseases from tomato leaves. It utilizes an effective preprocessing method to enhance the leaf images with illumination correction for improved classification. Our system extracts features using a combined model consisting of a pretrained MobileNetV2 architecture and a classifier network for effective prediction. Traditional augmentation approaches are replaced by runtime augmentation to avoid data leakage and address the class imbalance issue. Evaluation on tomato leaf images from the PlantVillage dataset shows that the proposed architecture achieves 99.30% accuracy with a model size of 9.60MB and 4.87M floating-point operations, making it a suitable choice for real-life applications in low-end devices. Our codes and models are available at https://github.com/redwankarimsony/project-tomato.
△ Less
Submitted 4 July, 2022; v1 submitted 6 September, 2021;
originally announced September 2021.
-
Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM
Authors:
Zahidul Islam,
Mohammad Rukonuzzaman,
Raiyan Ahmed,
Md. Hasanul Kabir,
Moshiur Farazi
Abstract:
Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one…
▽ More
Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one stream takes in background suppressed frames as inputs and other stream processes difference of adjacent frames. We employed simple and fast input pre-processing techniques that highlight the moving objects in the frames by suppressing non-moving backgrounds and capture the motion in-between frames. As violent actions are mostly characterized by body movements these inputs help produce discriminative features. SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution that enables producing robust long-range Spatio-temporal features while using substantially fewer parameters. We experimented with three fusion methods to combine the output feature maps of the two streams. Evaluation of the proposed methods was done on three standard public datasets. Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin while matching state-of-the-art results on the smaller datasets. Our experiments lead us to conclude, the proposed models are superior in terms of both computational efficiency and detection accuracy.
△ Less
Submitted 20 April, 2021; v1 submitted 21 February, 2021;
originally announced February 2021.
-
Improving Action Quality Assessment using Weighted Aggregation
Authors:
Shafkat Farabi,
Hasibul Himel,
Fakhruddin Gazzali,
Md. Bakhtiar Hasan,
Md. Hasanul Kabir,
Moshiur Farazi
Abstract:
Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA divide RGB videos into short clips, transform these clips to higher-level representations using Convolutional 3D (C3D) networks, and aggregate them through averaging. These higher-level repres…
▽ More
Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA divide RGB videos into short clips, transform these clips to higher-level representations using Convolutional 3D (C3D) networks, and aggregate them through averaging. These higher-level representations are used to perform AQA. We find that the current clip level feature aggregation technique of averaging is insufficient to capture the relative importance of clip level features. In this work, we propose a learning-based weighted-averaging technique. Using this technique, better performance can be obtained without sacrificing too much computational resources. We call this technique Weight-Decider(WD). We also experiment with ResNets for learning better representations for action quality assessment. We assess the effects of the depth and input clip size of the convolutional neural network on the quality of action score predictions. We achieve a new state-of-the-art Spearman's rank correlation of 0.9315 (an increase of 0.45%) on the MTL-AQA dataset using a 34 layer (2+1)D ResNet with the capability of processing 32 frame clips, with WD aggregation.
△ Less
Submitted 11 March, 2022; v1 submitted 21 February, 2021;
originally announced February 2021.
-
Efficient, Decentralized, and Collaborative Multi-Robot Exploration using Optimal Transport Theory
Authors:
Rabiul Hasan Kabir,
Kooktae Lee
Abstract:
An Optimal Transport (OT)-based decentralized collaborative multi-robot exploration strategy is proposed in this paper. This method is to achieve an efficient exploration with a predefined priority in the given domain. In this context, the efficiency indicates how a team of robots (agents) cover the domain reflecting the corresponding priority map (or degrees of importance) in the domain. The dece…
▽ More
An Optimal Transport (OT)-based decentralized collaborative multi-robot exploration strategy is proposed in this paper. This method is to achieve an efficient exploration with a predefined priority in the given domain. In this context, the efficiency indicates how a team of robots (agents) cover the domain reflecting the corresponding priority map (or degrees of importance) in the domain. The decentralized exploration implies that each agent carries out their exploration task independently in the absence of any supervisory agent/computer. When an agent encounters another agent within a communication range, each agent receives the information about which areas are already covered by other agents, yielding a collaborative exploration. The OT theory is employed to quantify the difference between the distribution formed by the robot trajectories and the given reference spatial distribution indicating the priority. A computationally feasible way is developed to measure the performance of the proposed exploration scheme. Further, the formal algorithm is provided for the efficient, decentralized, and collaborative exploration plan. Simulation results are presented to validate the proposed methods.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Ergodic Control Strategy for Multi-Agent Environment Exploration
Authors:
Rabiul Hasan Kabir,
Kooktae Lee,
Geronimo Macias
Abstract:
In this study, an ergodic environment exploration problem is introduced for a centralized multi-agent system. Given the reference distribution represented by the Mixture of Gaussian (MoG), the ergodicity is achieved when the time-averaged robot distribution is identical to the given reference distribution. The major challenge associated with this problem is to determine proper timing for a team of…
▽ More
In this study, an ergodic environment exploration problem is introduced for a centralized multi-agent system. Given the reference distribution represented by the Mixture of Gaussian (MoG), the ergodicity is achieved when the time-averaged robot distribution is identical to the given reference distribution. The major challenge associated with this problem is to determine proper timing for a team of agents (robots) to visit each Gaussian component in the reference MoG for ergodicity. The ergodic function is defined as a measure of ergodicity and the condition for convergence is derived based on timing analysis. The proposed control strategy provides relatively reasonable performance to achieve the ergodicity. We provide the formal algorithm for centralized multi-agent control to achieve the ergodicity and simulation results are presented for the validation of the proposed algorithm.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Efficient Multi-Robot Exploration with Energy Constraint based on Optimal Transport Theory
Authors:
Rabiul Hasan Kabir,
Kooktae Lee
Abstract:
This paper addresses an Optimal Transport (OT)-based efficient multi-robot exploration problem, considering the energy constraints of a multi-robot system. The efficiency in this problem implies how a team of robots (agents) covers a given domain, reflecting a priority of areas of interest represented by a density distribution, rather than simply following a preset of uniform patterns. To achieve…
▽ More
This paper addresses an Optimal Transport (OT)-based efficient multi-robot exploration problem, considering the energy constraints of a multi-robot system. The efficiency in this problem implies how a team of robots (agents) covers a given domain, reflecting a priority of areas of interest represented by a density distribution, rather than simply following a preset of uniform patterns. To achieve an efficient multi-robot exploration, the optimal transport theory that quantifies a distance between two density distributions is employed as a tool, which also serves as a means of performance measure. The energy constraints for the multi-robot system is then incorporated into the OT-based multi-robot exploration scheme.
The proposed scheme is decoupled from robot dynamics, broadening the applicability of the multi-robot exploration plan to heterogeneous robot platforms. Not only the centralized but also decentralized algorithms are provided to cope with more realistic scenarios such as communication range limits between agents. To measure the exploration efficiency, the upper bound of the performance is developed for both the centralized and decentralized cases based on the optimal transport theory, which is computationally tractable as well as efficient. The proposed multi-robot exploration scheme is also applicable to a time-varying distribution, where the spatio-temporal evolution of the given reference distribution is desired. To validate the proposed method, multiple simulation results are provided.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
SpinalNet: Deep Neural Network with Gradual Input
Authors:
H M Dipu Kabir,
Moloud Abdar,
Seyed Mohammad Jafar Jalali,
Abbas Khosravi,
Amir F Atiya,
Saeid Nahavandi,
Dipti Srinivasan
Abstract:
Deep neural networks (DNNs) have achieved the state of the art performance in numerous fields. However, DNNs need high computation times, and people always expect better performance in a lower computation. Therefore, we study the human somatosensory system and design a neural network (SpinalNet) to achieve higher accuracy with fewer computations. Hidden layers in traditional NNs receive inputs in…
▽ More
Deep neural networks (DNNs) have achieved the state of the art performance in numerous fields. However, DNNs need high computation times, and people always expect better performance in a lower computation. Therefore, we study the human somatosensory system and design a neural network (SpinalNet) to achieve higher accuracy with fewer computations. Hidden layers in traditional NNs receive inputs in the previous layer, apply activation function, and then transfer the outcomes to the next layer. In the proposed SpinalNet, each layer is split into three splits: 1) input split, 2) intermediate split, and 3) output split. Input split of each layer receives a part of the inputs. The intermediate split of each layer receives outputs of the intermediate split of the previous layer and outputs of the input split of the current layer. The number of incoming weights becomes significantly lower than traditional DNNs. The SpinalNet can also be used as the fully connected or classification layer of DNN and supports both traditional learning and transfer learning. We observe significant error reductions with lower computational costs in most of the DNNs. Traditional learning on the VGG-5 network with SpinalNet classification layers provided the state-of-the-art (SOTA) performance on QMNIST, Kuzushiji-MNIST, EMNIST (Letters, Digits, and Balanced) datasets. Traditional learning with ImageNet pre-trained initial weights and SpinalNet classification layers provided the SOTA performance on STL-10, Fruits 360, Bird225, and Caltech-101 datasets. The scripts of the proposed SpinalNet are available at the following link: https://github.com/dipuk0506/SpinalNet
△ Less
Submitted 7 January, 2022; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Enhancing Software Development Process Using Automated Adaptation of Object Ensembles
Authors:
Md. Emran,
Humaun Kabir,
Ziaur Rahman,
Nazrul Islam
Abstract:
Software development has been changing rapidly. This development process can be influenced through changing developer friendly approaches. We can save time consumption and accelerate the development process if we can automatically guide programmer during software development. There are some approaches that recommended relevant code snippets and APIitems to the developer. Some approaches apply gene…
▽ More
Software development has been changing rapidly. This development process can be influenced through changing developer friendly approaches. We can save time consumption and accelerate the development process if we can automatically guide programmer during software development. There are some approaches that recommended relevant code snippets and APIitems to the developer. Some approaches apply general code, searching techniques and some approaches use an online based repository mining strategies. But it gets quite difficult to help programmers when they need particular type conversion problems. More specifically when they want to adapt existing interfaces according to their expectation. One of the familiar triumph to guide developers in such situation is adapting collections and arrays through automated adaptation of object ensembles. But how does it help to a novice developer in real time software development that is not explicitly specified? In this paper, we have developed a system that works as a plugin-tool integrated with a particular Data Mining Integrated environment (DMIE) to recommend relevant interface while they seek for a type conversion situation. We have a mined repository of respective adapter classes and related APIs from where developer, search their query and get their result using the relevant transformer classes. The system that recommends developers titled automated objective ensembles (AOE plugin).From the investigation as we have ever made, we can see that our approach much better than some of the existing approaches.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Optimal Uncertainty-guided Neural Network Training
Authors:
H M Dipu Kabir,
Abbas Khosravi,
Abdollah Kavousi-Fard,
Saeid Nahavandi,
Dipti Srinivasan
Abstract:
The neural network (NN)-based direct uncertainty quantification (UQ) methods have achieved the state of the art performance since the first inauguration, known as the lower-upper-bound estimation (LUBE) method. However, currently-available cost functions for uncertainty guided NN training are not always converging and all converged NNs are not generating optimized prediction intervals (PIs). Moreo…
▽ More
The neural network (NN)-based direct uncertainty quantification (UQ) methods have achieved the state of the art performance since the first inauguration, known as the lower-upper-bound estimation (LUBE) method. However, currently-available cost functions for uncertainty guided NN training are not always converging and all converged NNs are not generating optimized prediction intervals (PIs). Moreover, several groups have proposed different quality criteria for PIs. These raise a question about their relative effectiveness. Most of the existing cost functions of uncertainty guided NN training are not customizable and the convergence of training is uncertain. Therefore, in this paper, we propose a highly customizable smooth cost function for developing NNs to construct optimal PIs. The optimized average width of PIs, PI-failure distances and the PI coverage probability (PICP) are computed for the test dataset. The performance of the proposed method is examined for the wind power generation and the electricity demand data. Results show that the proposed method reduces variation in the quality of PIs, accelerates the training, and improves convergence probability from 99.2% to 99.8%.
△ Less
Submitted 29 December, 2019;
originally announced December 2019.
-
Shared-memory Graph Truss Decomposition
Authors:
Humayun Kabir,
Kamesh Madduri
Abstract:
We present PKT, a new shared-memory parallel algorithm and OpenMP implementation for the truss decomposition of large sparse graphs. A k-truss is a dense subgraph definition that can be considered a relaxation of a clique. Truss decomposition refers to a partitioning of all the edges in the graph based on their k-truss membership. The truss decomposition of a graph has many applications. We show t…
▽ More
We present PKT, a new shared-memory parallel algorithm and OpenMP implementation for the truss decomposition of large sparse graphs. A k-truss is a dense subgraph definition that can be considered a relaxation of a clique. Truss decomposition refers to a partitioning of all the edges in the graph based on their k-truss membership. The truss decomposition of a graph has many applications. We show that our new approach PKT consistently outperforms other truss decomposition approaches for a collection of large sparse graphs and on a 24-core shared-memory server. PKT is based on a recently proposed algorithm for k-core decomposition.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
Human Computer Interaction Using Marker Based Hand Gesture Recognition
Authors:
Sayem Mohammad Siam,
Jahidul Adnan Sakel,
Md. Hasanul Kabir
Abstract:
Human Computer Interaction (HCI) has been redefined in this era. People want to interact with their devices in such a way that has physical significance in the real world, in other words, they want ergonomic input devices. In this paper, we propose a new method of interaction with computing devices having a consumer grade camera, that uses two colored markers (red and green) worn on tips of the fi…
▽ More
Human Computer Interaction (HCI) has been redefined in this era. People want to interact with their devices in such a way that has physical significance in the real world, in other words, they want ergonomic input devices. In this paper, we propose a new method of interaction with computing devices having a consumer grade camera, that uses two colored markers (red and green) worn on tips of the fingers to generate desired hand gestures, and for marker detection and tracking we used template matching with kalman filter. We have implemented all the usual system commands, i.e., cursor movement, right click, left click, double click, going forward and backward, zoom in and out through different hand gestures. Our system can easily recognize these gestures and give corresponding system commands. Our system is suitable for both desktop devices and devices where touch screen is not feasible like large screens or projected screens.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Enhanced Modulation Technique for Molecular Communication: OOMoSK
Authors:
Md. Humaun Kabir,
Kyung Sup Kwak
Abstract:
Molecular communication in nanonetworks is an emerging communication paradigm where molecules are used as information carriers. Concentration Shift Keying (CSK) and Molecule Shift Keying (MoSK) are being studied extensively for the short and medium range molecular nanonetworks. It is observed that MoSK outperforms CSK. However, MoSK requires different types of molecules for encoding which render t…
▽ More
Molecular communication in nanonetworks is an emerging communication paradigm where molecules are used as information carriers. Concentration Shift Keying (CSK) and Molecule Shift Keying (MoSK) are being studied extensively for the short and medium range molecular nanonetworks. It is observed that MoSK outperforms CSK. However, MoSK requires different types of molecules for encoding which render transmitter and receiver complexities. We propose a modulation scheme called On-Off MoSK (OOMoSK) in which, molecules are released for information bit 1 and no molecule is released for 0. The proposed scheme enjoys reduced number of the types of molecules for encoding. Numerical results show that the proposed scheme enhances channel capacity and Symbol Error Rate (SER).
△ Less
Submitted 11 November, 2014;
originally announced November 2014.
-
Design and implementation of a digital clock showing digits in Bangla font using microcontroller AT89C4051
Authors:
Nasif Muslim,
Md. Tanvir Adnan,
Mohammad Zahidul Kabir,
Md. Humayun Kabir,
Sheikh Mominul Islam
Abstract:
In this paper, a digital clock is designed where the microcontroller is used for timing controller and the font of the Bangla digits are designed, and programmed within the microcontroller. The design is cost effective, simple and easy for maintenance.
In this paper, a digital clock is designed where the microcontroller is used for timing controller and the font of the Bangla digits are designed, and programmed within the microcontroller. The design is cost effective, simple and easy for maintenance.
△ Less
Submitted 5 August, 2012;
originally announced August 2012.
-
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti STBC in Different Modulation Schemes
Authors:
Rifat Ara Shams,
M. Hasnat Kabir,
Sheikh Enayet Ullah
Abstract:
In this paper, the impact of Forward Error Correction (FEC) code namely Trellis code with interleaver on the performance of wavelet based MC-CDMA wireless communication system with the implementation of Alamouti antenna diversity scheme has been investigated in terms of Bit Error Rate (BER) as a function of Signal-to-Noise Ratio (SNR) per bit. Simulation of the system under proposed study has been…
▽ More
In this paper, the impact of Forward Error Correction (FEC) code namely Trellis code with interleaver on the performance of wavelet based MC-CDMA wireless communication system with the implementation of Alamouti antenna diversity scheme has been investigated in terms of Bit Error Rate (BER) as a function of Signal-to-Noise Ratio (SNR) per bit. Simulation of the system under proposed study has been done in M-ary modulation schemes (MPSK, MQAM and DPSK) over AWGN and Rayleigh fading channel incorporating Walsh Hadamard code as orthogonal spreading code to discriminate the message signal for individual user. It is observed via computer simulation that the performance of the interleaved coded based proposed system outperforms than that of the uncoded system in all modulation schemes over Rayleigh fading channel.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
WEP: An Energy Efficient Protocol for Cluster Based Heterogeneous Wireless Sensor Network
Authors:
Md. Golam Rashed,
M. Hasnat Kabir,
Shaikh Enayet Ullah
Abstract:
We develop an energy-efficient routing protocol in order to enhance the stability period of wireless sensor networks. This protocol is called weighted election protocol (WEP). It introduces a scheme to combine clustering strategy with chain routing algorithm for satisfy both energy and stable period constrains under heterogeneous environment in WSNs. Simulation results show that new one performs b…
▽ More
We develop an energy-efficient routing protocol in order to enhance the stability period of wireless sensor networks. This protocol is called weighted election protocol (WEP). It introduces a scheme to combine clustering strategy with chain routing algorithm for satisfy both energy and stable period constrains under heterogeneous environment in WSNs. Simulation results show that new one performs better than LEACH, SEP and HEARP in terms of stability period and network lifetime. It is also found that longer stability period strongly depend on higher values of extra energy during its heterogeneous settings.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
Cluster Based Hierarchical Routing Protocol for Wireless Sensor Network
Authors:
Md. Golam Rashed,
M. Hasnat Kabir,
Muhammad Sajjadur Rahim,
Shaikh Enayet Ullah
Abstract:
The efficient use of energy source in a sensor node is most desirable criteria for prolong the life time of wireless sensor network. In this paper, we propose a two layer hierarchical routing protocol called Cluster Based Hierarchical Routing Protocol (CBHRP). We introduce a new concept called head-set, consists of one active cluster head and some other associate cluster heads within a cluster. Th…
▽ More
The efficient use of energy source in a sensor node is most desirable criteria for prolong the life time of wireless sensor network. In this paper, we propose a two layer hierarchical routing protocol called Cluster Based Hierarchical Routing Protocol (CBHRP). We introduce a new concept called head-set, consists of one active cluster head and some other associate cluster heads within a cluster. The head-set members are responsible for control and management of the network. Results show that this protocol reduces energy consumption quite significantly and prolongs the life time of sensor network as compared to LEACH.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
Transmission of Voice Signal: BER Performance Analysis of Different FEC Schemes Based OFDM System over Various Channels
Authors:
Md. Golam Rashed,
M. Hasnat Kabir,
Md. Selim Reza,
Md. Matiqul Islam,
Rifat Ara Shams,
Saleh Masum,
Sheikh Enayet Ullah
Abstract:
In this paper, we investigate the impact of Forward Error Correction (FEC) codes namely Cyclic Redundancy Code and Convolution Code on the performance of OFDM wireless communication system for speech signal transmission over both AWGN and fading (Rayleigh and Rician) channels in term of Bit Error Probability. The simulation has been done in conjunction with QPSK digital modulation and compared wit…
▽ More
In this paper, we investigate the impact of Forward Error Correction (FEC) codes namely Cyclic Redundancy Code and Convolution Code on the performance of OFDM wireless communication system for speech signal transmission over both AWGN and fading (Rayleigh and Rician) channels in term of Bit Error Probability. The simulation has been done in conjunction with QPSK digital modulation and compared with uncoded resultstal modulation. In the fading channels, it is found via computer simulation that the performance of the Convolution interleaved based OFDM systems outperform than that of CRC interleaved OFDM system as well as uncoded OFDM channels.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
Performance Analysis of Wavelet Based MC-CDMA System with Implementation of Various Antenna Diversity Schemes
Authors:
Md. Matiqul Islam,
M. Hasnat Kabir,
Sk. Enayet Ullah
Abstract:
The impact of using wavelet based technique on the performance of a MC-CDMA wireless communication system has been investigated. The system under proposed study incorporates Walsh Hadamard codes to discriminate the message signal for individual user. A computer program written in Mathlab source code is developed and this simulation study is made with implementation of various antenna diversity sch…
▽ More
The impact of using wavelet based technique on the performance of a MC-CDMA wireless communication system has been investigated. The system under proposed study incorporates Walsh Hadamard codes to discriminate the message signal for individual user. A computer program written in Mathlab source code is developed and this simulation study is made with implementation of various antenna diversity schemes and fading (Rayleigh and Rician) channel. Computer simulation results demonstrate that the proposed wavelet based MC-CDMA system outperforms in Alamouti (two transmit antenna and one receive antenna) under AWGN and Rician channel.
△ Less
Submitted 16 July, 2012;
originally announced July 2012.
-
Impact of Different Spreading Codes Using FEC on DWT Based MC-CDMA System
Authors:
Saleh Masum,
M. Hasnat Kabir,
Md. Matiqul Islam,
Rifat Ara Shams,
Shaikh Enayet Ullah
Abstract:
The effect of different spreading codes in DWT based MC-CDMA wireless communication system is investigated. In this paper, we present the Bit Error Rate (BER) performance of different spreading codes (Walsh-Hadamard code, Orthogonal gold code and Golay complementary sequences) using Forward Error Correction (FEC) of the proposed system. The data is analyzed and is compared among different spreadin…
▽ More
The effect of different spreading codes in DWT based MC-CDMA wireless communication system is investigated. In this paper, we present the Bit Error Rate (BER) performance of different spreading codes (Walsh-Hadamard code, Orthogonal gold code and Golay complementary sequences) using Forward Error Correction (FEC) of the proposed system. The data is analyzed and is compared among different spreading codes in both coded and uncoded cases. It is found via computer simulation that the performance of the proposed coded system is much better than that of the uncoded system irrespective of the spreading codes and all the spreading codes show approximately similar nature for both coded and uncoded in all modulation schemes.
△ Less
Submitted 16 July, 2012;
originally announced July 2012.
-
CBHRP: A Cluster Based Routing Protocol for Wireless Sensor Network
Authors:
M. G. Rashed,
M. Hasnat Kabir,
M. Sajjadur Rahim,
Sk. Enayet Ullah
Abstract:
A new two layer hierarchical routing protocol called Cluster Based Hierarchical Routing Protocol (CBHRP) is proposed in this paper. It is an extension of LEACH routing protocol. We introduce cluster head-set idea for cluster-based routing where several clusters are formed with the deployed sensors to collect information from target field. On rotation basis, a head-set member receives data from the…
▽ More
A new two layer hierarchical routing protocol called Cluster Based Hierarchical Routing Protocol (CBHRP) is proposed in this paper. It is an extension of LEACH routing protocol. We introduce cluster head-set idea for cluster-based routing where several clusters are formed with the deployed sensors to collect information from target field. On rotation basis, a head-set member receives data from the neighbor nodes and transmits the aggregated results to the distance base station. This protocol reduces energy consumption quite significantly and prolongs the life time of sensor network. It is found that CBHRP performs better than other well accepted hierarchical routing protocols like LEACH in term of energy consumption and time requirement.
△ Less
Submitted 16 July, 2012;
originally announced July 2012.
-
Ber analysis of iterative turbo encoded miso wireless communication system under implementation of q-ostbc scheme
Authors:
M. Hasnat Kabir,
Shaikh Enayet Ullah,
Mustari Zaman,
Md. Golam Rashed
Abstract:
In this paper, a comprehensive study has been made to evaluate the performance of a MISO wireless communication system. The 4-by-1 spatially multiplexed Turbo encoded system under investigation incorporates Quasi-orthogonal space-time block coding (Q-STBC) and ML signal detection schemes under QPSK, QAM, 16PSK and 16QAM digital modulations. The simulation results elucidate that a significant impro…
▽ More
In this paper, a comprehensive study has been made to evaluate the performance of a MISO wireless communication system. The 4-by-1 spatially multiplexed Turbo encoded system under investigation incorporates Quasi-orthogonal space-time block coding (Q-STBC) and ML signal detection schemes under QPSK, QAM, 16PSK and 16QAM digital modulations. The simulation results elucidate that a significant improvement of system performance is achieved in QAM modulation. The results are also indicative of noticeable system performance enhancement with increasing number of iterations in Turbo encoding/decoding scheme.
△ Less
Submitted 8 March, 2012;
originally announced May 2012.
-
A Comprehensive Study and Performance Comparison of M-ary Modulation Schemes for an Efficient Wireless Mobile Communication System
Authors:
Md. Emdadul Haque,
Md. Golam Rashed,
M. Hasnat Kabir
Abstract:
Wireless communications has become one of the fastest growing areas in our modern life and creates enormous impact on nearly every feature of our daily life. In this paper, the performance of M-ary modulations schemes (MPSK, MQAM, MFSK) based wireless communication system on audio signal transmission over Additive Gaussian Noise (AWGN) channel are analyzed in terms of bit error probability as a fu…
▽ More
Wireless communications has become one of the fastest growing areas in our modern life and creates enormous impact on nearly every feature of our daily life. In this paper, the performance of M-ary modulations schemes (MPSK, MQAM, MFSK) based wireless communication system on audio signal transmission over Additive Gaussian Noise (AWGN) channel are analyzed in terms of bit error probability as a function of SNR. Based on the results obtained in the present study, MPSK and MQAM are showing better performance for lower modulation order whereas these are inferior with higher M. The BER value is smaller in MFSK for higher M, but it is worse due to the distortion in the reproduce signal at the receiver end. The lossless reproduction of recorded voice signal can be achieved at the receiver end with a lower modulation order.
△ Less
Submitted 8 March, 2012;
originally announced March 2012.
-
Performance Analysis of Two-Hop Cooperative MIMO transmission with Relay Selection in Rayleigh Fading Channel
Authors:
Ahasanun Nessa,
Qinghai Yang,
Sana Ullah,
Humaun Kabir,
Kyung Sup Kwak
Abstract:
Wireless relaying is one of the promising solutions to overcome the channel impairments and provide high data rate coverage that appears for beyond 3G mobile communications. In this paper we present an end to end BER performance analysis of dual hop wireless communication systems equipped with multiple decode and forward relays over the Rayleigh fading channel with relay selection. We select the…
▽ More
Wireless relaying is one of the promising solutions to overcome the channel impairments and provide high data rate coverage that appears for beyond 3G mobile communications. In this paper we present an end to end BER performance analysis of dual hop wireless communication systems equipped with multiple decode and forward relays over the Rayleigh fading channel with relay selection. We select the best relay based on end to end channel conditions. We apply orthogonal space time block coding (OSTBC) at source, and also present how the multiple antennas at the source terminal affects the end to end BER performance. This intermediate relay technique will cover long distance where destination is out of reach from source.
△ Less
Submitted 8 November, 2009;
originally announced November 2009.