-
Hybrid Approach for Electricity Price Forecasting using AlexNet and LSTM
Authors:
Bosubabu Sambana,
Kotamsetty Geethika Devi,
Bandi Rajeswara Reddy,
Galeti Mohammad Hussain,
Gownivalla Siddartha
Abstract:
The recent development of advanced machine learning methods for hybrid models has greatly addressed the need for the correct prediction of electrical prices. This method combines AlexNet and LSTM algorithms, which are used to introduce a new model with higher accuracy in price forecasting. Despite RNN and ANN being effective, they often fail to deal with forex time sequence data. The traditional m…
▽ More
The recent development of advanced machine learning methods for hybrid models has greatly addressed the need for the correct prediction of electrical prices. This method combines AlexNet and LSTM algorithms, which are used to introduce a new model with higher accuracy in price forecasting. Despite RNN and ANN being effective, they often fail to deal with forex time sequence data. The traditional methods do not accurately forecast the prices. These traditional methods only focus on demand and price which leads to insufficient analysis of data. To address this issue, using the hybrid approach, which focuses on external variables that also effect the predicted prices. Nevertheless, due to AlexNet's excellent feature extraction and LSTM's learning sequential patterns, the prediction accuracy is vastly increased. The model is built on the past data, which has been supplied with the most significant elements like demand, temperature, sunlight, and rain. For example, the model applies methods, such as minimum-maximum scaling and a time window, to predict the electricity prices of the future. The results show that this hybrid model is good than the standalone ones in terms of accuracy. Although we got our accuracy rating of 97.08, it shows higher accompaniments than remaining models RNN and ANN with accuracies of 96.64 and 96.63 respectively.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
A Keyword-Based Technique to Evaluate Broad Question Answer Script
Authors:
Tamim Al Mahmud,
Md Gulzar Hussain,
Sumaiya Kabir,
Hasnain Ahmad,
Mahmudus Sobhan
Abstract:
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This…
▽ More
Evaluation is the method of assessing and determining the educational system through various techniques such as verbal or viva-voice test, subjective or objective written test. This paper presents an efficient solution to evaluate the subjective answer script electronically. In this paper, we proposed and implemented an integrated system that examines and evaluates the written answer script. This article focuses on finding the keywords from the answer script and then compares them with the keywords that have been parsed from both open and closed domain. The system also checks the grammatical and spelling errors in the answer script. Our proposed system tested with answer scripts of 100 students and gives precision score 0.91.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
AttentionGuard: Transformer-based Misbehavior Detection for Secure Vehicular Platoons
Authors:
Hexu Li,
Konstantinos Kalogiannis,
Ahmed Mohamed Hussain,
Panos Papadimitratos
Abstract:
Vehicle platooning, with vehicles traveling in close formation coordinated through Vehicle-to-Everything (V2X) communications, offers significant benefits in fuel efficiency and road utilization. However, it is vulnerable to sophisticated falsification attacks by authenticated insiders that can destabilize the formation and potentially cause catastrophic collisions. This paper addresses this chall…
▽ More
Vehicle platooning, with vehicles traveling in close formation coordinated through Vehicle-to-Everything (V2X) communications, offers significant benefits in fuel efficiency and road utilization. However, it is vulnerable to sophisticated falsification attacks by authenticated insiders that can destabilize the formation and potentially cause catastrophic collisions. This paper addresses this challenge: misbehavior detection in vehicle platooning systems. We present AttentionGuard, a transformer-based framework for misbehavior detection that leverages the self-attention mechanism to identify anomalous patterns in mobility data. Our proposal employs a multi-head transformer-encoder to process sequential kinematic information, enabling effective differentiation between normal mobility patterns and falsification attacks across diverse platooning scenarios, including steady-state (no-maneuver) operation, join, and exit maneuvers. Our evaluation uses an extensive simulation dataset featuring various attack vectors (constant, gradual, and combined falsifications) and operational parameters (controller types, vehicle speeds, and attacker positions). Experimental results demonstrate that AttentionGuard achieves up to 0.95 F1-score in attack detection, with robust performance maintained during complex maneuvers. Notably, our system performs effectively with minimal latency (100ms decision intervals), making it suitable for real-time transportation safety applications. Comparative analysis reveals superior detection capabilities and establishes the transformer-encoder as a promising approach for securing Cooperative Intelligent Transport Systems (C-ITS) against sophisticated insider threats.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
Authors:
Rahima Khanam,
Muhammad Hussain
Abstract:
The YOLO (You Only Look Once) series has been a leading framework in real-time object detection, consistently improving the balance between speed and accuracy. However, integrating attention mechanisms into YOLO has been challenging due to their high computational overhead. YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time pe…
▽ More
The YOLO (You Only Look Once) series has been a leading framework in real-time object detection, consistently improving the balance between speed and accuracy. However, integrating attention mechanisms into YOLO has been challenging due to their high computational overhead. YOLOv12 introduces a novel approach that successfully incorporates attention-based enhancements while preserving real-time performance. This paper provides a comprehensive review of YOLOv12's architectural innovations, including Area Attention for computationally efficient self-attention, Residual Efficient Layer Aggregation Networks for improved feature aggregation, and FlashAttention for optimized memory access. Additionally, we benchmark YOLOv12 against prior YOLO versions and competing object detectors, analyzing its improvements in accuracy, inference speed, and computational efficiency. Through this analysis, we demonstrate how YOLOv12 advances real-time object detection by refining the latency-accuracy trade-off and optimizing computational resources.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Privacy-Preserving Secure Neighbor Discovery for Wireless Networks
Authors:
Ahmed Mohamed Hussain,
Panos Papadimitratos
Abstract:
Traditional Neighbor Discovery (ND) and Secure Neighbor Discovery (SND) are key elements for network functionality. SND is a hard problem, satisfying not only typical security properties (authentication, integrity) but also verification of direct communication, which involves distance estimation based on time measurements and device coordinates. Defeating relay attacks, also known as "wormholes",…
▽ More
Traditional Neighbor Discovery (ND) and Secure Neighbor Discovery (SND) are key elements for network functionality. SND is a hard problem, satisfying not only typical security properties (authentication, integrity) but also verification of direct communication, which involves distance estimation based on time measurements and device coordinates. Defeating relay attacks, also known as "wormholes", leading to stealthy Byzantine links and significant degradation of communication and adversarial control, is key in many wireless networked systems. However, SND is not concerned with privacy; it necessitates revealing the identity and location of the device(s) participating in the protocol execution. This can be a deterrent for deployment, especially involving user-held devices in the emerging Internet of Things (IoT) enabled smart environments. To address this challenge, we present a novel Privacy-Preserving Secure Neighbor Discovery (PP-SND) protocol, enabling devices to perform SND without revealing their actual identities and locations, effectively decoupling discovery from the exposure of sensitive information. We use Homomorphic Encryption (HE) for computing device distances without revealing their actual coordinates, as well as employing a pseudonymous device authentication to hide identities while preserving communication integrity. PP-SND provides SND [1] along with pseudonymity, confidentiality, and unlinkability. Our presentation here is not specific to one wireless technology, and we assess the performance of the protocols (cryptographic overhead) on a Raspberry Pi 4 and provide a security and privacy analysis.
△ Less
Submitted 31 March, 2025; v1 submitted 28 March, 2025;
originally announced March 2025.
-
LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
Authors:
Saeif Alhazbi,
Ahmed Mohamed Hussain,
Gabriele Oligeri,
Panos Papadimitratos
Abstract:
As Large Language Models (LLMs) become increasingly integrated into many technological ecosystems across various domains and industries, identifying which model is deployed or being interacted with is critical for the security and trustworthiness of the systems. Current verification methods typically rely on analyzing the generated output to determine the source model. However, these techniques ar…
▽ More
As Large Language Models (LLMs) become increasingly integrated into many technological ecosystems across various domains and industries, identifying which model is deployed or being interacted with is critical for the security and trustworthiness of the systems. Current verification methods typically rely on analyzing the generated output to determine the source model. However, these techniques are susceptible to adversarial attacks, operate in a post-hoc manner, and may require access to the model weights to inject a verifiable fingerprint. In this paper, we propose a novel passive and non-invasive fingerprinting technique that operates in real-time and remains effective even under encrypted network traffic conditions. Our method leverages the intrinsic autoregressive generation nature of language models, which generate text one token at a time based on all previously generated tokens, creating a unique temporal pattern like a rhythm or heartbeat that persists even when the output is streamed over a network. We find that measuring the Inter-Token Times (ITTs)-time intervals between consecutive tokens-can identify different language models with high accuracy. We develop a Deep Learning (DL) pipeline to capture these timing patterns using network traffic analysis and evaluate it on 16 Small Language Models (SLMs) and 10 proprietary LLMs across different deployment scenarios, including local host machine (GPU/CPU), Local Area Network (LAN), Remote Network, and Virtual Private Network (VPN). The experimental results confirm that our proposed technique is effective and maintains high accuracy even when tested in different network conditions. This work opens a new avenue for model identification in real-world scenarios and contributes to more secure and trustworthy language model deployment.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
YOLOv12: A Breakdown of the Key Architectural Features
Authors:
Mujadded Al Rabbani Alif,
Muhammad Hussain
Abstract:
This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and ro…
▽ More
This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models
Authors:
Johan Wahréus,
Ahmed Mohamed Hussain,
Panos Papadimitratos
Abstract:
Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, p…
▽ More
Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, particularly in specific domains, notably cybersecurity. To address this issue, we present and publicly release CySecBench, a comprehensive dataset containing 12662 prompts specifically designed to evaluate jailbreaking techniques in the cybersecurity domain. The dataset is organized into 10 distinct attack-type categories, featuring close-ended prompts to enable a more consistent and accurate assessment of jailbreaking attempts. Furthermore, we detail our methodology for dataset generation and filtration, which can be adapted to create similar datasets in other domains. To demonstrate the utility of CySecBench, we propose and evaluate a jailbreaking approach based on prompt obfuscation. Our experimental results show that this method successfully elicits harmful content from commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a jailbreaking SR of 17%. Compared to existing benchmark approaches, our method shows superior performance, highlighting the value of domain-specific evaluation datasets for assessing LLM security measures. Moreover, when evaluated using prompts from a widely used dataset (i.e., AdvBench), it achieved an SR of 78.5%, higher than the state-of-the-art methods.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Polymer/paper-based double touch mode capacitive pressure sensing element for wireless control of robotic arm
Authors:
Rishabh B. Mishra,
Wedyan Babatain,
Nazek El-Atab,
Aftab M. Hussain,
Muhammad M. Hussain
Abstract:
In this work, a large area, low cost and flexible polymer/paper-based double touch mode capacitive pressure sensor is demonstrated. Garage fabrication processes are used which only require cutting, taping and assembly of aluminum (Al) coated polyimide (PI) foil, PI tape and double-sided scotch tape. The presented pressure sensor operates in different pressure regions i.e. normal (0 to 7.5 kPa), tr…
▽ More
In this work, a large area, low cost and flexible polymer/paper-based double touch mode capacitive pressure sensor is demonstrated. Garage fabrication processes are used which only require cutting, taping and assembly of aluminum (Al) coated polyimide (PI) foil, PI tape and double-sided scotch tape. The presented pressure sensor operates in different pressure regions i.e. normal (0 to 7.5 kPa), transition (7.5 to 14.24 kPa), linear (14.24 to 54.9 kPa) and saturation (above 54.9 kPa). The advantages of the demonstrated double touch mode capacitive pressure sensors are low temperature drift, long linear range, high pressure sensitivity, precise pressure measurement and large die area. The linear output along with a high sensitivity range (14.24 to 54.9 kPa pressure range) of the sensor are utilized to wirelessly control the movement of a robotic arm with precise rotation and tilt movement capabilities.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Low-cost foil/paper based touch mode pressure sensing element as artificial skin module for prosthetic hand
Authors:
Rishabh B. Mishra,
Sherjeel M. Khan,
Sohail F. Shaikh,
Aftab M. Hussain,
Muhammad M. Hussain
Abstract:
Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensin…
▽ More
Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensing element is utilized to design large-area electronics skin for low-cost prosthetic hands. The presented sensor is characterized in normal, transition, touch and saturation modes. The sensor has higher sensitivity and linearity in touch mode operation from 10 to 40 kPa of applied pressure compared to the normal (0 to 8 kPa), transition (8 to 10 kPa) and saturation mode (after 40 kPa) with response time of 15.85 ms. Advantages of the presented sensor are higher sensitivity, linear response, less diaphragm area, less von Mises stress at the clamped edges region, low temperature drift, robust structure and less separation gap for large pressure measurement compared to normal mode capacitive pressure sensors. The linear range of pressure change is utilized for controlling the position of a servo motor for precise movement in robotic arm using wireless communication, which can be utilized for designing skin-like structure for low-cost prosthetic hands.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Edge AI-based Radio Frequency Fingerprinting for IoT Networks
Authors:
Ahmed Mohamed Hussain,
Nada Abughanam,
Panos Papadimitratos
Abstract:
The deployment of the Internet of Things (IoT) in smart cities and critical infrastructure has enhanced connectivity and real-time data exchange but introduced significant security challenges. While effective, cryptography can often be resource-intensive for small-footprint resource-constrained (i.e., IoT) devices. Radio Frequency Fingerprinting (RFF) offers a promising authentication alternative…
▽ More
The deployment of the Internet of Things (IoT) in smart cities and critical infrastructure has enhanced connectivity and real-time data exchange but introduced significant security challenges. While effective, cryptography can often be resource-intensive for small-footprint resource-constrained (i.e., IoT) devices. Radio Frequency Fingerprinting (RFF) offers a promising authentication alternative by using unique RF signal characteristics for device identification at the Physical (PHY)-layer, without resorting to cryptographic solutions. The challenge is two-fold: how to deploy such RFF in a large scale and for resource-constrained environments. Edge computing, processing data closer to its source, i.e., the wireless device, enables faster decision-making, reducing reliance on centralized cloud servers. Considering a modest edge device, we introduce two truly lightweight Edge AI-based RFF schemes tailored for resource-constrained devices. We implement two Deep Learning models, namely a Convolution Neural Network and a Transformer-Encoder, to extract complex features from the IQ samples, forming device-specific RF fingerprints. We convert the models to TensorFlow Lite and evaluate them on a Raspberry Pi, demonstrating the practicality of Edge deployment. Evaluations demonstrate the Transformer-Encoder outperforms the CNN in identifying unique transmitter features, achieving high accuracy (> 0.95) and ROC-AUC scores (> 0.90) while maintaining a compact model size of 73KB, appropriate for resource-constrained devices.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Ultrasound-Based AI for COVID-19 Detection: A Comprehensive Review of Public and Private Lung Ultrasound Datasets and Studies
Authors:
Abrar Morshed,
Abdulla Al Shihab,
Md Abrar Jahin,
Md Jaber Al Nahian,
Md Murad Hossain Sarker,
Md Sharjis Ibne Wadud,
Mohammad Istiaq Uddin,
Muntequa Imtiaz Siraji,
Nafisa Anjum,
Sumiya Rajjab Shristy,
Tanvin Rahman,
Mahmuda Khatun,
Md Rubel Dewan,
Mosaddeq Hossain,
Razia Sultana,
Ripel Chakma,
Sonet Barua Emon,
Towhidul Islam,
Mohammad Arafat Hussain
Abstract:
The COVID-19 pandemic has affected millions of people globally, with respiratory organs being strongly affected in individuals with comorbidities. Medical imaging-based diagnosis and prognosis have become increasingly popular in clinical settings for detecting COVID-19 lung infections. Among various medical imaging modalities, ultrasound stands out as a low-cost, mobile, and radiation-safe imaging…
▽ More
The COVID-19 pandemic has affected millions of people globally, with respiratory organs being strongly affected in individuals with comorbidities. Medical imaging-based diagnosis and prognosis have become increasingly popular in clinical settings for detecting COVID-19 lung infections. Among various medical imaging modalities, ultrasound stands out as a low-cost, mobile, and radiation-safe imaging technology. In this comprehensive review, we focus on AI-driven studies utilizing lung ultrasound (LUS) for COVID-19 detection and analysis. We provide a detailed overview of both publicly available and private LUS datasets and categorize the AI studies according to the dataset they used. Additionally, we systematically analyzed and tabulated the studies across various dimensions, including data preprocessing methods, AI models, cross-validation techniques, and evaluation metrics. In total, we reviewed 60 articles, 41 of which utilized public datasets, while the remaining employed private data. Our findings suggest that ultrasound-based AI studies for COVID-19 detection have great potential for clinical use, especially for children and pregnant women. Our review also provides a useful summary for future researchers and clinicians who may be interested in the field.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Foundation AI Model for Medical Image Segmentation
Authors:
Rina Bao,
Erfan Darzi,
Sheng He,
Chuan-Heng Hsiao,
Mohammad Arafat Hussain,
Jingpeng Li,
Atle Bjornerud,
Ellen Grant,
Yangming Ou
Abstract:
Foundation models refer to artificial intelligence (AI) models that are trained on massive amounts of data and demonstrate broad generalizability across various tasks with high accuracy. These models offer versatile, one-for-many or one-for-all solutions, eliminating the need for developing task-specific AI models. Examples of such foundation models include the Chat Generative Pre-trained Transfor…
▽ More
Foundation models refer to artificial intelligence (AI) models that are trained on massive amounts of data and demonstrate broad generalizability across various tasks with high accuracy. These models offer versatile, one-for-many or one-for-all solutions, eliminating the need for developing task-specific AI models. Examples of such foundation models include the Chat Generative Pre-trained Transformer (ChatGPT) and the Segment Anything Model (SAM). These models have been trained on millions to billions of samples and have shown wide-ranging and accurate applications in numerous tasks such as text processing (using ChatGPT) and natural image segmentation (using SAM). In medical image segmentation - finding target regions in medical images - there is a growing need for these one-for-many or one-for-all foundation models. Such models could obviate the need to develop thousands of task-specific AI models, which is currently standard practice in the field. They can also be adapted to tasks with datasets too small for effective training. We discuss two paths to achieve foundation models for medical image segmentation and comment on progress, challenges, and opportunities. One path is to adapt or fine-tune existing models, originally developed for natural images, for use with medical images. The second path entails building models from scratch, exclusively training on medical images.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
YOLOv11: An Overview of the Key Architectural Enhancements
Authors:
Rahima Khanam,
Muhammad Hussain
Abstract:
This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, wh…
▽ More
This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, which contribute in improving the models performance in several ways such as enhanced feature extraction. The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB). We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Additionally, the study discusses YOLOv11's versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
3D Multi-Object Tracking Employing MS-GLMB Filter for Autonomous Driving
Authors:
Linh Van Ma,
Muhammad Ishfaq Hussain,
Kin-Choong Yow,
Moongu Jeon
Abstract:
The MS-GLMB filter offers a robust framework for tracking multiple objects through the use of multi-sensor data. Building on this, the MV-GLMB and MV-GLMB-AB filters enhance the MS-GLMB capabilities by employing cameras for 3D multi-sensor multi-object tracking, effectively addressing occlusions. However, both filters depend on overlapping fields of view from the cameras to combine complementary i…
▽ More
The MS-GLMB filter offers a robust framework for tracking multiple objects through the use of multi-sensor data. Building on this, the MV-GLMB and MV-GLMB-AB filters enhance the MS-GLMB capabilities by employing cameras for 3D multi-sensor multi-object tracking, effectively addressing occlusions. However, both filters depend on overlapping fields of view from the cameras to combine complementary information. In this paper, we introduce an improved approach that integrates an additional sensor, such as LiDAR, into the MS-GLMB framework for 3D multi-object tracking. Specifically, we present a new LiDAR measurement model, along with a multi-camera and LiDAR multi-object measurement model. Our experimental results demonstrate a significant improvement in tracking performance compared to existing MS-GLMB-based methods. Importantly, our method eliminates the need for overlapping fields of view, broadening the applicability of the MS-GLMB filter. Our source code for nuScenes dataset is available at https://github.com/linh-gist/ms-glmb-nuScenes.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions
Authors:
Per Niklas Waaler,
Musarrat Hussain,
Igor Molchanov,
Lars Ailo Bongo,
Brita Elvevåg
Abstract:
Patients with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition. These individuals could benefit greatly from education platforms that leverage the adaptability of Large Language Models (LLMs) such as GPT-4. While LLMs have the potential to make topical mental health information more accessible and engaging, their black-box nature r…
▽ More
Patients with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition. These individuals could benefit greatly from education platforms that leverage the adaptability of Large Language Models (LLMs) such as GPT-4. While LLMs have the potential to make topical mental health information more accessible and engaging, their black-box nature raises concerns about ethics and safety. Prompting offers a way to produce semi-scripted chatbots with responses anchored in instructions and validated information, but prompt-engineered chatbots may drift from their intended identity as the conversation progresses. We propose a Critical Analysis Filter for achieving better control over chatbot behavior. In this system, a team of prompted LLM agents are prompt-engineered to critically analyze and refine the chatbot's response and deliver real-time feedback to the chatbot. To test this approach, we develop an informational schizophrenia chatbot and converse with it (with the filter deactivated) until it oversteps its scope. Once drift has been observed, AI-agents are used to automatically generate sample conversations in which the chatbot is being enticed to talk about out-of-bounds topics. We manually assign to each response a compliance score that quantifies the chatbot's compliance to its instructions; specifically the rules about accurately conveying sources and being transparent about limitations. Activating the Critical Analysis Filter resulted in an acceptable compliance score (>=2) in 67.0% of responses, compared to only 8.7% when the filter was deactivated. These results suggest that a self-reflection layer could enable LLMs to be used effectively and safely in mental health platforms, maintaining adaptability while reliably limiting their scope to appropriate use cases.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Authors:
Gracile Astlin Pereira,
Muhammad Hussain
Abstract:
Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range dependencies and contextual information, offer a promising alternative to traditional convolutional neural networks (CNNs) in computer vision. In this review paper,…
▽ More
Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range dependencies and contextual information, offer a promising alternative to traditional convolutional neural networks (CNNs) in computer vision. In this review paper, we provide an extensive overview of various transformer architectures adapted for computer vision tasks. We delve into how these models capture global context and spatial relationships in images, empowering them to excel in tasks such as image classification, object detection, and segmentation. Analyzing the key components, training methodologies, and performance metrics of transformer-based models, we highlight their strengths, limitations, and recent advancements. Additionally, we discuss potential research directions and applications of transformer-based models in computer vision, offering insights into their implications for future advancements in the field.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
Authors:
Isuru Ranawaka,
Md Taufique Hussain,
Charles Block,
Gerasimos Gerogiannis,
Josep Torrellas,
Ariful Azad
Abstract:
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal per…
▽ More
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TS-SpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Parallel Algorithms for Median Consensus Clustering in Complex Networks
Authors:
Md Taufique Hussain,
Mahantesh Halappanavar,
Samrat Chatterjee,
Filippo Radicchi,
Santo Fortunato,
Ariful Azad
Abstract:
We develop an algorithm that finds the consensus of many different clustering solutions of a graph. We formulate the problem as a median set partitioning problem and propose a greedy optimization technique. Unlike other approaches that find median set partitions, our algorithm takes graph structure into account and finds a comparable quality solution much faster than the other approaches. For grap…
▽ More
We develop an algorithm that finds the consensus of many different clustering solutions of a graph. We formulate the problem as a median set partitioning problem and propose a greedy optimization technique. Unlike other approaches that find median set partitions, our algorithm takes graph structure into account and finds a comparable quality solution much faster than the other approaches. For graphs with known communities, our consensus partition captures the actual community structure more accurately than alternative approaches. To make it applicable to large graphs, we remove sequential dependencies from our algorithm and design a parallel algorithm. Our parallel algorithm achieves 35x speedup when utilizing 64 processing cores for large real-world graphs from single-cell experiments.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model
Authors:
Athulya Sundaresan Geetha,
Muhammad Hussain
Abstract:
The Segment Anything Model (SAM), introduced to the computer vision community by Meta in April 2023, is a groundbreaking tool that allows automated segmentation of objects in images based on prompts such as text, clicks, or bounding boxes. SAM excels in zero-shot performance, segmenting unseen objects without additional training, stimulated by a large dataset of over one billion image masks. SAM 2…
▽ More
The Segment Anything Model (SAM), introduced to the computer vision community by Meta in April 2023, is a groundbreaking tool that allows automated segmentation of objects in images based on prompts such as text, clicks, or bounding boxes. SAM excels in zero-shot performance, segmenting unseen objects without additional training, stimulated by a large dataset of over one billion image masks. SAM 2 expands this functionality to video, leveraging memory from preceding and subsequent frames to generate accurate segmentation across entire videos, enabling near real-time performance. This comparison shows how SAM has evolved to meet the growing need for precise and efficient segmentation in various applications. The study suggests that future advancements in models like SAM will be crucial for improving computer vision technology.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
What is YOLOv5: A deep look into the internal features of the popular object detector
Authors:
Rahima Khanam,
Muhammad Hussain
Abstract:
This study presents a comprehensive analysis of the YOLOv5 object detection model, examining its architecture, training methodologies, and performance. Key components, including the Cross Stage Partial backbone and Path Aggregation-Network, are explored in detail. The paper reviews the model's performance across various metrics and hardware platforms. Additionally, the study discusses the transiti…
▽ More
This study presents a comprehensive analysis of the YOLOv5 object detection model, examining its architecture, training methodologies, and performance. Key components, including the Cross Stage Partial backbone and Path Aggregation-Network, are explored in detail. The paper reviews the model's performance across various metrics and hardware platforms. Additionally, the study discusses the transition from Darknet to PyTorch and its impact on model development. Overall, this research provides insights into YOLOv5's capabilities and its position within the broader landscape of object detection and why it is a popular choice for constrained edge deployment scenarios.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
A Comparative Analysis of YOLOv5, YOLOv8, and YOLOv10 in Kitchen Safety
Authors:
Athulya Sundaresan Geetha,
Muhammad Hussain
Abstract:
Knife safety in the kitchen is essential for preventing accidents or injuries with an emphasis on proper handling, maintenance, and storage methods. This research presents a comparative analysis of three YOLO models, YOLOv5, YOLOv8, and YOLOv10, to detect the hazards involved in handling knife, concentrating mainly on ensuring fingers are curled while holding items to be cut and that hands should…
▽ More
Knife safety in the kitchen is essential for preventing accidents or injuries with an emphasis on proper handling, maintenance, and storage methods. This research presents a comparative analysis of three YOLO models, YOLOv5, YOLOv8, and YOLOv10, to detect the hazards involved in handling knife, concentrating mainly on ensuring fingers are curled while holding items to be cut and that hands should only be in contact with knife handle avoiding the blade. Precision, recall, F-score, and normalized confusion matrix are used to evaluate the performance of the models. The results indicate that YOLOv5 performed better than the other two models in identifying the hazard of ensuring hands only touch the blade, while YOLOv8 excelled in detecting the hazard of curled fingers while holding items. YOLOv5 and YOLOv8 performed almost identically in recognizing classes such as hand, knife, and vegetable, whereas YOLOv5, YOLOv8, and YOLOv10 accurately identified the cutting board. This paper provides insights into the advantages and shortcomings of these models in real-world settings. Moreover, by detailing the optimization of YOLO architectures for safe knife handling, this study promotes the development of increased accuracy and efficiency in safety surveillance systems.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision
Authors:
Muhammad Hussain
Abstract:
This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing s…
▽ More
This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing speed and accuracy. YOLOv8 built upon this foundation with enhanced feature extraction and anchor-free detection, improving versatility and performance. YOLOv10 represents a leap forward with NMS-free training, spatial-channel decoupled downsampling, and large-kernel convolutions, achieving state-of-the-art performance with reduced computational overhead. Our findings highlight the progressive enhancements in accuracy, efficiency, and real-time performance, particularly emphasizing their applicability in resource-constrained environments. This review provides insights into the trade-offs between model complexity and detection accuracy, offering guidance for selecting the most appropriate YOLO version for specific edge computing applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain
Authors:
Mujadded Al Rabbani Alif,
Muhammad Hussain
Abstract:
This survey investigates the transformative potential of various YOLO variants, from YOLOv1 to the state-of-the-art YOLOv10, in the context of agricultural advancements. The primary objective is to elucidate how these cutting-edge object detection models can re-energise and optimize diverse aspects of agriculture, ranging from crop monitoring to livestock management. It aims to achieve key objecti…
▽ More
This survey investigates the transformative potential of various YOLO variants, from YOLOv1 to the state-of-the-art YOLOv10, in the context of agricultural advancements. The primary objective is to elucidate how these cutting-edge object detection models can re-energise and optimize diverse aspects of agriculture, ranging from crop monitoring to livestock management. It aims to achieve key objectives, including the identification of contemporary challenges in agriculture, a detailed assessment of YOLO's incremental advancements, and an exploration of its specific applications in agriculture. This is one of the first surveys to include the latest YOLOv10, offering a fresh perspective on its implications for precision farming and sustainable agricultural practices in the era of Artificial Intelligence and automation. Further, the survey undertakes a critical analysis of YOLO's performance, synthesizes existing research, and projects future trends. By scrutinizing the unique capabilities packed in YOLO variants and their real-world applications, this survey provides valuable insights into the evolving relationship between YOLO variants and agriculture. The findings contribute towards a nuanced understanding of the potential for precision farming and sustainable agricultural practices, marking a significant step forward in the integration of advanced object detection technologies within the agricultural sector.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Enhancing Data Integrity and Traceability in Industry Cyber Physical Systems (ICPS) through Blockchain Technology: A Comprehensive Approach
Authors:
Mohammad Ikbal Hossain,
Tanja Steigner,
Muhammad Imam Hussain,
Afroja Akther
Abstract:
Blockchain technology, heralded as a transformative innovation, has far-reaching implications beyond its initial application in cryptocurrencies. This study explores the potential of blockchain in enhancing data integrity and traceability within Industry Cyber-Physical Systems (ICPS), a crucial aspect in the era of Industry 4.0. ICPS, integrating computational and physical components, is pivotal i…
▽ More
Blockchain technology, heralded as a transformative innovation, has far-reaching implications beyond its initial application in cryptocurrencies. This study explores the potential of blockchain in enhancing data integrity and traceability within Industry Cyber-Physical Systems (ICPS), a crucial aspect in the era of Industry 4.0. ICPS, integrating computational and physical components, is pivotal in managing critical infrastructure like manufacturing, power grids, and transportation networks. However, they face challenges in security, privacy, and reliability. With its inherent immutability, transparency, and distributed consensus, blockchain presents a groundbreaking approach to address these challenges. It ensures robust data reliability and traceability across ICPS, enhancing transaction transparency and facilitating secure data sharing. This research unearths various blockchain applications in ICPS, including supply chain management, quality control, contract management, and data sharing. Each application demonstrates blockchain's capacity to streamline processes, reduce fraud, and enhance system efficiency. In supply chain management, blockchain provides real-time auditing and compliance. For quality control, it establishes tamper-proof records, boosting consumer confidence. In contract management, smart contracts automate execution, enhancing efficiency. Blockchain also fosters secure collaboration in ICPS, which is crucial for system stability and safety. This study emphasizes the need for further research on blockchain's practical implementation in ICPS, focusing on challenges like scalability, system integration, and security vulnerabilities. It also suggests examining blockchain's economic and organizational impacts in ICPS to understand its feasibility and long-term advantages.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
S-box Security Analysis of NIST Lightweight Cryptography Candidates: A Critical Empirical Study
Authors:
Mahnoor Naseer,
Sundas Tariq,
Naveed Riaz,
Naveed Ahmed,
Mureed Hussain
Abstract:
In the resource-constrained world of the digital landscape, lightweight cryptography plays a critical role in safeguarding information and ensuring the security of various systems, devices, and communication channels. Its efficient and resource-friendly nature makes it the ideal solution for applications where computational power is limited. In response to the growing need for platform-specific im…
▽ More
In the resource-constrained world of the digital landscape, lightweight cryptography plays a critical role in safeguarding information and ensuring the security of various systems, devices, and communication channels. Its efficient and resource-friendly nature makes it the ideal solution for applications where computational power is limited. In response to the growing need for platform-specific implementations, NIST issued a call for standardization of Lightweight cryptography algorithms in 2018. Ascon emerged as the winner of this competition. NIST initially established general evaluation criteria for a standard lightweight scheme including security strength, mitigation against side-channel and fault-injection attacks, and implementation efficiency. To verify the security claims, evaluating the individual components used in any cryptographic algorithm is a crucial step. The quality of a substitution box (S-box) significantly impacts the overall security of a cryptographic primitive. This paper analyzes the S-boxes of six finalists in the NIST Lightweight Cryptography (LWC) standardization process. We evaluate them based on well-established cryptographic properties. Our analysis explores how these properties influence the S-boxes' resistance against known cryptanalytic attacks and potential implementation-specific vulnerabilities, thus reflecting on their compliance with NIST's security requirements.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Towards Automated Generation of Smart Grid Cyber Range for Cybersecurity Experiments and Training
Authors:
Daisuke Mashima,
Muhammad M. Roomi,
Bennet Ng,
Zbigniew Kalbarczyk,
S. M. Suhail Hussain,
Ee-chien Chang
Abstract:
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid…
▽ More
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid cyber range, has been demanded by industry players as well as academia. A smart grid cyber range is typically implemented as a combination of cyber system emulation, which allows interactivity, and physical system (i.e., power grid) simulation that are tightly coupled for consistent cyber and physical behaviours. However, its design and implementation require intensive expertise and efforts in cyber and physical aspects of smart power systems as well as software/system engineering. While many industry players, including power grid operators, device vendors, research and education sectors are interested, availability of the smart grid cyber range is limited to a small number of research labs. To address this challenge, we have developed a framework for modelling a smart grid cyber range using an XML-based language, called SG-ML, and for "compiling" the model into an operational cyber range with minimal engineering efforts. The modelling language includes standardized schema from IEC 61850 and IEC 61131, which allows industry players to utilize their existing configurations. The SG-ML framework aims at making a smart grid cyber range available to broader user bases to facilitate cybersecurity R\&D and hands-on exercises.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
A Novel Technique to Parameterize Congestion Control in 6TiSCH IIoT Networks
Authors:
Kushal Chakraborty,
Aritra Kumar Dutta,
Mohammad Avesh Hussain,
Syed Raafay Mohiuddin,
Nikumani Choudhury,
Rakesh Matam,
Mithun Mukherjee
Abstract:
The Industrial Internet of Things (IIoT) refers to the use of interconnected smart devices, sensors, and other technologies to create a network of intelligent systems that can monitor and manage industrial processes. 6TiSCH (IPv6 over the Time Slotted Channel Hopping mode of IEEE 802.15.4e) as an enabling technology facilitates low-power and low-latency communication between IoT devices in industr…
▽ More
The Industrial Internet of Things (IIoT) refers to the use of interconnected smart devices, sensors, and other technologies to create a network of intelligent systems that can monitor and manage industrial processes. 6TiSCH (IPv6 over the Time Slotted Channel Hopping mode of IEEE 802.15.4e) as an enabling technology facilitates low-power and low-latency communication between IoT devices in industrial environments. The Routing Protocol for Low power and lossy networks (RPL), which is used as the de-facto routing protocol for 6TiSCH networks is observed to suffer from several limitations, especially during congestion in the network. Therefore, there is an immediate need for some modifications to the RPL to deal with this problem. Under traffic load which keeps on changing continuously at different instants of time, the proposed mechanism aims at finding the appropriate parent for a node that can forward the packet to the destination through the least congested path with minimal packet loss. This facilitates congestion management under dynamic traffic loads. For this, a new metric for routing using the concept of exponential weighting has been proposed, which takes the number of packets present in the queue of the node into account when choosing the parent at a particular instance of time. Additionally, the paper proposes a parent selection and swapping mechanism for congested networks. Performance evaluations are carried out in order to validate the proposed work. The results show an improvement in the performance of RPL under heavy and dynamic traffic loads.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers
Authors:
Md Shamim Hussain,
Mohammed J. Zaki,
Dharmashankar Subramanian
Abstract:
Graph transformers typically lack third-order interactions, limiting their geometric understanding which is crucial for tasks like molecular geometry prediction. We propose the Triplet Graph Transformer (TGT) that enables direct communication between pairs within a 3-tuple of nodes via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first pred…
▽ More
Graph transformers typically lack third-order interactions, limiting their geometric understanding which is crucial for tasks like molecular geometry prediction. We propose the Triplet Graph Transformer (TGT) that enables direct communication between pairs within a 3-tuple of nodes via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first predicting interatomic distances from 2D graphs and then using these distances for downstream tasks. A novel three-stage training procedure and stochastic inference further improve training efficiency and model performance. Our model achieves new state-of-the-art (SOTA) results on open challenge benchmarks PCQM4Mv2 and OC20 IS2RE. We also obtain SOTA results on QM9, MOLPCBA, and LIT-PCBA molecular property prediction benchmarks via transfer learning. We also demonstrate the generality of TGT with SOTA results on the traveling salesman problem (TSP).
△ Less
Submitted 9 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
BANSpEmo: A Bangla Emotional Speech Recognition Dataset
Authors:
Md Gulzar Hussain,
Mahmuda Rahman,
Babe Sultana,
Ye Shiren
Abstract:
In the field of audio and speech analysis, the ability to identify emotions from acoustic signals is essential. Human-computer interaction (HCI) and behavioural analysis are only a few of the many areas where the capacity to distinguish emotions from speech signals has an extensive range of applications. Here, we are introducing BanSpEmo, a corpus of emotional speech that only consists of audio re…
▽ More
In the field of audio and speech analysis, the ability to identify emotions from acoustic signals is essential. Human-computer interaction (HCI) and behavioural analysis are only a few of the many areas where the capacity to distinguish emotions from speech signals has an extensive range of applications. Here, we are introducing BanSpEmo, a corpus of emotional speech that only consists of audio recordings and has been created specifically for the Bangla language. This corpus contains 792 audio recordings over a duration of more than 1 hour and 23 minutes. 22 native speakers took part in the recording of two sets of sentences that represent the six desired emotions. The data set consists of 12 Bangla sentences which are uttered in 6 emotions as Disgust, Happy, Sad, Surprised, Anger, and Fear. This corpus is not also gender balanced. Ten individuals who either have experience in related field or have acting experience took part in the assessment of this corpus. It has a balanced number of audio recordings in each emotion class. BanSpEmo can be considered as a useful resource to promote emotion and speech recognition research and related applications in the Bangla language. The dataset can be found here: https://data.mendeley.com/datasets/rdwn4bs5ky and might be employed for academic research.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Adaptive Confidence Threshold for ByteTrack in Multi-Object Tracking
Authors:
Linh Van Ma,
Muhammad Ishfaq Hussain,
JongHyun Park,
Jeongbae Kim,
Moongu Jeon
Abstract:
We investigate the application of ByteTrack in the realm of multiple object tracking. ByteTrack, a simple tracking algorithm, enables the simultaneous tracking of multiple objects by strategically incorporating detections with a low confidence threshold. Conventionally, objects are initially associated with high confidence threshold detections. When the association between objects and detections b…
▽ More
We investigate the application of ByteTrack in the realm of multiple object tracking. ByteTrack, a simple tracking algorithm, enables the simultaneous tracking of multiple objects by strategically incorporating detections with a low confidence threshold. Conventionally, objects are initially associated with high confidence threshold detections. When the association between objects and detections becomes ambiguous, ByteTrack extends the association to lower confidence threshold detections. One notable drawback of the existing ByteTrack approach is its reliance on a fixed threshold to differentiate between high and low-confidence detections. In response to this limitation, we introduce a novel and adaptive approach. Our proposed method entails a dynamic adjustment of the confidence threshold, leveraging insights derived from overall detections. Through experimentation, we demonstrate the effectiveness of our adaptive confidence threshold technique while maintaining running time compared to ByteTrack.
△ Less
Submitted 5 December, 2023; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Identifying Alzheimer Disease Dementia Levels Using Machine Learning Methods
Authors:
Md Gulzar Hussain,
Ye Shiren
Abstract:
Dementia, a prevalent neurodegenerative condition, is a major manifestation of Alzheimer's disease (AD). As the condition progresses from mild to severe, it significantly impairs the individual's ability to perform daily tasks independently, necessitating the need for timely and accurate AD classification. Machine learning or deep learning models have emerged as effective tools for this purpose. I…
▽ More
Dementia, a prevalent neurodegenerative condition, is a major manifestation of Alzheimer's disease (AD). As the condition progresses from mild to severe, it significantly impairs the individual's ability to perform daily tasks independently, necessitating the need for timely and accurate AD classification. Machine learning or deep learning models have emerged as effective tools for this purpose. In this study, we suggested an approach for classifying the four stages of dementia using RF, SVM, and CNN algorithms, augmented with watershed segmentation for feature extraction from MRI images. Our results reveal that SVM with watershed features achieves an impressive accuracy of 96.25%, surpassing other classification methods. The ADNI dataset is utilized to evaluate the effectiveness of our method, and we observed that the inclusion of watershed segmentation contributes to the enhanced performance of the models.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Advancements in Upper Body Exoskeleton: Implementing Active Gravity Compensation with a Feedforward Controller
Authors:
Muhammad Ayaz Hussain,
Ioannis Iossifidis
Abstract:
In this study, we present a feedforward control system designed for active gravity compensation on an upper body exoskeleton. The system utilizes only positional data from internal motor sensors to calculate torque, employing analytical control equations based on Newton-Euler Inverse Dynamics. Compared to feedback control systems, the feedforward approach offers several advantages. It eliminates t…
▽ More
In this study, we present a feedforward control system designed for active gravity compensation on an upper body exoskeleton. The system utilizes only positional data from internal motor sensors to calculate torque, employing analytical control equations based on Newton-Euler Inverse Dynamics. Compared to feedback control systems, the feedforward approach offers several advantages. It eliminates the need for external torque sensors, resulting in reduced hardware complexity and weight. Moreover, the feedforward control exhibits a more proactive response, leading to enhanced performance. The exoskeleton used in the experiments is lightweight and comprises 4 Degrees of Freedom, closely mimicking human upper body kinematics and three-dimensional range of motion. We conducted tests on both hardware and simulations of the exoskeleton, demonstrating stable performance. The system maintained its position over an extended period, exhibiting minimal friction and avoiding undesired slewing.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Switched auxiliary loss for robust training of transformer models for histopathological image segmentation
Authors:
Mustaffa Hussain,
Saharsh Barve
Abstract:
Functional tissue Units (FTUs) are cell population neighborhoods local to a particular organ performing its main function.The FTUs provide crucial information to the pathologist in understanding the disease affecting a particular organ by providing information at the cellular level.In our research, we have developed a model to segment multi-organ FTUs across 5 organs namely: the kidney, large inte…
▽ More
Functional tissue Units (FTUs) are cell population neighborhoods local to a particular organ performing its main function.The FTUs provide crucial information to the pathologist in understanding the disease affecting a particular organ by providing information at the cellular level.In our research, we have developed a model to segment multi-organ FTUs across 5 organs namely: the kidney, large intestine, lung, prostate and spleen by utilizing the 'HuBMAP + HPA - Hacking the Human Body' competition dataset.We propose adding switched auxiliary loss for training models like the transformers to overcome the diminishing gradient problem which poses a challenge towards optimal training of deep models.Overall, our model achieved a dice score of 0.793 on the public dataset and 0.778 on the private dataset.The results supports the robustness of the proposed training methodology.The findings also bolster the use of transformers models for dense prediction tasks in the field of medical image analysis.The study assists in understanding the relationships between cell and tissue organization thereby providing a useful medium to look at the impact of cellular functions on human health.
△ Less
Submitted 14 August, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Authors:
Md Shamim Hussain,
Mohammed J. Zaki,
Dharmashankar Subramanian
Abstract:
Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However, very few of these contribute to the performance of the network, and even fewer are essential. We hypothesize that there are sparsely connected sub-networks within…
▽ More
Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However, very few of these contribute to the performance of the network, and even fewer are essential. We hypothesize that there are sparsely connected sub-networks within a transformer, called information pathways which can be trained independently. However, the dynamic (i.e., input-dependent) nature of these pathways makes it difficult to prune dense self-attention during training. But the overall distribution of these pathways is often predictable. We take advantage of this fact to propose Stochastically Subsampled self-Attention (SSA) - a general-purpose training strategy for transformers that can reduce both the memory and computational cost of self-attention by 4 to 8 times during training while also serving as a regularization method - improving generalization over dense training. We show that an ensemble of sub-models can be formed from the subsampled pathways within a network, which can achieve better performance than its densely attended counterpart. We perform experiments on a variety of NLP, computer vision and graph learning tasks in both generative and discriminative settings to provide empirical evidence for our claims and show the effectiveness of the proposed method.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Securing Safety in Collaborative Cyber-Physical Systems through Fault Criticality Analysis
Authors:
Manzoor Hussain,
Nazakat Ali,
Jang-Eui Hong
Abstract:
Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex,…
▽ More
Collaborative Cyber-Physical Systems (CCPS) are systems that contain tightly coupled physical and cyber components, massively interconnected subsystems, and collaborate to achieve a common goal. The safety of a single Cyber-Physical System (CPS) can be achieved by following the safety standards such as ISO 26262 and IEC 61508 or by applying hazard analysis techniques. However, due to the complex, highly interconnected, heterogeneous, and collaborative nature of CCPS, a fault in one CPS's components can trigger many other faults in other collaborating CPSs. Therefore, a safety assurance technique based on fault criticality analysis would require to ensure safety in CCPS. This paper presents a Fault Criticality Matrix (FCM) implemented in our tool called CPSTracer, which contains several data such as identified fault, fault criticality, safety guard, etc. The proposed FCM is based on composite hazard analysis and content-based relationships among the hazard analysis artifacts, and ensures that the safety guard controls the identified faults at design time; thus, we can effectively manage and control the fault at the design phase to ensure the safe development of CPSs. To validate our approach, we introduce a case study on the Platooning system (a collaborative CPS). We perform the criticality analysis of the Platooning system using FCM in our developed tool. After the detailed fault criticality analysis, we investigate the results to check the appropriateness and effectiveness with two research questions. Also, by performing simulation for the Platooning, we showed that the rate of collision of the Platooning system without using FCM was quite high as compared to the rate of collisions of the system after analyzing the fault criticality using FCM.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Neuro-Symbolic World Models for Adapting to Open World Novelty
Authors:
Jonathan Balloch,
Zhiyu Lin,
Robert Wright,
Xiangyu Peng,
Mustafa Hussain,
Aarun Srinivas,
Julia Kim,
Mark O. Riedl
Abstract:
Open-world novelty--a sudden change in the mechanics or properties of an environment--is a common occurrence in the real world. Novelty adaptation is an agent's ability to improve its policy performance post-novelty. Most reinforcement learning (RL) methods assume that the world is a closed, fixed process. Consequentially, RL policies adapt inefficiently to novelties. To address this, we introduce…
▽ More
Open-world novelty--a sudden change in the mechanics or properties of an environment--is a common occurrence in the real world. Novelty adaptation is an agent's ability to improve its policy performance post-novelty. Most reinforcement learning (RL) methods assume that the world is a closed, fixed process. Consequentially, RL policies adapt inefficiently to novelties. To address this, we introduce WorldCloner, an end-to-end trainable neuro-symbolic world model for rapid novelty adaptation. WorldCloner learns an efficient symbolic representation of the pre-novelty environment transitions, and uses this transition model to detect novelty and efficiently adapt to novelty in a single-shot fashion. Additionally, WorldCloner augments the policy learning process using imagination-based adaptation, where the world model simulates transitions of the post-novelty environment to help the policy adapt. By blending ''imagined'' transitions with interactions in the post-novelty environment, performance can be recovered with fewer total environment interactions. Using environments designed for studying novelty in sequential decision-making problems, we show that the symbolic world model helps its neural policy adapt more efficiently than model-based and model-based neural-only reinforcement learning methods.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty
Authors:
Jonathan Balloch,
Zhiyu Lin,
Mustafa Hussain,
Aarun Srinivas,
Robert Wright,
Xiangyu Peng,
Julia Kim,
Mark Riedl
Abstract:
A robust body of reinforcement learning techniques have been developed to solve complex sequential decision making problems. However, these methods assume that train and evaluation tasks come from similarly or identically distributed environments. This assumption does not hold in real life where small novel changes to the environment can make a previously learned policy fail or introduce simpler s…
▽ More
A robust body of reinforcement learning techniques have been developed to solve complex sequential decision making problems. However, these methods assume that train and evaluation tasks come from similarly or identically distributed environments. This assumption does not hold in real life where small novel changes to the environment can make a previously learned policy fail or introduce simpler solutions that might never be found. To that end we explore the concept of {\em novelty}, defined in this work as the sudden change to the mechanics or properties of environment. We provide an ontology of for novelties most relevant to sequential decision making, which distinguishes between novelties that affect objects versus actions, unary properties versus non-unary relations, and the distribution of solutions to a task. We introduce NovGrid, a novelty generation framework built on MiniGrid, acting as a toolkit for rapidly developing and evaluating novelty-adaptation-enabled reinforcement learning techniques. Along with the core NovGrid we provide exemplar novelties aligned with our ontology and instantiate them as novelty templates that can be applied to many MiniGrid-compliant environments. Finally, we present a set of metrics built into our framework for the evaluation of novelty-adaptation-enabled machine-learning techniques, and show characteristics of a baseline RL model using these metrics.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Parallel Algorithms for Adding a Collection of Sparse Matrices
Authors:
Md Taufique Hussain,
Guttu Sai Abhishek,
Aydin Buluç,
Ariful Azad
Abstract:
We develop a family of parallel algorithms for the SpKAdd operation that adds a collection of k sparse matrices. SpKAdd is a much needed operation in many applications including distributed memory sparse matrix-matrix multiplication (SpGEMM), streaming accumulations of graphs, and algorithmic sparsification of the gradient updates in deep learning. While adding two sparse matrices is a common oper…
▽ More
We develop a family of parallel algorithms for the SpKAdd operation that adds a collection of k sparse matrices. SpKAdd is a much needed operation in many applications including distributed memory sparse matrix-matrix multiplication (SpGEMM), streaming accumulations of graphs, and algorithmic sparsification of the gradient updates in deep learning. While adding two sparse matrices is a common operation in Matlab, Python, Intel MKL, and various GraphBLAS libraries, these implementations do not perform well when adding a large collection of sparse matrices. We develop a series of algorithms using tree merging, heap, sparse accumulator, hash table, and sliding hash table data structures. Among them, hash-based algorithms attain the theoretical lower bounds both on the computational and I/O complexities and perform the best in practice. The newly-developed hash SpKAdd makes the computation of a distributed-memory SpGEMM algorithm at least 2x faster than that the previous state-of-the-art algorithms.
△ Less
Submitted 19 December, 2021;
originally announced December 2021.
-
DeepGuard: A Framework for Safeguarding Autonomous Driving Systems from Inconsistent Behavior
Authors:
Manzoor Hussain,
Nazakat Ali,
Jang-Eui Hong
Abstract:
The deep neural networks (DNNs)based autonomous driving systems (ADSs) are expected to reduce road accidents and improve safety in the transportation domain as it removes the factor of human error from driving tasks. The DNN based ADS sometimes may exhibit erroneous or unexpected behaviors due to unexpected driving conditions which may cause accidents. It is not possible to generalize the DNN mode…
▽ More
The deep neural networks (DNNs)based autonomous driving systems (ADSs) are expected to reduce road accidents and improve safety in the transportation domain as it removes the factor of human error from driving tasks. The DNN based ADS sometimes may exhibit erroneous or unexpected behaviors due to unexpected driving conditions which may cause accidents. It is not possible to generalize the DNN model performance for all driving conditions. Therefore, the driving conditions that were not considered during the training of the ADS may lead to unpredictable consequences for the safety of autonomous vehicles. This study proposes an autoencoder and time series analysis based anomaly detection system to prevent the safety critical inconsistent behavior of autonomous vehicles at runtime. Our approach called DeepGuard consists of two components. The first component, the inconsistent behavior predictor, is based on an autoencoder and time series analysis to reconstruct the driving scenarios. Based on reconstruction error and threshold it determines the normal and unexpected driving scenarios and predicts potential inconsistent behavior. The second component provides on the fly safety guards, that is, it automatically activates healing strategies to prevent inconsistencies in the behavior. We evaluated the performance of DeepGuard in predicting the injected anomalous driving scenarios using already available open sourced DNN based ADSs in the Udacity simulator. Our simulation results show that the best variant of DeepGuard can predict up to 93 percent on the CHAUFFEUR ADS, 83 percent on DAVE2 ADS, and 80 percent of inconsistent behavior on the EPOCH ADS model, outperforming SELFORACLE and DeepRoad. Overall, DeepGuard can prevent up to 89 percent of all predicted inconsistent behaviors of ADS by executing predefined safety guards.
△ Less
Submitted 5 April, 2022; v1 submitted 18 November, 2021;
originally announced November 2021.
-
Designing the Architecture of a Convolutional Neural Network Automatically for Diabetic Retinopathy Diagnosis
Authors:
Fahman Saeed,
Muhammad Hussain,
Hatim A Aboalsamh,
Fadwa Al Adel,
Adi Mohammed Al Owaifeer
Abstract:
The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders…
▽ More
The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders. However, the existing CNN-based methods use either pre-trained CNN models or a brute force approach to design new CNN models, which are not customized to the complexity of fundus images. To overcome this issue, we introduce an approach for custom-design of CNN models, whose architectures are adapted to the structural patterns of fundus images and better represent the DR-relevant features. It takes the leverage of k-medoid clustering, principal component analysis (PCA), and inter-class and intra-class variations to automatically determine the depth and width of a CNN model. The designed models are lightweight, adapted to the internal structures of fundus images, and encode the discriminative patterns of DR lesions. The technique is validated on a local dataset from King Saud University Medical City, Saudi Arabia, and two challenging benchmark datasets from Kaggle: EyePACS and APTOS2019. The custom-designed models outperform the famous pre-trained CNN models like ResNet152, Densnet121, and ResNeSt50 with a significant decrease in the number of parameters and compete well with the state-of-the-art CNN-based DR screening methods. The proposed approach is helpful for DR screening under diverse clinical settings and referring the patients who may need further assessment and treatment to expert ophthalmologists.
△ Less
Submitted 7 November, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Robust Multi-Domain Mitosis Detection
Authors:
Mustaffa Hussain,
Ritesh Gangnani,
Sasidhar Kadiyala
Abstract:
Domain variability is a common bottle neck in developing generalisable algorithms for various medical applications. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a target representative feature space through unpaired image to image translation (CycleGAN). We comprehensively evaluate the performanceand usefulness by uti…
▽ More
Domain variability is a common bottle neck in developing generalisable algorithms for various medical applications. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a target representative feature space through unpaired image to image translation (CycleGAN). We comprehensively evaluate the performanceand usefulness by utilising the transformation to mitosis detection with candidate proposal and classification. This work presents a simple yet effective multi-step mitotic figure detection algorithm developed as a baseline for the MIDOG challenge. On the preliminary test set, the algorithm scoresan F1 score of 0.52.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Home Energy Management Systems: Operation and Resilience of Heuristics against Cyberattacks
Authors:
Hafiz Majid Hussain,
Arun Narayanan,
Subham Sahoo,
Yongheng Yang,
Pedro H. J. Nardelli,
Frede Blaabjerg
Abstract:
Internet of Things (IoT) and advanced communication technologies have demonstrated great potential to manage residential energy resources by enabling demand-side management (DSM). Home energy management systems (HEMSs) can automatically control electricity production and usage inside homes using DSM techniques. These HEMSs will wirelessly collect information from hardware installed in the power sy…
▽ More
Internet of Things (IoT) and advanced communication technologies have demonstrated great potential to manage residential energy resources by enabling demand-side management (DSM). Home energy management systems (HEMSs) can automatically control electricity production and usage inside homes using DSM techniques. These HEMSs will wirelessly collect information from hardware installed in the power system and in homes with the objective to intelligently and efficiently optimize electricity usage and minimize costs. However, HEMSs can be vulnerable to cyberattacks that target the electricity pricing model. The cyberattacker manipulates the pricing information collected by a customer's HEMS to misguide its algorithms toward non-optimal solutions. The customer's electricity bill increases, and additional peaks are created without being detected by the system operator. This article introduces demand-response (DR)-based DSM in HEMSs and discusses DR optimization using heuristic algorithms. Moreover, it discusses the possibilities and impacts of cyberattacks, their effectiveness, and the degree of resilience of heuristic algorithms against cyberattacks. This article also opens research questions and shows prospective directions.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
RVMDE: Radar Validated Monocular Depth Estimation for Robotics
Authors:
Muhamamd Ishfaq Hussain,
Muhammad Aasim Rafique,
Moongu Jeon
Abstract:
Stereoscopy exposits a natural perception of distance in a scene, and its manifestation in 3D world understanding is an intuitive phenomenon. However, an innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation. Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh envi…
▽ More
Stereoscopy exposits a natural perception of distance in a scene, and its manifestation in 3D world understanding is an intuitive phenomenon. However, an innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation. Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh environmental conditions. Moreover, an optical sensor often fails to acquire vital signals in harsh environments, and radar is used instead, which gives coarse but more accurate signals. This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions. A variant of feature pyramid network (FPN) extensively operates on fine-grained image features at multiple scales with a fewer number of parameters. FPN feature maps are fused with sparse radar features extracted with a Convolutional neural network. The concatenated hierarchical features are used to predict the depth with ordinal regression. We performed experiments on the nuScenes dataset, and the proposed architecture stays on top in quantitative evaluations with reduced parameters and faster inference. The depth estimation results suggest that the proposed techniques can be used as an alternative to stereo depth estimation in critical applications in robotics and self-driving cars. The source code will be available in the following: \url{https://github.com/MI-Hussain/RVMDE}.
△ Less
Submitted 18 April, 2022; v1 submitted 11 September, 2021;
originally announced September 2021.
-
A Deep Learning-Based Unified Framework for Red Lesions Detection on Retinal Fundus Images
Authors:
Norah Asiri,
Muhammad Hussain,
Fadwa Al Adel
Abstract:
Red-lesions, microaneurysms (MAs) and hemorrhages (HMs), are the early signs of diabetic retinopathy (DR). The automatic detection of MAs and HMs on retinal fundus images is a challenging task. Most of the existing methods detect either only MAs or only HMs because of the difference in their texture, sizes, and morphology. Though some methods detect both MAs and HMs, they suffer from the curse of…
▽ More
Red-lesions, microaneurysms (MAs) and hemorrhages (HMs), are the early signs of diabetic retinopathy (DR). The automatic detection of MAs and HMs on retinal fundus images is a challenging task. Most of the existing methods detect either only MAs or only HMs because of the difference in their texture, sizes, and morphology. Though some methods detect both MAs and HMs, they suffer from the curse of dimensionality of shape and colors features and fail to detect all shape variations of HMs such as flame-shaped. Leveraging the progress in deep learning, we proposed a two-stream red lesions detection system dealing simultaneously with small and large red lesions. For this system, we introduced a new ROIs candidates generation method for large red lesions on fundus images; it is based on blood vessel segmentation and morphological operations, and reduces the computational complexity, and enhances the detection accuracy by generating a small number of potential candidates. For detection, we proposed a framework with two streams. We used pretrained VGGNet as a backbone model and carried out several extensive experiments to tune it for vessels segmentation and candidates generation, and finally learning the appropriate mapping, which yields better detection of the red lesions comparing with the state-of-the-art methods. The experimental results validated the effectiveness of the system in the detection of both MAs and HMs; it yields higher performance for per lesion detection; its sensitivity equals 0.8589 and good FROC score under 8 FPIs on DiaretDB1-MA reports FROC=0.7518, and with SN=0.7552 and good FROC score under 2,4and 8 FPIs on DiaretDB1-HM, and SN=0.8157 on e-ophtha with overall FROC=0.4537 and on ROCh dataset with FROC=0.3461 which is higher than the state-of-the art methods. For DR screening, the system performs well with good AUC on DiaretDB1-MA, DiaretDB1-HM, and e-ophtha datasets.
△ Less
Submitted 1 May, 2025; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Recognition of COVID-19 Disease Utilizing X-Ray Imaging of the Chest Using CNN
Authors:
Md Gulzar Hussain,
Ye Shiren
Abstract:
Since this COVID-19 pandemic thrives, the utilization of X-Ray images of the Chest (CXR) as a complementary screening technique to RT-PCR testing grows to its clinical use for respiratory complaints. Many new deep learning approaches have developed as a consequence. The goal of this research is to assess the convolutional neural networks (CNNs) to diagnosis COVID-19 utisizing X-ray images of chest…
▽ More
Since this COVID-19 pandemic thrives, the utilization of X-Ray images of the Chest (CXR) as a complementary screening technique to RT-PCR testing grows to its clinical use for respiratory complaints. Many new deep learning approaches have developed as a consequence. The goal of this research is to assess the convolutional neural networks (CNNs) to diagnosis COVID-19 utisizing X-ray images of chest. The performance of CNN with one, three, and four convolution layers has been evaluated in this research. A dataset of 13,808 CXR photographs are used in this research. When evaluated on X-ray images with three splits of the dataset, our preliminary experimental results show that the CNN model with three convolution layers can reliably detect with 96 percent accuracy (precision being 96 percent). This fact indicates the commitment of our suggested model for reliable screening of COVID-19.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Global Self-Attention as a Replacement for Graph Convolution
Authors:
Md Shamim Hussain,
Mohammed J. Zaki,
Dharmashankar Subramanian
Abstract:
We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning…
▽ More
We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning on graph-structured data. Our model exclusively uses global self-attention as an aggregation mechanism rather than static localized convolutional aggregation. This allows for unconstrained long-range dynamic interactions between nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. We verify the performance of EGT in a wide range of graph-learning experiments on benchmark datasets, in which it outperforms Convolutional/Message-Passing Graph Neural Networks. EGT sets a new state-of-the-art for the quantum-chemical regression task on the OGB-LSC PCQM4Mv2 dataset containing 3.8 million molecular graphs. Our findings indicate that global self-attention based aggregation can serve as a flexible, adaptive and effective replacement of graph convolution for general-purpose graph learning. Therefore, convolutional local neighborhood aggregation is not an essential inductive bias.
△ Less
Submitted 3 June, 2022; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Learning and Adaptation for Millimeter-Wave Beam Tracking and Training: a Dual Timescale Variational Framework
Authors:
Muddassar Hussain,
Nicolo Michelusi
Abstract:
Millimeter-wave vehicular networks incur enormous beam-training overhead to enable narrow-beam communications. This paper proposes a learning and adaptation framework in which the dynamics of the communication beams are learned and then exploited to design adaptive beam-tracking and training with low overhead: on a long-timescale, a deep recurrent variational autoencoder (DR-VAE) uses noisy beam-t…
▽ More
Millimeter-wave vehicular networks incur enormous beam-training overhead to enable narrow-beam communications. This paper proposes a learning and adaptation framework in which the dynamics of the communication beams are learned and then exploited to design adaptive beam-tracking and training with low overhead: on a long-timescale, a deep recurrent variational autoencoder (DR-VAE) uses noisy beam-training feedback to learn a probabilistic model of beam dynamics and enable predictive beam-tracking; on a short-timescale, an adaptive beam-training procedure is formulated as a partially observable (PO-) Markov decision process (MDP) and optimized via point-based value iteration (PBVI) by leveraging beam-training feedback and a probabilistic prediction of the strongest beam pair provided by the DR-VAE. In turn, beam-training feedback is used to refine the DR-VAE via stochastic gradient ascent in a continuous process of learning and adaptation. The proposed DR-VAE learning framework learns accurate beam dynamics: it reduces the Kullback-Leibler divergence between the ground truth and the learned model of beam dynamics by 95% over the Baum-Welch algorithm and a naive learning approach that neglects feedback errors. Numerical results on a line-of-sight (LOS) scenario with multipath reveal that the proposed dual timescale approach yields near-optimal spectral efficiency, and improves it by 130% over a policy that scans exhaustively over the dominant beam pairs, and by 20% over a state-of-the-art POMDP policy. Finally, a low-complexity policy is proposed by reducing the POMDP to an error-robust MDP, and is shown to perform well in regimes with infrequent feedback errors.
△ Less
Submitted 26 October, 2021; v1 submitted 27 June, 2021;
originally announced July 2021.
-
Biologically Inspired Model for Timed Motion in Robotic Systems
Authors:
Sebastian Doliwa,
Muhammad Ayaz Hussain,
Tim Sziburis,
Ioannis Iossifidis
Abstract:
The goal of this work is the development of a motion model for sequentially timed movement actions in robotic systems under specific consideration of temporal stabilization, that is maintaining an approximately constant overall movement time (isochronous behavior). This is demonstrated both in simulation and on a physical robotic system for the task of intercepting a moving target in three-dimensi…
▽ More
The goal of this work is the development of a motion model for sequentially timed movement actions in robotic systems under specific consideration of temporal stabilization, that is maintaining an approximately constant overall movement time (isochronous behavior). This is demonstrated both in simulation and on a physical robotic system for the task of intercepting a moving target in three-dimensional space. Motivated from humanoid motion, timing plays a vital role to generate a naturalistic behavior in interaction with the dynamic environment as well as adaptively planning and executing action sequences on-line. In biological systems, many of the physiological and anatomical functions follow a particular level of periodicity and stabilization, which exhibit a certain extent of resilience against external disturbances. A main aspect thereof is stabilizing movement timing against limited perturbations. Especially human arm movement, namely when it is tasked to reach a certain goal point, pose or configuration, shows a stabilizing behavior. This work incorporates the utilization of an extended Kalman filter (EKF) which was implemented to predict the target position while coping with non-linear system dynamics. The periodicity and temporal stabilization in biological systems was artificially generated by a Hopf oscillator, yielding a sinusoidal velocity profile for smooth and repeatable motion.
△ Less
Submitted 29 July, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.