-
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Authors:
Divya Velayudhan,
Abdelfatah Ahmed,
Mohamad Alansari,
Neha Gour,
Abderaouf Behouch,
Taimur Hassan,
Syed Talal Wasim,
Nabil Maalej,
Muzammal Naseer,
Juergen Gall,
Mohammed Bennamoun,
Ernesto Damiani,
Naoufel Werghi
Abstract:
Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the fir…
▽ More
Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the first multimodal X-ray baggage security dataset, comprising 46,642 image-caption paired scans across 21 threat categories, generated using an X-ray scanner for airport security. STCray is meticulously developed with our specialized protocol that ensures domain-aware, coherent captions, that lead to the multi-modal instruction following data in X-ray baggage security. This allows us to train a domain-aware visual AI assistant named STING-BEE that supports a range of vision-language tasks, including scene comprehension, referring threat localization, visual grounding, and visual question answering (VQA), establishing novel baselines for multi-modal learning in X-ray baggage security. Further, STING-BEE shows state-of-the-art generalization in cross-domain settings. Code, data, and models are available at https://divs1159.github.io/STING-BEE/.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection
Authors:
MD Sadik Hossain Shanto,
Mahir Labib Dihan,
Souvik Ghosh,
Riad Ahmed Anonto,
Hafijul Hoque Chowdhury,
Abir Muhtasim,
Rakib Ahsan,
MD Tanvir Hassan,
MD Roqunuzzaman Sojib,
Sheikh Azizul Hakim,
M. Saifur Rahman
Abstract:
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary streng…
▽ More
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary strengths. Integration of convolution layers and strided attention in MaxViT is well-suited for detecting local features. In contrast, hybrid use of convolution and attention mechanisms in CoAtNet effectively captures multi-scale features. Robust pretraining with masked image modeling of EVA-02 excels at capturing global features. After training, we freeze the parameters of these models and train the classification heads. Finally, a majority voting ensemble is employed to combine the predictions from these models, improving robustness and generalization to unseen scenarios. The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Authors:
Mahir Labib Dihan,
Md Tanvir Hassan,
Md Tanvir Parvez,
Md Hasebul Hasan,
Md Almash Alam,
Muhammad Aamir Cheema,
Mohammed Eunus Ali,
Md Rizwan Parvez
Abstract:
Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks - textual, API-based, and visual reasoning - through 700 multiple-choice questions spanning 180 cities and 54 countries, cov…
▽ More
Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks - textual, API-based, and visual reasoning - through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions, and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro, none surpass 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation. All the resources are available at: https://mapeval.github.io/.
△ Less
Submitted 6 June, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain
Authors:
Vidya Sudevan,
Fakhreddine Zayer,
Taimur Hassan,
Sajid Javed,
Hamad Karki,
Giulia De Masi,
Jorge Dias
Abstract:
This paper delves into the potential of DU-VIO, a dehazing-aided hybrid multi-rate multi-modal Visual-Inertial Odometry (VIO) estimation framework, designed to thrive in the challenging realm of extreme underwater environments. The cutting-edge DU-VIO framework is incorporating a GAN-based pre-processing module and a hybrid CNN-LSTM module for precise pose estimation, using visibility-enhanced und…
▽ More
This paper delves into the potential of DU-VIO, a dehazing-aided hybrid multi-rate multi-modal Visual-Inertial Odometry (VIO) estimation framework, designed to thrive in the challenging realm of extreme underwater environments. The cutting-edge DU-VIO framework is incorporating a GAN-based pre-processing module and a hybrid CNN-LSTM module for precise pose estimation, using visibility-enhanced underwater images and raw IMU data. Accurate pose estimation is paramount for various underwater robotics and exploration applications. However, underwater visibility is often compromised by suspended particles and attenuation effects, rendering visual-inertial pose estimation a formidable challenge. DU-VIO aims to overcome these limitations by effectively removing visual disturbances from raw image data, enhancing the quality of image features used for pose estimation. We demonstrate the effectiveness of DU-VIO by calculating RMSE scores for translation and rotation vectors in comparison to their reference values. These scores are then compared to those of a base model using a modified AQUALOC Dataset. This study's significance lies in its potential to revolutionize underwater robotics and exploration. DU-VIO offers a robust solution to the persistent challenge of underwater visibility, significantly improving the accuracy of pose estimation. This research contributes valuable insights and tools for advancing underwater technology, with far-reaching implications for scientific research, environmental monitoring, and industrial applications.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Fair Distillation: Teaching Fairness from Biased Teachers in Medical Imaging
Authors:
Milad Masroor,
Tahir Hassan,
Yu Tian,
Kevin Wells,
David Rosewarne,
Thanh-Toan Do,
Gustavo Carneiro
Abstract:
Deep learning has achieved remarkable success in image classification and segmentation tasks. However, fairness concerns persist, as models often exhibit biases that disproportionately affect demographic groups defined by sensitive attributes such as race, gender, or age. Existing bias-mitigation techniques, including Subgroup Re-balancing, Adversarial Training, and Domain Generalization, aim to b…
▽ More
Deep learning has achieved remarkable success in image classification and segmentation tasks. However, fairness concerns persist, as models often exhibit biases that disproportionately affect demographic groups defined by sensitive attributes such as race, gender, or age. Existing bias-mitigation techniques, including Subgroup Re-balancing, Adversarial Training, and Domain Generalization, aim to balance accuracy across demographic groups, but often fail to simultaneously improve overall accuracy, group-specific accuracy, and fairness due to conflicts among these interdependent objectives. We propose the Fair Distillation (FairDi) method, a novel fairness approach that decomposes these objectives by leveraging biased ``teacher'' models, each optimized for a specific demographic group. These teacher models then guide the training of a unified ``student'' model, which distills their knowledge to maximize overall and group-specific accuracies, while minimizing inter-group disparities. Experiments on medical imaging datasets show that FairDi achieves significant gains in both overall and group-specific accuracy, along with improved fairness, compared to existing methods. FairDi is adaptable to various medical tasks, such as classification and segmentation, and provides an effective solution for equitable model performance.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Advancing Histopathology with Deep Learning Under Data Scarcity: A Decade in Review
Authors:
Ahmad Obeid,
Said Boumaraf,
Anabia Sohail,
Taimur Hassan,
Sajid Javed,
Jorge Dias,
Mohammed Bennamoun,
Naoufel Werghi
Abstract:
Recent years witnessed remarkable progress in computational histopathology, largely fueled by deep learning. This brought the clinical adoption of deep learning-based tools within reach, promising significant benefits to healthcare, offering a valuable second opinion on diagnoses, streamlining complex tasks, and mitigating the risks of inconsistency and bias in clinical decisions. However, a well-…
▽ More
Recent years witnessed remarkable progress in computational histopathology, largely fueled by deep learning. This brought the clinical adoption of deep learning-based tools within reach, promising significant benefits to healthcare, offering a valuable second opinion on diagnoses, streamlining complex tasks, and mitigating the risks of inconsistency and bias in clinical decisions. However, a well-known challenge is that deep learning models may contain up to billions of parameters; supervising their training effectively would require vast labeled datasets to achieve reliable generalization and noise resilience. In medical imaging, particularly histopathology, amassing such extensive labeled data collections places additional demands on clinicians and incurs higher costs, which hinders the art's progress. Addressing this challenge, researchers devised various strategies for leveraging deep learning with limited data and annotation availability. In this paper, we present a comprehensive review of deep learning applications in histopathology, with a focus on the challenges posed by data scarcity over the past decade. We systematically categorize and compare various approaches, evaluate their distinct contributions using benchmarking tables, and highlight their respective advantages and limitations. Additionally, we address gaps in existing reviews and identify underexplored research opportunities, underscoring the potential for future advancements in this field.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Integrating Features for Recognizing Human Activities through Optimized Parameters in Graph Convolutional Networks and Transformer Architectures
Authors:
Mohammad Belal,
Taimur Hassan,
Abdelfatah Hassan,
Nael Alsheikh,
Noureldin Elhendawi,
Irfan Hussain
Abstract:
Human activity recognition is a major field of study that employs computer vision, machine vision, and deep learning techniques to categorize human actions. The field of deep learning has made significant progress, with architectures that are extremely effective at capturing human dynamics. This study emphasizes the influence of feature fusion on the accuracy of activity recognition. This techniqu…
▽ More
Human activity recognition is a major field of study that employs computer vision, machine vision, and deep learning techniques to categorize human actions. The field of deep learning has made significant progress, with architectures that are extremely effective at capturing human dynamics. This study emphasizes the influence of feature fusion on the accuracy of activity recognition. This technique addresses the limitation of conventional models, which face difficulties in identifying activities because of their limited capacity to understand spatial and temporal features. The technique employs sensory data obtained from four publicly available datasets: HuGaDB, PKU-MMD, LARa, and TUG. The accuracy and F1-score of two deep learning models, specifically a Transformer model and a Parameter-Optimized Graph Convolutional Network (PO-GCN), were evaluated using these datasets. The feature fusion technique integrated the final layer features from both models and inputted them into a classifier. Empirical evidence demonstrates that PO-GCN outperforms standard models in activity recognition. HuGaDB demonstrated a 2.3% improvement in accuracy and a 2.2% increase in F1-score. TUG showed a 5% increase in accuracy and a 0.5% rise in F1-score. On the other hand, LARa and PKU-MMD achieved lower accuracies of 64% and 69% respectively. This indicates that the integration of features enhanced the performance of both the Transformer model and PO-GCN.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Simplify, Consolidate, Intervene: Facilitating Institutional Support with Mental Models of Learning Management System Use
Authors:
Taha Hassan,
Bob Edmison,
Daron Williams,
Larry Cox II,
Matthew Louvet,
Bart Knijnenburg,
D. Scott McCrickard
Abstract:
Measuring instructors' adoption of learning management system (LMS) tools is a critical first step in evaluating the efficacy of online teaching and learning at scale. Existing models for LMS adoption are often qualitative, learner-centered, and difficult to leverage towards institutional support. We propose depth-of-use (DOU): an intuitive measurement model for faculty's utilization of a universi…
▽ More
Measuring instructors' adoption of learning management system (LMS) tools is a critical first step in evaluating the efficacy of online teaching and learning at scale. Existing models for LMS adoption are often qualitative, learner-centered, and difficult to leverage towards institutional support. We propose depth-of-use (DOU): an intuitive measurement model for faculty's utilization of a university-wide LMS and their needs for institutional support. We hypothesis-test the relationship between DOU and course attributes like modality, participation, logistics, and outcomes. In a large-scale analysis of metadata from 30000+ courses offered at Virginia Tech over two years, we find that a pervasive need for scale, interoperability and ubiquitous access drives LMS adoption by university instructors. We then demonstrate how DOU can help faculty members identify the opportunity-cost of transition from legacy apps to LMS tools. We also describe how DOU can help instructional designers and IT organizational leadership evaluate the impact of their support allocation, faculty development and LMS evangelism initiatives.
△ Less
Submitted 25 June, 2024;
originally announced July 2024.
-
Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
Authors:
Mohammad Belal,
Taimur Hassan,
Abdelfatah Ahmed,
Ahmad Aljarah,
Nael Alsheikh,
Irfan Hussain
Abstract:
Human activity recognition (HAR) is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demon…
▽ More
Human activity recognition (HAR) is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demonstration of the effectiveness of feature fusion in improving HAR accuracy by capturing spatial and temporal features, which has important implications for the development of more accurate and robust activity recognition systems. The study uses sensory data from HuGaDB, PKU-MMD, LARa, and TUG datasets. Two model, the PO-MS-GCN and a Transformer were trained and evaluated, with PO-MS-GCN outperforming state-of-the-art models. HuGaDB and TUG achieved high accuracies and f1-scores, while LARa and PKU-MMD had lower scores. Feature fusion improved results across datasets.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
FLEXIBLE: Forecasting Cellular Traffic by Leveraging Explicit Inductive Graph-Based Learning
Authors:
Duc Thinh Ngo,
Kandaraj Piamrat,
Ons Aouedi,
Thomas Hassan,
Philippe Raipin-Parvédy
Abstract:
From a telecommunication standpoint, the surge in users and services challenges next-generation networks with escalating traffic demands and limited resources. Accurate traffic prediction can offer network operators valuable insights into network conditions and suggest optimal allocation policies. Recently, spatio-temporal forecasting, employing Graph Neural Networks (GNNs), has emerged as a promi…
▽ More
From a telecommunication standpoint, the surge in users and services challenges next-generation networks with escalating traffic demands and limited resources. Accurate traffic prediction can offer network operators valuable insights into network conditions and suggest optimal allocation policies. Recently, spatio-temporal forecasting, employing Graph Neural Networks (GNNs), has emerged as a promising method for cellular traffic prediction. However, existing studies, inspired by road traffic forecasting formulations, overlook the dynamic deployment and removal of base stations, requiring the GNN-based forecaster to handle an evolving graph. This work introduces a novel inductive learning scheme and a generalizable GNN-based forecasting model that can process diverse graphs of cellular traffic with one-time training. We also demonstrate that this model can be easily leveraged by transfer learning with minimal effort, making it applicable to different areas. Experimental results show up to 9.8% performance improvement compared to the state-of-the-art, especially in rare-data settings with training data reduced to below 20%.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Enhancing SCADA Security: Developing a Host-Based Intrusion Detection System to Safeguard Against Cyberattacks
Authors:
Omer Sen,
Tarek Hassan,
Andreas Ulbig,
Martin Henze
Abstract:
With the increasing reliance of smart grids on correctly functioning SCADA systems and their vulnerability to cyberattacks, there is a pressing need for effective security measures. SCADA systems are prone to cyberattacks, posing risks to critical infrastructure. As there is a lack of host-based intrusion detection systems specifically designed for the stable nature of SCADA systems, the objective…
▽ More
With the increasing reliance of smart grids on correctly functioning SCADA systems and their vulnerability to cyberattacks, there is a pressing need for effective security measures. SCADA systems are prone to cyberattacks, posing risks to critical infrastructure. As there is a lack of host-based intrusion detection systems specifically designed for the stable nature of SCADA systems, the objective of this work is to propose a host-based intrusion detection system tailored for SCADA systems in smart grids. The proposed system utilizes USB device identification, flagging, and process memory scanning to monitor and detect anomalies in SCADA systems, providing enhanced security measures. Evaluation in three different scenarios demonstrates the tool's effectiveness in detecting and disabling malware. The proposed approach effectively identifies potential threats and enhances the security of SCADA systems in smart grids, providing a promising solution to protect against cyberattacks.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
A Comprehensive Review of Artificial Intelligence Applications in Major Retinal Conditions
Authors:
Hina Raja,
Taimur Hassan,
Bilal Hassan,
Muhammad Usman Akram,
Hira Raja,
Alaa A Abd-alrazaq,
Siamak Yousefi,
Naoufel Werghi
Abstract:
This paper provides a systematic survey of retinal diseases that cause visual impairments or blindness, emphasizing the importance of early detection for effective treatment. It covers both clinical and automated approaches for detecting retinal disease, focusing on studies from the past decade. The survey evaluates various algorithms for identifying structural abnormalities and diagnosing retinal…
▽ More
This paper provides a systematic survey of retinal diseases that cause visual impairments or blindness, emphasizing the importance of early detection for effective treatment. It covers both clinical and automated approaches for detecting retinal disease, focusing on studies from the past decade. The survey evaluates various algorithms for identifying structural abnormalities and diagnosing retinal diseases, and it identifies future research directions based on a critical analysis of existing literature. This comprehensive study, which reviews both clinical and automated detection methods using different modalities, appears to be unique in its scope. Additionally, the survey serves as a helpful guide for researchers interested in digital retinopathy.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
A Study on Knowledge Graph Embeddings and Graph Neural Networks for Web Of Things
Authors:
Rohith Teja Mittakola,
Thomas Hassan
Abstract:
Graph data structures are widely used to store relational information between several entities. With data being generated worldwide on a large scale, we see a significant growth in the generation of knowledge graphs. Thing in the future is Orange's take on a knowledge graph in the domain of the Web Of Things (WoT), where the main objective of the platform is to provide a digital representation of…
▽ More
Graph data structures are widely used to store relational information between several entities. With data being generated worldwide on a large scale, we see a significant growth in the generation of knowledge graphs. Thing in the future is Orange's take on a knowledge graph in the domain of the Web Of Things (WoT), where the main objective of the platform is to provide a digital representation of the physical world and enable cross-domain applications to be built upon this massive and highly connected graph of things. In this context, as the knowledge graph grows in size, it is prone to have noisy and messy data. In this paper, we explore state-of-the-art knowledge graph embedding (KGE) methods to learn numerical representations of the graph entities and, subsequently, explore downstream tasks like link prediction, node classification, and triple classification. We also investigate Graph neural networks (GNN) alongside KGEs and compare their performance on the same downstream tasks. Our evaluation highlights the encouraging performance of both KGE and GNN-based methods on node classification, and the superiority of GNN approaches in the link prediction task. Overall, we show that state-of-the-art approaches are relevant in a WoT context, and this preliminary work provides insights to implement and evaluate them in this context.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Spatio-temporal MLP-graph network for 3D human pose estimation
Authors:
Tanvir Hassan,
A. Ben Hamza
Abstract:
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weak…
▽ More
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weakness, we propose a spatio-temporal network architecture composed of a joint-mixing multi-layer perceptron block that facilitates communication among different joints and a graph weighted Jacobi network block that enables communication among various feature channels. The major novelty of our approach lies in a new weighted Jacobi feature propagation rule obtained through graph filtering with implicit fairing. We leverage temporal information from the 2D pose sequences, and integrate weight modulation into the model to enable untangling of the feature transformations of distinct nodes. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined linkages between body joints by altering the graph topology through a learnable modulation matrix. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our model, outperforming recent state-of-the-art methods for 3D human pose estimation.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Vision-Based Autonomous Navigation for Unmanned Surface Vessel in Extreme Marine Conditions
Authors:
Muhayyuddin Ahmed,
Ahsan Baidar Bakht,
Taimur Hassan,
Waseem Akram,
Ahmed Humais,
Lakmal Seneviratne,
Shaoming He,
Defu Lin,
Irfan Hussain
Abstract:
Visual perception is an important component for autonomous navigation of unmanned surface vessels (USV), particularly for the tasks related to autonomous inspection and tracking. These tasks involve vision-based navigation techniques to identify the target for navigation. Reduced visibility under extreme weather conditions in marine environments makes it difficult for vision-based approaches to wo…
▽ More
Visual perception is an important component for autonomous navigation of unmanned surface vessels (USV), particularly for the tasks related to autonomous inspection and tracking. These tasks involve vision-based navigation techniques to identify the target for navigation. Reduced visibility under extreme weather conditions in marine environments makes it difficult for vision-based approaches to work properly. To overcome these issues, this paper presents an autonomous vision-based navigation framework for tracking target objects in extreme marine conditions. The proposed framework consists of an integrated perception pipeline that uses a generative adversarial network (GAN) to remove noise and highlight the object features before passing them to the object detector (i.e., YOLOv5). The detected visual features are then used by the USV to track the target. The proposed framework has been thoroughly tested in simulation under extremely reduced visibility due to sandstorms and fog. The results are compared with state-of-the-art de-hazing methods across the benchmarked MBZIRC simulation dataset, on which the proposed scheme has outperformed the existing methods across various metrics.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Tomato Maturity Recognition with Convolutional Transformers
Authors:
Asim Khan,
Taimur Hassan,
Muhammad Shafay,
Israa Fahmy,
Naoufel Werghi,
Lakmal Seneviratne,
Irfan Hussain
Abstract:
Tomatoes are a major crop worldwide, and accurately classifying their maturity is important for many agricultural applications, such as harvesting, grading, and quality control. In this paper, the authors propose a novel method for tomato maturity classification using a convolutional transformer. The convolutional transformer is a hybrid architecture that combines the strengths of convolutional ne…
▽ More
Tomatoes are a major crop worldwide, and accurately classifying their maturity is important for many agricultural applications, such as harvesting, grading, and quality control. In this paper, the authors propose a novel method for tomato maturity classification using a convolutional transformer. The convolutional transformer is a hybrid architecture that combines the strengths of convolutional neural networks (CNNs) and transformers. Additionally, this study introduces a new tomato dataset named KUTomaData, explicitly designed to train deep-learning models for tomato segmentation and classification. KUTomaData is a compilation of images sourced from a greenhouse in the UAE, with approximately 700 images available for training and testing. The dataset is prepared under various lighting conditions and viewing perspectives and employs different mobile camera sensors, distinguishing it from existing datasets. The contributions of this paper are threefold:Firstly, the authors propose a novel method for tomato maturity classification using a modular convolutional transformer. Secondly, the authors introduce a new tomato image dataset that contains images of tomatoes at different maturity levels. Lastly, the authors show that the convolutional transformer outperforms state-of-the-art methods for tomato maturity classification. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively.
△ Less
Submitted 2 January, 2024; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Regular Splitting Graph Network for 3D Human Pose Estimation
Authors:
Tanvir Hassan,
A. Ben Hamza
Abstract:
In human pose estimation methods based on graph convolutional architectures, the human skeleton is usually modeled as an undirected graph whose nodes are body joints and edges are connections between neighboring joints. However, most of these methods tend to focus on learning relationships between body joints of the skeleton using first-order neighbors, ignoring higher-order neighbors and hence li…
▽ More
In human pose estimation methods based on graph convolutional architectures, the human skeleton is usually modeled as an undirected graph whose nodes are body joints and edges are connections between neighboring joints. However, most of these methods tend to focus on learning relationships between body joints of the skeleton using first-order neighbors, ignoring higher-order neighbors and hence limiting their ability to exploit relationships between distant joints. In this paper, we introduce a higher-order regular splitting graph network (RS-Net) for 2D-to-3D human pose estimation using matrix splitting in conjunction with weight and adjacency modulation. The core idea is to capture long-range dependencies between body joints using multi-hop neighborhoods and also to learn different modulation vectors for different body joints as well as a modulation matrix added to the adjacency matrix associated to the skeleton. This learnable modulation matrix helps adjust the graph structure by adding extra graph edges in an effort to learn additional connections between body joints. Instead of using a shared weight matrix for all neighboring body joints, the proposed RS-Net model applies weight unsharing before aggregating the feature vectors associated to the joints in order to capture the different relations between them. Experiments and ablations studies performed on two benchmark datasets demonstrate the effectiveness of our model, achieving superior performance over recent state-of-the-art methods for 3D human pose estimation.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
NFTrig
Authors:
Jordan Thompson,
Ryan Benac,
Kidus Olana,
Talha Hassan,
Andrew Sward,
Tauheed Khan Mohd
Abstract:
NFTrig is a web-based application created for use as an educational tool to teach trigonometry and block chain technology. Creation of the application includes front and back end development as well as integration with other outside sources including MetaMask and OpenSea. The primary development languages include HTML, CSS (Bootstrap 5), and JavaScript as well as Solidity for smart contract creati…
▽ More
NFTrig is a web-based application created for use as an educational tool to teach trigonometry and block chain technology. Creation of the application includes front and back end development as well as integration with other outside sources including MetaMask and OpenSea. The primary development languages include HTML, CSS (Bootstrap 5), and JavaScript as well as Solidity for smart contract creation. The application itself is hosted on Moralis utilizing their Web3 API. This technical report describes how the application was created, what the application requires, and smart contract design with security considerations in mind. The NFTrig application has underwent significant testing and validation prior to and after deployment. Future suggestions and recommendations for further development, maintenance, and use in other fields for education are also described.
△ Less
Submitted 21 December, 2022;
originally announced January 2023.
-
Artificial Image Tampering Distorts Spatial Distribution of Texture Landmarks and Quality Characteristics
Authors:
Tahir Hassan,
Aras Asaad,
Dashti Ali,
Sabah Jassim
Abstract:
Advances in AI based computer vision has led to a significant growth in synthetic image generation and artificial image tampering with serious implications for unethical exploitations that undermine person identification and could make render AI predictions less explainable.Morphing, Deepfake and other artificial generation of face photographs undermine the reliability of face biometrics authentic…
▽ More
Advances in AI based computer vision has led to a significant growth in synthetic image generation and artificial image tampering with serious implications for unethical exploitations that undermine person identification and could make render AI predictions less explainable.Morphing, Deepfake and other artificial generation of face photographs undermine the reliability of face biometrics authentication using different electronic ID documents.Morphed face photographs on e-passports can fool automated border control systems and human guards.This paper extends our previous work on using the persistent homology (PH) of texture landmarks to detect morphing attacks.We demonstrate that artificial image tampering distorts the spatial distribution of texture landmarks (i.e. their PH) as well as that of a set of image quality characteristics.We shall demonstrate that the tamper caused distortion of these two slim feature vectors provide significant potentials for building explainable (Handcrafted) tamper detectors with low error rates and suitable for implementation on constrained devices.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
An Incremental Learning Approach to Automatically Recognize Pulmonary Diseases from the Multi-vendor Chest Radiographs
Authors:
Mehreen Sirshar,
Taimur Hassan,
Muhammad Usman Akram,
Shoab Ahmed Khan
Abstract:
Pulmonary diseases can cause severe respiratory problems, leading to sudden death if not treated timely. Many researchers have utilized deep learning systems to diagnose pulmonary disorders using chest X-rays (CXRs). However, such systems require exhaustive training efforts on large-scale data to effectively diagnose chest abnormalities. Furthermore, procuring such large-scale data is often infeas…
▽ More
Pulmonary diseases can cause severe respiratory problems, leading to sudden death if not treated timely. Many researchers have utilized deep learning systems to diagnose pulmonary disorders using chest X-rays (CXRs). However, such systems require exhaustive training efforts on large-scale data to effectively diagnose chest abnormalities. Furthermore, procuring such large-scale data is often infeasible and impractical, especially for rare diseases. With the recent advances in incremental learning, researchers have periodically tuned deep neural networks to learn different classification tasks with few training examples. Although, such systems can resist catastrophic forgetting, they treat the knowledge representations independently of each other, and this limits their classification performance. Also, to the best of our knowledge, there is no incremental learning-driven image diagnostic framework that is specifically designed to screen pulmonary disorders from the CXRs. To address this, we present a novel framework that can learn to screen different chest abnormalities incrementally. In addition to this, the proposed framework is penalized through an incremental learning loss function that infers Bayesian theory to recognize structural and semantic inter-dependencies between incrementally learned knowledge representations to diagnose the pulmonary diseases effectively, regardless of the scanner specifications. We tested the proposed framework on five public CXR datasets containing different chest abnormalities, where it outperformed various state-of-the-art system through various metrics.
△ Less
Submitted 14 January, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items
Authors:
Taimur Hassan,
Samet Akcay,
Mohammed Bennamoun,
Salman Khan,
Naoufel Werghi
Abstract:
Screening cluttered and occluded contraband items from baggage X-ray scans is a cumbersome task even for the expert security staff. This paper presents a novel strategy that extends a conventional encoder-decoder architecture to perform instance-aware segmentation and extract merged instances of contraband items without using any additional sub-network or an object detector. The encoder-decoder ne…
▽ More
Screening cluttered and occluded contraband items from baggage X-ray scans is a cumbersome task even for the expert security staff. This paper presents a novel strategy that extends a conventional encoder-decoder architecture to perform instance-aware segmentation and extract merged instances of contraband items without using any additional sub-network or an object detector. The encoder-decoder network first performs conventional semantic segmentation and retrieves cluttered baggage items. The model then incrementally evolves during training to recognize individual instances using significantly reduced training batches. To avoid catastrophic forgetting, a novel objective function minimizes the network loss in each iteration by retaining the previously acquired knowledge while learning new class representations and resolving their complex structural inter-dependencies through Bayesian inference. A thorough evaluation of our framework on two publicly available X-ray datasets shows that it outperforms state-of-the-art methods, especially within the challenging cluttered scenarios, while achieving an optimal trade-off between detection accuracy and efficiency.
△ Less
Submitted 10 January, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Temporal Fusion Based Mutli-scale Semantic Segmentation for Detecting Concealed Baggage Threats
Authors:
Muhammed Shafay,
Taimur Hassan,
Ernesto Damiani,
Naoufel Werghi
Abstract:
Detection of illegal and threatening items in baggage is one of the utmost security concern nowadays. Even for experienced security personnel, manual detection is a time-consuming and stressful task. Many academics have created automated frameworks for detecting suspicious and contraband data from X-ray scans of luggage. However, to our knowledge, no framework exists that utilizes temporal baggage…
▽ More
Detection of illegal and threatening items in baggage is one of the utmost security concern nowadays. Even for experienced security personnel, manual detection is a time-consuming and stressful task. Many academics have created automated frameworks for detecting suspicious and contraband data from X-ray scans of luggage. However, to our knowledge, no framework exists that utilizes temporal baggage X-ray imagery to effectively screen highly concealed and occluded objects which are barely visible even to the naked eye. To address this, we present a novel temporal fusion driven multi-scale residual fashioned encoder-decoder that takes series of consecutive scans as input and fuses them to generate distinct feature representations of the suspicious and non-suspicious baggage content, leading towards a more accurate extraction of the contraband data. The proposed methodology has been thoroughly tested using the publicly accessible GDXray dataset, which is the only dataset containing temporally linked grayscale X-ray scans showcasing extremely concealed contraband data. The proposed framework outperforms its competitors on the GDXray dataset on various metrics.
△ Less
Submitted 7 November, 2021; v1 submitted 4 November, 2021;
originally announced November 2021.
-
Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning
Authors:
Taimur Hassan,
Bilal Hassan,
Muhammad Usman Akram,
Shahrukh Hashmi,
Abdel Hakim Taguri,
Naoufel Werghi
Abstract:
Retinopathy represents a group of retinal diseases that, if not treated timely, can cause severe visual impairments or even blindness. Many researchers have developed autonomous systems to recognize retinopathy via fundus and optical coherence tomography (OCT) imagery. However, most of these frameworks employ conventional transfer learning and fine-tuning approaches, requiring a decent amount of w…
▽ More
Retinopathy represents a group of retinal diseases that, if not treated timely, can cause severe visual impairments or even blindness. Many researchers have developed autonomous systems to recognize retinopathy via fundus and optical coherence tomography (OCT) imagery. However, most of these frameworks employ conventional transfer learning and fine-tuning approaches, requiring a decent amount of well-annotated training data to produce accurate diagnostic performance. This paper presents a novel incremental cross-domain adaptation instrument that allows any deep classification model to progressively learn abnormal retinal pathologies in OCT and fundus imagery via few-shot training. Furthermore, unlike its competitors, the proposed instrument is driven via a Bayesian multi-objective function that not only enforces the candidate classification network to retain its prior learned knowledge during incremental training but also ensures that the network understands the structural and semantic relationships between previously learned pathologies and newly added disease categories to effectively recognize them at the inference stage. The proposed framework, evaluated on six public datasets acquired with three different scanners to screen thirteen retinal pathologies, outperforms the state-of-the-art competitors by achieving an overall accuracy and F1 score of 0.9826 and 0.9846, respectively.
△ Less
Submitted 4 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Automated segmentation and extraction of posterior eye segment using OCT scans
Authors:
Bilal Hassan,
Taimur Hassan,
Ramsha Ahmed,
Shiyin Qin,
Naoufel Werghi
Abstract:
This paper proposes an automated method for the segmentation and extraction of the posterior segment of the human eye, including the vitreous, retina, choroid, and sclera compartments, using multi-vendor optical coherence tomography (OCT) scans. The proposed method works in two phases. First extracts the retinal pigment epithelium (RPE) layer by applying the adaptive thresholding technique to iden…
▽ More
This paper proposes an automated method for the segmentation and extraction of the posterior segment of the human eye, including the vitreous, retina, choroid, and sclera compartments, using multi-vendor optical coherence tomography (OCT) scans. The proposed method works in two phases. First extracts the retinal pigment epithelium (RPE) layer by applying the adaptive thresholding technique to identify the retina-choroid junction. Then, it exploits the structure tensor guided approach to extract the inner limiting membrane (ILM) and the choroidal stroma (CS) layers, locating the vitreous-retina and choroid-sclera junctions in the candidate OCT scan. Furthermore, these three junction boundaries are utilized to conduct posterior eye compartmentalization effectively for both healthy and disease eye OCT scans. The proposed framework is evaluated over 1000 OCT scans, where it obtained the mean intersection over union (IoU) and mean Dice similarity coefficient (DSC) scores of 0.874 and 0.930, respectively.
△ Less
Submitted 18 October, 2021; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Tensor Pooling Driven Instance Segmentation Framework for Baggage Threat Recognition
Authors:
Taimur Hassan,
Samet Akcay,
Mohammed Bennamoun,
Salman Khan,
Naoufel Werghi
Abstract:
Automated systems designed for screening contraband items from the X-ray imagery are still facing difficulties with high clutter, concealment, and extreme occlusion. In this paper, we addressed this challenge using a novel multi-scale contour instance segmentation framework that effectively identifies the cluttered contraband data within the baggage X-ray scans. Unlike standard models that employ…
▽ More
Automated systems designed for screening contraband items from the X-ray imagery are still facing difficulties with high clutter, concealment, and extreme occlusion. In this paper, we addressed this challenge using a novel multi-scale contour instance segmentation framework that effectively identifies the cluttered contraband data within the baggage X-ray scans. Unlike standard models that employ region-based or keypoint-based techniques to generate multiple boxes around objects, we propose to derive proposals according to the hierarchy of the regions defined by the contours. The proposed framework is rigorously validated on three public datasets, dubbed GDXray, SIXray, and OPIXray, where it outperforms the state-of-the-art methods by achieving the mean average precision score of 0.9779, 0.9614, and 0.8396, respectively. Furthermore, to the best of our knowledge, this is the first contour instance segmentation framework that leverages multi-scale information to recognize cluttered and concealed contraband data from the colored and grayscale security X-ray imagery.
△ Less
Submitted 21 September, 2021; v1 submitted 21 August, 2021;
originally announced August 2021.
-
Unsupervised Anomaly Instance Segmentation for Baggage Threat Recognition
Authors:
Taimur Hassan,
Samet Akcay,
Mohammed Bennamoun,
Salman Khan,
Naoufel Werghi
Abstract:
Identifying potential threats concealed within the baggage is of prime concern for the security staff. Many researchers have developed frameworks that can detect baggage threats from X-ray scans. However, to the best of our knowledge, all of these frameworks require extensive training on large-scale and well-annotated datasets, which are hard to procure in the real world. This paper presents a nov…
▽ More
Identifying potential threats concealed within the baggage is of prime concern for the security staff. Many researchers have developed frameworks that can detect baggage threats from X-ray scans. However, to the best of our knowledge, all of these frameworks require extensive training on large-scale and well-annotated datasets, which are hard to procure in the real world. This paper presents a novel unsupervised anomaly instance segmentation framework that recognizes baggage threats, in X-ray scans, as anomalies without requiring any ground truth labels. Furthermore, thanks to its stylization capacity, the framework is trained only once, and at the inference stage, it detects and extracts contraband items regardless of their scanner specifications. Our one-staged approach initially learns to reconstruct normal baggage content via an encoder-decoder network utilizing a proposed stylization loss function. The model subsequently identifies the abnormal regions by analyzing the disparities within the original and the reconstructed scans. The anomalous regions are then clustered and post-processed to fit a bounding box for their localization. In addition, an optional classifier can also be appended with the proposed framework to recognize the categories of these extracted anomalies. A thorough evaluation of the proposed system on four public baggage X-ray datasets, without any re-training, demonstrates that it achieves competitive performance as compared to the conventional fully supervised methods (i.e., the mean average precision score of 0.7941 on SIXray, 0.8591 on GDXray, 0.7483 on OPIXray, and 0.5439 on COMPASS-XP dataset) while outperforming state-of-the-art semi-supervised and unsupervised baggage threat detection frameworks by 67.37%, 32.32%, 47.19%, and 45.81% in terms of F1 score across SIXray, GDXray, OPIXray, and COMPASS-XP datasets, respectively.
△ Less
Submitted 16 July, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Learning to Trust: Understanding Editorial Authority and Trust in Recommender Systems for Education
Authors:
Taha Hassan,
Bob Edmison,
Timothy Stelter,
D. Scott McCrickard
Abstract:
Trust in a recommendation system (RS) is often algorithmically incorporated using implicit or explicit feedback of user-perceived trustworthy social neighbors, and evaluated using user-reported trustworthiness of recommended items. However, real-life recommendation settings can feature group disparities in trust, power, and prerogatives. Our study examines a complementary view of trust which relie…
▽ More
Trust in a recommendation system (RS) is often algorithmically incorporated using implicit or explicit feedback of user-perceived trustworthy social neighbors, and evaluated using user-reported trustworthiness of recommended items. However, real-life recommendation settings can feature group disparities in trust, power, and prerogatives. Our study examines a complementary view of trust which relies on the editorial power relationships and attitudes of all stakeholders in the RS application domain. We devise a simple, first-principles metric of editorial authority, i.e., user preferences for recommendation sourcing, veto power, and incorporating user feedback, such that one RS user group confers trust upon another by ceding or assigning editorial authority. In a mixed-methods study at Virginia Tech, we surveyed faculty, teaching assistants, and students about their preferences of editorial authority, and hypothesis-tested its relationship with trust in algorithms for a hypothetical `Suggested Readings' RS. We discover that higher RS editorial authority assigned to students is linked to the relative trust the course staff allocates to RS algorithm and students. We also observe that course staff favors higher control for the RS algorithm in sourcing and updating the recommendations long-term. Using content analysis, we discuss frequent staff-recommended student editorial roles and highlight their frequent rationales, such as perceived expertise, scaling the learning environment, professional curriculum needs, and learner disengagement. We argue that our analyses highlight critical user preferences to help detect editorial power asymmetry and identify RS use-cases for supporting teaching and research
△ Less
Submitted 17 September, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
A Dilated Residual Hierarchically Fashioned Segmentation Framework for Extracting Gleason Tissues and Grading Prostate Cancer from Whole Slide Images
Authors:
Taimur Hassan,
Bilal Hassan,
Ayman El-Baz,
Naoufel Werghi
Abstract:
Prostate cancer (PCa) is the second deadliest form of cancer in males, and it can be clinically graded by examining the structural representations of Gleason tissues. This paper proposes \RV{a new method} for segmenting the Gleason tissues \RV{(patch-wise) in order to grade PCa from the whole slide images (WSI).} Also, the proposed approach encompasses two main contributions: 1) A synergy of hybri…
▽ More
Prostate cancer (PCa) is the second deadliest form of cancer in males, and it can be clinically graded by examining the structural representations of Gleason tissues. This paper proposes \RV{a new method} for segmenting the Gleason tissues \RV{(patch-wise) in order to grade PCa from the whole slide images (WSI).} Also, the proposed approach encompasses two main contributions: 1) A synergy of hybrid dilation factors and hierarchical decomposition of latent space representation for effective Gleason tissues extraction, and 2) A three-tiered loss function which can penalize different semantic segmentation models for accurately extracting the highly correlated patterns. In addition to this, the proposed framework has been extensively evaluated on a large-scale PCa dataset containing 10,516 whole slide scans (with around 71.7M patches), where it outperforms state-of-the-art schemes by 3.22% (in terms of mean intersection-over-union) for extracting the Gleason tissues and 6.91% (in terms of F1 score) for grading the progression of PCa.
△ Less
Submitted 25 July, 2021; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Clinically Verified Hybrid Deep Learning System for Retinal Ganglion Cells Aware Grading of Glaucomatous Progression
Authors:
Hina Raja,
Taimur Hassan,
Muhammad Usman Akram,
Naoufel Werghi
Abstract:
Objective: Glaucoma is the second leading cause of blindness worldwide. Glaucomatous progression can be easily monitored by analyzing the degeneration of retinal ganglion cells (RGCs). Many researchers have screened glaucoma by measuring cup-to-disc ratios from fundus and optical coherence tomography scans. However, this paper presents a novel strategy that pays attention to the RGC atrophy for sc…
▽ More
Objective: Glaucoma is the second leading cause of blindness worldwide. Glaucomatous progression can be easily monitored by analyzing the degeneration of retinal ganglion cells (RGCs). Many researchers have screened glaucoma by measuring cup-to-disc ratios from fundus and optical coherence tomography scans. However, this paper presents a novel strategy that pays attention to the RGC atrophy for screening glaucomatous pathologies and grading their severity. Methods: The proposed framework encompasses a hybrid convolutional network that extracts the retinal nerve fiber layer, ganglion cell with the inner plexiform layer and ganglion cell complex regions, allowing thus a quantitative screening of glaucomatous subjects. Furthermore, the severity of glaucoma in screened cases is objectively graded by analyzing the thickness of these regions. Results: The proposed framework is rigorously tested on publicly available Armed Forces Institute of Ophthalmology (AFIO) dataset, where it achieved the F1 score of 0.9577 for diagnosing glaucoma, a mean dice coefficient score of 0.8697 for extracting the RGC regions and an accuracy of 0.9117 for grading glaucomatous progression. Furthermore, the performance of the proposed framework is clinically verified with the markings of four expert ophthalmologists, achieving a statistically significant Pearson correlation coefficient of 0.9236. Conclusion: An automated assessment of RGC degeneration yields better glaucomatous screening and grading as compared to the state-of-the-art solutions. Significance: An RGC-aware system not only screens glaucoma but can also grade its severity and here we present an end-to-end solution that is thoroughly evaluated on a standardized dataset and is clinically validated for analyzing glaucomatous pathologies.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Trainable Structure Tensors for Autonomous Baggage Threat Detection Under Extreme Occlusion
Authors:
Taimur Hassan,
Samet Akcay,
Mohammed Bennamoun,
Salman Khan,
Naoufel Werghi
Abstract:
Detecting baggage threats is one of the most difficult tasks, even for expert officers. Many researchers have developed computer-aided screening systems to recognize these threats from the baggage X-ray scans. However, all of these frameworks are limited in identifying the contraband items under extreme occlusion. This paper presents a novel instance segmentation framework that utilizes trainable…
▽ More
Detecting baggage threats is one of the most difficult tasks, even for expert officers. Many researchers have developed computer-aided screening systems to recognize these threats from the baggage X-ray scans. However, all of these frameworks are limited in identifying the contraband items under extreme occlusion. This paper presents a novel instance segmentation framework that utilizes trainable structure tensors to highlight the contours of the occluded and cluttered contraband items (by scanning multiple predominant orientations), while simultaneously suppressing the irrelevant baggage content. The proposed framework has been extensively tested on four publicly available X-ray datasets where it outperforms the state-of-the-art frameworks in terms of mean average precision scores. Furthermore, to the best of our knowledge, it is the only framework that has been validated on combined grayscale and colored scans obtained from four different types of X-ray scanners.
△ Less
Submitted 5 October, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Exploiting the Transferability of Deep Learning Systems Across Multi-modal Retinal Scans for Extracting Retinopathy Lesions
Authors:
Taimur Hassan,
Muhammad Usman Akram,
Naoufel Werghi
Abstract:
Retinal lesions play a vital role in the accurate classification of retinal abnormalities. Many researchers have proposed deep lesion-aware screening systems that analyze and grade the progression of retinopathy. However, to the best of our knowledge, no literature exploits the tendency of these systems to generalize across multiple scanner specifications and multi-modal imagery. Towards this end,…
▽ More
Retinal lesions play a vital role in the accurate classification of retinal abnormalities. Many researchers have proposed deep lesion-aware screening systems that analyze and grade the progression of retinopathy. However, to the best of our knowledge, no literature exploits the tendency of these systems to generalize across multiple scanner specifications and multi-modal imagery. Towards this end, this paper presents a detailed evaluation of semantic segmentation, scene parsing and hybrid deep learning systems for extracting the retinal lesions such as intra-retinal fluid, sub-retinal fluid, hard exudates, drusen, and other chorioretinal anomalies from fused fundus and optical coherence tomography (OCT) imagery. Furthermore, we present a novel strategy exploiting the transferability of these models across multiple retinal scanner specifications. A total of 363 fundus and 173,915 OCT scans from seven publicly available datasets were used in this research (from which 297 fundus and 59,593 OCT scans were used for testing purposes). Overall, a hybrid retinal analysis and grading network (RAGNet), backboned through ResNet-50, stood first for extracting the retinal lesions, achieving a mean dice coefficient score of 0.822. Moreover, the complete source code and its documentation are released at: http://biomisa.org/index.php/downloads/.
△ Less
Submitted 14 August, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Cascaded Structure Tensor Framework for Robust Identification of Heavily Occluded Baggage Items from X-ray Scans
Authors:
Taimur Hassan,
Samet Akcay,
Mohammed Bennamoun,
Salman Khan,
Naoufel Werghi
Abstract:
In the last two decades, baggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is tedious, error-prone, and compromise privacy. Hence, many researchers have developed X-ray imagery-based autonomous systems to address these shortcomings. This paper presents a cascaded structure tensor framework that can automatically extract and reco…
▽ More
In the last two decades, baggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is tedious, error-prone, and compromise privacy. Hence, many researchers have developed X-ray imagery-based autonomous systems to address these shortcomings. This paper presents a cascaded structure tensor framework that can automatically extract and recognize suspicious items in heavily occluded and cluttered baggage. The proposed framework is unique, as it intelligently extracts each object by iteratively picking contour-based transitional information from different orientations and uses only a single feed-forward convolutional neural network for the recognition. The proposed framework has been rigorously evaluated using a total of 1,067,381 X-ray scans from publicly available GDXray and SIXray datasets where it outperformed the state-of-the-art solutions by achieving the mean average precision score of 0.9343 on GDXray and 0.9595 on SIXray for recognizing the highly cluttered and overlapping suspicious items. Furthermore, the proposed framework computationally achieves 4.76\% superior run-time performance as compared to the existing solutions based on publicly available object detectors
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
SIP-SegNet: A Deep Convolutional Encoder-Decoder Network for Joint Semantic Segmentation and Extraction of Sclera, Iris and Pupil based on Periocular Region Suppression
Authors:
Bilal Hassan,
Ramsha Ahmed,
Taimur Hassan,
Naoufel Werghi
Abstract:
The current developments in the field of machine vision have opened new vistas towards deploying multimodal biometric recognition systems in various real-world applications. These systems have the ability to deal with the limitations of unimodal biometric systems which are vulnerable to spoofing, noise, non-universality and intra-class variations. In addition, the ocular traits among various biome…
▽ More
The current developments in the field of machine vision have opened new vistas towards deploying multimodal biometric recognition systems in various real-world applications. These systems have the ability to deal with the limitations of unimodal biometric systems which are vulnerable to spoofing, noise, non-universality and intra-class variations. In addition, the ocular traits among various biometric traits are preferably used in these recognition systems. Such systems possess high distinctiveness, permanence, and performance while, technologies based on other biometric traits (fingerprints, voice etc.) can be easily compromised. This work presents a novel deep learning framework called SIP-SegNet, which performs the joint semantic segmentation of ocular traits (sclera, iris and pupil) in unconstrained scenarios with greater accuracy. The acquired images under these scenarios exhibit purkinje reflexes, specular reflections, eye gaze, off-angle shots, low resolution, and various occlusions particularly by eyelids and eyelashes. To address these issues, SIP-SegNet begins with denoising the pristine image using denoising convolutional neural network (DnCNN), followed by reflection removal and image enhancement based on contrast limited adaptive histogram equalization (CLAHE). Our proposed framework then extracts the periocular information using adaptive thresholding and employs the fuzzy filtering technique to suppress this information. Finally, the semantic segmentation of sclera, iris and pupil is achieved using the densely connected fully convolutional encoder-decoder network. We used five CASIA datasets to evaluate the performance of SIP-SegNet based on various evaluation metrics. The simulation results validate the optimal segmentation of the proposed SIP-SegNet, with the mean f1 scores of 93.35, 95.11 and 96.69 for the sclera, iris and pupil classes respectively.
△ Less
Submitted 15 February, 2020;
originally announced March 2020.
-
Cascaded Structure Tensor Framework for Robust Identification of Heavily Occluded Baggage Items from Multi-Vendor X-ray Scans
Authors:
Taimur Hassan,
Salman H. Khan,
Samet Akcay,
Mohammed Bennamoun,
Naoufel Werghi
Abstract:
In the last two decades, luggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is a cumbersome, subjective and inefficient process. Hence, many researchers have developed Xray imagery-based autonomous systems to address these shortcomings. However, to the best of our knowledge, there is no framework, up to now, that can recognize he…
▽ More
In the last two decades, luggage scanning has globally become one of the prime aviation security concerns. Manual screening of the baggage items is a cumbersome, subjective and inefficient process. Hence, many researchers have developed Xray imagery-based autonomous systems to address these shortcomings. However, to the best of our knowledge, there is no framework, up to now, that can recognize heavily occluded and cluttered baggage items from multi-vendor X-ray scans. This paper presents a cascaded structure tensor framework which can automatically extract and recognize suspicious items irrespective of their position and orientation in the multi-vendor X-ray scans. The proposed framework is unique, as it intelligently extracts each object by iteratively picking contour based transitional information from different orientations and uses only a single feedforward convolutional neural network for the recognition. The proposed framework has been rigorously tested on publicly available GDXray and SIXray datasets containing a total of 1,067,381 X-ray scans where it significantly outperformed the state-of-the-art solutions by achieving the mean average precision score of 0.9343 and 0.9595 for extracting and recognizing suspicious items from GDXray and SIXray scans, respectively. Furthermore, the proposed framework has achieved 15.78% better time
△ Less
Submitted 21 January, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
A CNN-based approach to classify cricket bowlers based on their bowling actions
Authors:
Md Nafee Al Islam,
Tanzil Bin Hassan,
Siamul Karim Khan
Abstract:
With the advances in hardware technologies and deep learning techniques, it has become feasible to apply these techniques in diverse fields. Convolutional Neural Network (CNN), an architecture from the field of deep learning, has revolutionized Computer Vision. Sports is one of the avenues in which the use of computer vision is thriving. Cricket is a complex game consisting of different types of s…
▽ More
With the advances in hardware technologies and deep learning techniques, it has become feasible to apply these techniques in diverse fields. Convolutional Neural Network (CNN), an architecture from the field of deep learning, has revolutionized Computer Vision. Sports is one of the avenues in which the use of computer vision is thriving. Cricket is a complex game consisting of different types of shots, bowling actions and many other activities. Every bowler, in a game of cricket, bowls with a different bowling action. We leverage this point to identify different bowlers. In this paper, we have proposed a CNN model to identify eighteen different cricket bowlers based on their bowling actions using transfer learning. Additionally, we have created a completely new dataset containing 8100 images of these eighteen bowlers to train the proposed framework and evaluate its performance. We have used the VGG16 model pre-trained with the ImageNet dataset and added a few layers on top of it to build our model. After trying out different strategies, we found that freezing the weights for the first 14 layers of the network and training the rest of the layers works best. Our approach achieves an overall average accuracy of 93.3% on the test set and converges to a very low cross-entropy loss.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Exploring the context of course rankings on online academic forums
Authors:
Taha Hassan,
Bob Edmison,
Larry Cox II,
Matthew Louvet,
Daron Williams
Abstract:
University students routinely use the tools provided by online course ranking forums to share and discuss their satisfaction with the quality of instruction and content in a wide variety of courses. Student perception of the efficacy of pedagogies employed in a course is a reflection of a multitude of decisions by professors, instructional designers and university administrators. This complexity h…
▽ More
University students routinely use the tools provided by online course ranking forums to share and discuss their satisfaction with the quality of instruction and content in a wide variety of courses. Student perception of the efficacy of pedagogies employed in a course is a reflection of a multitude of decisions by professors, instructional designers and university administrators. This complexity has motivated a large body of research on the utility, reliability, and behavioral correlates of course rankings. There is, however, little investigation of the (potential) implicit student bias on these forums towards desirable course outcomes at the institution level. To that end, we examine the connection between course outcomes (student-reported GPA) and the overall ranking of the primary course instructor, as well as rating disparity by nature of course outcomes, based on data from two popular academic rating forums. Our experiments with ranking data about over ten thousand courses taught at Virginia Tech and its 25 SCHEV-approved peer institutions indicate that there is a discernible albeit complex bias towards course outcomes in the professor ratings registered by students.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Find It: A Novel Way to Learn Through Play
Authors:
Md. Tashfiqul Bari,
Tanvir Hassan,
Raisa Tabassum,
Zubaida Ahmed,
Swakkhar Shatabda
Abstract:
Autism Spectrum Disorder (ASD) is the area where many researches enduring like Magnetic Resonance Imaging (MRI), called diffusion tensor imaging, Early Start Denver Model (ESDM) to provide an easier life for the people diagnosed. After years and years of combined funding sources from public and private funding, these researches show great promises in recent years. In this paper, we have tried to s…
▽ More
Autism Spectrum Disorder (ASD) is the area where many researches enduring like Magnetic Resonance Imaging (MRI), called diffusion tensor imaging, Early Start Denver Model (ESDM) to provide an easier life for the people diagnosed. After years and years of combined funding sources from public and private funding, these researches show great promises in recent years. In this paper, we have tried to show a way how children with Down Syndrome Autism can learn through game therapy. These game therapies have shown an immense number of improvements among those children to learn alphabets along with developing their motor skills and memory challenges.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
On bias in social reviews of university courses
Authors:
Taha Hassan
Abstract:
University course ranking forums are a popular means of disseminating information about satisfaction with the quality of course content and instruction, especially with undergraduate students. A variety of policy decisions by university administrators, instructional designers and teaching staff affect how students perceive the efficacy of pedagogies employed in a given course, in class and online.…
▽ More
University course ranking forums are a popular means of disseminating information about satisfaction with the quality of course content and instruction, especially with undergraduate students. A variety of policy decisions by university administrators, instructional designers and teaching staff affect how students perceive the efficacy of pedagogies employed in a given course, in class and online. While there is a large body of research on qualitative driving factors behind the use of academic rating sites, there is little investigation of the (potential) implicit student bias on said forums towards desirable course outcomes at the institution level. To that end, we examine the connection between course outcomes (student-reported GPA) and the overall ranking of the primary course instructor, as well as rating disparity by nature of course outcomes, for several hundred courses taught at Virginia Tech based on data collected from a popular academic rating forum. We also replicate our analysis for several public universities across the US. Our experiments indicate that there is a discernible albeit complex bias towards course outcomes in the professor ratings registered by students.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Trust and Trustworthiness in Social Recommender Systems
Authors:
Taha Hassan,
D. Scott McCrickard
Abstract:
The prevalence of misinformation on online social media has tangible empirical connections to increasing political polarization and partisan antipathy in the United States. Ranking algorithms for social recommendation often encode broad assumptions about network structure (like homophily) and group cognition (like, social action is largely imitative). Assumptions like these can be naïve and exclus…
▽ More
The prevalence of misinformation on online social media has tangible empirical connections to increasing political polarization and partisan antipathy in the United States. Ranking algorithms for social recommendation often encode broad assumptions about network structure (like homophily) and group cognition (like, social action is largely imitative). Assumptions like these can be naïve and exclusionary in the era of fake news and ideological uniformity towards the political poles. We examine these assumptions with aid from the user-centric framework of trustworthiness in social recommendation. The constituent dimensions of trustworthiness (diversity, transparency, explainability, disruption) highlight new opportunities for discouraging dogmatization and building decision-aware, transparent news recommender systems.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Unsupervised Domain Adaptation using Generative Models and Self-ensembling
Authors:
Eman T. Hassan,
Xin Chen,
David Crandall
Abstract:
Transferring knowledge across different datasets is an important approach to successfully train deep models with a small-scale target dataset or when few labeled instances are available. In this paper, we aim at developing a model that can generalize across multiple domain shifts, so that this model can adapt from a single source to multiple targets. This can be achieved by randomizing the generat…
▽ More
Transferring knowledge across different datasets is an important approach to successfully train deep models with a small-scale target dataset or when few labeled instances are available. In this paper, we aim at developing a model that can generalize across multiple domain shifts, so that this model can adapt from a single source to multiple targets. This can be achieved by randomizing the generation of the data of various styles to mitigate the domain mismatch. First, we present a new adaptation to the CycleGAN model to produce stochastic style transfer between two image batches of different domains. Second, we enhance the classifier performance by using a self-ensembling technique with a teacher and student model to train on both original and generated data. Finally, we present experimental results on three datasets Office-31, Office-Home, and Visual Domain adaptation. The results suggest that selfensembling is better than simple data augmentation with the newly generated data and a single model trained this way can have the best performance across all different transfer tasks.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
Toward Performance Optimization in IoT-based Next-Gen Wireless Sensor Networks
Authors:
Muzammil Behzad,
Manal Abdullah,
Muhammad Talal Hassan,
Yao Ge,
Mahmood Ashraf Khan
Abstract:
In this paper, we propose a novel framework for performance optimization in Internet of Things (IoT)-based next-generation wireless sensor networks. In particular, a computationally-convenient system is presented to combat two major research problems in sensor networks. First is the conventionally-tackled resource optimization problem which triggers the drainage of battery at a faster rate within…
▽ More
In this paper, we propose a novel framework for performance optimization in Internet of Things (IoT)-based next-generation wireless sensor networks. In particular, a computationally-convenient system is presented to combat two major research problems in sensor networks. First is the conventionally-tackled resource optimization problem which triggers the drainage of battery at a faster rate within a network. Such drainage promotes inefficient resource usage thereby causing sudden death of the network. The second main bottleneck for such networks is that of data degradation. This is because the nodes in such networks communicate via a wireless channel, where the inevitable presence of noise corrupts the data making it unsuitable for practical applications. Therefore, we present a layer-adaptive method via 3-tier communication mechanism to ensure the efficient use of resources. This is supported with a mathematical coverage model that deals with the formation of coverage holes. We also present a transform-domain based robust algorithm to effectively remove the unwanted components from the data. Our proposed framework offers a handy algorithm that enjoys desirable complexity for real-time applications as shown by the extensive simulation results.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
Layer-Adaptive Communication and Collaborative Transformed-Domain Representations for Performance Optimization in WSNs
Authors:
Muzammil Behzad,
Manal Abdullah,
Muhammad Talal Hassan,
Yao Ge,
Mahmood Ashraf Khan
Abstract:
In this paper, we combat the problem of performance optimization in wireless sensor networks. Specifically, a novel framework is proposed to handle two major research issues. Firstly, we optimize the utilization of resources available to various nodes at hand. This is achieved via proposed optimal network clustering enriched with layer-adaptive 3-tier communication mechanism to diminish energy hol…
▽ More
In this paper, we combat the problem of performance optimization in wireless sensor networks. Specifically, a novel framework is proposed to handle two major research issues. Firstly, we optimize the utilization of resources available to various nodes at hand. This is achieved via proposed optimal network clustering enriched with layer-adaptive 3-tier communication mechanism to diminish energy holes. We also introduce a mathematical coverage model that helps us minimize the number of coverage holes. Secondly, we present a novel approach to recover the corrupted version of the data received over noisy wireless channels. A robust sparse-domain based recovery method equipped with specially developed averaging filter is used to take care of the unwanted noisy components added to the data samples. Our proposed framework provides a handy routing protocol that enjoys improved computation complexity and elongated network lifetime as demonstrated with the help of extensive simulation results.
△ Less
Submitted 12 December, 2017;
originally announced December 2017.
-
A Study of Cross-domain Generative Models applied to Cartoon Series
Authors:
Eman T. Hassan,
David J. Crandall
Abstract:
We investigate Generative Adversarial Networks (GANs) to model one particular kind of image: frames from TV cartoons. Cartoons are particularly interesting because their visual appearance emphasizes the important semantic information about a scene while abstracting out the less important details, but each cartoon series has a distinctive artistic style that performs this abstraction in different w…
▽ More
We investigate Generative Adversarial Networks (GANs) to model one particular kind of image: frames from TV cartoons. Cartoons are particularly interesting because their visual appearance emphasizes the important semantic information about a scene while abstracting out the less important details, but each cartoon series has a distinctive artistic style that performs this abstraction in different ways. We consider a dataset consisting of images from two popular television cartoon series, Family Guy and The Simpsons. We examine the ability of GANs to generate images from each of these two domains, when trained independently as well as on both domains jointly. We find that generative models may be capable of finding semantic-level correspondences between these two image domains despite the unsupervised setting, even when the training data does not give labeled alignments between them.
△ Less
Submitted 29 September, 2017;
originally announced October 2017.
-
Performance Characterization of a Real-Time Massive MIMO System with LOS Mobile Channels
Authors:
Paul Harris,
Steffen Malkowsky,
Joao Vieira,
Fredrik Tufvesson Wael Boukley Hassan,
Liang Liu,
Mark Beach,
Simon Armour,
Ove Edfors
Abstract:
The first measured results for massive MIMO performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served in real-time using a 100-antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of outdated channel in…
▽ More
The first measured results for massive MIMO performance in a line-of-sight (LOS) scenario with moderate mobility are presented, with 8 users served in real-time using a 100-antenna base Station (BS) at 3.7 GHz. When such a large number of channels dynamically change, the inherent propagation and processing delay has a critical relationship with the rate of change, as the use of outdated channel information can result in severe detection and precoding inaccuracies. For the downlink (DL) in particular, a time division duplex (TDD) configuration synonymous with massive multiple-input, multiple-output (MIMO) deployments could mean only the uplink (UL) is usable in extreme cases. Therefore, it is of great interest to investigate the impact of mobility on massive MIMO performance and consider ways to combat the potential limitations. In a mobile scenario with moving cars and pedestrians, the massive MIMO channel is sampled across many points in space to build a picture of the overall user orthogonality, and the impact of both azimuth and elevation array configurations are considered. Temporal analysis is also conducted for vehicles moving up to 29km/h and real-time bit error rates (BERs) for both the UL and DL without power control are presented. For a 100-antenna system, it is found that the channel state information (CSI) update rate requirement may increase by 7 times when compared to an 8-antenna system, whilst the power control update rate could be decreased by at least 5 times relative to a single antenna system.
△ Less
Submitted 19 May, 2017; v1 submitted 30 January, 2017;
originally announced January 2017.
-
Performance Evaluation Of Qos In Wimax Network
Authors:
Ahmed Hassan M. Hassan,
Elrasheed Ismail M. Zayid,
Mohammed Altayeb Awad,
Ahmed Salah Mohammed,
Samreen Tarig Hassan
Abstract:
OPNET Modeler is used to simulate the architecture and to calculate the performance criteria (i.e. throughput, delay and data dropped) that slightly concerned in network estimation. It is concluded that our models shorten the time quite a bit for obtaining the performance measures of an end-to-end delay as well as throughput can be used as an effective tool for this purpose.
OPNET Modeler is used to simulate the architecture and to calculate the performance criteria (i.e. throughput, delay and data dropped) that slightly concerned in network estimation. It is concluded that our models shorten the time quite a bit for obtaining the performance measures of an end-to-end delay as well as throughput can be used as an effective tool for this purpose.
△ Less
Submitted 16 June, 2015;
originally announced June 2015.
-
Semantic HMC for Big Data Analysis
Authors:
Thomas Hassan,
Rafael Peixoto,
Christophe Cruz,
Aurlie Bertaux,
Nuno Silva
Abstract:
Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning.
Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
An Empirical Investigation of V-I Trajectory based Load Signatures for Non-Intrusive Load Monitoring
Authors:
Taha Hassan,
Fahad Javed,
Naveed Arshad
Abstract:
Choice of load signature or feature space is one of the most fundamental design choices for non-intrusive load monitoring or energy disaggregation problem. Electrical power quantities, harmonic load characteristics, canonical transient and steady-state waveforms are some of the typical choices of load signature or load signature basis for current research addressing appliance classification and pr…
▽ More
Choice of load signature or feature space is one of the most fundamental design choices for non-intrusive load monitoring or energy disaggregation problem. Electrical power quantities, harmonic load characteristics, canonical transient and steady-state waveforms are some of the typical choices of load signature or load signature basis for current research addressing appliance classification and prediction. This paper expands and evaluates appliance load signatures based on V-I trajectory - the mutual locus of instantaneous voltage and current waveforms - for precision and robustness of prediction in classification algorithms used to disaggregate residential overall energy use and predict constituent appliance profiles. We also demonstrate the use of variants of differential evolution as a novel strategy for selection of optimal load models in context of energy disaggregation. A publicly available benchmark dataset REDD is employed for evaluation purposes. Our experimental evaluations indicate that these load signatures, in conjunction with a number of popular classification algorithms, offer better or generally comparable overall precision of prediction, robustness and reliability against dynamic, noisy and highly similar load signatures with reference to electrical power quantities and harmonic content. Herein, wave-shape features are found to be an effective new basis of classification and prediction for semi-automated energy disaggregation and monitoring.
△ Less
Submitted 2 May, 2013;
originally announced May 2013.