-
Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?
Authors:
Abu Sufian,
Marco Leo,
Cosimo Distante,
Anirudha Ghosh,
Debaditya Barman
Abstract:
Biometric face authentication is crucial in computer vision, but ensuring fairness and generalization across demographic groups remains a big challenge. Therefore, we investigated whether Vision Transformer (ViT) and ResNet, leveraging pre-trained global features, can fairly authenticate different demographic faces while relying minimally on local features. In this investigation, we used three pre…
▽ More
Biometric face authentication is crucial in computer vision, but ensuring fairness and generalization across demographic groups remains a big challenge. Therefore, we investigated whether Vision Transformer (ViT) and ResNet, leveraging pre-trained global features, can fairly authenticate different demographic faces while relying minimally on local features. In this investigation, we used three pre-trained state-of-the-art (SOTA) ViT foundation models from Facebook, Google, and Microsoft for global features as well as ResNet-18. We concatenated the features from ViT and ResNet, passed them through two fully connected layers, and trained on customized face image datasets to capture the local features. Then, we designed a novel few-shot prototype network with backbone features embedding. We also developed new demographic face image support and query datasets for this empirical study. The network's testing was conducted on this dataset in one-shot, three-shot, and five-shot scenarios to assess how performance improves as the size of the support set increases. We observed results across datasets with varying races/ethnicities, genders, and age groups. The Microsoft Swin Transformer backbone performed better among the three SOTA ViT for this task. The code and data are available at: https://github.com/Sufianlab/FairVitBio.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Advancing Wheat Crop Analysis: A Survey of Deep Learning Approaches Using Hyperspectral Imaging
Authors:
Fadi Abdeladhim Zidi,
Abdelkrim Ouafi,
Fares Bougourzi,
Cosimo Distante,
Abdelmalik Taleb-Ahmed
Abstract:
As one of the most widely cultivated and consumed crops, wheat is essential to global food security. However, wheat production is increasingly challenged by pests, diseases, climate change, and water scarcity, threatening yields. Traditional crop monitoring methods are labor-intensive and often ineffective for early issue detection. Hyperspectral imaging (HSI) has emerged as a non-destructive and…
▽ More
As one of the most widely cultivated and consumed crops, wheat is essential to global food security. However, wheat production is increasingly challenged by pests, diseases, climate change, and water scarcity, threatening yields. Traditional crop monitoring methods are labor-intensive and often ineffective for early issue detection. Hyperspectral imaging (HSI) has emerged as a non-destructive and efficient technology for remote crop health assessment. However, the high dimensionality of HSI data and limited availability of labeled samples present notable challenges. In recent years, deep learning has shown great promise in addressing these challenges due to its ability to extract and analysis complex structures. Despite advancements in applying deep learning methods to HSI data for wheat crop analysis, no comprehensive survey currently exists in this field. This review addresses this gap by summarizing benchmark datasets, tracking advancements in deep learning methods, and analyzing key applications such as variety classification, disease detection, and yield estimation. It also highlights the strengths, limitations, and future opportunities in leveraging deep learning methods for HSI-based wheat crop analysis. We have listed the current state-of-the-art papers and will continue tracking updating them in the following https://github.com/fadi-07/Awesome-Wheat-HSI-DeepLearning.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Mamba Adaptive Anomaly Transformer with association discrepancy for time series
Authors:
Abdellah Zakaria Sellam,
Ilyes Benaissa,
Abdelmalik Taleb-Ahmed,
Luigi Patrono,
Cosimo Distante
Abstract:
Anomaly detection in time series is essential for industrial monitoring and environmental sensing, yet distinguishing anomalies from complex patterns remains challenging. Existing methods like the Anomaly Transformer and DCdetector have progressed, but they face limitations such as sensitivity to short-term contexts and inefficiency in noisy, non-stationary environments.
To overcome these issues…
▽ More
Anomaly detection in time series is essential for industrial monitoring and environmental sensing, yet distinguishing anomalies from complex patterns remains challenging. Existing methods like the Anomaly Transformer and DCdetector have progressed, but they face limitations such as sensitivity to short-term contexts and inefficiency in noisy, non-stationary environments.
To overcome these issues, we introduce MAAT, an improved architecture that enhances association discrepancy modeling and reconstruction quality. MAAT features Sparse Attention, efficiently capturing long-range dependencies by focusing on relevant time steps, thereby reducing computational redundancy. Additionally, a Mamba-Selective State Space Model is incorporated into the reconstruction module, utilizing a skip connection and Gated Attention to improve anomaly localization and detection performance.
Extensive experiments show that MAAT significantly outperforms previous methods, achieving better anomaly distinguishability and generalization across various time series applications, setting a new standard for unsupervised time series anomaly detection in real-world scenarios.
△ Less
Submitted 17 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach
Authors:
Mohamed Fadhlallah Guerri,
Cosimo Distante,
Paolo Spagnolo,
Fares Bougourzi,
Abdelmalik Taleb-Ahmed
Abstract:
During the process of classifying Hyperspectral Image (HSI), every pixel sample is categorized under a land-cover type. CNN-based techniques for HSI classification have notably advanced the field by their adept feature representation capabilities. However, acquiring deep features remains a challenge for these CNN-based methods. In contrast, transformer models are adept at extracting high-level sem…
▽ More
During the process of classifying Hyperspectral Image (HSI), every pixel sample is categorized under a land-cover type. CNN-based techniques for HSI classification have notably advanced the field by their adept feature representation capabilities. However, acquiring deep features remains a challenge for these CNN-based methods. In contrast, transformer models are adept at extracting high-level semantic features, offering a complementary strength. This paper's main contribution is the introduction of an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block. This model leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling. The GSF block is designed to strengthen the extraction of local and global spatial-spectral features. An effective attention mechanism module is also proposed to enhance the extraction of information from HSI cubes. The proposed method is evaluated on four well-known datasets (the Indian Pines, Pavia University, WHU-WHU-Hi-LongKou and WHU-Hi-HanChuan), demonstrating that the proposed framework achieves superior results compared to other models.
△ Less
Submitted 2 October, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Boosting House Price Estimations with Multi-Head Gated Attention
Authors:
Zakaria Abdellah Sellam,
Cosimo Distante,
Abdelmalik Taleb-Ahmed,
Pier Luigi Mazzeo
Abstract:
Evaluating house prices is crucial for various stakeholders, including homeowners, investors, and policymakers. However, traditional spatial interpolation methods have limitations in capturing the complex spatial relationships that affect property values. To address these challenges, we have developed a new method called Multi-Head Gated Attention for spatial interpolation. Our approach builds upo…
▽ More
Evaluating house prices is crucial for various stakeholders, including homeowners, investors, and policymakers. However, traditional spatial interpolation methods have limitations in capturing the complex spatial relationships that affect property values. To address these challenges, we have developed a new method called Multi-Head Gated Attention for spatial interpolation. Our approach builds upon attention-based interpolation models and incorporates multiple attention heads and gating mechanisms to capture spatial dependencies and contextual information better. Importantly, our model produces embeddings that reduce the dimensionality of the data, enabling simpler models like linear regression to outperform complex ensembling models. We conducted extensive experiments to compare our model with baseline methods and the original attention-based interpolation model. The results show a significant improvement in the accuracy of house price predictions, validating the effectiveness of our approach. This research advances the field of spatial interpolation and provides a robust tool for more precise house price evaluation. Our GitHub repository.contains the data and code for all datasets, which are available for researchers and practitioners interested in replicating or building upon our work.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
D-TrAttUnet: Toward Hybrid CNN-Transformer Architecture for Generic and Subtle Segmentation in Medical Images
Authors:
Fares Bougourzi,
Fadi Dornaika,
Cosimo Distante,
Abdelmalik Taleb-Ahmed
Abstract:
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A…
▽ More
Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses.
To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Deep Learning Techniques for Hyperspectral Image Analysis in Agriculture: A Review
Authors:
Mohamed Fadhlallah Guerri,
Cosimo Distante,
Paolo Spagnolo,
Fares Bougourzi,
Abdelmalik Taleb-Ahmed
Abstract:
In the recent years, hyperspectral imaging (HSI) has gained considerably popularity among computer vision researchers for its potential in solving remote sensing problems, especially in agriculture field. However, HSI classification is a complex task due to the high redundancy of spectral bands, limited training samples, and non-linear relationship between spatial position and spectral bands. Fort…
▽ More
In the recent years, hyperspectral imaging (HSI) has gained considerably popularity among computer vision researchers for its potential in solving remote sensing problems, especially in agriculture field. However, HSI classification is a complex task due to the high redundancy of spectral bands, limited training samples, and non-linear relationship between spatial position and spectral bands. Fortunately, deep learning techniques have shown promising results in HSI analysis. This literature review explores recent applications of deep learning approaches such as Autoencoders, Convolutional Neural Networks (1D, 2D, and 3D), Recurrent Neural Networks, Deep Belief Networks, and Generative Adversarial Networks in agriculture. The performance of these approaches has been evaluated and discussed on well-known land cover datasets including Indian Pines, Salinas Valley, and Pavia University.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
D-TrAttUnet: Dual-Decoder Transformer-Based Attention Unet Architecture for Binary and Multi-classes Covid-19 Infection Segmentation
Authors:
Fares Bougourzi,
Cosimo Distante,
Fadi Dornaika,
Abdelmalik Taleb-Ahmed
Abstract:
In the last three years, the world has been facing a global crisis caused by Covid-19 pandemic. Medical imaging has been playing a crucial role in the fighting against this disease and saving the human lives. Indeed, CT-scans has proved their efficiency in diagnosing, detecting, and following-up the Covid-19 infection. In this paper, we propose a new Transformer-CNN based approach for Covid-19 inf…
▽ More
In the last three years, the world has been facing a global crisis caused by Covid-19 pandemic. Medical imaging has been playing a crucial role in the fighting against this disease and saving the human lives. Indeed, CT-scans has proved their efficiency in diagnosing, detecting, and following-up the Covid-19 infection. In this paper, we propose a new Transformer-CNN based approach for Covid-19 infection segmentation from the CT slices. The proposed D-TrAttUnet architecture has an Encoder-Decoder structure, where compound Transformer-CNN encoder and Dual-Decoders are proposed. The Transformer-CNN encoder is built using Transformer layers, UpResBlocks, ResBlocks and max-pooling layers. The Dual-Decoder consists of two identical CNN decoders with attention gates. The two decoders are used to segment the infection and the lung regions simultaneously and the losses of the two tasks are joined. The proposed D-TrAttUnet architecture is evaluated for both Binary and Multi-classes Covid-19 infection segmentation. The experimental results prove the efficiency of the proposed approach to deal with the complexity of Covid-19 segmentation task from limited data. Furthermore, D-TrAttUnet architecture outperforms three baseline CNN segmentation architectures (Unet, AttUnet and Unet++) and three state-of-the-art architectures (AnamNet, SCOATNet and CopleNet), in both Binary and Mutli-classes segmentation tasks.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
2D and 3D CNN-Based Fusion Approach for COVID-19 Severity Prediction from 3D CT-Scans
Authors:
Fares Bougourzi,
Fadi Dornaika,
Amir Nakib,
Cosimo Distante,
Abdelmalik Taleb-Ahmed
Abstract:
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 3nd COV19D competition for Covid-19 Severity Prediction. In order to deal with the big gap…
▽ More
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 3nd COV19D competition for Covid-19 Severity Prediction. In order to deal with the big gap between the validation and test results that were shown in the previous version of this competition, we proposed to combine the prediction of 2D and 3D CNN predictions. For the 2D CNN approach, we propose 2B-InceptResnet architecture which consists of two paths for segmented lungs and infection of all slices of the input CT-scan, respectively. Each path consists of ConvLayer and Inception-ResNet pretrained model on ImageNet. For the 3D CNN approach, we propose hybrid-DeCoVNet architecture which consists of four blocks: Stem, four 3D-ResNet layers, Classification Head and Decision layer. Our proposed approaches outperformed the baseline approach in the validation data of the 3nd COV19D competition for Covid-19 Severity Prediction by 36%.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Ensemble CNN models for Covid-19 Recognition and Severity Perdition From 3D CT-scan
Authors:
Fares Bougourzi,
Cosimo Distante,
Fadi Dornaika,
Abdelmalik Taleb-Ahmed
Abstract:
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 2nd COV19D competition, where two challenges are set: Covid-19 Detection and Covid-19 Sever…
▽ More
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 2nd COV19D competition, where two challenges are set: Covid-19 Detection and Covid-19 Severity Detection from the CT-scans. For Covid-19 detection from CT-scans, we proposed an ensemble of 2D Convolution blocks with Densenet-161 models. Here, each 2D convolutional block with Densenet-161 architecture is trained separately and in testing phase, the ensemble model is based on the average of their probabilities. On the other hand, we proposed an ensemble of Convolutional Layers with Inception models for Covid-19 severity detection. In addition to the Convolutional Layers, three Inception variants were used, namely Inception-v3, Inception-v4 and Inception-Resnet. Our proposed approaches outperformed the baseline approach in the validation data of the 2nd COV19D competition by 11% and 16% for Covid-19 detection and Covid-19 severity detection, respectively.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
A Dense CNN approach for skin lesion classification
Authors:
Pierluigi Carcagnì,
Andrea Cuna,
Cosimo Distante
Abstract:
This article presents a Deep CNN, based on the DenseNet architecture jointly with a highly discriminating learning methodology, in order to classify seven kinds of skin lesions: Melanoma, Melanocytic nevus, Basal cell carcinoma, Actinic keratosis / Bowen's disease, Benign keratosis, Dermatofibroma, Vascular lesion. In particular a 61 layers DenseNet, pre-trained on IMAGENET dataset, has been fine-…
▽ More
This article presents a Deep CNN, based on the DenseNet architecture jointly with a highly discriminating learning methodology, in order to classify seven kinds of skin lesions: Melanoma, Melanocytic nevus, Basal cell carcinoma, Actinic keratosis / Bowen's disease, Benign keratosis, Dermatofibroma, Vascular lesion. In particular a 61 layers DenseNet, pre-trained on IMAGENET dataset, has been fine-tuned on ISIC 2018 Task 3 Challenge Dataset exploiting a Center Loss function.
△ Less
Submitted 26 July, 2018; v1 submitted 17 July, 2018;
originally announced July 2018.