Skip to main content

Showing 1–50 of 140 results for author: Ayed, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.17503  [pdf, ps, other

    cs.CV

    Trustworthy Few-Shot Transfer of Medical VLMs through Split Conformal Prediction

    Authors: Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz

    Abstract: Medical vision-language models (VLMs) have demonstrated unprecedented transfer capabilities and are being increasingly adopted for data-efficient image classification. Despite its growing popularity, its reliability aspect remains largely unexplored. This work explores the split conformal prediction (SCP) framework to provide trustworthiness guarantees when transferring such models based on a smal… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025. Code: https://github.com/jusiro/SCA-T

  2. arXiv:2506.17500  [pdf, ps, other

    cs.CV

    Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation

    Authors: Julio Silva-Rodríguez, Fereshteh Shakeri, Houda Bahig, Jose Dolz, Ismail Ben Ayed

    Abstract: Vision-language models (VLMs) are gaining attention in medical image analysis. These are pre-trained on large, heterogeneous data sources, yielding rich and transferable representations. Notably, the combination of modality-specialized VLMs with few-shot adaptation has provided fruitful results, enabling the efficient deployment of high-performing solutions. However, previous works on this topic m… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025. Code: https://github.com/jusiro/SS-Text

  3. arXiv:2506.06076  [pdf, ps, other

    cs.CV

    Full Conformal Adaptation of Medical Vision-Language Models

    Authors: Julio Silva-Rodríguez, Leo Fillioux, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Ismail Ben Ayed, Jose Dolz

    Abstract: Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, w… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: IPMI 2025. Code: https://github.com/jusiro/FCA

  4. arXiv:2506.04005  [pdf, ps, other

    cs.CV

    Vocabulary-free few-shot learning for Vision-Language Models

    Authors: Maxime Zanella, Clément Fuchs, Ismail Ben Ayed, Christophe De Vleeschouwer

    Abstract: Recent advances in few-shot adaptation for Vision-Language Models (VLMs) have greatly expanded their ability to generalize across tasks using only a few labeled examples. However, existing approaches primarily build upon the strong zero-shot priors of these models by leveraging carefully designed, task-specific prompts. This dependence on predefined class names can restrict their applicability, es… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR Workshops 2025

  5. arXiv:2505.24693  [pdf, ps, other

    cs.CV

    Conformal Prediction for Zero-Shot Models

    Authors: Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz

    Abstract: Vision-language models pre-trained at large scale have shown unprecedented adaptability and generalization to downstream tasks. Although its discriminative potential has been widely explored, its reliability and uncertainty are still overlooked. In this work, we investigate the capabilities of CLIP models under the split conformal prediction paradigm, which provides theoretical guarantees to black… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Code: https://github.com/jusiro/CLIP-Conformal

  6. arXiv:2505.22684  [pdf, ps, other

    cs.SI cs.LG

    Recovering Fairness Directly from Modularity: a New Way for Fair Community Partitioning

    Authors: Yufeng Wang, Yiguang Bai, Tianqing Zhu, Ismail Ben Ayed, Jing Yuan

    Abstract: Community partitioning is crucial in network analysis, with modularity optimization being the prevailing technique. However, traditional modularity-based methods often overlook fairness, a critical aspect in real-world applications. To address this, we introduce protected group networks and propose a novel fairness-modularity metric. This metric extends traditional modularity by explicitly incorpo… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 17pages, 5 figures

  7. arXiv:2505.21844  [pdf, ps, other

    cs.CV

    Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

    Authors: Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Recently, test-time adaptation has attracted wide interest in the context of vision-language models for image classification. However, to the best of our knowledge, the problem is completely overlooked in dense prediction tasks such as Open-Vocabulary Semantic Segmentation (OVSS). In response, we propose a novel TTA method tailored to adapting VLMs for segmentation during test time. Unlike TTA met… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.19546  [pdf, ps, other

    cs.CV

    SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds

    Authors: Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Mehrdad Noori, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Test-Time Training (TTT) has emerged as a promising solution to address distribution shifts in 3D point cloud classification. However, existing methods often rely on computationally expensive backpropagation during adaptation, limiting their applicability in real-world, time-sensitive scenarios. In this paper, we introduce SMART-PC, a skeleton-based framework that enhances resilience to corruption… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  9. arXiv:2504.12436  [pdf, other

    cs.CV cs.AI

    Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation

    Authors: Nairouz Mrabah, Nicolas Richet, Ismail Ben Ayed, Éric Granger

    Abstract: Adapting Vision-Language Models (VLMs) to new domains with few labeled samples remains a significant challenge due to severe overfitting and computational constraints. State-of-the-art solutions, such as low-rank reparameterization, mitigate these issues but often struggle with generalization and require extensive hyperparameter tuning. In this paper, a novel Sparse Optimization (SO) framework is… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Under review

    ACM Class: I.4.8; I.5.1; G.1.6

  10. arXiv:2504.06330  [pdf, other

    cs.CV cs.AI

    Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

    Authors: Hicham Talaoubrid, Anissa Mokraoui, Ismail Ben Ayed, Axel Prouvost, Sonimith Hang, Monit Korn, Rémi Harvey

    Abstract: This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results s… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  11. arXiv:2504.05227  [pdf, other

    cs.CV

    A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?

    Authors: Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed

    Abstract: Vision-language pre-training has recently gained popularity as it allows learning rich feature representations using large-scale data sources. This paradigm has quickly made its way into the medical image analysis community. In particular, there is an impressive amount of recent literature developing vision-language models for radiology. However, the available medical datasets with image-text supe… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: IPMI 2025. Code and weights: https://github.com/jusiro/DLILP

  12. arXiv:2503.04953  [pdf, other

    cs.CV

    Spectral Informed Mamba for Robust Point Cloud Processing

    Authors: Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, David Osowiechi, Gustavo Adolfo Vargas Hakim, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

    Abstract: State space models have shown significant promise in Natural Language Processing (NLP) and, more recently, computer vision. This paper introduces a new methodology leveraging Mamba and Masked Autoencoder networks for point cloud data in both supervised and self-supervised learning. We propose three key contributions to enhance Mamba's capability in processing complex point cloud structures. First,… ▽ More

    Submitted 25 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  13. arXiv:2501.07306  [pdf, other

    cs.LG math.OC

    Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

    Authors: Ségolène Martin, Jean-Christophe Pesquet, Gabriele Steidl, Ismail Ben Ayed

    Abstract: We propose a novel Bregman descent algorithm for minimizing a convex function that is expressed as the sum of a differentiable part (defined over an open set) and a possibly nonsmooth term. The approach, referred to as the Variable Bregman Majorization-Minimization (VBMM) algorithm, extends the Bregman Proximal Gradient method by allowing the Bregman function used in the divergence to adaptively v… ▽ More

    Submitted 5 February, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  14. arXiv:2501.03729  [pdf, other

    cs.CV

    Realistic Test-Time Adaptation of Vision-Language Models

    Authors: Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, Ismail Ben Ayed

    Abstract: The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution, such as the presence of all classes. Our work challenges these favorable deployment scenarios, and introduces a more realistic evaluation framework,… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  15. arXiv:2412.16739  [pdf, other

    cs.CV

    UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning

    Authors: Long Zhou, Fereshteh Shakeri, Aymen Sadraoui, Mounir Kaaniche, Jean-Christophe Pesquet, Ismail Ben Ayed

    Abstract: Transductive few-shot learning has recently triggered wide attention in computer vision. Yet, current methods introduce key hyper-parameters, which control the prediction statistics of the test batches, such as the level of class balance, affecting performances significantly. Such hyper-parameters are empirically grid-searched over validation data, and their configurations may vary substantially w… ▽ More

    Submitted 11 April, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted by CVPR2025

  16. arXiv:2412.06082  [pdf, other

    cs.CV

    Are foundation models for computer vision good conformal predictors?

    Authors: Leo Fillioux, Julio Silva-Rodríguez, Ismail Ben Ayed, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Jose Dolz

    Abstract: Recent advances in self-supervision and contrastive learning have brought the performance of foundation models to unprecedented levels in a variety of tasks. Fueled by this progress, these models are becoming the prevailing approach for a wide array of real-world vision problems, including risk-sensitive and high-stakes applications. However, ensuring safe deployment in these scenarios requires a… ▽ More

    Submitted 11 March, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

  17. arXiv:2411.17002  [pdf, other

    cs.CV

    Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation

    Authors: Shambhavi Mishra, Julio Silva-Rodrıguez, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

    Abstract: Vision-language foundation models, such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we explore how to efficiently leverage class text information to mitigate these distribution drifts encountered by large pre-trained visio… ▽ More

    Submitted 18 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Added additional figures to communicate the algorithm

  18. arXiv:2411.01116  [pdf, other

    cs.CV

    Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging

    Authors: Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, David Osowiech, Farzad Beizaee, Gustavo adolfo. vargas-hakim, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Test-Time Adaptation (TTA) addresses distribution shifts during testing by adapting a pretrained model without access to source data. In this work, we propose a novel TTA approach for 3D point cloud classification, combining sampling variation with weight averaging. Our method leverages Farthest Point Sampling (FPS) and K-Nearest Neighbors (KNN) to create multiple point cloud representations, adap… ▽ More

    Submitted 29 December, 2024; v1 submitted 1 November, 2024; originally announced November 2024.

  19. arXiv:2409.03868  [pdf, other

    cs.CV

    Few-shot Adaptation of Medical Vision-Language Models

    Authors: Fereshteh Shakeri, Yunshi Huang, Julio Silva-Rodríguez, Houda Bahig, An Tang, Jose Dolz, Ismail Ben Ayed

    Abstract: Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing medical foundation models and their zero-shot transfer to downstream tasks, the popular few-shot setting remains relatively unexplored. Following on from the cur… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: MICCAI 2024 (Spotlight) - Code is available at https://github.com/FereshteShakeri/few-shot-MedVLMs.git

  20. arXiv:2409.01883  [pdf, other

    cs.CV

    Boosting Vision-Language Models for Histopathology Classification: Predict all at once

    Authors: Maxime Zanella, Fereshteh Shakeri, Yunshi Huang, Houda Bahig, Ismail Ben Ayed

    Abstract: The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive classification, i.e., prediction for each patch is made independently of the other patches in the target test data. We extend the capability of these large models by… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  21. arXiv:2409.00698  [pdf, other

    cs.CV

    Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification

    Authors: Karim El Khoury, Maxime Zanella, Benoît Gérin, Tiffanie Godelaine, Benoît Macq, Saïd Mahmoudi, Christophe De Vleeschouwer, Ismail Ben Ayed

    Abstract: Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and making independent predictions, i.e., inductive inference, thereby limiting their effectiveness by ignoring valuable contextual information. Our approach tackles t… ▽ More

    Submitted 7 January, 2025; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted at ICASSP 2025

  22. arXiv:2407.13588  [pdf, other

    cs.CV

    Robust Calibration of Large Vision-Language Adapters

    Authors: Balamurali Murugesan, Julio Silva-Rodriguez, Ismail Ben Ayed, Jose Dolz

    Abstract: This paper addresses the critical issue of miscalibration in CLIP-based model adaptation, particularly in the challenging scenario of out-of-distribution (OOD) samples, which has been overlooked in the existing literature on CLIP adaptation. We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the cal… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  23. arXiv:2407.03588  [pdf, other

    cs.CV

    FDS: Feedback-guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization

    Authors: Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Gustavo Adolfo Vargas Hakim, David Osowiechi, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Domain Generalization techniques aim to enhance model robustness by simulating novel data distributions during training, typically through various augmentation or stylization strategies. However, these methods frequently suffer from limited control over the diversity of generated images and lack assurance that these images span distinct distributions. To address these challenges, we propose FDS, F… ▽ More

    Submitted 18 December, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to WACV 2025

  24. arXiv:2406.13875  [pdf, other

    cs.CV

    WATT: Weight Average Test-Time Adaptation of CLIP

    Authors: David Osowiechi, Mehrdad Noori, Gustavo Adolfo Vargas Hakim, Moslem Yazdanpanah, Ali Bahri, Milad Cheraghalikhani, Sahar Dastani, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Vision-Language Models (VLMs) such as CLIP have yielded unprecedented performance for zero-shot image classification, yet their generalization capability may still be seriously challenged when confronted to domain shifts. In response, we present Weight Average Test-Time Adaptation (WATT) of CLIP, a pioneering approach facilitating full test-time adaptation (TTA) of this VLM. Our method employs a d… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  25. arXiv:2406.07640  [pdf, other

    cs.LG cs.AI

    When is an Embedding Model More Promising than Another?

    Authors: Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida

    Abstract: Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately la… ▽ More

    Submitted 16 November, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  26. arXiv:2406.01837  [pdf, other

    cs.CV

    Boosting Vision-Language Models with Transduction

    Authors: Maxime Zanella, Benoît Gérin, Ismail Ben Ayed

    Abstract: Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models, consistently improving their performances. Our new objec… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2405.18541  [pdf, other

    cs.CV

    Low-Rank Few-Shot Adaptation of Vision-Language Models

    Authors: Maxime Zanella, Ismail Ben Ayed

    Abstract: Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parame… ▽ More

    Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  28. arXiv:2405.18437  [pdf, other

    cs.CV cs.AI

    Transductive Zero-Shot and Few-Shot CLIP

    Authors: Ségolène Martin, Yunshi Huang, Fereshteh Shakeri, Jean-Christophe Pesquet, Ismail Ben Ayed

    Abstract: Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge, in which inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating eac… ▽ More

    Submitted 8 April, 2024; originally announced May 2024.

    Comments: 2024 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024, Seattle (USA), Washington, United States

  29. arXiv:2405.12419  [pdf, other

    cs.CV cs.LG

    GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D

    Authors: Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

    Abstract: We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventional method of random masking, our technique utilizes a teacher-student model to focus on intricate areas within the data, guiding the model's focus toward region… ▽ More

    Submitted 17 March, 2025; v1 submitted 20 May, 2024; originally announced May 2024.

  30. arXiv:2405.02266  [pdf, other

    cs.CV

    On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

    Authors: Maxime Zanella, Ismail Ben Ayed

    Abstract: The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts to… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  31. arXiv:2405.00754  [pdf, other

    cs.CV cs.LG

    CLIPArTT: Adaptation of CLIP to New Domains at Test Time

    Authors: Gustavo Adolfo Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Pre-trained vision-language models (VLMs), exemplified by CLIP, demonstrate remarkable adaptability across zero-shot classification tasks without additional training. However, their performance diminishes in the presence of domain shifts. In this study, we introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for CLIP, which involves automatic text prom… ▽ More

    Submitted 29 November, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  32. arXiv:2404.19460  [pdf, other

    cs.LG cs.CR cs.CV

    AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

    Authors: Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, Fabio Roli

    Abstract: Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack ove… ▽ More

    Submitted 12 May, 2025; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Paper accepted at AAAI2025. Project page and leaderboard: https://attackbench.github.io

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 2600-2608. 2025

  33. arXiv:2404.08392  [pdf, other

    cs.CV cs.LG

    NC-TTT: A Noise Contrastive Approach for Test-Time Training

    Authors: David Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Despite their exceptional performance in vision tasks, deep learning models often struggle when faced with domain shifts during testing. Test-Time Training (TTT) methods have recently gained popularity by their ability to enhance the robustness of models through the addition of an auxiliary objective that is jointly optimized with the main task. Being strictly unsupervised, this auxiliary objectiv… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  34. arXiv:2404.08181  [pdf, other

    cs.CV

    Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

    Authors: Sina Hajimiri, Ismail Ben Ayed, Jose Dolz

    Abstract: Despite the significant progress in deep learning for dense visual recognition problems, such as semantic segmentation, traditional methods are constrained by fixed class sets. Meanwhile, vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks, owing to their robust generalizability. Recently, a body of work has investigated… ▽ More

    Submitted 16 September, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to WACV 2025

  35. arXiv:2404.02314  [pdf, other

    cs.LG cs.AI

    A Strong Baseline for Molecular Few-Shot Learning

    Authors: Philippe Formont, Hugo Jeannin, Pablo Piantanida, Ismail Ben Ayed

    Abstract: Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving convoluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which… ▽ More

    Submitted 7 February, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Published in Transactions on Machine Learning Research (02/2025)

  36. arXiv:2404.02285  [pdf, other

    cs.CV

    LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP

    Authors: Yunshi Huang, Fereshteh Shakeri, Jose Dolz, Malik Boudiaf, Houda Bahig, Ismail Ben Ayed

    Abstract: In a recent, strongly emergent literature on few-shot CLIP adaptation, Linear Probe (LP) has been often reported as a weak baseline. This has motivated intensive research building convoluted prompt learning or feature adaptation strategies. In this work, we propose and examine from convex-optimization perspectives a generalization of the standard LP baseline, in which the linear classifier weights… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  37. arXiv:2403.15567  [pdf, other

    cs.LG cs.CV

    Do not trust what you trust: Miscalibration in Semi-supervised Learning

    Authors: Shambhavi Mishra, Balamurali Murugesan, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

    Abstract: State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  38. arXiv:2403.12364  [pdf, other

    cs.CV

    Class and Region-Adaptive Constraints for Network Calibration

    Authors: Balamurali Murugesan, Julio Silva-Rodriguez, Ismail Ben Ayed, Jose Dolz

    Abstract: In this work, we present a novel approach to calibrate segmentation networks that considers the inherent challenges posed by different categories and object regions. In particular, we present a formulation that integrates class and region-wise constraints into the learning objective, with multiple penalty weights to account for class and region differences. Finding the optimal penalty weights manu… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Under review

  39. Exploring the Transferability of a Foundation Model for Fundus Images: Application to Hypertensive Retinopathy

    Authors: Julio Silva-Rodriguez, Jihed Chelbi, Waziha Kabir, Hadi Chakor, Jose Dolz, Ismail Ben Ayed, Riadh Kobbi

    Abstract: Using deep learning models pre-trained on Imagenet is the traditional solution for medical image classification to deal with data scarcity. Nevertheless, relevant literature supports that this strategy may offer limited gains due to the high dissimilarity between domains. Currently, the paradigm of adapting domain-specialized foundation models is proving to be a promising alternative. However, how… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: CGI 2023

  40. arXiv:2401.14487  [pdf, other

    cs.CV

    Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based Constraints

    Authors: Balamurali Murugesan, Sukesh Adiga Vasudeva, Bingyuan Liu, Hervé Lombaert, Ismail Ben Ayed, Jose Dolz

    Abstract: Ensuring reliable confidence scores from deep neural networks is of paramount significance in critical decision-making systems, particularly in real-world domains such as healthcare. Recent literature on calibrating deep segmentation networks has resulted in substantial progress. Nevertheless, these approaches are strongly inspired by the advancements in classification tasks, and thus their uncert… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Under review. arXiv admin note: text overlap with arXiv:2303.06268

  41. arXiv:2312.12730  [pdf, other

    cs.CV

    A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

    Authors: Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed, Jose Dolz

    Abstract: Efficient transfer learning (ETL) is receiving increasing attention to adapt large pre-trained language-vision models on downstream tasks with a few labeled samples. While significant progress has been made, we reveal that state-of-the-art ETL approaches exhibit strong performance only in narrowly-defined experimental setups, and with a careful adjustment of hyperparameters based on a large corpus… ▽ More

    Submitted 25 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Code: https://github.com/jusiro/CLAP

  42. arXiv:2311.17740  [pdf, other

    eess.IV cs.LG q-bio.TO

    A transductive few-shot learning approach for classification of digital histopathological slides from liver cancer

    Authors: Aymen Sadraoui, Ségolène Martin, Eliott Barbot, Astrid Laurent-Bellue, Jean-Christophe Pesquet, Catherine Guettier, Ismail Ben Ayed

    Abstract: This paper presents a new approach for classifying 2D histopathology patches using few-shot learning. The method is designed to tackle a significant challenge in histopathology, which is the limited availability of labeled data. By applying a sliding window technique to histopathology slides, we illustrate the practical benefits of transductive learning (i.e., making joint predictions on patches)… ▽ More

    Submitted 11 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Journal ref: ISBI 2024 - 21st IEEE International Symposium on Biomedical Imaging, May 2024, Ath{è}nes, Greece

  43. arXiv:2310.13998  [pdf, other

    cs.CL

    Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models

    Authors: Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Victor Storchan, Myriam Tami, Ismail Ben Ayed, Celine Hudelot, Pablo Piantanida

    Abstract: Proprietary and closed APIs are becoming increasingly common to process natural language, and are impacting the practical applications of natural language processing, including few-shot classification. Few-shot classification involves training a model to perform a new classification task with a handful of labeled data. This paper presents three contributions. First, we introduce a scenario where t… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  44. arXiv:2310.12345  [pdf, other

    cs.CV cs.AI cs.LG

    ClusT3: Information Invariant Test-Time Training

    Authors: Gustavo A. Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ismail Ben Ayed, Christian Desrosiers

    Abstract: Deep Learning models have shown remarkable performance in a broad range of vision tasks. However, they are often vulnerable against domain shifts at test-time. Test-time training (TTT) methods have been developed in an attempt to mitigate these vulnerabilities, where a secondary task is solved at training time simultaneously with the main task, to be later used as an self-supervised proxy task at… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  45. arXiv:2310.05566  [pdf, other

    cs.LG cs.AI

    Aggregated f-average Neural Network applied to Few-Shot Class Incremental Learning

    Authors: Mathieu Vu, Emilie Chouzenoux, Ismail Ben Ayed, Jean-Christophe Pesquet

    Abstract: Ensemble learning leverages multiple models (i.e., weak learners) on a common machine learning task to enhance prediction performance. Basic ensembling approaches average the weak learners outputs, while more sophisticated ones stack a machine learning model in between the weak learners outputs and the final prediction. This work fuses both aforementioned frameworks. We introduce an aggregated f-a… ▽ More

    Submitted 19 September, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 27 pages, 3 figures, submitted to Signal Processing

  46. arXiv:2310.02416  [pdf, other

    cs.LG cs.CV

    Bag of Tricks for Fully Test-Time Adaptation

    Authors: Saypraseuth Mounsaveng, Florent Chiaroni, Malik Boudiaf, Marco Pedersoli, Ismail Ben Ayed

    Abstract: Fully Test-Time Adaptation (TTA), which aims at adapting models to data drifts, has recently attracted wide interest. Numerous tricks and techniques have been proposed to ensure robust learning on arbitrary streams of unlabeled data. However, assessing the true impact of each individual technique and obtaining a fair comparison still constitutes a significant challenge. To help consolidate the com… ▽ More

    Submitted 9 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at WACV 2024

  47. arXiv:2309.17357  [pdf, other

    cs.LG

    Module-wise Training of Neural Networks via the Minimizing Movement Scheme

    Authors: Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari

    Abstract: Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introd… ▽ More

    Submitted 5 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023. arXiv admin note: text overlap with arXiv:2210.00949

  48. A Foundation Language-Image Model of the Retina (FLAIR): Encoding Expert Knowledge in Text Supervision

    Authors: Julio Silva-Rodríguez, Hadi Chakor, Riadh Kobbi, Jose Dolz, Ismail Ben Ayed

    Abstract: Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain… ▽ More

    Submitted 15 January, 2025; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted in Medical Image Analysis. The pre-trained model is available at: https://github.com/jusiro/FLAIR

  49. arXiv:2307.11808  [pdf, other

    cs.CV

    Automatic Data Augmentation Learning using Bilevel Optimization for Histopathological Images

    Authors: Saypraseuth Mounsaveng, Issam Laradji, David Vázquez, Marco Perdersoli, Ismail Ben Ayed

    Abstract: Training a deep learning model to classify histopathological images is challenging, because of the color and shape variability of the cells and tissues, and the reduced amount of available data, which does not allow proper learning of those variations. Variations can come from the image acquisition process, for example, due to different cell staining protocols or tissue deformation. To tackle this… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2006.14699

  50. arXiv:2307.00097  [pdf, other

    cs.CV

    Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation

    Authors: Balamurali Murugesan, Rukhshanda Hussain, Rajarshi Bhattacharya, Ismail Ben Ayed, Jose Dolz

    Abstract: Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work w… ▽ More

    Submitted 13 January, 2024; v1 submitted 30 June, 2023; originally announced July 2023.

    Comments: WACV 2024