Skip to main content

Showing 1–10 of 10 results for author: Odonnat, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.02362  [pdf, other

    cs.LG eess.SP stat.ML

    Easing Optimization Paths: a Circuit Perspective

    Authors: Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes

    Abstract: Gradient descent is the method of choice for training large artificial intelligence systems. As these systems become larger, a better understanding of the mechanisms behind gradient training would allow us to alleviate compute costs and help steer these systems away from harmful behaviors. To that end, we suggest utilizing the circuit perspective brought forward by mechanistic interpretability. Af… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  2. arXiv:2410.24050  [pdf, other

    cs.LG stat.ML

    Clustering Head: A Visual Case Study of the Training Dynamics in Transformers

    Authors: Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes

    Abstract: This paper introduces the sparse modular addition task and examines how transformers learn it. We focus on transformers with embeddings in $\R^2$ and introduce a visual sandbox that provides comprehensive visualizations of each layer throughout the training process. We reveal a type of circuit, called "clustering heads," which learns the problem's invariants. We analyze the training dynamics of th… ▽ More

    Submitted 2 February, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

  3. arXiv:2410.11711  [pdf, other

    stat.ML cs.LG

    Zero-shot Model-based Reinforcement Learning using Large Language Models

    Authors: Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl

    Abstract: The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to pr… ▽ More

    Submitted 13 February, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Journal ref: The Thirteenth International Conference on Learning Representations (ICLR 2025)

  4. arXiv:2410.02724  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Large Language Models as Markov Chains

    Authors: Oussama Zekri, Ambroise Odonnat, Abdelhakim Benechehab, Linus Bleistein, Nicolas Boullé, Ievgen Redko

    Abstract: Large language models (LLMs) are remarkably efficient across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the LLMs' generalization capabilities remains elusive. In our paper, we approach this task by drawing an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state… ▽ More

    Submitted 2 February, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

  5. arXiv:2407.11676  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities

    Authors: Yanis Lalou, Théo Gnassounou, Antoine Collas, Antoine de Mathelin, Oleksii Kachaiev, Ambroise Odonnat, Alexandre Gramfort, Thomas Moreau, Rémi Flamary

    Abstract: Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many methods have been proposed in the literature, fair and realistic evaluation remains an open question, particularly due to methodological difficulties in selecting hyperparameters in the unsupervised setting.… ▽ More

    Submitted 11 February, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2406.10327  [pdf, other

    stat.ML cs.LG

    Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting

    Authors: Romain Ilbert, Malik Tiomoko, Cosme Louart, Ambroise Odonnat, Vasilii Feofanov, Themis Palpanas, Ievgen Redko

    Abstract: In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2405.18979  [pdf, other

    cs.LG stat.ML

    MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

    Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

    Abstract: Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to predict… ▽ More

    Submitted 25 November, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The three first authors contributed equally

  8. arXiv:2402.10198  [pdf, other

    cs.LG stat.ML

    SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

    Authors: Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko

    Abstract: Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite the… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted as an Oral at ICML 2024, Vienna. The first two authors contributed equally

  9. arXiv:2401.08909  [pdf, other

    cs.LG

    Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift

    Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An

    Abstract: Estimating the test performance of a model, possibly under distribution shift, without having access to the ground-truth labels is a challenging, yet very important problem for the safe deployment of machine learning algorithms in the wild. Existing works mostly rely on information from either the outputs or the extracted features of neural networks to estimate a score that correlates with the gro… ▽ More

    Submitted 12 May, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

  10. arXiv:2310.14814  [pdf, other

    cs.LG cs.AI stat.ML

    Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

    Authors: Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko

    Abstract: Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions. This phenomenon is part… ▽ More

    Submitted 3 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at AISTATS 2024, Valencia, Spain