Skip to main content

Showing 1–5 of 5 results for author: Bouaziz, W

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.14913  [pdf, ps, other

    cs.CR cs.LG stat.ML

    Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning

    Authors: Wassim Bouaziz, Mathurin Videau, Nicolas Usunier, El-Mahdi El-Mhamdi

    Abstract: The pre-training of large language models (LLMs) relies on massive text datasets sourced from diverse and difficult-to-curate origins. Although membership inference attacks and hidden canaries have been explored to trace data usage, such methods rely on memorization of training data, which LM providers try to limit. In this work, we demonstrate that indirect data poisoning (where the targeted beha… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 18 pages, 12 figures

  2. arXiv:2506.08998  [pdf, ps, other

    math.ST cs.LG stat.ML

    On Monotonicity in AI Alignment

    Authors: Gilles Bareilles, Julien Fageot, Lê-Nguyên Hoang, Peva Blanchard, Wassim Bouaziz, Sébastien Rouault, El-Mahdi El-Mhamdi

    Abstract: Comparison-based preference learning has become central to the alignment of AI models with human preferences. However, these methods may behave counterintuitively. After empirically observing that, when accounting for a preference for response $y$ over $z$, the model may actually decrease the probability (and reward) of generating $y$ (an observation also made by others), this paper investigates t… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2501.02362  [pdf, other

    cs.LG eess.SP stat.ML

    Easing Optimization Paths: a Circuit Perspective

    Authors: Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes

    Abstract: Gradient descent is the method of choice for training large artificial intelligence systems. As these systems become larger, a better understanding of the mechanisms behind gradient training would allow us to alleviate compute costs and help steer these systems away from harmful behaviors. To that end, we suggest utilizing the circuit perspective brought forward by mechanistic interpretability. Af… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  4. arXiv:2410.24050  [pdf, other

    cs.LG stat.ML

    Clustering Head: A Visual Case Study of the Training Dynamics in Transformers

    Authors: Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes

    Abstract: This paper introduces the sparse modular addition task and examines how transformers learn it. We focus on transformers with embeddings in $\R^2$ and introduce a visual sandbox that provides comprehensive visualizations of each layer throughout the training process. We reveal a type of circuit, called "clustering heads," which learns the problem's invariants. We analyze the training dynamics of th… ▽ More

    Submitted 2 February, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

  5. arXiv:2410.09101  [pdf, other

    cs.CR cs.LG stat.ML

    Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning

    Authors: Wassim Bouaziz, El-Mahdi El-Mhamdi, Nicolas Usunier

    Abstract: Dataset ownership verification, the process of determining if a dataset is used in a model's training data, is necessary for detecting unauthorized data usage and data contamination. Existing approaches, such as backdoor watermarking, rely on inducing a detectable behavior into the trained model on a part of the data distribution. However, these approaches have limitations, as they can be harmful… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 16 pages, 7 figures