Skip to main content

Showing 1–23 of 23 results for author: Scetbon, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.07752  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension

    Authors: Wenbo Gong, Meyer Scetbon, Chao Ma, Edward Meeds

    Abstract: Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the lens of structured Fisher information matrix (FIM) approximation. We show that many state-of-the-art efficient optimizers can be viewed as solutions to FIM appr… ▽ More

    Submitted 20 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  2. arXiv:2502.06742  [pdf, other

    cs.LG cs.AI

    Gradient Multi-Normalization for Stateless and Scalable LLM Training

    Authors: Meyer Scetbon, Chao Ma, Wenbo Gong, Edward Meeds

    Abstract: Training large language models (LLMs) typically relies on adaptive optimizers like Adam (Kingma & Ba, 2015) which store additional state information to accelerate convergence but incur significant memory overhead. Recent efforts, such as SWAN (Ma et al., 2024) address this by eliminating the need for optimizer states while achieving performance comparable to Adam via a multi-step preprocessing pro… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  3. arXiv:2412.13148  [pdf, other

    cs.LG cs.AI

    SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training

    Authors: Chao Ma, Wenbo Gong, Meyer Scetbon, Edward Meeds

    Abstract: Adaptive optimizers such as Adam (Kingma & Ba, 2015) have been central to the success of large language models. However, they often require to maintain optimizer states throughout training, which can result in memory requirements several times greater than the model footprint. This overhead imposes constraints on scalability and computational efficiency. Stochastic Gradient Descent (SGD), in contr… ▽ More

    Submitted 21 February, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: In v2 we have revised the related work, added more comprehensive citations, and clarified our key contributions

  4. arXiv:2412.07902  [pdf, other

    stat.ML cs.LG

    Low-Rank Correction for Quantized LLMs

    Authors: Meyer Scetbon, James Hensman

    Abstract: We consider the problem of model compression for Large Language Models (LLMs) at post-training time, where the task is to compress a well-trained model using only a small set of calibration input data. In this work, we introduce a new low-rank approach to correct for quantization errors of \emph{activations} in LLMs: we propose to add low-rank weight matrices in full precision that act on the \emp… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  5. arXiv:2410.06128  [pdf, ps, other

    cs.LG stat.ML

    Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

    Authors: Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

    Abstract: Structural Causal Models (SCMs) offer a principled framework to reason about interventions and support out-of-distribution generalization, which are key goals in scientific discovery. However, the task of learning SCMs from observed data poses formidable challenges, and often requires training a separate model for each dataset. In this work, we propose amortized inference of SCMs by training a sin… ▽ More

    Submitted 10 June, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: Preprint. Under Review

  6. arXiv:2404.06969  [pdf, other

    cs.LG stat.ML

    A Fixed-Point Approach for Causal Generative Modeling

    Authors: Meyer Scetbon, Joel Jennings, Agrin Hilmkil, Cheng Zhang, Chao Ma

    Abstract: We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their unique recovery given the topological ordering (TO). Based on this, we design a two-stage causal generative model that first infers in a zero-shot manner a val… ▽ More

    Submitted 13 December, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  7. arXiv:2402.06665  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    The Essential Role of Causality in Foundation World Models for Embodied AI

    Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

    Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More

    Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  8. arXiv:2308.00556  [pdf, other

    stat.ML cs.LG

    Robust Linear Regression: Phase-Transitions and Precise Tradeoffs for General Norms

    Authors: Elvis Dohmatob, Meyer Scetbon

    Abstract: In this paper, we investigate the impact of test-time adversarial attacks on linear regression models and determine the optimal level of robustness that any model can reach while maintaining a given level of standard predictive performance (accuracy). Through quantitative estimates, we uncover fundamental tradeoffs between adversarial robustness and accuracy in different regimes. We obtain a preci… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  9. arXiv:2305.19727  [pdf, other

    cs.LG math.OC

    Unbalanced Low-rank Optimal Transport Solvers

    Authors: Meyer Scetbon, Michal Klein, Giovanni Palla, Marco Cuturi

    Abstract: The relevance of optimal transport methods to machine learning has long been hindered by two salient limitations. First, the $O(n^3)$ computational cost of standard sample-based solvers (when used on batches of $n$ samples) is prohibitive. Second, the mass conservation constraint makes OT solvers too rigid in practice: because they must match \textit{all} points from both measures, their output ca… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  10. arXiv:2304.13467  [pdf, other

    math.OC cs.LG

    Polynomial-Time Solvers for the Discrete $\infty$-Optimal Transport Problems

    Authors: Meyer Scetbon

    Abstract: In this note, we propose polynomial-time algorithms solving the Monge and Kantorovich formulations of the $\infty$-optimal transport problem in the discrete and finite setting. It is the first time, to the best of our knowledge, that efficient numerical methods for these problems have been proposed.

    Submitted 26 April, 2023; originally announced April 2023.

  11. arXiv:2301.13486  [pdf, other

    stat.ML cs.LG

    Robust Linear Regression: Gradient-descent, Early-stopping, and Beyond

    Authors: Meyer Scetbon, Elvis Dohmatob

    Abstract: In this work we study the robustness to adversarial attacks, of early-stopping strategies on gradient-descent (GD) methods for linear regression. More precisely, we show that early-stopped GD is optimally robust (up to an absolute constant) against Euclidean-norm adversarial attacks. However, we show that this strategy can be arbitrarily sub-optimal in the case of general Mahalanobis attacks. This… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  12. arXiv:2205.12365  [pdf, other

    stat.ML cs.LG

    Low-rank Optimal Transport: Approximation, Statistics and Debiasing

    Authors: Meyer Scetbon, Marco Cuturi

    Abstract: The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing conse… ▽ More

    Submitted 15 September, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

  13. arXiv:2112.15595  [pdf, other

    stat.ML cs.LG math.PR

    Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

    Authors: Nicholas J. Irons, Meyer Scetbon, Soumik Pal, Zaid Harchaoui

    Abstract: Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flow models such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical mod… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

  14. arXiv:2110.14868  [pdf, other

    stat.ML cs.LG

    An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings

    Authors: Meyer Scetbon, Laurent Meunier, Yaniv Romano

    Abstract: We propose a new conditional dependence measure and a statistical test for conditional independence. The measure is based on the difference between analytic kernel embeddings of two well-suited distributions evaluated at a finite set of locations. We obtain its asymptotic distribution under the null hypothesis of conditional independence and design a consistent statistical test from it. We conduct… ▽ More

    Submitted 16 June, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

  15. arXiv:2106.01128  [pdf, other

    cs.LG stat.ML

    Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs

    Authors: Meyer Scetbon, Gabriel Peyré, Marco Cuturi

    Abstract: The ability to align points across two related yet incomparable point clouds (e.g. living in different spaces) plays an important role in machine learning. The Gromov-Wasserstein (GW) framework provides an increasingly popular answer to such problems, by seeking a low-distortion, geometry-preserving assignment between these points. As a non-convex, quadratic generalization of optimal transport (OT… ▽ More

    Submitted 6 February, 2023; v1 submitted 2 June, 2021; originally announced June 2021.

  16. arXiv:2103.04737  [pdf, other

    stat.ML cs.LG

    Low-Rank Sinkhorn Factorization

    Authors: Meyer Scetbon, Marco Cuturi, Gabriel Peyré

    Abstract: Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank cons… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  17. arXiv:2102.06905  [pdf, other

    cs.GT cs.CR cs.LG

    Mixed Nash Equilibria in the Adversarial Examples Game

    Authors: Laurent Meunier, Meyer Scetbon, Rafael Pinot, Jamal Atif, Yann Chevaleyre

    Abstract: This paper tackles the problem of adversarial examples from a game theoretic point of view. We study the open question of the existence of mixed Nash equilibria in the zero-sum game formed by the attacker and the classifier. While previous works usually allow only one player to use randomized strategies, we show the necessity of considering randomization for both the classifier and the attacker. W… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  18. arXiv:2006.07260  [pdf, other

    stat.ML cs.LG math.OC

    Equitable and Optimal Transport with Multiple Agents

    Authors: Meyer Scetbon, Laurent Meunier, Jamal Atif, Marco Cuturi

    Abstract: We introduce an extension of the Optimal Transport problem when multiple costs are involved. Considering each cost as an agent, we aim to share equally between agents the work of transporting one distribution to another. To do so, we minimize the transportation cost of the agent who works the most. Another point of view is when the goal is to partition equitably goods between agents according to t… ▽ More

    Submitted 25 February, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

  19. arXiv:2006.07057  [pdf, other

    stat.ML cs.LG

    Linear Time Sinkhorn Divergences using Positive Features

    Authors: Meyer Scetbon, Marco Cuturi

    Abstract: Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size $n$ of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a $n\times n$ kernel matrix (the neg-exponent… ▽ More

    Submitted 26 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

  20. arXiv:2003.12756  [pdf, other

    stat.ML cs.LG

    Harmonic Decompositions of Convolutional Networks

    Authors: Meyer Scetbon, Zaid Harchaoui

    Abstract: We present a description of the function space and the smoothness class associated with a convolutional network using the machinery of reproducing kernel Hilbert spaces. We show that the mapping associated with a convolutional network expands into a sum involving elementary functions akin to spherical harmonics. This functional decomposition can be related to the functional ANOVA decomposition in… ▽ More

    Submitted 16 November, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

  21. arXiv:2002.12640  [pdf, other

    stat.ML cs.LG

    A Spectral Analysis of Dot-product Kernels

    Authors: Meyer Scetbon, Zaid Harchaoui

    Abstract: We present eigenvalue decay estimates of integral operators associated with compositional dot-product kernels. The estimates improve on previous ones established for power series kernels on spheres. This allows us to obtain the volumes of balls in the corresponding reproducing kernel Hilbert spaces. We discuss the consequences on statistical estimation with compositional dot product kernels and hi… ▽ More

    Submitted 26 February, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  22. arXiv:1909.13164  [pdf, other

    cs.LG stat.ML

    Deep K-SVD Denoising

    Authors: Meyer Scetbon, Michael Elad, Peyman Milanfar

    Abstract: This work considers noise removal from images, focusing on the well known K-SVD denoising algorithm. This sparsity-based method was proposed in 2006, and for a short while it was considered as state-of-the-art. However, over the years it has been surpassed by other methods, including the recent deep-learning-based newcomers. The question we address in this paper is whether K-SVD was brought to its… ▽ More

    Submitted 18 November, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

  23. arXiv:1909.09264  [pdf, other

    stat.ML cs.LG

    Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing

    Authors: M. Scetbon, G. Varoquaux

    Abstract: Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the $L^2$ distance between kernel-based distribution representatives to derive their test statistics. Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on… ▽ More

    Submitted 30 September, 2019; v1 submitted 19 September, 2019; originally announced September 2019.