Skip to main content

Showing 1–14 of 14 results for author: Marion, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.02336  [pdf, ps, other

    stat.ML cs.LG

    Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

    Authors: Jingfeng Wu, Pierre Marion, Peter Bartlett

    Abstract: We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $\widetilde{\mathcal{O}}(κ)$ steps with $κ$ being the condition number. Surprisingly, we show that this can be accelerated to… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2505.13112  [pdf, other

    stat.ML cs.LG

    Attention-based clustering

    Authors: Rodrigo Maulen-Soto, Claire Boyer, Pierre Marion

    Abstract: Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we… ▽ More

    Submitted 3 July, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  3. arXiv:2502.03435  [pdf, other

    stat.ML cs.LG

    Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization

    Authors: Yu-Han Wu, Pierre Marion, Gérard Biau, Claire Boyer

    Abstract: Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples replicate the training data. Yet, in practice, only a moderate degree of memorization is observed, even without explicit regularization. In this paper, we inves… ▽ More

    Submitted 6 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  4. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  5. arXiv:2410.01537  [pdf, other

    stat.ML cs.LG

    Attention layers provably solve single-location regression

    Authors: Pierre Marion, Raphaël Berthier, Gérard Biau, Claire Boyer

    Abstract: Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linea… ▽ More

    Submitted 25 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 42 pages, 10 figures. Accepted to ICLR 2025

  6. arXiv:2405.13456  [pdf, other

    stat.ML cs.LG

    Deep linear networks for regression are implicitly regularized towards flat minima

    Authors: Pierre Marion, Lénaïc Chizat

    Abstract: The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth.… ▽ More

    Submitted 28 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 47 pages, 7 figures. Accepted to NeurIPS 2024

  7. arXiv:2402.05468  [pdf, other

    cs.LG

    Implicit Diffusion: Efficient Optimization through Stochastic Sampling

    Authors: Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet

    Abstract: We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is in… ▽ More

    Submitted 5 March, 2025; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 40 pages, 18 figures. Accepted to AISTATS 2025

  8. arXiv:2309.01213  [pdf, other

    stat.ML cs.LG

    Implicit regularization of deep residual networks towards neural ODEs

    Authors: Pierre Marion, Yu-Han Wu, Michael E. Sander, Gérard Biau

    Abstract: Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual ne… ▽ More

    Submitted 5 July, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 (spotlight). 40 pages, 3 figures

  9. arXiv:2305.06648  [pdf, other

    stat.ML cs.LG

    Generalization bounds for neural ordinary differential equations and deep residual networks

    Authors: Pierre Marion

    Abstract: Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual… ▽ More

    Submitted 11 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023, 21 pages, 2 figures

  10. arXiv:2304.09576  [pdf, other

    math.OC cs.LG stat.ML

    Leveraging the two timescale regime to demonstrate convergence of neural networks

    Authors: Pierre Marion, Raphaël Berthier

    Abstract: We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to h… ▽ More

    Submitted 25 October, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023. 34 pages, 10 figures

  11. arXiv:2206.06929  [pdf, other

    cs.LG stat.ML

    Scaling ResNets in the Large-depth Regime

    Authors: Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert

    Abstract: Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed… ▽ More

    Submitted 28 February, 2025; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 48 pages, 15 figures. Accepted to JMLR

  12. arXiv:2109.00269  [pdf, other

    cs.CL

    Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

    Authors: Pierre Marion, Paweł Krzysztof Nowak, Francesco Piccinno

    Abstract: We tackle the problem of weakly-supervised conversational Question Answering over large Knowledge Graphs using a neural semantic parsing approach. We introduce a new Logical Form (LF) grammar that can model a wide range of queries on the graph while remaining sufficiently simple to generate supervision data efficiently. Our Transformer-based model takes a JSON-like structure as input, allowing us… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: 16 pages, 1 figure. Accepted to EMNLP 2021

    ACM Class: I.2.7

  13. arXiv:2106.01202  [pdf, other

    stat.ML cs.LG

    Framing RNN as a kernel method: A neural ODE approach

    Authors: Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, Gérard Biau

    Abstract: Building on the interpretation of a recurrent neural network (RNN) as a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space.… ▽ More

    Submitted 29 October, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: 33 pages, 7 figures, accepted for an oral presentation at NeurIPS 2021

  14. arXiv:1707.04796  [pdf, other

    cs.CV cs.RO

    LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

    Authors: Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake

    Abstract: Deep neural network (DNN) architectures have been shown to outperform traditional pipelines for object segmentation and pose estimation using RGBD data, but the performance of these DNN pipelines is directly tied to how representative the training data is of the true data. Hence a key requirement for employing these methods in practice is to have a large set of labeled data for your specific robot… ▽ More

    Submitted 26 September, 2017; v1 submitted 15 July, 2017; originally announced July 2017.