Skip to main content

Showing 1–4 of 4 results for author: Medapati, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.13397  [pdf, other

    cs.LG math.NA stat.ML

    Learning by solving differential equations

    Authors: Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Sourabh Medapati, Javier Gonzalvo

    Abstract: Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kut… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2502.15015  [pdf, other

    cs.LG stat.ML

    Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition

    Authors: Priya Kasimbeg, Frank Schneider, Runa Eschenhagen, Juhan Bae, Chandramouli Shama Sastry, Mark Saroufim, Boyuan Feng, Less Wright, Edward Z. Yang, Zachary Nado, Sourabh Medapati, Philipp Hennig, Michael Rabbat, George E. Dahl

    Abstract: The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions ar… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: ICLR 2025; 23 pages, 5 figures, 8 tables

  3. arXiv:2402.04494  [pdf, other

    cs.LG cs.AI stat.ML

    Amortized Planning with Large-Scale Transformers: A Case Study on Chess

    Authors: Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Cannada A. Lewis, Joel Veness, Tim Genewein

    Abstract: This paper uses chess, a landmark planning problem in AI, to assess transformers' performance on a planning task where memorization is futile $\unicode{x2013}$ even at a large scale. To this end, we release ChessBench, a large-scale benchmark dataset of 10 million chess games with legal move and value annotations (15 billion data points) provided by Stockfish 16, the state-of-the-art chess engine.… ▽ More

    Submitted 21 October, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  4. arXiv:2306.07179  [pdf, ps, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 18 June, 2025; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables