Skip to main content

Showing 1–3 of 3 results for author: Oncescu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15535  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Simplified Analysis of SGD for Linear Regression with Weight Averaging

    Authors: Alexandru Meterez, Depen Morwani, Costin-Andrei Oncescu, Jingfeng Wu, Cengiz Pehlevan, Sham Kakade

    Abstract: Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides sharp rates for SGD optimization in linear regression using constant learning rate, both with and without tail iterate averaging, based on a bias-variance decompo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2410.12982  [pdf, other

    cs.LG cs.AI

    Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

    Authors: Costin-Andrei Oncescu, Sanket Purandare, Stratos Idreos, Sham Kakade

    Abstract: While transformers have been at the core of most recent advancements in sequence generative models, their computational cost remains quadratic in sequence length. Several subquadratic architectures have been proposed to address this computational issue. Some of them, including long convolution sequence models (LCSMs), such as Hyena, address this issue at training time but remain quadratic during i… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 15 pages, 9 figures, 5 algorithms

  3. arXiv:2311.07568  [pdf, other

    cs.LG

    Feature emergence via margin maximization: case studies in algebraic tasks

    Authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade

    Abstract: Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry fo… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted as Spotlight at ICLR 2024

    ACM Class: I.5.1; I.2.6