Skip to main content

Showing 1–7 of 7 results for author: Abedsoltan, A

.
  1. arXiv:2502.08991  [pdf, ps, other

    cs.LG stat.ML

    Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?

    Authors: Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin

    Abstract: Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of autoregressive compositional structure, where each task is a c… ▽ More

    Submitted 8 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  2. arXiv:2411.16658  [pdf, other

    stat.ML cs.LG

    Fast training of large kernel models with delayed projections

    Authors: Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit, Mikhail Belkin

    Abstract: Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradie… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: text overlap with arXiv:2302.02605

  3. arXiv:2410.12783  [pdf, other

    cs.LG stat.ML

    Context-Scaling versus Task-Scaling in In-Context Learning

    Authors: Amirhesam Abedsoltan, Adityanarayanan Radhakrishnan, Jingfeng Wu, Mikhail Belkin

    Abstract: Transformers exhibit In-Context Learning (ICL), where these models solve new tasks by using examples in the prompt without additional training. In our work, we identify and analyze two key components of ICL: (1) context-scaling, where model performance improves as the number of in-context examples increases and (2) task-scaling, where model performance improves as the number of pre-training tasks… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2312.03311  [pdf, other

    stat.ML cs.LG

    On the Nystrom Approximation for Preconditioning in Kernel Machines

    Authors: Amirhesam Abedsoltan, Parthe Pandit, Luis Rademacher, Mikhail Belkin

    Abstract: Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral precondi… ▽ More

    Submitted 24 January, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  5. arXiv:2306.02533  [pdf, ps, other

    cs.LG stat.ML

    On Emergence of Clean-Priority Learning in Early Stopped Neural Networks

    Authors: Chaoyue Liu, Amirhesam Abedsoltan, Mikhail Belkin

    Abstract: When random label noise is added to a training dataset, the prediction error of a neural network on a label-noise-free test dataset initially improves during early training but eventually deteriorates, following a U-shaped dependence on training time. This behaviour is believed to be a result of neural networks learning the pattern of clean data first and fitting the noise later in the training, a… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  6. arXiv:2302.02605  [pdf, other

    cs.LG stat.ML

    Toward Large Kernel Models

    Authors: Amirhesam Abedsoltan, Mikhail Belkin, Parthe Pandit

    Abstract: Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas i… ▽ More

    Submitted 19 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Code is available at github.com/EigenPro/EigenPro3

  7. arXiv:2207.06569  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

    Authors: Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

    Abstract: The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: NM and JS co-first authors