Skip to main content

Showing 1–5 of 5 results for author: Mousavi-Hosseini, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.11272  [pdf, other

    stat.ML cs.LG

    When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

    Authors: Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, Murat A. Erdogdu

    Abstract: Theoretical efforts to prove advantages of Transformers in comparison with classical architectures such as feedforward and recurrent neural networks have mostly focused on representational power. In this work, we take an alternative perspective and prove that even with infinite compute, feedforward and recurrent networks may suffer from larger sample complexity compared to Transformers, as the lat… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 43 pages, 2 figures

  2. arXiv:2410.16449  [pdf, other

    stat.ML cs.LG

    Robust Feature Learning for Multi-Index Models in High Dimensions

    Authors: Alireza Mousavi-Hosseini, Adel Javanmard, Murat A. Erdogdu

    Abstract: Recently, there have been numerous studies on feature learning with neural networks, specifically on learning single- and multi-index models where the target is a function of a low-dimensional projection of the input. Prior works have shown that in high dimensions, the majority of the compute and data resources are spent on recovering the low-dimensional projection; once this subspace is recovered… ▽ More

    Submitted 27 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: 41 pages, 1 figure. To appear in the International Conference on Learning Representations (ICLR), 2025

  3. arXiv:2408.07254  [pdf, other

    stat.ML cs.LG

    Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

    Authors: Alireza Mousavi-Hosseini, Denny Wu, Murat A. Erdogdu

    Abstract: We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures… ▽ More

    Submitted 26 March, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 36 pages, 2 figures. To appear in the International Conference on Learning Representations (ICLR), 2025

  4. arXiv:2309.03843  [pdf, other

    stat.ML cs.LG

    Gradient-Based Feature Learning under Structured Data

    Authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu

    Abstract: Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  5. arXiv:2209.14863  [pdf, other

    stat.ML cs.LG

    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

    Authors: Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis Mitliagkas, Murat A. Erdogdu

    Abstract: We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove… ▽ More

    Submitted 15 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 39 pages, 2 figures. To appear in the International Conference on Learning Representations (ICLR), 2023