Skip to main content

Showing 1–12 of 12 results for author: Carmon, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.10898  [pdf, other

    math.OC cs.LG stat.ML

    The Price of Adaptivity in Stochastic Convex Optimization

    Authors: Yair Carmon, Oliver Hinder

    Abstract: We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024; to appear in proceedings as an extended abstract

  2. arXiv:2305.13064  [pdf, other

    cs.LG math.OC stat.ML

    Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

    Authors: Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon

    Abstract: Recent research shows that when Gradient Descent (GD) is applied to neural networks, the loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent converges to its ''Edge of Stability'' (EoS). Here, we find a quantity that does decrease monotonically throughout GD training: the sharpness attained by the gradient flow solution (GFS)-the solution that would be obtai… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  3. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  4. arXiv:2211.15724  [pdf, other

    cs.LG stat.ML

    Malign Overfitting: Interpolation Can Provably Preclude Invariance

    Authors: Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

    Abstract: Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phe… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

  5. arXiv:2205.02160  [pdf, other

    math.OC cs.LG stat.ML

    Making SGD Parameter-Free

    Authors: Yair Carmon, Oliver Hinder

    Abstract: We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to t… ▽ More

    Submitted 1 March, 2024; v1 submitted 4 May, 2022; originally announced May 2022.

  6. arXiv:2107.04649  [pdf, other

    cs.LG stat.ML

    Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

    Authors: John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

    Abstract: For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribut… ▽ More

    Submitted 7 October, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

  7. arXiv:2010.05893  [pdf, other

    math.OC cs.LG stat.ML

    Large-Scale Methods for Distributionally Robust Optimization

    Authors: Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

    Abstract: We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such… ▽ More

    Submitted 10 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 63 pages, NeurIPS 2020

  8. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  9. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  10. arXiv:1905.13736  [pdf, other

    stat.ML cs.CV cs.LG

    Unlabeled Data Improves Adversarial Robustness

    Authors: Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

    Abstract: We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high… ▽ More

    Submitted 13 January, 2022; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Corrected some math typos in the proof of Lemma 1

  11. arXiv:1903.02675  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    A Rank-1 Sketch for Matrix Multiplicative Weights

    Authors: Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

    Abstract: We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor. Unlike MMW, where every step requires full matrix exponentiation, our steps require only a single product of the form $e^A b$, which the Lanczos method approximates efficiently. Our key technique is to view the sketch as a… ▽ More

    Submitted 12 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

  12. arXiv:1605.08361  [pdf, ps, other

    stat.ML cs.LG cs.NE

    No bad local minima: Data independent training error guarantees for multilayer neural networks

    Authors: Daniel Soudry, Yair Carmon

    Abstract: We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minim… ▽ More

    Submitted 30 May, 2016; v1 submitted 26 May, 2016; originally announced May 2016.