Skip to main content

Showing 1–43 of 43 results for author: Kakade, S M

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.08466  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Scaling Laws in Linear Regression: Compute, Parameters, and Data

    Authors: Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

    Abstract: Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, wh… ▽ More

    Submitted 10 June, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: fixed typos

  2. arXiv:2404.12376  [pdf, other

    cs.LG math.OC stat.ML

    Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

    Authors: Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

    Abstract: The $k$-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can eff… ▽ More

    Submitted 5 December, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 37 pages, 7 figures, 3 tables. In NeurIPS 2024

  3. arXiv:2303.02255  [pdf, other

    cs.LG math.OC stat.ML

    Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

    Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspec… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: ICML 2023 camera ready

  4. arXiv:2210.04157  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Role of Coverage in Online Reinforcement Learning

    Authors: Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

    Abstract: Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning. While such conditions might seem irrelevant to online reinforcement learning at first glance, we establish a new connection by showing -- somewhat surprisingly -- that the mere existence of a data… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  5. arXiv:2210.03137  [pdf, other

    cs.LG math.OC

    Deep Inventory Management

    Authors: Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade

    Abstract: This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, our results show that several policy learning approaches are competitive with or outperform classical methods. In order to train… ▽ More

    Submitted 28 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

  6. arXiv:2208.01857  [pdf, other

    cs.LG math.OC stat.ML

    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 32 pages, 1 figure, 1 table

  7. arXiv:2203.03159  [pdf, other

    cs.LG math.OC stat.ML

    Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 28 pages, 2 figures

  8. arXiv:2112.13487  [pdf, other

    cs.LG math.OC math.ST stat.ML

    The Statistical Complexity of Interactive Decision Making

    Authors: Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

    Abstract: A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher… ▽ More

    Submitted 11 July, 2023; v1 submitted 26 December, 2021; originally announced December 2021.

    Comments: Minor improvements to writing and organization

  9. arXiv:2110.06198  [pdf, other

    cs.LG math.OC stat.ML

    Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has been shown to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regress… ▽ More

    Submitted 11 July, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 35 pages, 2 figures, 1 table. In ICML 2022

  10. arXiv:2108.04552  [pdf, other

    cs.LG math.OC stat.ML

    The Benefits of Implicit Regularization from SGD in Least Squares Problems

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make s… ▽ More

    Submitted 10 July, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 33 pages, 1 figure. In NeurIPS 2021

  11. arXiv:2107.02377  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    A Short Note on the Relationship of Information Gain and Eluder Dimension

    Authors: Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei

    Abstract: Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  12. arXiv:2103.12692  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting of Constant-Stepsize SGD for Linear Regression

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: There is an increasing realization that algorithmic inductive biases are central in preventing overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized settings for natural learning algorithms, such as stochastic gradient descent (SGD), where little to no explicit regularization has been employed. This work considers this issue in arguably the most basic setting:… ▽ More

    Submitted 12 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: 56 pages, 2 figures. A short version is accepted at the 34th Annual Conference on Learning Theory (COLT 2021)

  13. arXiv:2103.10897  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Bilinear Classes: A Structural Framework for Provable Generalization in RL

    Authors: Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

    Abstract: This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the opti… ▽ More

    Submitted 11 July, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Expanded extension section to include generalized linear bellman complete and changed related work

  14. arXiv:2103.04947  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Instabilities of Offline RL with Pre-Trained Neural Representation

    Authors: Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade

    Abstract: In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold, else… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  15. arXiv:2010.11895  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    What are the Statistical Limits of Offline RL with Linear Function Approximation?

    Authors: Ruosong Wang, Dean P. Foster, Sham M. Kakade

    Abstract: Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making p… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  16. arXiv:2007.07461  [pdf, ps, other

    cs.LG cs.GT cs.MA math.OC stat.ML

    Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

    Authors: Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang

    Abstract: Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intu… ▽ More

    Submitted 8 August, 2023; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Updated version accepted to Journal of Machine Learning Research (JMLR)

  17. arXiv:2006.12484  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

    Authors: Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu

    Abstract: Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number of computational and statistical hardness results for learning general Partially Observable Markov Decision Processes (POMDPs). This work shows that these hard… ▽ More

    Submitted 24 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020 as spotlight

  18. arXiv:2005.00527  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

    Authors: Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

    Abstract: Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the number of episodes it takes to provably discover a policy whose value is $\varepsilon$ near to tha… ▽ More

    Submitted 9 July, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

  19. arXiv:2002.09434  [pdf, ps, other

    cs.LG math.OC stat.ML

    Few-Shot Learning via Learning the Representation, Provably

    Authors: Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei

    Abstract: This paper studies few-shot learning via representation learning, where one uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task for which there is only $n_2 (\ll n_1)$ data. Specifically, we focus on the setting where there exists a good \emph{common representation} between source and target, and our goal is to understa… ▽ More

    Submitted 30 March, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: ICLR2021

  20. arXiv:1911.12568  [pdf, other

    cs.LG math.ST stat.ML

    Optimal Estimation of Change in a Population of Parameters

    Authors: Ramya Korlakai Vinayak, Weihao Kong, Sham M. Kakade

    Abstract: Paired estimation of change in parameters of interest over a population plays a central role in several application domains including those in the social sciences, epidemiology, medicine and biology. In these domains, the size of the population under study is often very large, however, the number of observations available per individual in the population is very small (\emph{sparse observations})… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

  21. arXiv:1910.03016  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

    Authors: Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

    Abstract: Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for sample efficient reinforcement learning? This question has largely been studied only with respect to (worst-case) approximation error, in the more classical approximate dynamic programming literature. With regards to the statistical viewpoint, this question is… ▽ More

    Submitted 27 February, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: To appear in ICLR 2020

  22. arXiv:1904.12838  [pdf, other

    cs.LG math.OC stat.ML

    The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

    Authors: Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli

    Abstract: Minimax optimal convergence rates for classes of stochastic convex optimization problems are well characterized, where the majority of results utilize iterate averaged stochastic gradient descent (SGD) with polynomially decaying step sizes. In contrast, SGD's final iterate behavior has received much less attention despite their widespread use in practice. Motivated by this observation, this work p… ▽ More

    Submitted 29 October, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: Appears in the proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2019. 28 pages, 4 tables, 1 Algorithm, 7 figures

  23. arXiv:1902.08721  [pdf, ps, other

    cs.LG eess.SY math.OC stat.ML

    Online Control with Adversarial Disturbances

    Authors: Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, Karan Singh

    Abstract: We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this pro… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

  24. arXiv:1902.04811  [pdf, ps, other

    cs.LG math.OC stat.ML

    On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

    Authors: Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan

    Abstract: Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a gap has arisen between theory and practice. Indeed, traditional analyses of GD and SGD… ▽ More

    Submitted 3 September, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

    Comments: A preliminary version of this paper, with a subset of the results that are presented here, was presented at ICML 2017 (also as arXiv:1703.00887)

  25. arXiv:1902.04553  [pdf, ps, other

    math.ST cs.LG stat.ML

    Maximum Likelihood Estimation for Learning Populations of Parameters

    Authors: Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade

    Abstract: Consider a setting with $N$ independent individuals, each with an unknown parameter, $p_i \in [0, 1]$ drawn from some unknown distribution $P^\star$. After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i \sim \text{Binomial}(t, p_i)$ per individual, our objective is to accurately estimate $P^\star$. This problem arises in numerous domains, including the social sciences, psyc… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  26. arXiv:1902.03736  [pdf, ps, other

    math.PR cs.LG stat.ML

    A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

    Authors: Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan

    Abstract: In this note, we derive concentration inequalities for random vectors with subGaussian norm (a generalization of both subGaussian random vectors and norm bounded random vectors), which are tight up to logarithmic factors.

    Submitted 11 February, 2019; originally announced February 2019.

  27. arXiv:1902.03228  [pdf, other

    stat.ML cs.LG math.OC

    A Smoother Way to Train Structured Prediction Models

    Authors: Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui

    Abstract: We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. Smoothing overcomes the non-smoothness inherent to the maximum margin structured prediction objective, and paves the way for the use of fast primal gradient-based optimization algorithms. We illustrate the proposed framework by developing a novel primal incremental optim… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: Short version appeared in Neural Information Processing Systems (NeurIPS) 2018

  28. arXiv:1803.05591  [pdf, other

    cs.LG math.OC stat.ML

    On the insufficiency of existing momentum schemes for Stochastic Optimization

    Authors: Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade

    Abstract: Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). Rigorously speaking, "fast gradient" methods have provable improvements over gradient descent on… ▽ More

    Submitted 31 July, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

    Comments: 28 pages, 10 figures. Updated acknowledgements. Appeared as an oral presentation at International Conference on Learning Representations (ICLR), 2018. Code implementing the ASGD method can be found at https://github.com/rahulkidambi/AccSGD

  29. arXiv:1710.09430  [pdf, ps, other

    stat.ML cs.LG math.OC

    A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

    Abstract: This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addre… ▽ More

    Submitted 21 July, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: Lemma 1 has been updated in v2

  30. arXiv:1704.08227  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Accelerating Stochastic Gradient Descent For Least Squares Regression

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

    Abstract: There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stoc… ▽ More

    Submitted 31 July, 2018; v1 submitted 26 April, 2017; originally announced April 2017.

    Comments: 54 pages, 3 figures, 1 table; updated acknowledgements, minor title change. Paper appeared in the proceedings of the Conference on Learning Theory (COLT), 2018

  31. arXiv:1703.00887  [pdf, ps, other

    cs.LG math.OC stat.ML

    How to Escape Saddle Points Efficiently

    Authors: Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan

    Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are no… ▽ More

    Submitted 2 March, 2017; originally announced March 2017.

  32. arXiv:1605.08754  [pdf, other

    cs.DS cs.LG math.NA math.OC

    Faster Eigenvector Computation via Shift-and-Invert Preconditioning

    Authors: Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

    Abstract: We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

    Comments: Appearing in ICML 2016. Combination of work in arXiv:1509.05647 and arXiv:1510.08896

  33. arXiv:1605.08370  [pdf, ps, other

    cs.LG math.OC stat.ML

    Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

    Authors: Chi Jin, Sham M. Kakade, Praneeth Netrapalli

    Abstract: Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications,… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  34. arXiv:1604.03930  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

    Authors: Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

    Abstract: This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al., 2009). We provide si… ▽ More

    Submitted 27 May, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

    Comments: International Conference on Machine Learning (ICML) 2016

  35. arXiv:1510.08896  [pdf, other

    cs.DS cs.LG math.NA math.OC

    Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

    Authors: Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

    Abstract: We provide faster algorithms and improved sample complexities for approximating the top eigenvector of a matrix. Offline Setting: Given an $n \times d$ matrix $A$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O ( [nnz(A) + \frac{d \cdot sr(A)}{gap^2}]\cdot \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d \cdot sr(A))^{1/4}}{\sqrt{gap}}]\cdot \log1/ε)$. Here $sr(A)$ is… ▽ More

    Submitted 29 May, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

    Comments: Manuscript outdated. Updated version at arxiv:1605.08754

  36. arXiv:1507.05854  [pdf, other

    math.NA cs.DS math.OC

    Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot

    Authors: Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli

    Abstract: While there has been a significant amount of work studying gradient descent techniques for non-convex optimization problems over the last few years, all existing results establish either local convergence with good rates or global convergence with highly suboptimal rates, for many problems of interest. In this paper, we take the first step in getting the best of both worlds -- establishing global… ▽ More

    Submitted 9 March, 2017; v1 submitted 21 July, 2015; originally announced July 2015.

    Comments: Appear in AISTATS 2017

  37. arXiv:1211.5414  [pdf, ps, other

    cs.DS cs.LG math.NA stat.ML

    Analysis of a randomized approximation scheme for matrix multiplication

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: This note gives a simple analysis of a randomized approximation scheme for matrix multiplication proposed by Sarlos (2006) based on a random rotation followed by uniform column sampling. The result follows from a matrix version of Bernstein's inequality and a tail inequality for quadratic forms in subgaussian random vectors.

    Submitted 23 November, 2012; originally announced November 2012.

  38. arXiv:1210.7559  [pdf, ps, other

    cs.LG math.NA stat.ML

    Tensor decompositions for learning latent variable models

    Authors: Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

    Abstract: This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to… ▽ More

    Submitted 13 November, 2014; v1 submitted 29 October, 2012; originally announced October 2012.

    Journal ref: Journal of Machine Learning Research, 15(Aug):2773-2832, 2014

  39. arXiv:1110.2842  [pdf, ps, other

    math.PR cs.LG

    A tail inequality for quadratic forms of subgaussian random vectors

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: We prove an exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector. The bound is analogous to one that holds when the vector has independent Gaussian entries.

    Submitted 13 October, 2011; originally announced October 2011.

  40. arXiv:1107.1744  [pdf, other

    math.OC cs.LG eess.SY

    Stochastic convex optimization with bandit feedback

    Authors: Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin

    Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm'… ▽ More

    Submitted 8 October, 2011; v1 submitted 8 July, 2011; originally announced July 2011.

  41. arXiv:1106.2363  [pdf, ps, other

    math.ST cs.AI cs.LG stat.ML

    Random design analysis of ridge regression

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the ``out-of-sample'' prediction error, as opposed to the ``in-sample'' (fixed design) error. The analysis also reveals the effect of… ▽ More

    Submitted 24 March, 2014; v1 submitted 12 June, 2011; originally announced June 2011.

  42. arXiv:1104.1672  [pdf, ps, other

    math.PR cs.LG stat.ML

    Dimension-free tail inequalities for sums of random matrices

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: We derive exponential tail inequalities for sums of random matrices with no dependence on the explicit matrix dimensions. These are similar to the matrix versions of the Chernoff bound and Bernstein inequality except with the explicit matrix dimensions replaced by a trace quantity that can be small even when the dimension is large or infinite. Some applications to principal component analysis and… ▽ More

    Submitted 16 April, 2011; v1 submitted 9 April, 2011; originally announced April 2011.

  43. arXiv:1011.1518  [pdf, ps, other

    stat.ML cs.LG math.NA

    Robust Matrix Decomposition with Outliers

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse matrix (outliers), and the goal is to recover these individual components from the observed sum. Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, and principal components analysis. We study conditions… ▽ More

    Submitted 3 December, 2010; v1 submitted 5 November, 2010; originally announced November 2010.

    Comments: Corrected comparisons to previous work of Candes et al (2009)