Skip to main content

Showing 1–50 of 151 results for author: Gu, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.12020  [pdf, other

    cs.LG cs.AI stat.ML

    Variance-Dependent Regret Lower Bounds for Contextual Bandits

    Authors: Jiafan He, Quanquan Gu

    Abstract: Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical $\tilde{O}(d\sqrt{K})$ regret bound to $\tilde{O}(d\sqrt{\sum_{k=1}^Kσ_k^2})$, where $d$ is the context dimension, $K$ is the number of rounds, and $σ^2_k$ is the noise variance in round $k$, has been widely studied in recent years. However, most existing works focus on the regret upper bounds instead… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: 19 pages

  2. arXiv:2503.09565  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization

    Authors: Zixiang Chen, Greg Yang, Qingyue Zhao, Quanquan Gu

    Abstract: Despite deep neural networks' powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like the neural tangent kernel (NTK) are limited because features stay close to their initialization in this parametrization, leaving open questions about feature pro… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 29 pages, 5 figures, 2 tables

  3. arXiv:2502.14123  [pdf, other

    cs.LG math.OC stat.ML

    Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression

    Authors: Xuheng Li, Quanquan Gu

    Abstract: Exponential moving average (EMA) has recently gained significant popularity in training modern deep learning models, especially diffusion-based generative models. However, there have been few theoretical results explaining the effectiveness of EMA. In this paper, to better understand EMA, we establish the risk bound of online SGD with EMA for high-dimensional linear regression, one of the simplest… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 34 pages, 4 figures

  4. arXiv:2502.07460  [pdf, ps, other

    cs.LG stat.ML

    Logarithmic Regret for Online KL-Regularized Reinforcement Learning

    Authors: Heyang Zhao, Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

    Abstract: Recent advances in Reinforcement Learning from Human Feedback (RLHF) have shown that KL-regularization plays a pivotal role in improving the efficiency of RL fine-tuning for large language models (LLMs). Despite its empirical advantage, the theoretical difference between KL-regularized RL and standard RL remains largely under-explored. While there is a recent line of work on the theoretical analys… ▽ More

    Submitted 30 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  5. arXiv:2502.06051  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

    Authors: Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Tong Zhang, Quanquan Gu

    Abstract: Although many popular reinforcement learning algorithms are underpinned by $f$-divergence regularization, their sample complexity with respect to the \emph{regularized objective} still lacks a tight characterization. In this paper, we analyze $f$-divergence-regularized offline policy learning. For reverse Kullback-Leibler (KL) divergence, arguably the most commonly used one, we give the first… ▽ More

    Submitted 30 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: 38 pages

  6. arXiv:2412.19444  [pdf, other

    cs.LG math.OC stat.ML

    Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

    Authors: Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

    Abstract: Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, adhoc tuning of learning rates poses a challenge, leading to inefficiencies in practice. To address this issue, recent research has focused on developing "learning-rate-free" or "parameter-free" algorithms that… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: 34 pages, 16 figures, 3 tables

  7. arXiv:2411.10438  [pdf, other

    cs.LG math.OC stat.ML

    MARS: Unleashing the Power of Variance Reduction for Training Large Models

    Authors: Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu

    Abstract: Training deep neural networks--and more recently, large models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not… ▽ More

    Submitted 10 February, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: 47 pages, 18 figures, 12 tables

  8. arXiv:2411.04625  [pdf, other

    cs.LG stat.ML

    Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

    Authors: Heyang Zhao, Chenlu Ye, Quanquan Gu, Tong Zhang

    Abstract: Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning (RL) and reinforcement learning from human feedback (RLHF), which forces the learned policy to stay close to a reference policy. While the effectiveness and necessity of KL-regularization have been empirically demonstrated in various practical scenari… ▽ More

    Submitted 11 February, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  9. arXiv:2410.14237  [pdf, ps, other

    cs.LG math.OC stat.ML

    Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers

    Authors: Runjia Li, Qiwei Di, Quanquan Gu

    Abstract: Score-based diffusion models have emerged as powerful techniques for generating samples from high-dimensional data distributions. These models involve a two-phase process: first, injecting noise to transform the data distribution into a known prior distribution, and second, sampling to recover the original data distribution from noise. Among the various sampling methods, deterministic samplers sta… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 68 pages

  10. arXiv:2410.02321  [pdf, other

    cs.LG stat.ML

    Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

    Authors: Zikun Zhang, Zixiang Chen, Quanquan Gu

    Abstract: Diffusion models have achieved great success in generating high-dimensional samples across various applications. While the theoretical guarantees for continuous-state diffusion models have been extensively studied, the convergence analysis of the discrete-state counterparts remains under-explored. In this paper, we study the theoretical aspects of score-based discrete diffusion models under the Co… ▽ More

    Submitted 12 April, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: 26 pages, 1 figure

    Journal ref: The Thirteenth International Conference on Learning Representations, 2025

  11. arXiv:2409.02416  [pdf, other

    cs.LG stat.ML

    Relative-Translation Invariant Wasserstein Distance

    Authors: Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

    Abstract: We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$), for measuring the similarity of two probability distributions under distribution shift. Generalizing it from the classical optimal transport model, we show that $RW_p$ distances are also real distance metrics defined on the quotient set $\mathcal{P}_p(\mathbb{R}^n)/\sim$ and invariant to distribu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  12. arXiv:2405.00675  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Self-Play Preference Optimization for Language Model Alignment

    Authors: Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu

    Abstract: Standard reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model… ▽ More

    Submitted 4 October, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 27 pages, 4 figures, 5 tables

  13. arXiv:2404.12376  [pdf, other

    cs.LG math.OC stat.ML

    Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

    Authors: Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

    Abstract: The $k$-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can eff… ▽ More

    Submitted 5 December, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 37 pages, 7 figures, 3 tables. In NeurIPS 2024

  14. arXiv:2404.06013  [pdf, other

    cs.LG math.OC stat.ML

    Feel-Good Thompson Sampling for Contextual Dueling Bandits

    Authors: Xuheng Li, Heyang Zhao, Quanquan Gu

    Abstract: Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 30 pages, 6 figures

  15. arXiv:2402.10210  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

    Authors: Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

    Abstract: Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Re… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 28 pages, 8 figures, 10 tables

  16. arXiv:2402.09401  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Reinforcement Learning from Human Feedback with Active Queries

    Authors: Kaixuan Ji, Jiafan He, Quanquan Gu

    Abstract: Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning,… ▽ More

    Submitted 11 February, 2025; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 28 pages, 1 figure, 4 table

  17. arXiv:2402.08998  [pdf, other

    cs.LG stat.ML

    Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

    Authors: Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

    Abstract: We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we p… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 28 pages, 1 figure, In ICML 2023

  18. arXiv:2402.08991  [pdf, ps, other

    stat.ML cs.LG

    Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

    Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

    Abstract: This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to mod… ▽ More

    Submitted 20 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  19. arXiv:2401.01335  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    Authors: Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu

    Abstract: Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: 22 pages, 6 figures, 7 tables. In ICML 2024

  20. arXiv:2312.16793  [pdf, other

    cs.LG stat.ML

    Sparse PCA with Oracle Property

    Authors: Quanquan Gu, Zhaoran Wang, Han Liu

    Abstract: In this paper, we study the estimation of the $k$-dimensional sparse principal subspace of covariance matrix $Σ$ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 16 pages, 1 table. In NIPS 2014

  21. arXiv:2312.09193  [pdf, other

    cs.LG cs.AI stat.ML

    Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time

    Authors: Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu

    Abstract: Discrete diffusion models have emerged as powerful tools for high-quality data generation. Despite their success in discrete spaces, such as text generation tasks, the acceleration of discrete diffusion models remains under-explored. In this paper, we propose discrete non-Markov diffusion models (DNDM), which naturally induce the predetermined transition time set. This enables a training-free samp… ▽ More

    Submitted 5 December, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 36 pages, 5 figures, 13 tables. In NeurIPS 2024

  22. arXiv:2311.15238  [pdf, other

    cs.LG math.OC stat.ML

    A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

    Authors: Heyang Zhao, Jiafan He, Quanquan Gu

    Abstract: The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 52 pages, 1 table

  23. arXiv:2311.14222  [pdf, other

    cs.LG math.OC stat.ML

    Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

    Authors: Xuheng Li, Yihe Deng, Jingfeng Wu, Dongruo Zhou, Quanquan Gu

    Abstract: Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest se… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 85 pages, 5 figures

  24. arXiv:2310.18935  [pdf, other

    cs.LG math.OC stat.ML

    Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data

    Authors: Yiwen Kou, Zixiang Chen, Quanquan Gu

    Abstract: The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural net… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 55 pages, 7 figures. In NeurIPS 2023

  25. arXiv:2310.08391  [pdf, other

    stat.ML cs.LG

    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

    Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

    Abstract: Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a stati… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera Ready

  26. arXiv:2310.07269  [pdf, other

    cs.LG math.OC stat.ML

    Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

    Authors: Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen, Cho-Jui Hsieh, Quanquan Gu

    Abstract: The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However,… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 52 pages, 4 figures, 2 tables. In NeurIPS 2023

  27. arXiv:2310.01380  [pdf, ps, other

    cs.LG math.OC stat.ML

    Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

    Authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

    Abstract: Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function app… ▽ More

    Submitted 8 October, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 34 pages, 1 table

  28. arXiv:2310.00968  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

    Authors: Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

    Abstract: Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret b… ▽ More

    Submitted 14 October, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 24 pages, 2 figures. In ICLR 2024

  29. arXiv:2310.00927  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

    Authors: Zixiang Chen, Yihe Deng, Yuanzhi Li, Quanquan Gu

    Abstract: Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e.g., text and images) to improve the model performance. Recently, CLIP has emerged as an effective approach that employs vision-language contrastive pretraining to learn joint image and text representations and exhibits remarkable performance in zero-shot learning and text-… ▽ More

    Submitted 10 July, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 31 pages, 7 tables, 6 figures. In ICLR 2024

  30. arXiv:2306.11680  [pdf, other

    cs.LG math.OC stat.ML

    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

    Authors: Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu

    Abstract: We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-Ω(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in t… ▽ More

    Submitted 11 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 53 pages, 2 figures

  31. arXiv:2305.08359  [pdf, other

    cs.LG math.OC stat.ML

    Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

    Authors: Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

    Abstract: Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it remains an open question that if such results can be carried over to adversarial RL, where the reward is adversarially chosen at each episode. In this paper, we… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 34 pages

  32. arXiv:2305.08350  [pdf, other

    cs.LG math.OC stat.ML

    Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

    Authors: Yue Wu, Jiafan He, Quanquan Gu

    Abstract: Recently, there has been remarkable progress in reinforcement learning (RL) with general function approximation. However, all these works only provide regret or sample complexity guarantees. It is still an open question if one can achieve stronger performance guarantees, i.e., the uniform probably approximate correctness (Uniform-PAC) guarantee that can imply both a sub-linear regret bound and a p… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 21 pages, 1 table. To appear in UAI 2023

  33. arXiv:2303.10165  [pdf, ps, other

    cs.LG math.OC stat.ML

    Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

    Authors: Junkai Zhang, Weitong Zhang, Quanquan Gu

    Abstract: We study reward-free reinforcement learning (RL) with linear function approximation, where the agent works in two phases: (1) in the exploration phase, the agent interacts with the environment but cannot access the reward; and (2) in the planning phase, the agent is given a reward function and is expected to find a near-optimal policy based on samples collected in the exploration phase. The sample… ▽ More

    Submitted 14 February, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: 37 pages, 1 figure, 2 tables. In ICML 2023

  34. arXiv:2303.09390  [pdf, other

    cs.LG stat.ML

    On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

    Authors: Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

    Abstract: We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level $ζ>0$. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level $ζ$ is dom… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 28 pages, 2 figures, 2 tables

  35. arXiv:2303.08816  [pdf, other

    cs.LG stat.ML

    Borda Regret Minimization for Generalized Linear Dueling Bandits

    Authors: Yue Wu, Tao Jin, Hao Lou, Farzad Farnoud, Quanquan Gu

    Abstract: Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cov… ▽ More

    Submitted 25 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 33 pages, 5 figure. This version includes new results for dueling bandits in the adversarial setting

  36. arXiv:2303.08433  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Benefits of Mixup for Feature Learning

    Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

    Abstract: Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 72 pages, 4 figures

  37. arXiv:2303.04145  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting for Two-layer ReLU Convolutional Neural Networks

    Authors: Yiwen Kou, Zixiang Chen, Yuanzhou Chen, Quanquan Gu

    Abstract: Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as \textit{benign overfitting}. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the ne… ▽ More

    Submitted 3 November, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 45 pages, 3 figures, 2 tables. In ICML 2023

  38. arXiv:2303.02255  [pdf, other

    cs.LG math.OC stat.ML

    Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

    Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspec… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: ICML 2023 camera ready

  39. arXiv:2302.10371  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

    Authors: Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

    Abstract: Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime and the deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of the noise. In this… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 43 pages, 2 tables

  40. arXiv:2212.06132  [pdf, ps, other

    cs.LG math.OC stat.ML

    Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

    Authors: Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

    Abstract: We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the d… ▽ More

    Submitted 3 November, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: 33 pages, 1 table. In ICML 2023

  41. arXiv:2212.05949  [pdf, ps, other

    stat.ML cs.LG

    Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

    Authors: Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

    Abstract: Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}ζ)$ regret bound, where $T$ is the number of rounds and $ζ$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and pr… ▽ More

    Submitted 10 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: We study the corruption-robust MDPs and contextual bandits with general function approximation

    Journal ref: ICML 2023

  42. arXiv:2210.17550  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization

    Authors: Chris Junchi Li, Angela Yuan, Gauthier Gidel, Quanquan Gu, Michael I. Jordan

    Abstract: We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent -- for separable convex-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper re… ▽ More

    Submitted 14 August, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: 44 pages. This version matches the camera-ready that appeared at ICML 2023 under the same title

  43. arXiv:2209.15634  [pdf, other

    cs.LG cs.AI stat.ML

    A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

    Authors: Zixiang Chen, Chris Junchi Li, Angela Yuan, Quanquan Gu, Michael I. Jordan

    Abstract: With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL). In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov Decision Process (MDP) models in the literature for tractable RL… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

  44. arXiv:2208.05363  [pdf, ps, other

    cs.LG cs.AI cs.GT math.OC stat.ML

    Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

    Authors: Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan

    Abstract: We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: 42 pages

  45. arXiv:2208.02813  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Understanding Mixture of Experts in Deep Learning

    Authors: Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li

    Abstract: The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: 53 pages, 8 figures, 11 tables

  46. arXiv:2208.01857  [pdf, other

    cs.LG math.OC stat.ML

    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

    Authors: Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 32 pages, 1 figure, 1 table

  47. arXiv:2207.03106  [pdf, other

    cs.LG stat.ML

    A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

    Authors: Jiafan He, Tianhao Wang, Yifei Min, Quanquan Gu

    Abstract: We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLin… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: 25 pages, 1 figure, 2 tables

  48. arXiv:2205.11507  [pdf, other

    cs.LG math.OC stat.ML

    Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

    Authors: Dongruo Zhou, Quanquan Gu

    Abstract: Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than contextual bandits, even with a long planning horizon and unknown state transitions. However, these results are limited to either tabular Markov decision processes (MDPs) or computationally inefficient algorithms for linear mixture MDPs. In this paper, we propose the first computationally efficient horiz… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: 33 pages, 1 table

  49. arXiv:2205.06811  [pdf, other

    cs.LG stat.ML

    Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

    Authors: Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

    Abstract: We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruptio… ▽ More

    Submitted 9 July, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 25 pages, 1 table. This version simplifies the proof of the regret upper bound in Version 1, and provides a stronger result for the lower bound

  50. arXiv:2203.03159  [pdf, other

    cs.LG math.OC stat.ML

    Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 28 pages, 2 figures