-
Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints
Authors:
Asuman Ozdaglar,
Sarath Pattathil,
Jiawei Zhang,
Kaiqing Zhang
Abstract:
Offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal $O(1/\sqrt{n})$ sample complexity, where $n$ is the sample size, under partial data coverage and gene…
▽ More
Offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal $O(1/\sqrt{n})$ sample complexity, where $n$ is the sample size, under partial data coverage and general function approximation, and with favorable computational tractability. To this end, we derive new \emph{error bounds} for both the dual and primal-dual formulations of the LP, and incorporate them properly as \emph{constraints} in the LP reformulation. We then show that under a completeness-type assumption, $O(1/\sqrt{n})$ sample complexity can be achieved under standard single-policy coverage assumption, when one properly \emph{relaxes} the occupancy validity constraint in the LP. This framework can readily handle both infinite-horizon discounted and average-reward MDPs, in both general function approximation and tabular cases. The instantiation to the tabular case achieves either state-of-the-art or the first sample complexities of offline RL in these settings. To further remove any completeness-type assumption, we then introduce a proper \emph{lower-bound constraint} in the LP, and a variant of the standard single-policy coverage assumption. Such an algorithm leads to a $O(1/\sqrt{n})$ sample complexity with dependence on the \emph{value-function gap}, with only realizability assumptions. Our properly constrained LP framework advances the existing results in several aspects, in relaxing certain assumptions and achieving the optimal $O(1/\sqrt{n})$ sample complexity, with simple analyses. We hope our results bring new insights into the use of LP formulations and the equivalent primal-dual minimax optimization for offline RL, through the error-bound induced constraints.
△ Less
Submitted 9 December, 2024; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
Authors:
Sarath Pattathil,
Kaiqing Zhang,
Asuman Ozdaglar
Abstract:
Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the ve…
▽ More
Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games, with global last-iterate parameter convergence guarantees. We also generalize the results to certain function approximation settings. Note that in our algorithms, the agents take symmetric roles. Our results might also be of independent interest for solving nonconvex-nonconcave minimax optimization problems with certain structures. Simulations are also provided to corroborate our theoretical findings.
△ Less
Submitted 20 March, 2023; v1 submitted 23 October, 2022;
originally announced October 2022.
-
What is a Good Metric to Study Generalization of Minimax Learners?
Authors:
Asuman Ozdaglar,
Sarath Pattathil,
Jiawei Zhang,
Kaiqing Zhang
Abstract:
Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A…
▽ More
Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.
△ Less
Submitted 20 June, 2022; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Optimal adaptive testing for epidemic control: combining molecular and serology tests
Authors:
D. Acemoglu,
A. Fallah,
A. Giometto,
D. Huttenlocher,
A. Ozdaglar,
F. Parise,
S. Pattathil
Abstract:
The COVID-19 crisis highlighted the importance of non-medical interventions, such as testing and isolation of infected individuals, in the control of epidemics. Here, we show how to minimize testing needs while maintaining the number of infected individuals below a desired threshold. We find that the optimal policy is adaptive, with testing rates that depend on the epidemic state. Additionally, we…
▽ More
The COVID-19 crisis highlighted the importance of non-medical interventions, such as testing and isolation of infected individuals, in the control of epidemics. Here, we show how to minimize testing needs while maintaining the number of infected individuals below a desired threshold. We find that the optimal policy is adaptive, with testing rates that depend on the epidemic state. Additionally, we show that such epidemic state is difficult to infer with molecular tests alone, which are highly sensitive but have a short detectability window. Instead, we propose the use of baseline serology testing, which is less sensitive but detects past infections, for the purpose of state estimation. Validation of such combined testing approach with a stochastic model of epidemics shows significant cost savings compared to non-adaptive testing strategies that are the current standard for COVID-19.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Tight last-iterate convergence rates for no-regret learning in multi-player games
Authors:
Noah Golowich,
Sarath Pattathil,
Constantinos Daskalakis
Abstract:
We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who as…
▽ More
We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
An Optimal Multistage Stochastic Gradient Method for Minimax Problems
Authors:
Alireza Fallah,
Asuman Ozdaglar,
Sarath Pattathil
Abstract:
In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the…
▽ More
In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error's dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems
Authors:
Noah Golowich,
Sarath Pattathil,
Constantinos Daskalakis,
Asuman Ozdaglar
Abstract:
In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of $O(1/T)$ (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of $O(1/\sqrt{T})$. To the best of our knowle…
▽ More
In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of $O(1/T)$ (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of $O(1/\sqrt{T})$. To the best of our knowledge, this is the first paper to provide a convergence rate guarantee for the last iterate of EG for the smooth convex-concave saddle point problem. Moreover, we show that this rate is tight by proving a lower bound of $Ω(1/\sqrt{T})$ for the last iterate. This lower bound therefore shows a quadratic separation of the convergence rates of ergodic and last iterates in smooth convex-concave saddle point problems.
△ Less
Submitted 6 July, 2020; v1 submitted 31 January, 2020;
originally announced February 2020.
-
A Decentralized Proximal Point-type Method for Saddle Point Problems
Authors:
Weijie Liu,
Aryan Mokhtari,
Asuman Ozdaglar,
Sarath Pattathil,
Zebang Shen,
Nenggan Zheng
Abstract:
In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point metho…
▽ More
In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $ρ$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$ where the approximation error depends linearly on $\sqrtρ$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees.
△ Less
Submitted 31 October, 2019;
originally announced October 2019.
-
Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems
Authors:
Aryan Mokhtari,
Asuman Ozdaglar,
Sarath Pattathil
Abstract:
We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG…
▽ More
We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG as an approximation of the `conceptual mirror prox'. In this paper, we highlight how gradients used in OGDA and EG try to approximate the gradient of the Proximal Point method. We then exploit this interpretation to show that both algorithms produce iterates that remain within a bounded set. We further show that the primal dual gap of the averaged iterates generated by both of these algorithms converge with a rate of $\mathcal{O}(1/k)$. Our theoretical analysis is of interest as it provides a the first convergence rate estimate for OGDA in the general convex-concave setting. Moreover, it provides a simple convergence analysis for the EG algorithm in terms of function value without using compactness assumption.
△ Less
Submitted 29 September, 2020; v1 submitted 3 June, 2019;
originally announced June 2019.
-
A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach
Authors:
Aryan Mokhtari,
Asuman Ozdaglar,
Sarath Pattathil
Abstract:
In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. We show that both of these algorithms admit a unified analysis as approximations of the classical proximal point method for solving saddle point problems. This viewpoint enables us to develop a new framework for…
▽ More
In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. We show that both of these algorithms admit a unified analysis as approximations of the classical proximal point method for solving saddle point problems. This viewpoint enables us to develop a new framework for analyzing EG and OGDA for bilinear and strongly convex-strongly concave settings. Moreover, we use the proximal point approximation interpretation to generalize the results for OGDA for a wide range of parameters.
△ Less
Submitted 5 September, 2019; v1 submitted 24 January, 2019;
originally announced January 2019.
-
Concentration bounds for two time scale stochastic approximation
Authors:
Vivek S. Borkar,
Sarath Pattathil
Abstract:
Viewing a two time scale stochastic approximation scheme as a noisy discretization of a singularly perturbed differential equation, we obtain a concentration bound for its iterates that captures its behavior with quantifiable high probability. This uses Alekseev's nonlinear variation of constants formula and a martingale concentration inequality and extends the corresponding results for single tim…
▽ More
Viewing a two time scale stochastic approximation scheme as a noisy discretization of a singularly perturbed differential equation, we obtain a concentration bound for its iterates that captures its behavior with quantifiable high probability. This uses Alekseev's nonlinear variation of constants formula and a martingale concentration inequality and extends the corresponding results for single time scale stochastic approximation.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.