-
Computing Optimal Transport Plans via Min-Max Gradient Flows
Authors:
Lauren Conger,
Franca Hoffmann,
Ricardo Baptista,
Eric Mazumdar
Abstract:
We pose the Kantorovich optimal transport problem as a min-max problem with a Nash equilibrium that can be obtained dynamically via a two-player game, providing a framework for approximating optimal couplings. We prove convergence of the timescale-separated gradient descent dynamics to the optimal transport plan, and implement the gradient descent algorithm with a particle method, where the margin…
▽ More
We pose the Kantorovich optimal transport problem as a min-max problem with a Nash equilibrium that can be obtained dynamically via a two-player game, providing a framework for approximating optimal couplings. We prove convergence of the timescale-separated gradient descent dynamics to the optimal transport plan, and implement the gradient descent algorithm with a particle method, where the marginal constraints are enforced weakly using the KL divergence, automatically selecting a dynamical adaptation of the regularizer. The numerical results highlight the different advantages of using the standard Kullback-Leibler (KL) divergence versus the reverse KL divergence with this approach, opening the door for new methodologies.
△ Less
Submitted 27 May, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Learning to Steer Learners in Games
Authors:
Yizhou Zhang,
Yi-An Ma,
Eric Mazumdar
Abstract:
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an…
▽ More
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner's objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner's payoff structure. We demonstrate the effectiveness of this approach if the learner's algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.
△ Less
Submitted 28 May, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
Authors:
Shangding Gu,
Laixi Shi,
Muning Wen,
Ming Jin,
Eric Mazumdar,
Yuejie Chi,
Adam Wierman,
Costas Spanos
Abstract:
Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are…
▽ More
Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Coupled Wasserstein Gradient Flows for Min-Max and Cooperative Games
Authors:
Lauren Conger,
Franca Hoffmann,
Eric Mazumdar,
Lillian J. Ratliff
Abstract:
We propose a framework for two-player infinite-dimensional games with cooperative or competitive structure. These games take the form of coupled partial differential equations in which players optimize over a space of measures, driven by either a gradient descent or gradient descent-ascent in Wasserstein-2 space. We characterize the properties of the Nash equilibrium of the system, and relate it t…
▽ More
We propose a framework for two-player infinite-dimensional games with cooperative or competitive structure. These games take the form of coupled partial differential equations in which players optimize over a space of measures, driven by either a gradient descent or gradient descent-ascent in Wasserstein-2 space. We characterize the properties of the Nash equilibrium of the system, and relate it to the steady state of the dynamics. In the min-max setting, we show, under sufficient convexity conditions, that solutions converge exponentially fast and with explicit rate to the unique Nash equilibrium. Similar results are obtained for the cooperative setting. We apply this framework to distribution shift induced by interactions among a strategic population of agents and an algorithm, proving additional convergence results in the timescale-separated setting. We illustrate the performance of our model on (i) real data from an economics study on Colombia census data, (ii) feature modification in loan applications, and (iii) performative prediction. The numerical experiments demonstrate the importance of distribution-level, rather than moment-level, modeling.
△ Less
Submitted 10 February, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning
Authors:
Laixi Shi,
Jingchu Gai,
Eric Mazumdar,
Yuejie Chi,
Adam Wierman
Abstract:
Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. RMGs remains under-explored, from reasonable problem formulation to the development of sa…
▽ More
Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. RMGs remains under-explored, from reasonable problem formulation to the development of sample-efficient algorithms. Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents. We first establish the well-posedness of this class of RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs, regardless of the uncertainty set formulation.
△ Less
Submitted 31 January, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Authors:
Zaiwei Chen,
Kaiqing Zhang,
Eric Mazumdar,
Asuman Ozdaglar,
Adam Wierman
Abstract:
In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorpor…
▽ More
In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorporation of the minimax value iteration. To our knowledge, our theoretical results present the first finite-sample analysis of such learning dynamics with last-iterate guarantees. In the matrix game setting, the results imply a sample complexity of $O(ε^{-1})$ to find the Nash distribution and a sample complexity of $O(ε^{-8})$ to find a Nash equilibrium. In the stochastic game setting, the results also imply a sample complexity of $O(ε^{-8})$ to find a Nash equilibrium. To establish these results, the main challenge is to handle stochastic approximation algorithms with multiple sets of coupled and stochastic iterates that evolve on (possibly) different time scales. To overcome this challenge, we developed a coupled Lyapunov-based approach, which may be of independent interest to the broader community studying the convergence behavior of stochastic approximation algorithms.
△ Less
Submitted 4 September, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Combinatorial invariants for certain classes of non-abelian groups
Authors:
Naveen K. Godara,
Renu Joshi,
Eshita Mazumdar
Abstract:
This article focuses on the study of zero-sum invariants of finite non-abelian groups. We address two main problems: the first centers on the ordered Davenport constant and the second on Gao's constant. We establish a connection between the ordered Davenport constant and the small Davenport constant for a finite non-abelian group of even order, which in turn gives a relation with the Noether numbe…
▽ More
This article focuses on the study of zero-sum invariants of finite non-abelian groups. We address two main problems: the first centers on the ordered Davenport constant and the second on Gao's constant. We establish a connection between the ordered Davenport constant and the small Davenport constant for a finite non-abelian group of even order, which in turn gives a relation with the Noether number. Additionally, we confirm a conjecture of Gao and Li for a non-abelian group of order $2p^α$, where $p$ is a prime. Furthermore, we prove a conjecture that connects the ordered Davenport constant to the Loewy length for certain classes of finite $2$-groups.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
On a conjecture related to the Davenport constant
Authors:
Naveen K. Godara,
Renu Joshi,
Eshita Mazumdar
Abstract:
For a finite group $G,$ $D(G)$ is defined as the least positive integer $k$ such that for every sequence $S=g_1 g_2\cdots g_k$ of length $k$ over $G$, there exist $1 \le i_1 < i_2 <\cdots < i_m \le k $ such that $\prod_{j=1}^{m} g_{i_{σ(j)}}=1$ holds for $σ= id,$ identity element of $S_m.$ For a finite abelian group, this group invariant, known as the Davenport constant, is crucial in the theory o…
▽ More
For a finite group $G,$ $D(G)$ is defined as the least positive integer $k$ such that for every sequence $S=g_1 g_2\cdots g_k$ of length $k$ over $G$, there exist $1 \le i_1 < i_2 <\cdots < i_m \le k $ such that $\prod_{j=1}^{m} g_{i_{σ(j)}}=1$ holds for $σ= id,$ identity element of $S_m.$ For a finite abelian group, this group invariant, known as the Davenport constant, is crucial in the theory of non-unique factorization domains. The precise value of this invariant, even for a finite abelian group of rank greater than $2$, is not known yet. In 1977, Olson and White first worked with this invariant for finite non-abelian groups. After that in 2004, Dimitrov dealt with it, where he proved that $D(G)\leq L(G)$ for a finite $p$-group $G$, where $p$ is a prime and $L(G)$ is the Loewy length of $\mathbb{F}_pG.$ He conjectured that equality holds for all finite $p$-groups. In this article, we compute $D(G)$ for a certain subclass of $2$-generated finite $p$-groups of nilpotency class two and show that the conjecture is true by determining the precise value of the Loewy length of $\mathbb{F}_pG.$ We also evaluate $D(G)$ for finite dicyclic, semi-dihedral and some other groups.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Tractable Equilibrium Computation in Markov Games through Risk Aversion
Authors:
Eric Mazumdar,
Kishan Panaganti,
Laixi Shi
Abstract:
A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of…
▽ More
A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.
△ Less
Submitted 26 August, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data
Authors:
Kishan Panaganti,
Adam Wierman,
Eric Mazumdar
Abstract:
The robust $φ$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $φ$-regularized fitted Q-iteration (RPQ) for learning an $ε$-opt…
▽ More
The robust $φ$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $φ$-regularized fitted Q-iteration (RPQ) for learning an $ε$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $φ$-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust $φ$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration (HyTQ: pronounced height-Q). To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems with general function approximation under the hybrid robust $φ$-regularized reinforcement learning framework. Finally, we provide theoretical guarantees on the performance of the learned policies of our algorithms on systems with arbitrary large state space.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
Authors:
Laixi Shi,
Eric Mazumdar,
Yuejie Chi,
Adam Wierman
Abstract:
To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This w…
▽ More
To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DRNVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.
△ Less
Submitted 8 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Flow-Based Synthesis of Reactive Tests for Discrete Decision-Making Systems with Temporal Logic Specifications
Authors:
Josefine B. Graebener,
Apurva S. Badithela,
Denizalp Goktas,
Wyatt Ubellacker,
Eric V. Mazumdar,
Aaron D. Ames,
Richard M. Murray
Abstract:
Designing tests to evaluate if a given autonomous system satisfies complex specifications is challenging due to the complexity of these systems. This work proposes a flow-based approach for reactive test synthesis from temporal logic specifications, enabling the synthesis of test environments consisting of static and reactive obstacles and dynamic test agents. The temporal logic specifications des…
▽ More
Designing tests to evaluate if a given autonomous system satisfies complex specifications is challenging due to the complexity of these systems. This work proposes a flow-based approach for reactive test synthesis from temporal logic specifications, enabling the synthesis of test environments consisting of static and reactive obstacles and dynamic test agents. The temporal logic specifications describe desired test behavior, including system requirements as well as a test objective that is not revealed to the system. The synthesized test strategy places restrictions on system actions in reaction to the system state. The tests are minimally restrictive and accomplish the test objective while ensuring realizability of the system's objective without aiding it (semi-cooperative setting). Automata theory and flow networks are leveraged to formulate a mixed-integer linear program (MILP) to synthesize the test strategy. For a dynamic test agent, the agent strategy is synthesized for a GR(1) specification constructed from the solution of the MILP. If the specification is unrealizable by the dynamics of the test agent, a counterexample-guided approach is used to resolve the MILP until a strategy is found. This flow-based, reactive test synthesis is conducted offline and is agnostic to the system controller. Finally, the resulting test strategy is demonstrated in simulation and experimentally on a pair of quadrupedal robots for a variety of specifications.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Characterizing Controllability and Observability for Systems with Locality, Communication, and Actuation Constraints
Authors:
Lauren Conger,
Yiheng Lin,
Adam Wierman,
Eric Mazumdar
Abstract:
This paper presents a closed-form notion of controllability and observability for systems with communication delays, actuation delays, and locality constraints. The formulation reduces to classical notions of controllability and observability in the unconstrained setting. As a consequence of our formulation, we show that the addition of locality and communication constraints may not affect the con…
▽ More
This paper presents a closed-form notion of controllability and observability for systems with communication delays, actuation delays, and locality constraints. The formulation reduces to classical notions of controllability and observability in the unconstrained setting. As a consequence of our formulation, we show that the addition of locality and communication constraints may not affect the controllability and observability of the system, and we provide an efficient sufficient condition under which this phenomenon occurs. This contrasts with actuation and sensing delays, which cause a gradual loss of controllability and observability as the delays increase. We illustrate our results using linearized swing equations for the power grid, showing how actuation delay and locality constraints affect controllability.
△ Less
Submitted 4 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Davenport constant for finite abelian groups with higher rank
Authors:
Anamitro Biswas,
Eshita Mazumdar
Abstract:
For a finite abelian group $G,$ the Davenport Constant, denoted by $D(G)$, is defined to be the least positive integer $k$ such that every sequence of length at least $k$ has a non-trivial zero-sum subsequence. A long-standing conjecture is that the Davenport constant of a finite abelian group $G =C_{n_1}\times\cdots\times C_{n_d}$ of rank $d \in \mathbb{N}$ is…
▽ More
For a finite abelian group $G,$ the Davenport Constant, denoted by $D(G)$, is defined to be the least positive integer $k$ such that every sequence of length at least $k$ has a non-trivial zero-sum subsequence. A long-standing conjecture is that the Davenport constant of a finite abelian group $G =C_{n_1}\times\cdots\times C_{n_d}$ of rank $d \in \mathbb{N}$ is $1+\displaystyle\sum_{i=1}^d (n_i-1) $. This conjecture is false in general, but it remains to know for which groups it is true. In this paper, we consider groups of the form $G = (C_p)^{d-1} \times C_{pq},$ where $p$ is a prime and $q\in \mathbb{N}$ and provide sufficient condition when the conjecture holds true.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Understanding Model Selection For Learning In Strategic Environments
Authors:
Tinashe Handina,
Eric Mazumdar
Abstract:
The deployment of ever-larger machine learning models reflects a growing consensus that the more expressive the model class one optimizes over$\unicode{x2013}$and the more data one has access to$\unicode{x2013}$the more one can improve performance. As models get deployed in a variety of real-world scenarios, they inevitably face strategic environments. In this work, we consider the natural questio…
▽ More
The deployment of ever-larger machine learning models reflects a growing consensus that the more expressive the model class one optimizes over$\unicode{x2013}$and the more data one has access to$\unicode{x2013}$the more one can improve performance. As models get deployed in a variety of real-world scenarios, they inevitably face strategic environments. In this work, we consider the natural question of how the interplay of models and strategic interactions affects the relationship between performance at equilibrium and the expressivity of model classes. We find that strategic interactions can break the conventional view$\unicode{x2013}$meaning that performance does not necessarily monotonically improve as model classes get larger or more expressive (even with infinite data). We show the implications of this result in several contexts including strategic regression, strategic classification, and multi-agent reinforcement learning. In particular, we show that each of these settings admits a Braess' paradox-like phenomenon in which optimizing over less expressive model classes allows one to achieve strictly better equilibrium outcomes. Motivated by these examples, we then propose a new paradigm for model selection in games wherein an agent seeks to choose amongst different model classes to use as their action set in a game.
△ Less
Submitted 22 November, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games
Authors:
Zaiwei Chen,
Kaiqing Zhang,
Eric Mazumdar,
Asuman Ozdaglar,
Adam Wierman
Abstract:
We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) ar…
▽ More
We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) are updated by taking a convex combination between its previous iterate and the latest fast-timescale iterate. Introducing the slow timescale as well as its update equation marks as our main algorithmic novelty. In the special case of linear function approximation, we establish, to the best of our knowledge, the first last-iterate finite-sample bound for payoff-based independent learning dynamics of these types. The result implies a polynomial sample complexity to find a Nash equilibrium in such stochastic games.
To establish the results, we model our proposed algorithm as a two-timescale stochastic approximation and derive the finite-sample bound through a Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescale iterates. Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function. This insight enables us to construct a valid Lyapunov function via a generalized variant of the Moreau envelope of the regularized Nash gap. The construction of our Lyapunov function might be of broad independent interest in studying the behavior of stochastic approximation algorithms.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Optimal Bounds on the Growth of Iterated Sumsets in Abelian Semigroups
Authors:
Shalom Eliahou,
Eshita Mazumdar
Abstract:
We provide optimal upper bounds on the growth of iterated sumsets $hA=A+\dots+A$ for finite subsets $A$ of abelian semigroups. More precisely, we show that the new upper bounds recently derived from Macaulay's theorem in commutative algebra are best possible, i.e., are actually reached by suitable subsets of suitable abelian semigroups. Our constructions, in a multiplicative setting, are based on…
▽ More
We provide optimal upper bounds on the growth of iterated sumsets $hA=A+\dots+A$ for finite subsets $A$ of abelian semigroups. More precisely, we show that the new upper bounds recently derived from Macaulay's theorem in commutative algebra are best possible, i.e., are actually reached by suitable subsets of suitable abelian semigroups. Our constructions, in a multiplicative setting, are based on certain specific monomial ideals in polynomial algebras and on their deformation into appropriate binomial ideals via Gröbner bases.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows
Authors:
Lauren Conger,
Franca Hoffmann,
Eric Mazumdar,
Lillian Ratliff
Abstract:
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential…
▽ More
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.
△ Less
Submitted 29 October, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Authors:
Zaiwei Chen,
Kaiqing Zhang,
Eric Mazumdar,
Asuman Ozdaglar,
Adam Wierman
Abstract:
We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main…
▽ More
We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known $\tilde{\mathcal{O}}(1/ε^2)$ sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper $\tilde{\mathcal{O}}(1/ε)$ sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Algorithmic Collective Action in Machine Learning
Authors:
Moritz Hardt,
Eric Mazumdar,
Celestine Mendler-Dünner,
Tijana Zrnic
Abstract:
We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collecti…
▽ More
We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collective goal. We investigate the consequences of this model in three fundamental learning-theoretic settings: the case of a nonparametric optimal learning algorithm, a parametric risk minimizer, and gradient-based optimization. In each setting, we come up with coordinated algorithmic strategies and characterize natural success criteria as a function of the collective's size. Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. Taken together, our theory and experiments broadly support the conclusion that algorithmic collectives of exceedingly small fractional size can exert significant control over a platform's learning algorithm.
△ Less
Submitted 7 August, 2024; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Follower Agnostic Methods for Stackelberg Games
Authors:
Chinmay Maheshwari,
James Cheng,
S. Shankar Sasty,
Lillian Ratliff,
Eric Mazumdar
Abstract:
In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a d…
▽ More
In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers' responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers' actions. We provide both non-asymptotic convergence rates to stationary points of the leader's objective and demonstrate asymptotic convergence to a \emph{local Stackelberg equilibrium}. To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a large-scale transportation network, showcasing its robustness even when the leader lacks access to followers' demand.
△ Less
Submitted 26 March, 2024; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Designing System Level Synthesis Controllers for Nonlinear Systems with Stability Guarantees
Authors:
Lauren Conger,
Syndey Vernon,
Eric Mazumdar
Abstract:
We introduce a method for controlling systems with nonlinear dynamics and full actuation by approximating the dynamics with polynomials and applying a system level synthesis controller. We show how to optimize over this class of controllers using a neural network while maintaining stability guarantees, without requiring a Lyapunov function. We give bounds for the domain over which the use of the c…
▽ More
We introduce a method for controlling systems with nonlinear dynamics and full actuation by approximating the dynamics with polynomials and applying a system level synthesis controller. We show how to optimize over this class of controllers using a neural network while maintaining stability guarantees, without requiring a Lyapunov function. We give bounds for the domain over which the use of the class of controllers preserves stability and gives bounds on the control costs incurred by optimized controllers. We then numerically validate our approach and show improved performance compared with feedback linearization -- suggesting that the SLS controllers are able to take advantage of nonlinearities in the dynamics while guaranteeing stability.
△ Less
Submitted 7 June, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Synthesizing Reactive Test Environments for Autonomous Systems: Testing Reach-Avoid Specifications with Multi-Commodity Flows
Authors:
Apurva Badithela,
Josefine B. Graebener,
Wyatt Ubellacker,
Eric V. Mazumdar,
Aaron D. Ames,
Richard M. Murray
Abstract:
We study automated test generation for verifying discrete decision-making modules in autonomous systems. We utilize linear temporal logic to encode the requirements on the system under test in the system specification and the behavior that we want to observe during the test is given as the test specification which is unknown to the system. First, we use the specifications and their corresponding n…
▽ More
We study automated test generation for verifying discrete decision-making modules in autonomous systems. We utilize linear temporal logic to encode the requirements on the system under test in the system specification and the behavior that we want to observe during the test is given as the test specification which is unknown to the system. First, we use the specifications and their corresponding non-deterministic Büchi automata to generate the specification product automaton. Second, a virtual product graph representing the high-level interaction between the system and the test environment is constructed modeling the product automaton encoding the system, the test environment, and specifications. The main result of this paper is an optimization problem, framed as a multi-commodity network flow problem, that solves for constraints on the virtual product graph which can then be projected to the test environment. Therefore, the result of the optimization problem is reactive test synthesis that ensures that the system meets the test specifications along with satisfying the system specifications. This framework is illustrated in simulation on grid world examples, and demonstrated on hardware with the Unitree A1 quadruped, wherein dynamic locomotion behaviors are verified in the context of reactive test environments.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
A Note on Zeroth-Order Optimization on the Simplex
Authors:
Tijana Zrnic,
Eric Mazumdar
Abstract:
We construct a zeroth-order gradient estimator for a smooth function defined on the probability simplex. The proposed estimator queries the simplex only. We prove that projected gradient descent and the exponential weights algorithm, when run with this estimator instead of exact gradients, converge at a $\mathcal O(T^{-1/4})$ rate.
We construct a zeroth-order gradient estimator for a smooth function defined on the probability simplex. The proposed estimator queries the simplex only. We prove that projected gradient descent and the exponential weights algorithm, when run with this estimator instead of exact gradients, converge at a $\mathcal O(T^{-1/4})$ rate.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Langevin Monte Carlo for Contextual Bandits
Authors:
Pan Xu,
Hongkai Zheng,
Eric Mazumdar,
Kamyar Azizzadenesheli,
Anima Anandkumar
Abstract:
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior di…
▽ More
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets
Authors:
Chinmay Maheshwari,
Eric Mazumdar,
Shankar Sastry
Abstract:
We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms…
▽ More
We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. Our results show that, in the case of matching markets, competition need not drastically affect the performance of decentralized, communication and coordination free online learning algorithms.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Nonlinear System Level Synthesis for Polynomial Dynamical Systems
Authors:
Lauren Conger,
Jing Shuang Li,
Eric Mazumdar,
Steven L. Brunton
Abstract:
This work introduces a controller synthesis method via system level synthesis for nonlinear systems characterized by polynomial dynamics. The resulting framework yields finite impulse response, time-invariant, closed-loop transfer functions with guaranteed disturbance cancellation. Our method generalizes feedback linearization to enable partial feedback linearization, where the cancellation of the…
▽ More
This work introduces a controller synthesis method via system level synthesis for nonlinear systems characterized by polynomial dynamics. The resulting framework yields finite impulse response, time-invariant, closed-loop transfer functions with guaranteed disturbance cancellation. Our method generalizes feedback linearization to enable partial feedback linearization, where the cancellation of the nonlinearity is spread across a finite-time horizon. This provides flexibility to use the system dynamics to attenuate disturbances before cancellation via control, reducing the cost of control compared with feedback linearization while maintaining guarantees about disturbance rejection. This approach is illustrated on a benchmark example and on a common model for fluid flow control.
△ Less
Submitted 22 September, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Who Leads and Who Follows in Strategic Classification?
Authors:
Tijana Zrnic,
Eric Mazumdar,
S. Shankar Sastry,
Michael I. Jordan
Abstract:
As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker "leads" in the game by deploying a model, and the strategic agents "follow" by playing their best response to the deployed model. Importantly, in this framing, the burden of le…
▽ More
As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker "leads" in the game by deploying a model, and the strategic agents "follow" by playing their best response to the deployed model. Importantly, in this framing, the burden of learning is placed solely on the decision-maker, while the agents' best responses are implicitly treated as instantaneous. In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions. In particular, by generalizing the standard model to allow both players to learn over time, we show that a decision-maker that makes updates faster than the agents can reverse the order of play, meaning that the agents lead and the decision-maker follows. We observe in standard learning settings that such a role reversal can be desirable for both the decision-maker and the strategic agents. Finally, we show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
△ Less
Submitted 29 January, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization
Authors:
Chinmay Maheshwari,
Chih-Yuan Chiu,
Eric Mazumdar,
S. Shankar Sastry,
Lillian J. Ratliff
Abstract:
Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure.
We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algor…
▽ More
Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure.
We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We further specialize the algorithm to solve distributionally robust, decision-dependent learning problems, where gradient information is not readily available. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.
△ Less
Submitted 19 February, 2022; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization
Authors:
Yaodong Yu,
Tianyi Lin,
Eric Mazumdar,
Michael I. Jordan
Abstract:
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most pop…
▽ More
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- have serious limitations that limit their use in large-scale problems -- in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.
△ Less
Submitted 25 January, 2022; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Group-annihilator graphs realised by finite abelian groups and its properties
Authors:
Eshita Mazumdar,
Rameez Raja
Abstract:
Let $G$ be a finite abelian group viewed a $\mathbb{Z}$-module and let $\mathcal{G} = (V, E)$ be a simple graph. In this paper, we consider a graph $Γ(G)$ called as a \textit{group-annihilator} graph. The vertices of $Γ(G)$ are all elements of $G$ and two distinct vertices $x$ and $y$ are adjacent in $Γ(G)$ if and only if $[x : G][y : G]G = \{0\}$, where $x, y\in G$ and…
▽ More
Let $G$ be a finite abelian group viewed a $\mathbb{Z}$-module and let $\mathcal{G} = (V, E)$ be a simple graph. In this paper, we consider a graph $Γ(G)$ called as a \textit{group-annihilator} graph. The vertices of $Γ(G)$ are all elements of $G$ and two distinct vertices $x$ and $y$ are adjacent in $Γ(G)$ if and only if $[x : G][y : G]G = \{0\}$, where $x, y\in G$ and $[x : G] = \{r\in\mathbb{Z} : rG \subseteq \mathbb{Z}x\}$ is an ideal of a ring $\mathbb{Z}$. We discuss in detail the graph structure realised by the group $G$. Moreover, we study the creation sequence, hyperenergeticity and hypoenergeticity of group-annihilator graphs. Finally, we conclude the paper with a discussion on Laplacian eigen values of the group-annhilator graph. We show that the Laplacian eigen values are representatives of orbits of the group action: $Aut(Γ(G)) \times G \rightarrow G$.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Expert Selection in High-Dimensional Markov Decision Processes
Authors:
Vicenc Rubies-Royo,
Eric Mazumdar,
Roy Dong,
Claire Tomlin,
S. Shankar Sastry
Abstract:
In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall pe…
▽ More
In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Iterated sumsets and Hilbert functions
Authors:
Shalom Eliahou,
Eshita Mazumdar
Abstract:
Let A be a finite subset of an abelian group (G, +). Let h $\ge$ 2 be an integer. If |A| $\ge$ 2 and the cardinality |hA| of the h-fold iterated sumset hA = A + $\times$ $\times$ $\times$ + A is known, what can one say about |(h -- 1)A| and |(h + 1)A|? It is known that |(h -- 1)A| $\ge$ |hA| (h--1)/h , a consequence of Pl{ü}nnecke's inequality. Here we improve this bound with a new approach. Namel…
▽ More
Let A be a finite subset of an abelian group (G, +). Let h $\ge$ 2 be an integer. If |A| $\ge$ 2 and the cardinality |hA| of the h-fold iterated sumset hA = A + $\times$ $\times$ $\times$ + A is known, what can one say about |(h -- 1)A| and |(h + 1)A|? It is known that |(h -- 1)A| $\ge$ |hA| (h--1)/h , a consequence of Pl{ü}nnecke's inequality. Here we improve this bound with a new approach. Namely, we model the sequence |hA| h$\ge$0 with the Hilbert function of a standard graded algebra. We then apply Macaulay's 1927 theorem on the growth of Hilbert functions, and more specifically a recent condensed version of it. Our bound implies |(h -- 1)A| $\ge$ $θ$(x, h) |hA| (h--1)/h for some factor $θ$(x, h) > 1, where x is a real number closely linked to |hA|. Moreover, we show that $θ$(x, h) asymptotically tends to e $\approx$ 2.718 as |A| grows and h lies in a suitable range varying with |A|.
△ Less
Submitted 7 September, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
Authors:
Tyler Westenbroek,
Eric Mazumdar,
David Fridovich-Keil,
Valmik Prabhu,
Claire J. Tomlin,
S. Shankar Sastry
Abstract:
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enab…
▽ More
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. However, the discrete-time and stochastic nature of these algorithms precludes the direct application of standard machinery from the adaptive control literature to provide deterministic stability proofs for the system. Nevertheless, we leverage these techniques alongside tools from the stochastic approximation literature to demonstrate that with high probability the tracking and parameter errors concentrate near zero when a certain persistence of excitation condition is satisfied. A simulated example of a double pendulum demonstrates the utility of the proposed theory. 1
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
On Thompson Sampling with Langevin Algorithms
Authors:
Eric Mazumdar,
Aldo Pacchiano,
Yi-an Ma,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly co…
▽ More
Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
△ Less
Submitted 17 June, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
Local Nash Equilibria are Isolated, Strict Local Nash Equilibria in `Almost All' Zero-Sum Continuous Games
Authors:
Eric Mazumdar,
Lillian Ratliff
Abstract:
We prove that differential Nash equilibria are generic amongst local Nash equilibria in continuous zero-sum games. That is, there exists an open-dense subset of zero-sum games for which local Nash equilibria are non-degenerate differential Nash equilibria. The result extends previous results to the zero-sum setting, where we obtain even stronger results; in particular, we show that local Nash equi…
▽ More
We prove that differential Nash equilibria are generic amongst local Nash equilibria in continuous zero-sum games. That is, there exists an open-dense subset of zero-sum games for which local Nash equilibria are non-degenerate differential Nash equilibria. The result extends previous results to the zero-sum setting, where we obtain even stronger results; in particular, we show that local Nash equilibria are generically hyperbolic critical points. We further show that differential Nash equilibria of zero-sum games are structurally stable. The purpose for presenting these extensions is the recent renewed interest in zero-sum games within machine learning and optimization. Adversarial learning and generative adversarial network approaches are touted to be more robust than the alternative. Zero-sum games are at the heart of such approaches. Many works proceed under the assumption of hyperbolicity of critical points. Our results justify this assumption by showing `almost all' zero-sum games admit local Nash equilibria that are hyperbolic.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
The Weighted Davenport constant of a group and a related extremal problem II
Authors:
Niranjan Balachandran,
Eshita Mazumdar
Abstract:
For a finite abelian group $G$ with $\exp(G)=n$ and an integer $k\ge 2$, Balachandran and Mazumdar \cite{BM} introduced the extremal function $\fD_G(k)$ which is defined to be $\min\{|A|: \emptyset \neq A\subseteq[1,n-1]\textrm{\ with\ }D_A(G)\le k\}$ (and $\infty$ if there is no such $A$), where $D_A(G)$ denotes the $A$-weighted Davenport constant of the group $G$. Denoting $\fD_G(k)$ by…
▽ More
For a finite abelian group $G$ with $\exp(G)=n$ and an integer $k\ge 2$, Balachandran and Mazumdar \cite{BM} introduced the extremal function $\fD_G(k)$ which is defined to be $\min\{|A|: \emptyset \neq A\subseteq[1,n-1]\textrm{\ with\ }D_A(G)\le k\}$ (and $\infty$ if there is no such $A$), where $D_A(G)$ denotes the $A$-weighted Davenport constant of the group $G$. Denoting $\fD_G(k)$ by $\fD(p,k)$ when $G=\bF_p$ (for $p$ prime), it is known (\cite{BM}) that $p^{1/k}-1\le \fD(p,k)\le O_k(p\log p)^{1/k}$ holds for each $k\ge 2$ and $p$ sufficiently large, and that for $k=2,4$, we have the sharper bound $\fD(p,k)\le O(p^{1/k})$. It was furthermore conjectured that $\fD(p,k)=Θ(p^{1/k})$. In this short paper we prove that $\fD(p,k)\le 4^{k^2}p^{1/k}$ for sufficiently large primes $p$.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Feedback Linearization for Unknown Systems via Reinforcement Learning
Authors:
Tyler Westenbroek,
David Fridovich-Keil,
Eric Mazumdar,
Shreyas Arora,
Valmik Prabhu,
S. Shankar Sastry,
Claire J. Tomlin
Abstract:
We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Onc…
▽ More
We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Once a linearizing controller has been constructed, desired output trajectories for the nonlinear plant can be tracked using a variety of linear control techniques. However, the calculation of a linearizing controller requires a precise dynamics model for the system. As a result, model-based approaches for learning exact linearizing controllers generally require a simple, highly structured model of the system with easily identifiable parameters. In contrast, the model-free approach presented in this paper is able to approximate the linearizing controller for the plant using general function approximation architectures. Specifically, we formulate a continuous-time optimization problem over the parameters of a learned linearizing controller whose optima are the set of parameters which best linearize the plant. We derive conditions under which the learning problem is (strongly) convex and provide guarantees which ensure the true linearizing controller for the plant is recovered. We then discuss how model-free policy optimization algorithms can be used to solve a discrete-time approximation to the problem using data collected from the real-world plant. The utility of the framework is demonstrated in simulation and on a real-world robotic platform.
△ Less
Submitted 21 April, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games
Authors:
Eric Mazumdar,
Lillian J. Ratliff,
Michael I. Jordan,
S. Shankar Sastry
Abstract:
We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and actio…
▽ More
We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi agent policy-gradient. We first show that these games are surprisingly not convex games. Despite this, we are still able to show that the only critical points of the gradient dynamics are global Nash equilibria. We then give sufficient conditions under which policy-gradient will avoid the Nash equilibria, and generate a large number of general-sum linear quadratic games that satisfy these conditions. In such games we empirically observe the players converging to limit cycles for which the time average does not coincide with a Nash equilibrium. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no local guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.
△ Less
Submitted 16 December, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings
Authors:
Benjamin Chasnov,
Lillian J. Ratliff,
Eric Mazumdar,
Samuel A. Burden
Abstract:
Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Ut…
▽ More
Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finite-time convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of non-uniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.
△ Less
Submitted 30 May, 2019;
originally announced June 2019.
-
On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games
Authors:
Eric V. Mazumdar,
Michael I. Jordan,
S. Shankar Sastry
Abstract:
We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games. We first show that previous gradient-based algorithms cannot guarantee convergence to local Nash equilibria due to the existence of non-Nash stationary points. By taking advantage of the differential structure of the game, we construct an algorithm for which the local Nash…
▽ More
We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games. We first show that previous gradient-based algorithms cannot guarantee convergence to local Nash equilibria due to the existence of non-Nash stationary points. By taking advantage of the differential structure of the game, we construct an algorithm for which the local Nash equilibria are the only attracting fixed points. We also show that the algorithm exhibits no oscillatory behaviors in neighborhoods of equilibria and show that it has the same per-iteration complexity as other recently proposed algorithms. We conclude by validating the algorithm on two numerical examples: a toy example with multiple Nash equilibria and a non-Nash equilibrium, and the training of a small generative adversarial network (GAN).
△ Less
Submitted 24 January, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
The Weighted Davenport Constant of a group and a related extremal problem
Authors:
Niranjan Balachandran,
Eshita Mazumdar
Abstract:
For a finite abelian group $G$ written additively, and a non-empty subset $A\subset [1,\exp(G)-1]$ the weighted Davenport Constant of $G$ with respect to the set $A$, denoted $D_A(G)$, is the least positive integer $k$ for which the following holds: Given an arbitrary $G$-sequence $(x_1,\ldots,x_k)$, there exists a non-empty subsequence $(x_{i_1},\ldots,x_{i_t})$ along with $a_{j}\in A$ such that…
▽ More
For a finite abelian group $G$ written additively, and a non-empty subset $A\subset [1,\exp(G)-1]$ the weighted Davenport Constant of $G$ with respect to the set $A$, denoted $D_A(G)$, is the least positive integer $k$ for which the following holds: Given an arbitrary $G$-sequence $(x_1,\ldots,x_k)$, there exists a non-empty subsequence $(x_{i_1},\ldots,x_{i_t})$ along with $a_{j}\in A$ such that $\sum_{j=1}^t a_jx_{i_j}=0$. In this paper, we pose and study a natural new extremal problem that arises from the study of $D_A(G)$: For an integer $k\ge 2$, determine $\fD_G(k):=\min\{|A|: D_A(G)\le k\}$ (if the problem posed makes sense). It turns out that for $k$ `not-too-small', this is a well-posed problem and one of the most interesting cases occurs for $G=\Z_p$, the cyclic group of prime order, for which we obtain near optimal bounds for all $k$ (for sufficiently large primes $p$), and asymptotically tight (up to constants) bounds for $k=2,4$.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Zero sums in restricted sequences
Authors:
Niranjan Balachandran,
Eshita Mazumdar
Abstract:
A sequence $\bfx=(x_1,\ldots,x_m)$ of elements of $\Z_n$ is called an \textit{$A$-weighted Davenport Z-sequence} if there exists $\bfa:=(a_1,\ldots,a_m)\in (A\cup\{0\})^m\setminus\bfzero_m$ such that $\sum_i a_ix_i=0$. Here $\bfzero_m=(0,\ldots,0)\in\Z_n^m$. Similarly, the sequence $\bfx$ is called an \textit{$A$-weighted Erdős Z-sequence} if there exists…
▽ More
A sequence $\bfx=(x_1,\ldots,x_m)$ of elements of $\Z_n$ is called an \textit{$A$-weighted Davenport Z-sequence} if there exists $\bfa:=(a_1,\ldots,a_m)\in (A\cup\{0\})^m\setminus\bfzero_m$ such that $\sum_i a_ix_i=0$. Here $\bfzero_m=(0,\ldots,0)\in\Z_n^m$. Similarly, the sequence $\bfx$ is called an \textit{$A$-weighted Erdős Z-sequence} if there exists $\bfa:=(a_1,\ldots,a_m)\in (A\cup\{0\})^m\setminus\{\bfzero_m\}$ with $|Supp(\bfa)|=n$, such that $\sum_i a_ix_i=0$, where $Supp(\bfa):=\{i: a_i\ne 0\}$. A $\Z_n$-sequence $\bfx$ is called $k$-restricted if no element of $\Z_n$ appears more than $k$ times in $\bfx$. In this paper, we study the problem of determining the least value of $m$ for which a $k$-restricted $\Z_n$-sequence of length $m$ is an $A$-weighted Davenport Z-sequence (resp. an$A$-weighted Erdős Z-sequence). We also consider the same problem for random $\Z_n$ sequences, for certain very natural choices for the set $A$.
△ Less
Submitted 2 March, 2021; v1 submitted 2 July, 2018;
originally announced July 2018.
-
On Gradient-Based Learning in Continuous Games
Authors:
Eric Mazumdar,
Lillian J. Ratliff,
S. Shankar Sastry
Abstract:
We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical systems theory. For both general-sum and potential games, we characterize a non-negligible subset of the local Nash equilibria that will be avoided if each age…
▽ More
We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical systems theory. For both general-sum and potential games, we characterize a non-negligible subset of the local Nash equilibria that will be avoided if each agent employs a gradient-based learning algorithm. We also shed light on the issue of convergence to non-Nash strategies in general- and zero-sum games, which may have no relevance to the underlying game, and arise solely due to the choice of algorithm. The existence and frequency of such strategies may explain some of the difficulties encountered when using gradient descent in zero-sum games as, e.g., in the training of generative adversarial networks. To reinforce the theoretical contributions, we provide empirical results that highlight the frequency of linear quadratic dynamic games (a benchmark for multi-agent reinforcement learning) that admit global Nash equilibria that are almost surely avoided by policy gradient.
△ Less
Submitted 20 February, 2020; v1 submitted 15 April, 2018;
originally announced April 2018.
-
The Harborth Constant of Dihedral Groups
Authors:
Niranjan Balachandran,
Eshita Mazumdar,
Kevin Zhao
Abstract:
The Harborth constant of a finite group $G$, denoted $\gs(G)$, is the smallest integer $k$ such that the following holds: For $A\subseteq G$ with $|A|=k$, there exists $B\subseteq A$ with $|B|=\exp(G)$ such that the elements of $B$ can be rearranged into a sequence whose product equals $1_G$, the identity element of $G$. The Harborth constant is a well studied combinatorial invariant in the case o…
▽ More
The Harborth constant of a finite group $G$, denoted $\gs(G)$, is the smallest integer $k$ such that the following holds: For $A\subseteq G$ with $|A|=k$, there exists $B\subseteq A$ with $|B|=\exp(G)$ such that the elements of $B$ can be rearranged into a sequence whose product equals $1_G$, the identity element of $G$. The Harborth constant is a well studied combinatorial invariant in the case of abelian groups. In this paper, we consider a generalization $\gs(G)$ of this combinatorial invariant for nonabelian groups and prove that if $G$ is a dihedral group of order $2n$ with $n\ge 3$, then $\gs(G) = n + 2$ if $n$ is even and $\gs(G) = 2n + 1$ otherwise.
△ Less
Submitted 16 January, 2019; v1 submitted 22 March, 2018;
originally announced March 2018.
-
A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes
Authors:
Eric Mazumdar,
Roy Dong,
Vicenç Rúbies Royo,
Claire Tomlin,
S. Shankar Sastry
Abstract:
We formulate a multi-armed bandit (MAB) approach to choosing expert policies online in Markov decision processes (MDPs). Given a set of expert policies trained on a state and action space, the goal is to maximize the cumulative reward of our agent. The hope is to quickly find the best expert in our set. The MAB formulation allows us to quantify the performance of an algorithm in terms of the regre…
▽ More
We formulate a multi-armed bandit (MAB) approach to choosing expert policies online in Markov decision processes (MDPs). Given a set of expert policies trained on a state and action space, the goal is to maximize the cumulative reward of our agent. The hope is to quickly find the best expert in our set. The MAB formulation allows us to quantify the performance of an algorithm in terms of the regret incurred from not choosing the best expert from the beginning. We first develop the theoretical framework for MABs in MDPs, and then present a basic regret decomposition identity. We then adapt the classical Upper Confidence Bounds algorithm to the problem of choosing experts in MDPs and prove that the expected regret grows at worst at a logarithmic rate. Lastly, we validate the theory on a small MDP.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
Inverse Risk-Sensitive Reinforcement Learning
Authors:
Lillian J. Ratliff,
Eric Mazumdar
Abstract:
We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algor…
▽ More
We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior. We demonstrate the performance of the proposed technique on two examples, the first of which is the canonical Grid World example and the second of which is a Markov decision process modeling passengers' decisions regarding ride-sharing. In the latter, we use pricing and travel time data from a ride-sharing company to construct the transition probabilities and rewards of the Markov decision process.
△ Less
Submitted 21 November, 2017; v1 submitted 28 March, 2017;
originally announced March 2017.
-
Optimal Causal Imputation for Control
Authors:
Roy Dong,
Eric Mazumdar,
S. Shankar Sastry
Abstract:
The widespread applicability of analytics in cyber-physical systems has motivated research into causal inference methods. Predictive estimators are not sufficient when analytics are used for decision making; rather, the flow of causal effects must be determined. Generally speaking, these methods focus on estimation of a causal structure from experimental data. In this paper, we consider the dual p…
▽ More
The widespread applicability of analytics in cyber-physical systems has motivated research into causal inference methods. Predictive estimators are not sufficient when analytics are used for decision making; rather, the flow of causal effects must be determined. Generally speaking, these methods focus on estimation of a causal structure from experimental data. In this paper, we consider the dual problem: we fix the causal structure and optimize over causal imputations to achieve desirable system behaviors for a minimal imputation cost. First, we present the optimal causal imputation problem, and then we analyze the problem in two special cases: 1) when the causal imputations can only impute to a fixed value, 2) when the causal structure has linear dynamics with additive Gaussian noise. This optimal causal imputation framework serves to bridge the gap between causal structures and control.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Prime powers in sums of terms of binary recurrence sequences
Authors:
Eshita Mazumdar,
S. S. Rout
Abstract:
Let $\{u_{n}\}_{n \geq 0}$ be a non-degenerate binary recurrence sequence with positive, square-free discriminant and $p$ be a fixed prime number. In this paper, we have shown the finiteness result for the solutions of the Diophantine equation $u_{n_{1}} + u_{n_{2}} + \cdots + u_{n_{t}} = p^{z}$ with some conditions on $n_i $ for all $1\leq i \leq t$. Moreover, we explicitly find all the powers of…
▽ More
Let $\{u_{n}\}_{n \geq 0}$ be a non-degenerate binary recurrence sequence with positive, square-free discriminant and $p$ be a fixed prime number. In this paper, we have shown the finiteness result for the solutions of the Diophantine equation $u_{n_{1}} + u_{n_{2}} + \cdots + u_{n_{t}} = p^{z}$ with some conditions on $n_i $ for all $1\leq i \leq t$. Moreover, we explicitly find all the powers of three which are sums of three balancing numbers using the lower bounds for linear forms in logarithms. Further, we use a variant of Baker-Davenport reduction method in Diophantine approximation due to Dujella and Pethő.
△ Less
Submitted 3 July, 2017; v1 submitted 10 October, 2016;
originally announced October 2016.
-
To Observe or Not to Observe: Queuing Game Framework for Urban Parking
Authors:
Lillian J. Ratliff,
Chase Dowling,
Eric Mazumdar,
Baosen Zhang
Abstract:
We model parking in urban centers as a set of parallel queues and overlay a game theoretic structure that allows us to compare the user-selected (Nash) equilibrium to the socially optimal equilibrium. We model arriving drivers as utility maximizers and consider the game in which observing the queue length is free as well as the game in which drivers must pay to observe the queue length. In both ga…
▽ More
We model parking in urban centers as a set of parallel queues and overlay a game theoretic structure that allows us to compare the user-selected (Nash) equilibrium to the socially optimal equilibrium. We model arriving drivers as utility maximizers and consider the game in which observing the queue length is free as well as the game in which drivers must pay to observe the queue length. In both games, drivers must decide between balking and joining. We compare the Nash induced welfare to the socially optimal welfare. We find that gains to welfare do not require full information penetration---meaning, for social welfare to increase, not everyone needs to pay to observe. Through simulation, we explore a more complex scenario where drivers decide based the queueing game whether or not to enter a collection of queues over a network. We examine the occupancy-congestion relationship, an important relationship for determining the impact of parking resources on overall traffic congestion. Our simulated models use parameters informed by real-world data collected by the Seattle Department of Transportation.
△ Less
Submitted 29 March, 2016;
originally announced March 2016.