Skip to main content

Showing 1–50 of 50 results for author: Mazumdar, E

.
  1. arXiv:2504.16890  [pdf, other

    math.OC math.AP

    Computing Optimal Transport Plans via Min-Max Gradient Flows

    Authors: Lauren Conger, Franca Hoffmann, Ricardo Baptista, Eric Mazumdar

    Abstract: We pose the Kantorovich optimal transport problem as a min-max problem with a Nash equilibrium that can be obtained dynamically via a two-player game, providing a framework for approximating optimal couplings. We prove convergence of the timescale-separated gradient descent dynamics to the optimal transport plan, and implement the gradient descent algorithm with a particle method, where the margin… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  2. arXiv:2502.20770  [pdf, other

    cs.GT cs.LG

    Learning to Steer Learners in Games

    Authors: Yizhou Zhang, Yi-An Ma, Eric Mazumdar

    Abstract: We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an… ▽ More

    Submitted 28 May, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

  3. arXiv:2502.19652  [pdf, other

    cs.LG cs.AI cs.RO

    Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

    Authors: Shangding Gu, Laixi Shi, Muning Wen, Ming Jin, Eric Mazumdar, Yuejie Chi, Adam Wierman, Costas Spanos

    Abstract: Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  4. arXiv:2411.07403  [pdf, other

    math.AP math.OC

    Coupled Wasserstein Gradient Flows for Min-Max and Cooperative Games

    Authors: Lauren Conger, Franca Hoffmann, Eric Mazumdar, Lillian J. Ratliff

    Abstract: We propose a framework for two-player infinite-dimensional games with cooperative or competitive structure. These games take the form of coupled partial differential equations in which players optimize over a space of measures, driven by either a gradient descent or gradient descent-ascent in Wasserstein-2 space. We characterize the properties of the Nash equilibrium of the system, and relate it t… ▽ More

    Submitted 10 February, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.01166

    MSC Class: 35G50; 91A25; 49J35

  5. arXiv:2409.20067  [pdf, ps, other

    cs.LG cs.GT cs.MA stat.ML

    Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

    Authors: Laixi Shi, Jingchu Gai, Eric Mazumdar, Yuejie Chi, Adam Wierman

    Abstract: Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. RMGs remains under-explored, from reasonable problem formulation to the development of sa… ▽ More

    Submitted 31 January, 2025; v1 submitted 30 September, 2024; originally announced September 2024.

  6. arXiv:2409.01447  [pdf, other

    cs.LG cs.GT

    Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

    Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

    Abstract: In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorpor… ▽ More

    Submitted 4 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: A preliminary version [arXiv:2303.03100] of this paper, with a subset of the results that are presented here, was presented at NeurIPS 2023

  7. arXiv:2408.13558  [pdf, ps, other

    math.CO

    Combinatorial invariants for certain classes of non-abelian groups

    Authors: Naveen K. Godara, Renu Joshi, Eshita Mazumdar

    Abstract: This article focuses on the study of zero-sum invariants of finite non-abelian groups. We address two main problems: the first centers on the ordered Davenport constant and the second on Gao's constant. We establish a connection between the ordered Davenport constant and the small Davenport constant for a finite non-abelian group of even order, which in turn gives a relation with the Noether numbe… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 15 pages

    MSC Class: 11B75; 11P70

  8. arXiv:2407.01148  [pdf, ps, other

    math.CO

    On a conjecture related to the Davenport constant

    Authors: Naveen K. Godara, Renu Joshi, Eshita Mazumdar

    Abstract: For a finite group $G,$ $D(G)$ is defined as the least positive integer $k$ such that for every sequence $S=g_1 g_2\cdots g_k$ of length $k$ over $G$, there exist $1 \le i_1 < i_2 <\cdots < i_m \le k $ such that $\prod_{j=1}^{m} g_{i_{σ(j)}}=1$ holds for $σ= id,$ identity element of $S_m.$ For a finite abelian group, this group invariant, known as the Davenport constant, is crucial in the theory o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2406.14156  [pdf, other

    cs.GT cs.LG cs.MA

    Tractable Equilibrium Computation in Markov Games through Risk Aversion

    Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi

    Abstract: A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: preprint of multi-agent RL with risk-averse equilibria

  10. arXiv:2405.05468  [pdf, ps, other

    cs.LG stat.ML

    Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

    Authors: Kishan Panaganti, Adam Wierman, Eric Mazumdar

    Abstract: The robust $φ$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $φ$-regularized fitted Q-iteration (RPQ) for learning an $ε$-opt… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: To appear in the proceedings of the International Conference on Machine Learning (ICML) 2024

  11. arXiv:2404.18909  [pdf, other

    cs.LG cs.MA stat.ML

    Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

    Authors: Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman

    Abstract: To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This w… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by International Conference on Machine Learning, 2024

  12. arXiv:2404.09888  [pdf, other

    cs.FL cs.RO eess.SY

    Flow-Based Synthesis of Reactive Tests for Discrete Decision-Making Systems with Temporal Logic Specifications

    Authors: Josefine B. Graebener, Apurva S. Badithela, Denizalp Goktas, Wyatt Ubellacker, Eric V. Mazumdar, Aaron D. Ames, Richard M. Murray

    Abstract: Designing tests to evaluate if a given autonomous system satisfies complex specifications is challenging due to the complexity of these systems. This work proposes a flow-based approach for reactive test synthesis from temporal logic specifications, enabling the synthesis of test environments consisting of static and reactive obstacles and dynamic test agents. The temporal logic specifications des… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Manuscript

  13. arXiv:2403.18956  [pdf, other

    math.OC

    Characterizing Controllability and Observability for Systems with Locality, Communication, and Actuation Constraints

    Authors: Lauren Conger, Yiheng Lin, Adam Wierman, Eric Mazumdar

    Abstract: This paper presents a closed-form notion of controllability and observability for systems with communication delays, actuation delays, and locality constraints. The formulation reduces to classical notions of controllability and observability in the unconstrained setting. As a consequence of our formulation, we show that the addition of locality and communication constraints may not affect the con… ▽ More

    Submitted 4 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  14. arXiv:2402.09999  [pdf

    math.NT

    Davenport constant for finite abelian groups with higher rank

    Authors: Anamitro Biswas, Eshita Mazumdar

    Abstract: For a finite abelian group $G,$ the Davenport Constant, denoted by $D(G)$, is defined to be the least positive integer $k$ such that every sequence of length at least $k$ has a non-trivial zero-sum subsequence. A long-standing conjecture is that the Davenport constant of a finite abelian group $G =C_{n_1}\times\cdots\times C_{n_d}$ of rank $d \in \mathbb{N}$ is… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 11 pages

    MSC Class: 11B75; 11P99; 20K01

  15. arXiv:2402.07588  [pdf, other

    cs.GT cs.LG stat.ML

    Understanding Model Selection For Learning In Strategic Environments

    Authors: Tinashe Handina, Eric Mazumdar

    Abstract: The deployment of ever-larger machine learning models reflects a growing consensus that the more expressive the model class one optimizes over$\unicode{x2013}$and the more data one has access to$\unicode{x2013}$the more one can improve performance. As models get deployed in a variety of real-world scenarios, they inevitably face strategic environments. In this work, we consider the natural questio… ▽ More

    Submitted 22 November, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024

  16. arXiv:2312.04905  [pdf, ps, other

    cs.LG cs.MA

    Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

    Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

    Abstract: We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) ar… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  17. arXiv:2310.10241  [pdf, ps, other

    math.AC math.CO math.NT

    Optimal Bounds on the Growth of Iterated Sumsets in Abelian Semigroups

    Authors: Shalom Eliahou, Eshita Mazumdar

    Abstract: We provide optimal upper bounds on the growth of iterated sumsets $hA=A+\dots+A$ for finite subsets $A$ of abelian semigroups. More precisely, we show that the new upper bounds recently derived from Macaulay's theorem in commutative algebra are best possible, i.e., are actually reached by suitable subsets of suitable abelian semigroups. Our constructions, in a multiplicative setting, are based on… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: To appear in Annales de l'Institut Fourier

    MSC Class: 11P70; 05E40; 11B13; 13P25

  18. arXiv:2307.01166  [pdf, other

    cs.LG math.AP

    Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows

    Authors: Lauren Conger, Franca Hoffmann, Eric Mazumdar, Lillian Ratliff

    Abstract: We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential… ▽ More

    Submitted 29 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  19. arXiv:2303.03100  [pdf, ps, other

    cs.GT cs.LG

    A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

    Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

    Abstract: We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

  20. arXiv:2302.04262  [pdf, other

    cs.LG cs.GT stat.ML

    Algorithmic Collective Action in Machine Learning

    Authors: Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, Tijana Zrnic

    Abstract: We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collecti… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: Published at ICML 2023; Revision corrects epsilon-dependence in the analysis

  21. arXiv:2302.01421  [pdf, other

    math.OC cs.AI cs.GT math.DS

    Follower Agnostic Methods for Stackelberg Games

    Authors: Chinmay Maheshwari, James Cheng, S. Shankar Sasty, Lillian Ratliff, Eric Mazumdar

    Abstract: In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a d… ▽ More

    Submitted 26 March, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 31 pages

    MSC Class: 91A65

  22. arXiv:2212.03923  [pdf, other

    math.OC

    Designing System Level Synthesis Controllers for Nonlinear Systems with Stability Guarantees

    Authors: Lauren Conger, Syndey Vernon, Eric Mazumdar

    Abstract: We introduce a method for controlling systems with nonlinear dynamics and full actuation by approximating the dynamics with polynomials and applying a system level synthesis controller. We show how to optimize over this class of controllers using a neural network while maintaining stability guarantees, without requiring a Lyapunov function. We give bounds for the domain over which the use of the c… ▽ More

    Submitted 7 June, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

  23. arXiv:2210.10304  [pdf, other

    cs.RO cs.FL eess.SY

    Synthesizing Reactive Test Environments for Autonomous Systems: Testing Reach-Avoid Specifications with Multi-Commodity Flows

    Authors: Apurva Badithela, Josefine B. Graebener, Wyatt Ubellacker, Eric V. Mazumdar, Aaron D. Ames, Richard M. Murray

    Abstract: We study automated test generation for verifying discrete decision-making modules in autonomous systems. We utilize linear temporal logic to encode the requirements on the system under test in the system specification and the behavior that we want to observe during the test is given as the test specification which is unknown to the system. First, we use the specifications and their corresponding n… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Submitted to ICRA 2023

  24. arXiv:2208.01185  [pdf, ps, other

    cs.LG cs.GT math.OC

    A Note on Zeroth-Order Optimization on the Simplex

    Authors: Tijana Zrnic, Eric Mazumdar

    Abstract: We construct a zeroth-order gradient estimator for a smooth function defined on the probability simplex. The proposed estimator queries the simplex only. We prove that projected gradient descent and the exponential weights algorithm, when run with this estimator instead of exact gradients, converge at a $\mathcal O(T^{-1/4})$ rate.

    Submitted 1 August, 2022; originally announced August 2022.

  25. arXiv:2206.11254  [pdf, other

    cs.LG stat.ML

    Langevin Monte Carlo for Contextual Bandits

    Authors: Pan Xu, Hongkai Zheng, Eric Mazumdar, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior di… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 21 pages, 3 figures, 2 tables. To appear in the proceedings of the 39th International Conference on Machine Learning (ICML2022)

  26. arXiv:2206.02344  [pdf, other

    cs.AI cs.LG cs.MA econ.TH

    Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets

    Authors: Chinmay Maheshwari, Eric Mazumdar, Shankar Sastry

    Abstract: We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 41 pages, 2 figures

  27. arXiv:2205.02187  [pdf, other

    math.OC

    Nonlinear System Level Synthesis for Polynomial Dynamical Systems

    Authors: Lauren Conger, Jing Shuang Li, Eric Mazumdar, Steven L. Brunton

    Abstract: This work introduces a controller synthesis method via system level synthesis for nonlinear systems characterized by polynomial dynamics. The resulting framework yields finite impulse response, time-invariant, closed-loop transfer functions with guaranteed disturbance cancellation. Our method generalizes feedback linearization to enable partial feedback linearization, where the cancellation of the… ▽ More

    Submitted 22 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: accepted to CDC 2022

  28. arXiv:2106.12529  [pdf, other

    cs.LG cs.GT

    Who Leads and Who Follows in Strategic Classification?

    Authors: Tijana Zrnic, Eric Mazumdar, S. Shankar Sastry, Michael I. Jordan

    Abstract: As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker "leads" in the game by deploying a model, and the strategic agents "follow" by playing their best response to the deployed model. Importantly, in this framing, the burden of le… ▽ More

    Submitted 29 January, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

  29. arXiv:2106.09082  [pdf, other

    math.OC cs.LG

    Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization

    Authors: Chinmay Maheshwari, Chih-Yuan Chiu, Eric Mazumdar, S. Shankar Sastry, Lillian J. Ratliff

    Abstract: Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algor… ▽ More

    Submitted 19 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 38 pages, 6 figures

  30. arXiv:2104.13326  [pdf, other

    cs.LG math.OC stat.ML

    Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization

    Authors: Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan

    Abstract: Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most pop… ▽ More

    Submitted 25 January, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted by AISTATS 2022; The first three authors contributed equally to this work; 43 pages, 28 figures

  31. arXiv:2101.02059  [pdf, ps, other

    math.CO

    Group-annihilator graphs realised by finite abelian groups and its properties

    Authors: Eshita Mazumdar, Rameez Raja

    Abstract: Let $G$ be a finite abelian group viewed a $\mathbb{Z}$-module and let $\mathcal{G} = (V, E)$ be a simple graph. In this paper, we consider a graph $Γ(G)$ called as a \textit{group-annihilator} graph. The vertices of $Γ(G)$ are all elements of $G$ and two distinct vertices $x$ and $y$ are adjacent in $Γ(G)$ if and only if $[x : G][y : G]G = \{0\}$, where $x, y\in G$ and… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  32. arXiv:2010.15599  [pdf, other

    cs.LG cs.AI eess.SY

    Expert Selection in High-Dimensional Markov Decision Processes

    Authors: Vicenc Rubies-Royo, Eric Mazumdar, Roy Dong, Claire Tomlin, S. Shankar Sastry

    Abstract: In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall pe… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: In proceedings of the 59th IEEE Conference on Decision and Control 2020. arXiv admin note: text overlap with arXiv:1707.05714

  33. arXiv:2006.08998  [pdf, other

    math.AC math.CO math.NT

    Iterated sumsets and Hilbert functions

    Authors: Shalom Eliahou, Eshita Mazumdar

    Abstract: Let A be a finite subset of an abelian group (G, +). Let h $\ge$ 2 be an integer. If |A| $\ge$ 2 and the cardinality |hA| of the h-fold iterated sumset hA = A + $\times$ $\times$ $\times$ + A is known, what can one say about |(h -- 1)A| and |(h + 1)A|? It is known that |(h -- 1)A| $\ge$ |hA| (h--1)/h , a consequence of Pl{ü}nnecke's inequality. Here we improve this bound with a new approach. Namel… ▽ More

    Submitted 7 September, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

  34. arXiv:2004.02766  [pdf, other

    cs.LG math.DS math.OC stat.ML

    Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning

    Authors: Tyler Westenbroek, Eric Mazumdar, David Fridovich-Keil, Valmik Prabhu, Claire J. Tomlin, S. Shankar Sastry

    Abstract: This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enab… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  35. arXiv:2002.10002  [pdf, other

    cs.LG stat.ML

    On Thompson Sampling with Langevin Algorithms

    Authors: Eric Mazumdar, Aldo Pacchiano, Yi-an Ma, Peter L. Bartlett, Michael I. Jordan

    Abstract: Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly co… ▽ More

    Submitted 17 June, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

  36. arXiv:2002.01007  [pdf, other

    cs.GT

    Local Nash Equilibria are Isolated, Strict Local Nash Equilibria in `Almost All' Zero-Sum Continuous Games

    Authors: Eric Mazumdar, Lillian Ratliff

    Abstract: We prove that differential Nash equilibria are generic amongst local Nash equilibria in continuous zero-sum games. That is, there exists an open-dense subset of zero-sum games for which local Nash equilibria are non-degenerate differential Nash equilibria. The result extends previous results to the zero-sum setting, where we obtain even stronger results; in particular, we show that local Nash equi… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Comments: A shorter version of this paper was presented at the 2019 IEEE Conference on Decision and Control

  37. arXiv:1912.07509  [pdf, ps, other

    math.CO

    The Weighted Davenport constant of a group and a related extremal problem II

    Authors: Niranjan Balachandran, Eshita Mazumdar

    Abstract: For a finite abelian group $G$ with $\exp(G)=n$ and an integer $k\ge 2$, Balachandran and Mazumdar \cite{BM} introduced the extremal function $\fD_G(k)$ which is defined to be $\min\{|A|: \emptyset \neq A\subseteq[1,n-1]\textrm{\ with\ }D_A(G)\le k\}$ (and $\infty$ if there is no such $A$), where $D_A(G)$ denotes the $A$-weighted Davenport constant of the group $G$. Denoting $\fD_G(k)$ by… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  38. arXiv:1910.13272  [pdf, other

    math.OC cs.AI cs.LG eess.SY

    Feedback Linearization for Unknown Systems via Reinforcement Learning

    Authors: Tyler Westenbroek, David Fridovich-Keil, Eric Mazumdar, Shreyas Arora, Valmik Prabhu, S. Shankar Sastry, Claire J. Tomlin

    Abstract: We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Onc… ▽ More

    Submitted 21 April, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

  39. arXiv:1907.03712  [pdf, other

    cs.LG stat.ML

    Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

    Authors: Eric Mazumdar, Lillian J. Ratliff, Michael I. Jordan, S. Shankar Sastry

    Abstract: We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and actio… ▽ More

    Submitted 16 December, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

  40. arXiv:1906.00731  [pdf, other

    math.OC cs.GT cs.LG eess.SY

    Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings

    Authors: Benjamin Chasnov, Lillian J. Ratliff, Eric Mazumdar, Samuel A. Burden

    Abstract: Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Ut… ▽ More

    Submitted 30 May, 2019; originally announced June 2019.

  41. arXiv:1901.00838  [pdf, other

    cs.LG math.OC stat.ML

    On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games

    Authors: Eric V. Mazumdar, Michael I. Jordan, S. Shankar Sastry

    Abstract: We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games. We first show that previous gradient-based algorithms cannot guarantee convergence to local Nash equilibria due to the existence of non-Nash stationary points. By taking advantage of the differential structure of the game, we construct an algorithm for which the local Nash… ▽ More

    Submitted 24 January, 2019; v1 submitted 3 January, 2019; originally announced January 2019.

  42. arXiv:1807.04112  [pdf, ps, other

    math.CO

    The Weighted Davenport Constant of a group and a related extremal problem

    Authors: Niranjan Balachandran, Eshita Mazumdar

    Abstract: For a finite abelian group $G$ written additively, and a non-empty subset $A\subset [1,\exp(G)-1]$ the weighted Davenport Constant of $G$ with respect to the set $A$, denoted $D_A(G)$, is the least positive integer $k$ for which the following holds: Given an arbitrary $G$-sequence $(x_1,\ldots,x_k)$, there exists a non-empty subsequence $(x_{i_1},\ldots,x_{i_t})$ along with $a_{j}\in A$ such that… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

  43. arXiv:1807.00648  [pdf, ps, other

    math.NT

    Zero sums in restricted sequences

    Authors: Niranjan Balachandran, Eshita Mazumdar

    Abstract: A sequence $\bfx=(x_1,\ldots,x_m)$ of elements of $\Z_n$ is called an \textit{$A$-weighted Davenport Z-sequence} if there exists $\bfa:=(a_1,\ldots,a_m)\in (A\cup\{0\})^m\setminus\bfzero_m$ such that $\sum_i a_ix_i=0$. Here $\bfzero_m=(0,\ldots,0)\in\Z_n^m$. Similarly, the sequence $\bfx$ is called an \textit{$A$-weighted Erdős Z-sequence} if there exists… ▽ More

    Submitted 2 March, 2021; v1 submitted 2 July, 2018; originally announced July 2018.

    MSC Class: 11B50; 11B75; 11P70; 11K99

  44. arXiv:1804.05464  [pdf, other

    cs.LG stat.ML

    On Gradient-Based Learning in Continuous Games

    Authors: Eric Mazumdar, Lillian J. Ratliff, S. Shankar Sastry

    Abstract: We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical systems theory. For both general-sum and potential games, we characterize a non-negligible subset of the local Nash equilibria that will be avoided if each age… ▽ More

    Submitted 20 February, 2020; v1 submitted 15 April, 2018; originally announced April 2018.

    Journal ref: SIAM Journal on Mathematics of Data Science 2020 2:1, 103-131

  45. arXiv:1803.08286  [pdf, ps, other

    math.CO

    The Harborth Constant of Dihedral Groups

    Authors: Niranjan Balachandran, Eshita Mazumdar, Kevin Zhao

    Abstract: The Harborth constant of a finite group $G$, denoted $\gs(G)$, is the smallest integer $k$ such that the following holds: For $A\subseteq G$ with $|A|=k$, there exists $B\subseteq A$ with $|B|=\exp(G)$ such that the elements of $B$ can be rearranged into a sequence whose product equals $1_G$, the identity element of $G$. The Harborth constant is a well studied combinatorial invariant in the case o… ▽ More

    Submitted 16 January, 2019; v1 submitted 22 March, 2018; originally announced March 2018.

  46. arXiv:1707.05714  [pdf, other

    eess.SY

    A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes

    Authors: Eric Mazumdar, Roy Dong, Vicenç Rúbies Royo, Claire Tomlin, S. Shankar Sastry

    Abstract: We formulate a multi-armed bandit (MAB) approach to choosing expert policies online in Markov decision processes (MDPs). Given a set of expert policies trained on a state and action space, the goal is to maximize the cumulative reward of our agent. The hope is to quickly find the best expert in our set. The MAB formulation allows us to quantify the performance of an algorithm in terms of the regre… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

  47. arXiv:1703.09842  [pdf, other

    cs.LG stat.ML

    Inverse Risk-Sensitive Reinforcement Learning

    Authors: Lillian J. Ratliff, Eric Mazumdar

    Abstract: We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algor… ▽ More

    Submitted 21 November, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: v3 (comments regarding updates): We significantly extended the theory (Theorem 2, 3, 5 and Proposition 3). We also correct some minor typos throughout the document; v2 (comments regarding updates): We corrected some notational typos and made clarifications in the proof. We also added clarifying remarks regarding reference points and acceptance levels which were previously conflated

  48. arXiv:1703.07049  [pdf, ps, other

    eess.SY

    Optimal Causal Imputation for Control

    Authors: Roy Dong, Eric Mazumdar, S. Shankar Sastry

    Abstract: The widespread applicability of analytics in cyber-physical systems has motivated research into causal inference methods. Predictive estimators are not sufficient when analytics are used for decision making; rather, the flow of causal effects must be determined. Generally speaking, these methods focus on estimation of a causal structure from experimental data. In this paper, we consider the dual p… ▽ More

    Submitted 21 March, 2017; originally announced March 2017.

  49. arXiv:1610.02774  [pdf, ps, other

    math.NT

    Prime powers in sums of terms of binary recurrence sequences

    Authors: Eshita Mazumdar, S. S. Rout

    Abstract: Let $\{u_{n}\}_{n \geq 0}$ be a non-degenerate binary recurrence sequence with positive, square-free discriminant and $p$ be a fixed prime number. In this paper, we have shown the finiteness result for the solutions of the Diophantine equation $u_{n_{1}} + u_{n_{2}} + \cdots + u_{n_{t}} = p^{z}$ with some conditions on $n_i $ for all $1\leq i \leq t$. Moreover, we explicitly find all the powers of… ▽ More

    Submitted 3 July, 2017; v1 submitted 10 October, 2016; originally announced October 2016.

    Comments: 15 pages

    MSC Class: 11B39 (Primary); 11D45; 11J86 (Secondary)

  50. arXiv:1603.08995  [pdf, other

    eess.SY cs.GT

    To Observe or Not to Observe: Queuing Game Framework for Urban Parking

    Authors: Lillian J. Ratliff, Chase Dowling, Eric Mazumdar, Baosen Zhang

    Abstract: We model parking in urban centers as a set of parallel queues and overlay a game theoretic structure that allows us to compare the user-selected (Nash) equilibrium to the socially optimal equilibrium. We model arriving drivers as utility maximizers and consider the game in which observing the queue length is free as well as the game in which drivers must pay to observe the queue length. In both ga… ▽ More

    Submitted 29 March, 2016; originally announced March 2016.