-
Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles
Authors:
Kushagra Gupta,
Surya Murthy,
Mustafa O. Karabag,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such setti…
▽ More
Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such settings, we consider that the mediator has access only to agents most preferred directions, i.e., normalized utility gradients in the decision space. To this end, we propose a cooperative bargaining algorithm where a mediator has access to only the direction oracle of each agent. We prove that unlike popular approaches such as the Nash and Kalai Smorodinsky bargaining solutions, our approach is invariant to monotonic nonaffine transformations, and that under strong convexity and smoothness assumptions, this approach enjoys global asymptotic convergence to Pareto stationary solutions. Moreover, we show that the bargaining solutions found by our algorithm also satisfy the axioms of symmetry and (under slightly stronger conditions) independence of irrelevant alternatives, which are popular in the literature. Finally, we conduct experiments in two domains, multi agent formation assignment and mediated stock portfolio allocation, which validate these theoretic results. All code for our experiments can be found at https://github.com/suryakmurthy/dibs_bargaining.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics
Authors:
Lizhi Yang,
Blake Werner,
Ryan K. Cosner,
David Fridovich-Keil,
Preston Culbertson,
Aaron D. Ames
Abstract:
Robot learning has produced remarkably effective ``black-box'' controllers for complex tasks such as dynamic locomotion on humanoids. Yet ensuring dynamic safety, i.e., constraint satisfaction, remains challenging for such policies. Reinforcement learning (RL) embeds constraints heuristically through reward engineering, and adding or modifying constraints requires retraining. Model-based approache…
▽ More
Robot learning has produced remarkably effective ``black-box'' controllers for complex tasks such as dynamic locomotion on humanoids. Yet ensuring dynamic safety, i.e., constraint satisfaction, remains challenging for such policies. Reinforcement learning (RL) embeds constraints heuristically through reward engineering, and adding or modifying constraints requires retraining. Model-based approaches, like control barrier functions (CBFs), enable runtime constraint specification with formal guarantees but require accurate dynamics models. This paper presents SHIELD, a layered safety framework that bridges this gap by: (1) training a generative, stochastic dynamics residual model using real-world data from hardware rollouts of the nominal controller, capturing system behavior and uncertainties; and (2) adding a safety layer on top of the nominal (learned locomotion) controller that leverages this model via a stochastic discrete-time CBF formulation enforcing safety constraints in probability. The result is a minimally-invasive safety layer that can be added to the existing autonomy stack to give probabilistic guarantees of safety that balance risk and performance. In hardware experiments on an Unitree G1 humanoid, SHIELD enables safe navigation (obstacle avoidance) through varied indoor and outdoor environments using a nominal (unknown) RL controller and onboard perception.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Real-Time Privacy Preservation for Robot Visual Perception
Authors:
Minkyu Choi,
Yunhao Yang,
Neel P. Bhatt,
Kushagra Gupta,
Sahil Shah,
Aditya Rai,
David Fridovich-Keil,
Ufuk Topcu,
Sandeep P. Chinchali
Abstract:
Many robots (e.g., iRobot's Roomba) operate based on visual observations from live video streams, and such observations may inadvertently include privacy-sensitive objects, such as personal identifiers. Existing approaches for preserving privacy rely on deep learning models, differential privacy, or cryptography. They lack guarantees for the complete concealment of all sensitive objects. Guarantee…
▽ More
Many robots (e.g., iRobot's Roomba) operate based on visual observations from live video streams, and such observations may inadvertently include privacy-sensitive objects, such as personal identifiers. Existing approaches for preserving privacy rely on deep learning models, differential privacy, or cryptography. They lack guarantees for the complete concealment of all sensitive objects. Guaranteeing concealment requires post-processing techniques and thus is inadequate for real-time video streams. We develop a method for privacy-constrained video streaming, PCVS, that conceals sensitive objects within real-time video streams. PCVS takes a logical specification constraining the existence of privacy-sensitive objects, e.g., never show faces when a person exists. It uses a detection model to evaluate the existence of these objects in each incoming frame. Then, it blurs out a subset of objects such that the existence of the remaining objects satisfies the specification. We then propose a conformal prediction approach to (i) establish a theoretical lower bound on the probability of the existence of these objects in a sequence of frames satisfying the specification and (ii) update the bound with the arrival of each subsequent frame. Quantitative evaluations show that PCVS achieves over 95 percent specification satisfaction rate in multiple datasets, significantly outperforming other methods. The satisfaction rate is consistently above the theoretical bounds across all datasets, indicating that the established bounds hold. Additionally, we deploy PCVS on robots in real-time operation and show that the robots operate normally without being compromised when PCVS conceals objects.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Act Natural! Extending Naturalistic Projection to Multimodal Behavior Scenarios
Authors:
Hamzah I. Khan,
David Fridovich-Keil
Abstract:
Autonomous agents operating in public spaces must consider how their behaviors might affect the humans around them, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods for this purpose use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivation…
▽ More
Autonomous agents operating in public spaces must consider how their behaviors might affect the humans around them, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods for this purpose use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivations for human behavior and/or require significant amounts of data. Our work extends a technique for modeling unimodal naturalistic behaviors with an explicit convex set representation, to account for multimodal behavior by using multiple convex sets. This more flexible representation provides a higher degree of fidelity in data-driven modeling of naturalistic behavior that arises in real-world scenarios in which human behavior is, in some sense, discrete, e.g. whether or not to yield at a roundabout. Equipped with this new set representation, we develop an optimization-based filter to project arbitrary trajectories into the set so that they appear naturalistic to humans in the scene, while also satisfying vehicle dynamics, actuator limits, etc. We demonstrate our methods on real-world human driving data from the inD (intersection) and rounD (roundabout) datasets.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
PSN Game: Game-theoretic Planning via a Player Selection Network
Authors:
Tianyu Qiu,
Eric Ouano,
Fernando Palafox,
Christian Ellis,
David Fridovich-Keil
Abstract:
While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving optimization problems with hundreds or thousands of variables, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose PSN Game: a novel game-theoretic planning framework that reduces runtime by learning a Player Sel…
▽ More
While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving optimization problems with hundreds or thousands of variables, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose PSN Game: a novel game-theoretic planning framework that reduces runtime by learning a Player Selection Network (PSN). A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of variables in the optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it i) relies solely on observations of players' past trajectories, without requiring full state, control, or other game-specific information; and ii) requires no online parameter tuning. We train PSNs in an unsupervised manner using a differentiable dynamic game solver, with reference trajectories from full-player games guiding the learning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that i) PSNs outperform baseline selection methods in trajectory smoothness and length, while maintaining comparable safety and achieving a 10x speedup in runtime; and ii) PSNs generalize effectively to real-world scenarios without fine-tuning. By selecting only the most relevant players for decision-making, PSNs offer a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.
△ Less
Submitted 30 April, 2025;
originally announced May 2025.
-
Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving
Authors:
Jacob Levy,
Jason Gibson,
Bogdan Vlahov,
Erica Tevere,
Evangelos Theodorou,
David Fridovich-Keil,
Patrick Spieler
Abstract:
High-speed off-road autonomous driving presents unique challenges due to complex, evolving terrain characteristics and the difficulty of accurately modeling terrain-vehicle interactions. While dynamics models used in model-based control can be learned from real-world data, they often struggle to generalize to unseen terrain, making real-time adaptation essential. We propose a novel framework that…
▽ More
High-speed off-road autonomous driving presents unique challenges due to complex, evolving terrain characteristics and the difficulty of accurately modeling terrain-vehicle interactions. While dynamics models used in model-based control can be learned from real-world data, they often struggle to generalize to unseen terrain, making real-time adaptation essential. We propose a novel framework that combines a Kalman filter-based online adaptation scheme with meta-learned parameters to address these challenges. Offline meta-learning optimizes the basis functions along which adaptation occurs, as well as the adaptation parameters, while online adaptation dynamically adjusts the onboard dynamics model in real time for model-based control. We validate our approach through extensive experiments, including real-world testing on a full-scale autonomous off-road vehicle, demonstrating that our method outperforms baseline approaches in prediction accuracy, performance, and safety metrics, particularly in safety-critical scenarios. Our results underscore the effectiveness of meta-learned dynamics model adaptation, advancing the development of reliable autonomous systems capable of navigating diverse and unseen environments. Video is available at: https://youtu.be/cCKHHrDRQEA
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
A Framework for Finding Local Saddle Points in Two-Player Zero-Sum Black-Box Games
Authors:
Shubhankar Agarwal,
Hamzah I. Khan,
Sandeep P. Chinchali,
David Fridovich-Keil
Abstract:
Saddle point optimization is a critical problem employed in numerous real-world applications, including portfolio optimization, generative adversarial networks, and robotics. It has been extensively studied in cases where the objective function is known and differentiable. Existing work in black-box settings with unknown objectives that can only be sampled either assumes convexity-concavity in the…
▽ More
Saddle point optimization is a critical problem employed in numerous real-world applications, including portfolio optimization, generative adversarial networks, and robotics. It has been extensively studied in cases where the objective function is known and differentiable. Existing work in black-box settings with unknown objectives that can only be sampled either assumes convexity-concavity in the objective to simplify the problem or operates with noisy gradient estimators. In contrast, we introduce a framework inspired by Bayesian optimization which utilizes Gaussian processes to model the unknown (potentially nonconvex-nonconcave) objective and requires only zeroth-order samples. Our approach frames the saddle point optimization problem as a two-level process which can flexibly integrate existing and novel approaches to this problem. The upper level of our framework produces a model of the objective function by sampling in promising locations, and the lower level of our framework uses the existing model to frame and solve a general-sum game to identify locations to sample. This lower level procedure can be designed in complementary ways, and we demonstrate the flexibility of our approach by introducing variants which appropriately trade off between factors like runtime, the cost of function evaluations, and the number of available initial samples. We experimentally demonstrate these algorithms on synthetic and realistic datasets in black-box nonconvex-nonconcave settings, showcasing their ability to efficiently locate local saddle points in these contexts.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
More Information is Not Always Better: Connections between Zero-Sum Local Nash Equilibria in Feedback and Open-Loop Information Patterns
Authors:
Kushagra Gupta,
Ross Allen,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Non-cooperative dynamic game theory provides a principled approach to modeling sequential decision-making among multiple noncommunicative agents. A key focus has been on finding Nash equilibria in two-agent zero-sum dynamic games under various information structures. A well-known result states that in linear-quadratic games, unique Nash equilibria under feedback and open-loop information structure…
▽ More
Non-cooperative dynamic game theory provides a principled approach to modeling sequential decision-making among multiple noncommunicative agents. A key focus has been on finding Nash equilibria in two-agent zero-sum dynamic games under various information structures. A well-known result states that in linear-quadratic games, unique Nash equilibria under feedback and open-loop information structures yield identical trajectories. Motivated by two key perspectives -- (i) many real-world problems extend beyond linear-quadratic settings and lack unique equilibria, making only local Nash equilibria computable, and (ii) local open-loop Nash equilibria (OLNE) are easier to compute than local feedback Nash equilibria (FBNE) -- it is natural to ask whether a similar result holds for local equilibria in zero-sum games. To this end, we establish that for a broad class of zero-sum games with potentially nonconvex-nonconcave objectives and nonlinear dynamics: (i) the state/control trajectory of a local FBNE satisfies local OLNE first-order optimality conditions, and vice versa, (ii) a local FBNE trajectory satisfies local OLNE second-order necessary conditions, (iii) a local FBNE trajectory satisfying feedback sufficiency conditions also constitutes a local OLNE, and (iv) with additional hard constraints on agents' actuations, a local FBNE where strict complementarity holds also satisfies local OLNE first-order optimality conditions, and vice versa.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
A Convex Formulation of Game-theoretic Hierarchical Routing
Authors:
Dong Ho Lee,
Kaitlyn Donnel,
Max Z. Li,
David Fridovich-Keil
Abstract:
Hierarchical decision-making is a natural paradigm for coordinating multi-agent systems in complex environments such as air traffic management. In this paper, we present a bilevel framework for game-theoretic hierarchical routing, where a high-level router assigns discrete routes to multiple vehicles who seek to optimize potentially noncooperative objectives that depend upon the assigned routes. T…
▽ More
Hierarchical decision-making is a natural paradigm for coordinating multi-agent systems in complex environments such as air traffic management. In this paper, we present a bilevel framework for game-theoretic hierarchical routing, where a high-level router assigns discrete routes to multiple vehicles who seek to optimize potentially noncooperative objectives that depend upon the assigned routes. To address computational challenges, we propose a reformulation that preserves the convexity of each agent's feasible set. This convex reformulation enables a solution to be identified efficiently via a customized branch-and-bound algorithm. Our approach ensures global optimality while capturing strategic interactions between agents at the lower level. We demonstrate the solution concept of our framework in two-vehicle and three-vehicle routing scenarios.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Multi-Fidelity Policy Gradient Algorithms
Authors:
Xinjie Liu,
Cyrus Neary,
Kushagra Gupta,
Christian Ellis,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
Many reinforcement learning (RL) algorithms require large amounts of data, prohibiting their use in applications where frequent interactions with operational systems are infeasible, or high-fidelity simulations are expensive or unavailable. Meanwhile, low-fidelity simulators--such as reduced-order models, heuristic reward functions, or generative world models--can cheaply provide useful data for R…
▽ More
Many reinforcement learning (RL) algorithms require large amounts of data, prohibiting their use in applications where frequent interactions with operational systems are infeasible, or high-fidelity simulations are expensive or unavailable. Meanwhile, low-fidelity simulators--such as reduced-order models, heuristic reward functions, or generative world models--can cheaply provide useful data for RL training, even if they are too coarse for direct sim-to-real transfer. We propose multi-fidelity policy gradients (MFPGs), an RL framework that mixes a small amount of data from the target environment with a large volume of low-fidelity simulation data to form unbiased, reduced-variance estimators (control variates) for on-policy policy gradients. We instantiate the framework by developing multi-fidelity variants of two policy gradient algorithms: REINFORCE and proximal policy optimization. Experimental results across a suite of simulated robotics benchmark problems demonstrate that when target-environment samples are limited, MFPG achieves up to 3.9x higher reward and improves training stability when compared to baselines that only use high-fidelity data. Moreover, even when the baselines are given more high-fidelity samples--up to 10x as many interactions with the target environment--MFPG continues to match or outperform them. Finally, we observe that MFPG is capable of training effective policies even when the low-fidelity environment is drastically different from the target environment. MFPG thus not only offers a novel paradigm for efficient sim-to-real transfer but also provides a principled approach to managing the trade-off between policy performance and data collection costs.
△ Less
Submitted 9 April, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Noncooperative Equilibrium Selection via a Trading-based Auction
Authors:
Jaehan Im,
Filippos Fotiadis,
Daniel Delahaye,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, agents acting in their own self-interest can settle on different equilibria, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach…
▽ More
Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, agents acting in their own self-interest can settle on different equilibria, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach consensus without communicating directly or disclosing private valuations. TACo facilitates coordination through a structured trading-based auction, where agents iteratively select choices of interest and provably reach an agreement within an a priori bounded number of steps. A series of numerical experiments validate that the termination guarantees of TACo hold in practice, and show that TACo achieves a median performance that minimizes the total cost across all agents, while allocating resources significantly more fairly than baseline approaches.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
Authors:
Brett Barkley,
David Fridovich-Keil
Abstract:
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observati…
▽ More
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process -- the backbone of Dyna-style algorithms -- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.
△ Less
Submitted 20 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations
Authors:
Cevahir Koprulu,
Po-han Li,
Tianyu Qiu,
Ruihan Zhao,
Tyler Westenbroek,
David Fridovich-Keil,
Sandeep Chinchali,
Ufuk Topcu
Abstract:
Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fix…
▽ More
Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.
△ Less
Submitted 24 April, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
Inferring Short-Sightedness in Dynamic Noncooperative Games
Authors:
Cade Armstrong,
Ryan Park,
Xinjie Liu,
Kushagra Gupta,
David Fridovich-Keil
Abstract:
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in…
▽ More
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an \emph{inverse} dynamic game. We consider a specific parametrization of each agent's objective function that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct two types of experiments: one with synthetically generated pedestrian motion at a crosswalk and the other with real-world intersection data involving people walking, biking, and driving vehicles. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to more accurately model agents' behavior. Specifically, our results show 33% more accurate prediction of foresighted behavior on average compared to the baseline in real-world scenarios.
△ Less
Submitted 15 April, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
You Can't Always Get What You Want: Games of Ordered Preference
Authors:
Dong Ho Lee,
Lasse Peters,
David Fridovich-Keil
Abstract:
We study noncooperative games, in which each player's objective is composed of a sequence of ordered- and potentially conflicting-preferences. Problems of this type naturally model a wide variety of scenarios: for example, drivers at a busy intersection must balance the desire to make forward progress with the risk of collision. Mathematically, these problems possess a nested structure, and to beh…
▽ More
We study noncooperative games, in which each player's objective is composed of a sequence of ordered- and potentially conflicting-preferences. Problems of this type naturally model a wide variety of scenarios: for example, drivers at a busy intersection must balance the desire to make forward progress with the risk of collision. Mathematically, these problems possess a nested structure, and to behave properly players must prioritize their most important preference, and only consider less important preferences to the extent that they do not compromise performance on more important ones. We consider multi-agent, noncooperative variants of these problems, and seek generalized Nash equilibria in which each player's decision reflects both its hierarchy of preferences and other players' actions. We make two key contributions. First, we develop a recursive approach for deriving the first-order optimality conditions of each player's nested problem. Second, we propose a sequence of increasingly tight relaxations, each of which can be transcribed as a mixed complementarity problem and solved via existing methods. Experimental results demonstrate that our approach reliably converges to equilibrium solutions that strictly reflect players' individual ordered preferences.
△ Less
Submitted 21 January, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Improving DeFi Mechanisms with Dynamic Games and Optimal Control: A Case Study in Stablecoins
Authors:
Nicholas Strohmeyer,
Sriram Vishwanath,
David Fridovich-Keil
Abstract:
Stablecoins are a class of cryptocurrencies which aim at providing consistency and predictability, typically by pegging the token's value to that of a real world asset. Designing resilient decentralized stablecoins is a challenge, and prominent stablecoins today either (i) give up on decentralization, or (ii) rely on user-owned cryptocurrencies as collateral, exposing the token to exogenous price…
▽ More
Stablecoins are a class of cryptocurrencies which aim at providing consistency and predictability, typically by pegging the token's value to that of a real world asset. Designing resilient decentralized stablecoins is a challenge, and prominent stablecoins today either (i) give up on decentralization, or (ii) rely on user-owned cryptocurrencies as collateral, exposing the token to exogenous price fluctuations. In this latter category, it is increasingly common to employ algorithmic mechanisms to automate risk management, helping maintain the peg. One example of this is Reflexer's RAI, which adapts its system-internal exchange rate (redemption price) to secondary market conditions according to a proportional control law. In this paper, we take this idea of active management a step further, and introduce a new kind of control scheme based on a Stackelberg game model between the token protocol and its users. By doing so, we show that (i) we can mitigate adverse depeg events that inevitably arise in a fixed-redemption scheme such as MakerDao's DAI and (ii) generally outperform a simpler, adaptive-redemption scheme such as RAI in the task of targeting a desired market price. We demonstrate these results through extensive simulations over a range of market conditions.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Approximate Feedback Nash Equilibria with Sparse Inter-Agent Dependencies
Authors:
Xinjie Liu,
Jingqi Li,
Filippos Fotiadis,
Mustafa O. Karabag,
Jesse Milzman,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Feedback Nash equilibrium strategies in multi-agent dynamic games require availability of all players' state information to compute control actions. However, in real-world scenarios, sensing and communication limitations between agents make full state feedback expensive or impractical, and such strategies can become fragile when state information from other agents is inaccurate. To this end, we pr…
▽ More
Feedback Nash equilibrium strategies in multi-agent dynamic games require availability of all players' state information to compute control actions. However, in real-world scenarios, sensing and communication limitations between agents make full state feedback expensive or impractical, and such strategies can become fragile when state information from other agents is inaccurate. To this end, we propose a regularized dynamic programming approach for finding sparse feedback policies that selectively depend on the states of a subset of agents in dynamic games. The proposed approach solves convex adaptive group Lasso problems to compute sparse policies approximating Nash equilibrium solutions. We prove the regularized solutions' asymptotic convergence to a neighborhood of Nash equilibrium policies in linear-quadratic (LQ) games. Further, we extend the proposed approach to general non-LQ games via an iterative algorithm. Simulation results in multi-robot interaction scenarios show that the proposed approach effectively computes feedback policies with varying sparsity levels. When agents have noisy observations of other agents' states, simulation results indicate that the proposed regularized policies consistently achieve lower costs than standard Nash equilibrium policies by up to 77% for all interacting agents whose costs are coupled with other agents' states.
△ Less
Submitted 9 April, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models
Authors:
Jacob Levy,
Tyler Westenbroek,
David Fridovich-Keil
Abstract:
Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-id…
▽ More
Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-identification, but break down in the face of complex phenomena such as contact. We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems which seamlessly integrates structured first principles modeling techniques with black-box auto-regressive models. Specifically, we develop an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrate these predictions using known Lagrangian dynamics. With this semi-structured approach, we can make accurate long-horizon predictions with substantially less data than prior methods. We leverage this capability and propose Semi-Structured Reinforcement Learning ($\texttt{SSRL}$) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning. We validate our approach on a real-world Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just a few minutes of real-world data. Video and code are available at: https://sites.google.com/utexas.edu/ssrl
△ Less
Submitted 28 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Learning responsibility allocations for multi-agent interactions: A differentiable optimization approach with control barrier functions
Authors:
Isaac Remy,
David Fridovich-Keil,
Karen Leung
Abstract:
From autonomous driving to package delivery, ensuring safe yet efficient multi-agent interaction is challenging as the interaction dynamics are influenced by hard-to-model factors such as social norms and contextual cues. Understanding these influences can aid in the design and evaluation of socially-aware autonomous agents whose behaviors are aligned with human values. In this work, we seek to co…
▽ More
From autonomous driving to package delivery, ensuring safe yet efficient multi-agent interaction is challenging as the interaction dynamics are influenced by hard-to-model factors such as social norms and contextual cues. Understanding these influences can aid in the design and evaluation of socially-aware autonomous agents whose behaviors are aligned with human values. In this work, we seek to codify factors governing safe multi-agent interactions via the lens of responsibility, i.e., an agent's willingness to deviate from their desired control to accommodate safe interaction with others. Specifically, we propose a data-driven modeling approach based on control barrier functions and differentiable optimization that efficiently learns agents' responsibility allocation from data. We demonstrate on synthetic and real-world datasets that we can obtain an interpretable and quantitative understanding of how much agents adjust their behavior to ensure the safety of others given their current environment.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Authors:
Kushagra Gupta,
Xinjie Liu,
Ross Allen,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors.…
▽ More
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with a local linear convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local superlinear convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods do not offer convergence rate guarantees. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria.
△ Less
Submitted 3 October, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets
Authors:
Hamzah I. Khan,
Adam J. Thorpe,
David Fridovich-Keil
Abstract:
Autonomous agents operating around human actors must consider how their behaviors might affect those humans, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods to address this problem use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivation…
▽ More
Autonomous agents operating around human actors must consider how their behaviors might affect those humans, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods to address this problem use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivations for human behavior or require significant amounts of data. In contrast, we propose a technique for modeling naturalistic behavior as a set of convex hulls computed over a relatively small dataset of human behavior. Given this set, we design an optimization-based filter which projects arbitrary trajectories into it to make them more naturalistic for autonomous agents to execute while also satisfying dynamics constraints. We demonstrate our methods on real-world human driving data from the inD intersection dataset (Bock et al., 2020).
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Sensing Resource Allocation Against Data-Poisoning Attacks in Traffic Routing
Authors:
Yue Yu,
Adam J. Thorpe,
Jesse Milzman,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Data-poisoning attacks can disrupt the efficient operations of transportation systems by misdirecting traffic flows via falsified data. One challenge in countering these attacks is to reduce the uncertainties on the types of attacks, such as the distribution of their targets and intensities. We introduce a resource allocation method in transportation networks to detect and distinguish different ty…
▽ More
Data-poisoning attacks can disrupt the efficient operations of transportation systems by misdirecting traffic flows via falsified data. One challenge in countering these attacks is to reduce the uncertainties on the types of attacks, such as the distribution of their targets and intensities. We introduce a resource allocation method in transportation networks to detect and distinguish different types of attacks and facilitate efficient traffic routing. The idea is to first cluster different types of attacks based on the corresponding optimal routing strategies, then allocate sensing resources to a subset of network links to distinguish attacks from different clusters via lexicographical mixed-integer programming. We illustrate the application of the proposed method using the Anaheim network, a benchmark model in traffic routing that contains more than 400 nodes and 900 links.
△ Less
Submitted 10 September, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Smooth Information Gathering in Two-Player Noncooperative Games
Authors:
Fernando Palafox,
Jesse Milzman,
Dong Ho Lee,
Ryan Park,
David Fridovich-Keil
Abstract:
We present a mathematical framework for modeling two-player noncooperative games in which one player is uncertain of the other player's costs but can preemptively allocate information-gathering resources to reduce this uncertainty. We refer to the players as the uncertain player (UP) and the certain player (CP), respectively. We obtain UP's decisions by solving a two-stage problem where, in Stage…
▽ More
We present a mathematical framework for modeling two-player noncooperative games in which one player is uncertain of the other player's costs but can preemptively allocate information-gathering resources to reduce this uncertainty. We refer to the players as the uncertain player (UP) and the certain player (CP), respectively. We obtain UP's decisions by solving a two-stage problem where, in Stage 1, UP allocates information-gathering resources that smoothly transform the information structure in the second stage. Then, in Stage 2, a signal (that is, a function of the Stage 1 allocation) informs UP about CP's costs, and both players execute strategies which depend upon the signal's value. This framework allows for a smooth resource allocation, in contrast to existing literature on the topic. We also identify conditions under which the gradient of UP's overall cost with respect to the information-gathering resources is well-defined. Then we provide a gradient-based algorithm to solve the two-stage game. Finally, we apply our framework to a tower-defense game which can be interpreted as a variant of a Colonel Blotto game with smooth payoff functions and uncertainty over battlefield valuations. We include an analysis of how optimal decisions shift with changes in information-gathering allocations and perturbations in the cost functions.
△ Less
Submitted 24 October, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning
Authors:
Antonio Lopez,
David Fridovich-Keil
Abstract:
Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-worl…
▽ More
Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Coordination in Noncooperative Multiplayer Matrix Games via Reduced Rank Correlated Equilibria
Authors:
Jaehan Im,
Yue Yu,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of j…
▽ More
Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of joint actions to be considered and thereby mitigates computational complexity. The idea is to approximate the set of all joint actions with the actions used in a set of pre-computed Nash equilibria via a convex hull operation. In a game with n players and each player having m actions, the proposed mechanism reduces the number of joint actions considered from O(m^n) to O(mn). We demonstrate the application of the proposed mechanism to an air traffic queue management problem. Compared with the correlated equilibrium-a popular benchmark coordination mechanism-the proposed approach is capable of solving a problem involving four thousand times more joint actions while yielding similar or better performance in terms of a fairness indicator and showing a maximum optimality gap of 0.066% in terms of the average delay cost. In the meantime, it yields a solution that shows up to 99.5% improvement in a fairness indicator and up to 50.4% reduction in average delay cost compared to the Nash solution, which does not involve coordination.
△ Less
Submitted 12 June, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Auto-Encoding Bayesian Inverse Games
Authors:
Xinjie Liu,
Lasse Peters,
Javier Alonso-Mora,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some pr…
▽ More
When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior game parameter distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning.
△ Less
Submitted 15 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
The computation of approximate feedback Stackelberg equilibria in multi-player nonlinear constrained dynamic games
Authors:
Jingqi Li,
Somayeh Sojoudi,
Claire Tomlin,
David Fridovich-Keil
Abstract:
Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing approximate local feedback Stackelberg equilibria in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solut…
▽ More
Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing approximate local feedback Stackelberg equilibria in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are primarily designed for unconstrained problems, our approach involves reformulating a feedback Stackelberg dynamic game into a sequence of nested optimization problems, enabling the derivation of Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order sufficient condition for local feedback Stackelberg equilibria. We propose a Newton-style primal-dual interior point method for solving constrained linear quadratic (LQ) feedback Stackelberg games, offering provable convergence guarantees. Our method is further extended to compute local feedback Stackelberg equilibria for more general nonlinear games by iteratively approximating them using LQ games, ensuring that their KKT conditions are locally aligned with those of the original nonlinear games. We prove the exponential convergence of our algorithm in constrained nonlinear games. In a feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs and constraints, our experimental results reveal the algorithm's ability to handle infeasible initial conditions and achieve exponential convergence towards an approximate local feedback Stackelberg equilibrium.
△ Less
Submitted 2 April, 2025; v1 submitted 28 January, 2024;
originally announced January 2024.
-
An Investigation of Time Reversal Symmetry in Reinforcement Learning
Authors:
Brett Barkley,
Amy Zhang,
David Fridovich-Keil
Abstract:
One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specific…
▽ More
One of the fundamental challenges associated with reinforcement learning (RL) is that collecting sufficient data can be both time-consuming and expensive. In this paper, we formalize a concept of time reversal symmetry in a Markov decision process (MDP), which builds upon the established structure of dynamically reversible Markov chains (DRMCs) and time-reversibility in classical physics. Specifically, we investigate the utility of this concept in reducing the sample complexity of reinforcement learning. We observe that utilizing the structure of time reversal in an MDP allows every environment transition experienced by an agent to be transformed into a feasible reverse-time transition, effectively doubling the number of experiences in the environment. To test the usefulness of this newly synthesized data, we develop a novel approach called time symmetric data augmentation (TSDA) and investigate its application in both proprioceptive and pixel-based state within the realm of off-policy, model-free RL. Empirical evaluations showcase how these synthetic transitions can enhance the sample efficiency of RL agents in time reversible scenarios without friction or contact. We also test this method in more realistic environments where these assumptions are not globally satisfied. We find that TSDA can significantly degrade sample efficiency and policy performance, but can also improve sample efficiency under the right conditions. Ultimately we conclude that time symmetry shows promise in enhancing the sample efficiency of reinforcement learning and provide guidance when the environment and reward structures are of an appropriate form for TSDA to be employed effectively.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Learning Hyperplanes for Multi-Agent Collision Avoidance in Space
Authors:
Fernando Palafox,
Yue Yu,
David Fridovich-Keil
Abstract:
A core challenge of multi-robot interactions is collision avoidance among robots with potentially conflicting objectives. We propose a game-theoretic method for collision avoidance based on rotating hyperplane constraints. These constraints ensure collision avoidance by defining separating hyperplanes that rotate around a keep-out zone centered on certain robots. Since it is challenging to select…
▽ More
A core challenge of multi-robot interactions is collision avoidance among robots with potentially conflicting objectives. We propose a game-theoretic method for collision avoidance based on rotating hyperplane constraints. These constraints ensure collision avoidance by defining separating hyperplanes that rotate around a keep-out zone centered on certain robots. Since it is challenging to select the parameters that define a hyperplane without introducing infeasibilities, we propose to learn them from an expert trajectory i.e., one collected by recording human operators. To do so, we solve for the parameters whose corresponding equilibrium trajectory best matches the expert trajectory. We validate our method by learning hyperplane parameters from noisy expert trajectories and demonstrate the generalizability of the learned parameters to scenarios with more robots and previously unseen initial conditions.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Leadership Inference for Multi-Agent Interactions
Authors:
Hamzah Khan,
David Fridovich-Keil
Abstract:
Effectively predicting intent and behavior requires inferring leadership in multi-agent interactions. Dynamic games provide an expressive theoretical framework for modeling these interactions. Employing this framework, we propose a novel method to infer the leader in a two-agent game by observing the agents' behavior in complex, long-horizon interactions. We make two contributions. First, we intro…
▽ More
Effectively predicting intent and behavior requires inferring leadership in multi-agent interactions. Dynamic games provide an expressive theoretical framework for modeling these interactions. Employing this framework, we propose a novel method to infer the leader in a two-agent game by observing the agents' behavior in complex, long-horizon interactions. We make two contributions. First, we introduce an iterative algorithm that solves dynamic two-agent Stackelberg games with nonlinear dynamics and nonquadratic costs, and demonstrate that it consistently converges. Second, we propose the Stackelberg Leadership Filter (SLF), an online method for identifying the leading agent in interactive scenarios based on observations of the game interactions. We validate the leadership filter's efficacy on simulated driving scenarios to demonstrate that the SLF can draw conclusions about leadership that match right-of-way expectations.
△ Less
Submitted 8 April, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
When Should a Leader Act Suboptimally? The Role of Inferability in Repeated Stackelberg Games
Authors:
Mustafa O. Karabag,
Sophia Smith,
Negar Mehr,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
When interacting with other decision-making agents in non-adversarial scenarios, it is critical for an autonomous agent to have inferable behavior: The agent's actions must convey their intention and strategy. We model the inferability problem using Stackelberg games with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed mixed strategy…
▽ More
When interacting with other decision-making agents in non-adversarial scenarios, it is critical for an autonomous agent to have inferable behavior: The agent's actions must convey their intention and strategy. We model the inferability problem using Stackelberg games with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed mixed strategy. The follower does not know the leader's strategy and dynamically reacts to the statistically inferred strategy based on the leader's previous actions. In the inference setting, the leader may have a lower performance compared to the setting where the follower has full information on the leader's strategy. We refer to the performance gap between these settings as the inferability gap. For a variety of game settings, we show that the inferability gap is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. We also analyze bimatrix Stackelberg games and identify a set of games where the leader's near-optimal strategy may potentially suffer from a large inferability gap.
△ Less
Submitted 31 May, 2025; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Symbolic Regression on Sparse and Noisy Data with Gaussian Processes
Authors:
Junette Hsin,
Shubhankar Agarwal,
Adam Thorpe,
Luis Sentis,
David Fridovich-Keil
Abstract:
In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equati…
▽ More
In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equations. Our approach GPSINDy offers improved robustness with sparse, noisy data compared to SINDy alone. We demonstrate its effectiveness on simulation data from Lotka-Volterra and unicycle models and hardware data from an NVIDIA JetRacer system. We show superior performance over baselines including more than 50% improvement over SINDy and other baselines in predicting future trajectories from noise-corrupted and sparse 5 Hz data.
△ Less
Submitted 10 October, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Game-theoretic Occlusion-Aware Motion Planning: an Efficient Hybrid-Information Approach
Authors:
Kushagra Gupta,
David Fridovich-Keil
Abstract:
We present a novel algorithm for game-theoretic trajectory planning, tailored for settings in which agents can only observe one another in specific regions of the state space. Such problems arise naturally in the context of multi-robot navigation, where occlusions due to environment geometry naturally mask agents' view of one another. In this paper, we formalize these settings as dynamic games wit…
▽ More
We present a novel algorithm for game-theoretic trajectory planning, tailored for settings in which agents can only observe one another in specific regions of the state space. Such problems arise naturally in the context of multi-robot navigation, where occlusions due to environment geometry naturally mask agents' view of one another. In this paper, we formalize these settings as dynamic games with a hybrid information structure, which interleaves so-called "open-loop" periods (in which agents cannot observe one another) with "feedback" periods (with full state observability). We present two main contributions. First, we study a canonical variant of these hybrid information games in which agents' dynamics are linear, and objectives are convex and quadratic. Here, we build upon classical solution methods for the open-loop and feedback variants of these games to derive an algorithm for the hybrid information case that matches the cubic runtime of the classical settings. Second, we consider a far broader class of problems in which agents' dynamics are nonlinear, and objectives are nonquadratic; we reduce these problems to sequences of hybrid information linear-quadratic games and empirically demonstrate that iteratively solving these simpler problems with the proposed algorithm yields reliable convergence to approximate Nash equilibria through simulation studies of overtaking and intersection traffic scenarios.
△ Less
Submitted 16 June, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Connected Autonomous Vehicle Motion Planning with Video Predictions from Smart, Self-Supervised Infrastructure
Authors:
Jiankai Sun,
Shreyas Kousik,
David Fridovich-Keil,
Mac Schwager
Abstract:
Connected autonomous vehicles (CAVs) promise to enhance safety, efficiency, and sustainability in urban transportation. However, this is contingent upon a CAV correctly predicting the motion of surrounding agents and planning its own motion safely. Doing so is challenging in complex urban environments due to frequent occlusions and interactions among many agents. One solution is to leverage smart…
▽ More
Connected autonomous vehicles (CAVs) promise to enhance safety, efficiency, and sustainability in urban transportation. However, this is contingent upon a CAV correctly predicting the motion of surrounding agents and planning its own motion safely. Doing so is challenging in complex urban environments due to frequent occlusions and interactions among many agents. One solution is to leverage smart infrastructure to augment a CAV's situational awareness; the present work leverages a recently proposed "Self-Supervised Traffic Advisor" (SSTA) framework of smart sensors that teach themselves to generate and broadcast useful video predictions of road users. In this work, SSTA predictions are modified to predict future occupancy instead of raw video, which reduces the data footprint of broadcast predictions. The resulting predictions are used within a planning framework, demonstrating that this design can effectively aid CAV motion planning. A variety of numerical experiments study the key factors that make SSTA outputs useful for practical CAV planning in crowded urban environments.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Risk-Minimizing Two-Player Zero-Sum Stochastic Differential Game via Path Integral Control
Authors:
Apurva Patil,
Yujing Zhou,
David Fridovich-Keil,
Takashi Tanaka
Abstract:
This paper addresses a continuous-time risk-minimizing two-player zero-sum stochastic differential game (SDG), in which each player aims to minimize its probability of failure. Failure occurs in the event when the state of the game enters into predefined undesirable domains, and one player's failure is the other's success. We derive a sufficient condition for this game to have a saddle-point equil…
▽ More
This paper addresses a continuous-time risk-minimizing two-player zero-sum stochastic differential game (SDG), in which each player aims to minimize its probability of failure. Failure occurs in the event when the state of the game enters into predefined undesirable domains, and one player's failure is the other's success. We derive a sufficient condition for this game to have a saddle-point equilibrium and show that it can be solved via a Hamilton-Jacobi-Isaacs (HJI) partial differential equation (PDE) with Dirichlet boundary condition. Under certain assumptions on the system dynamics and cost function, we establish the existence and uniqueness of the saddle-point of the game. We provide explicit expressions for the saddle-point policies which can be numerically evaluated using path integral control. This allows us to solve the game online via Monte Carlo sampling of system trajectories. We implement our control synthesis framework on two classes of risk-minimizing zero-sum SDGs: a disturbance attenuation problem and a pursuit-evasion game. Simulation studies are presented to validate the proposed control synthesis framework.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Active Inverse Learning in Stackelberg Trajectory Games
Authors:
William Ward,
Yue Yu,
Jacob Levy,
Negar Mehr,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Game-theoretic inverse learning is the problem of inferring a player's objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates best describes…
▽ More
Game-theoretic inverse learning is the problem of inferring a player's objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates best describes the follower's objective function. Instead of using passively observed trajectories like existing methods, we actively maximize the differences in the follower's trajectories under different hypotheses by optimizing the leader's control inputs. Compared with uniformly random inputs, the optimized inputs accelerate the convergence of the estimated probability of different hypotheses conditioned on the follower's trajectory. We demonstrate the proposed method in a receding-horizon repeated trajectory game and simulate the results using virtual TurtleBots in Gazebo.
△ Less
Submitted 11 October, 2024; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models
Authors:
Tyler Westenbroek,
Jacob Levy,
David Fridovich-Keil
Abstract:
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient…
▽ More
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.
△ Less
Submitted 6 November, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Contingency Games for Multi-Agent Interaction
Authors:
Lasse Peters,
Andrea Bajcsy,
Chih-Yuan Chiu,
David Fridovich-Keil,
Forrest Laine,
Laura Ferranti,
Javier Alonso-Mora
Abstract:
Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contin…
▽ More
Contingency planning, wherein an agent generates a set of possible plans conditioned on the outcome of an uncertain event, is an increasingly popular way for robots to act under uncertainty. In this work we take a game-theoretic perspective on contingency planning, tailored to multi-agent scenarios in which a robot's actions impact the decisions of other agents and vice versa. The resulting contingency game allows the robot to efficiently interact with other agents by generating strategic motion plans conditioned on multiple possible intents for other actors in the scene. Contingency games are parameterized via a scalar variable which represents a future time when intent uncertainty will be resolved. By estimating this parameter online, we construct a game-theoretic motion planner that adapts to changing beliefs while anticipating future certainty. We show that existing variants of game-theoretic planning under uncertainty are readily obtained as special cases of contingency games. Through a series of simulated autonomous driving scenarios, we demonstrate that contingency games close the gap between certainty-equivalent games that commit to a single hypothesis and non-contingent multi-hypothesis games that do not account for future uncertainty reduction.
△ Less
Submitted 21 December, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Scenario-Game ADMM: A Parallelized Scenario-Based Solver for Stochastic Noncooperative Games
Authors:
Jingqi Li,
Chih-Yuan Chiu,
Lasse Peters,
Fernando Palafox,
Mustafa Karabag,
Javier Alonso-Mora,
Somayeh Sojoudi,
Claire Tomlin,
David Fridovich-Keil
Abstract:
Decision-making in multi-player games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sam…
▽ More
Decision-making in multi-player games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sample average approximation (SAA) method and enjoys a feasibility guarantee derived from the scenario optimization literature. We characterize the sample complexity of this new game-theoretic approximation scheme, and observe that high accuracy usually requires a large number of samples, which results in a large number of sampled constraints. To accommodate this, we decompose the approximated game into a set of smaller games with few constraints for each sampled scenario, and propose a decentralized, consensus-based ADMM algorithm to efficiently compute a generalized Nash equilibrium (GNE) of the approximated game. We prove the convergence of our algorithm to a GNE and empirically demonstrate superior performance relative to a recent baseline algorithm based on ADMM and interior point method.
△ Less
Submitted 5 November, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning
Authors:
Shenghui Chen,
Yue Yu,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-…
▽ More
Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outperform those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.
△ Less
Submitted 8 September, 2023; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Inferring Occluded Agent Behavior in Dynamic Games from Noise Corrupted Observations
Authors:
Tianyu Qiu,
David Fridovich-Keil
Abstract:
In mobile robotics and autonomous driving, it is natural to model agent interactions as the Nash equilibrium of a noncooperative, dynamic game. These methods inherently rely on observations from sensors such as lidars and cameras to identify agents participating in the game and, therefore, have difficulty when some agents are occluded. To address this limitation, this paper presents an occlusion-a…
▽ More
In mobile robotics and autonomous driving, it is natural to model agent interactions as the Nash equilibrium of a noncooperative, dynamic game. These methods inherently rely on observations from sensors such as lidars and cameras to identify agents participating in the game and, therefore, have difficulty when some agents are occluded. To address this limitation, this paper presents an occlusion-aware game-theoretic inference method to estimate the locations of potentially occluded agents, and simultaneously infer the intentions of both visible and occluded agents, which best accounts for the observations of visible agents. Additionally, we propose a receding horizon planning strategy based on an occlusion-aware contingency game designed to navigate in scenarios with potentially occluded agents. Monte Carlo simulations validate our approach, demonstrating that it accurately estimates the game model and trajectories for both visible and occluded agents using noisy observations of visible agents. Our planning pipeline significantly enhances navigation safety when compared to occlusion-ignorant baseline as well.
△ Less
Submitted 13 July, 2024; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Online and Offline Learning of Player Objectives from Partial Observations in Dynamic Games
Authors:
Lasse Peters,
Vicenç Rubies-Royo,
Claire J. Tomlin,
Laura Ferranti,
Javier Alonso-Mora,
Cyrill Stachniss,
David Fridovich-Keil
Abstract:
Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address…
▽ More
Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches.
△ Less
Submitted 14 May, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
GrAVITree: Graph-based Approximate Value Function In a Tree
Authors:
Patrick H. Washington,
David Fridovich-Keil,
Mac Schwager
Abstract:
In this paper, we introduce GrAVITree, a tree- and sampling-based algorithm to compute a near-optimal value function and corresponding feedback policy for indefinite time-horizon, terminal state-constrained nonlinear optimal control problems. Our algorithm is suitable for arbitrary nonlinear control systems with both state and input constraints. The algorithm works by sampling feasible control inp…
▽ More
In this paper, we introduce GrAVITree, a tree- and sampling-based algorithm to compute a near-optimal value function and corresponding feedback policy for indefinite time-horizon, terminal state-constrained nonlinear optimal control problems. Our algorithm is suitable for arbitrary nonlinear control systems with both state and input constraints. The algorithm works by sampling feasible control inputs and branching backwards in time from the terminal state to build the tree, thereby associating each vertex in the tree with a feasible control sequence to reach the terminal state. Additionally, we embed this stochastic tree within a larger graph structure, rewiring of which enables rapid adaptation to changes in problem structure due to, e.g., newly detected obstacles. Because our method reasons about global problem structure without relying on (potentially imprecise) derivative information, it is particularly well suited to controlling a system based on an imperfect deep neural network model of its dynamics. We demonstrate this capability in the context of an inverted pendulum, where we use a learned model of the pendulum with actuator limits and achieve robust stabilization in settings where competing graph-based and derivative-based techniques fail.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories
Authors:
Jingqi Li,
Chih-Yuan Chiu,
Lasse Peters,
Somayeh Sojoudi,
Claire Tomlin,
David Fridovich-Keil
Abstract:
In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that th…
▽ More
In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Cost Design in Atomic Routing Games
Authors:
Yue Yu,
Shenghui Chen,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
An atomic routing game is a multiplayer game on a directed graph. Each player in the game chooses a path -- a sequence of links that connect its origin node to its destination node -- with the lowest cost, where the cost of each link is a function of all players' choices. We develop a novel numerical method to design the link cost function in atomic routing games such that the players' choices at…
▽ More
An atomic routing game is a multiplayer game on a directed graph. Each player in the game chooses a path -- a sequence of links that connect its origin node to its destination node -- with the lowest cost, where the cost of each link is a function of all players' choices. We develop a novel numerical method to design the link cost function in atomic routing games such that the players' choices at the Nash equilibrium minimize a given smooth performance function. This method first approximates the nonsmooth Nash equilibrium conditions with smooth ones, then iteratively improves the link cost function via implicit differentiation. We demonstrate the application of this method to atomic routing games that model noncooperative agents navigating in grid worlds.
△ Less
Submitted 17 May, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Robust Forecasting for Robotic Control: A Game-Theoretic Approach
Authors:
Shubhankar Agarwal,
David Fridovich-Keil,
Sandeep P. Chinchali
Abstract:
Modern robots require accurate forecasts to make optimal decisions in the real world. For example, self-driving cars need an accurate forecast of other agents' future actions to plan safe trajectories. Current methods rely heavily on historical time series to accurately predict the future. However, relying entirely on the observed history is problematic since it could be corrupted by noise, have o…
▽ More
Modern robots require accurate forecasts to make optimal decisions in the real world. For example, self-driving cars need an accurate forecast of other agents' future actions to plan safe trajectories. Current methods rely heavily on historical time series to accurately predict the future. However, relying entirely on the observed history is problematic since it could be corrupted by noise, have outliers, or not completely represent all possible outcomes. To solve this problem, we propose a novel framework for generating robust forecasts for robotic control. In order to model real-world factors affecting future forecasts, we introduce the notion of an adversary, which perturbs observed historical time series to increase a robot's ultimate control cost. Specifically, we model this interaction as a zero-sum two-player game between a robot's forecaster and this hypothetical adversary. We show that our proposed game may be solved to a local Nash equilibrium using gradient-based optimization techniques. Furthermore, we show that a forecaster trained with our method performs 30.14% better on out-of-distribution real-world lane change data than baselines.
△ Less
Submitted 4 April, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Alternating Direction Method of Multipliers for Decomposable Saddle-Point Problems
Authors:
Mustafa O. Karabag,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by e…
▽ More
Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by exploiting the decomposable structure. We show the convergence of this algorithm for convex-concave saddle-point problems under a mild assumption. We also provide a sufficient condition for which the assumption holds. We demonstrate the convergence properties of the saddle-point alternating direction method of multipliers with numerical examples on a power allocation problem in communication channels and a network routing problem with adversarial costs.
△ Less
Submitted 27 December, 2022; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Inverse Matrix Games with Unique Quantal Response Equilibrium
Authors:
Yue Yu,
Jonathan Salfity,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
In an inverse game problem, one needs to infer the cost function of the players in a game such that a desired joint strategy is a Nash equilibrium. We study the inverse game problem for a class of multiplayer matrix games, where the cost perceived by each player is corrupted by random noise. We provide sufficient conditions for the players' quantal response equilibrium -- a generalization of the N…
▽ More
In an inverse game problem, one needs to infer the cost function of the players in a game such that a desired joint strategy is a Nash equilibrium. We study the inverse game problem for a class of multiplayer matrix games, where the cost perceived by each player is corrupted by random noise. We provide sufficient conditions for the players' quantal response equilibrium -- a generalization of the Nash equilibrium to games with perception noise -- to be unique. We develop efficient optimization algorithms for inferring the cost matrix based on semidefinite programs and bilevel optimization. We demonstrate the application of these methods in encouraging collision avoidance and fair resource allocation.
△ Less
Submitted 13 October, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Relationship Design for Socially-Aware Behavior in Static Games
Authors:
Shenghui Chen,
Yigit E. Bayiz,
David Fridovich-Keil,
Ufuk Topcu
Abstract:
Autonomous agents can adopt socially-aware behaviors to reduce social costs, mimicking the way animals interact in nature and humans in society. We present a new approach to model socially-aware decision-making that includes two key elements: bounded rationality and inter-agent relationships. We capture the interagent relationships by introducing a novel model called a relationship game and encode…
▽ More
Autonomous agents can adopt socially-aware behaviors to reduce social costs, mimicking the way animals interact in nature and humans in society. We present a new approach to model socially-aware decision-making that includes two key elements: bounded rationality and inter-agent relationships. We capture the interagent relationships by introducing a novel model called a relationship game and encode agents' bounded rationality using quantal response equilibria. For each relationship game, we define a social cost function and formulate a mechanism design problem to optimize weights for relationships that minimize social cost at the equilibrium. We address the multiplicity of equilibria by presenting the problem in two forms: Min-Max and Min-Min, aimed respectively at minimization of the highest and lowest social costs in the equilibria. We compute the quantal response equilibrium by solving a least-squares problem defined with its Karush-Kuhn-Tucker conditions, and propose two projected gradient descent algorithms to solve the mechanism design problems. Numerical results, including two-lane congestion and congestion with an ambulance, confirm that these algorithms consistently reach the equilibrium with the intended social costs.
△ Less
Submitted 25 January, 2024; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Learning Mixed Strategies in Trajectory Games
Authors:
Lasse Peters,
David Fridovich-Keil,
Laura Ferranti,
Cyrill Stachniss,
Javier Alonso-Mora,
Forrest Laine
Abstract:
In multi-agent settings, game theory is a natural framework for describing the strategic interactions of agents whose objectives depend upon one another's behavior. Trajectory games capture these complex effects by design. In competitive settings, this makes them a more faithful interaction model than traditional "predict then plan" approaches. However, current game-theoretic planning methods have…
▽ More
In multi-agent settings, game theory is a natural framework for describing the strategic interactions of agents whose objectives depend upon one another's behavior. Trajectory games capture these complex effects by design. In competitive settings, this makes them a more faithful interaction model than traditional "predict then plan" approaches. However, current game-theoretic planning methods have important limitations. In this work, we propose two main contributions. First, we introduce an offline training phase which reduces the online computational burden of solving trajectory games. Second, we formulate a lifted game which allows players to optimize multiple candidate trajectories in unison and thereby construct more competitive "mixed" strategies. We validate our approach on a number of experiments using the pursuit-evasion game "tag."
△ Less
Submitted 3 May, 2022; v1 submitted 30 April, 2022;
originally announced May 2022.