Search | arXiv e-print repository

Distributed Event-Triggered Nash Equilibrium Seeking for Noncooperative Games

Authors: Victor Hugo Pereira Rodrigues, Tiago Roux Oliveira, Miroslav Krstic, Tamer Basar

Abstract: We propose locally convergent Nash equilibrium seeking algorithms for $N$-player noncooperative games, which use distributed event-triggered pseudo-gradient estimates. The proposed approach employs sinusoidal perturbations to estimate the pseudo-gradients of unknown quadratic payoff functions. This is the first instance of noncooperative games being tackled in a model-free fashion with event-trigg… ▽ More We propose locally convergent Nash equilibrium seeking algorithms for $N$-player noncooperative games, which use distributed event-triggered pseudo-gradient estimates. The proposed approach employs sinusoidal perturbations to estimate the pseudo-gradients of unknown quadratic payoff functions. This is the first instance of noncooperative games being tackled in a model-free fashion with event-triggered extremum seeking. Each player evaluates independently the deviation between the corresponding current pseudo-gradient estimate and its last broadcasted value from the event-triggering mechanism to tune individually the player action, while they preserve collectively the closed-loop stability/convergence. We guarantee Zeno behavior avoidance by establishing a minimum dwell-time to avoid infinitely fast switching. In particular, the stability analysis is carried out using Lyapunov's method and averaging for systems with discontinuous right-hand sides. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an oligopoly setting. △ Less

Submitted 10 May, 2025; originally announced May 2025.

arXiv:2504.09638 [pdf, other]

Data-Driven Two-Stage Distributionally Robust Dispatch of Multi-Energy Microgrid

Authors: Xunhang Sun, Xiaoyu Cao, Bo Zeng, Miaomiao Li, Xiaohong Guan, Tamer Başar

Abstract: This paper studies adaptive distributionally robust dispatch (DRD) of the multi-energy microgrid under supply and demand uncertainties. A Wasserstein ambiguity set is constructed to support data-driven decision-making. By fully leveraging the special structure of worst-case expectation from the primal perspective, a novel and high-efficient decomposition algorithm under the framework of column-and… ▽ More This paper studies adaptive distributionally robust dispatch (DRD) of the multi-energy microgrid under supply and demand uncertainties. A Wasserstein ambiguity set is constructed to support data-driven decision-making. By fully leveraging the special structure of worst-case expectation from the primal perspective, a novel and high-efficient decomposition algorithm under the framework of column-and-constraint generation is customized and developed to address the computational burden. Numerical studies demonstrate the effectiveness of our DRD approach, and shed light on the interrelationship of it with the traditional dispatch approaches through stochastic programming and robust optimization schemes. Also, comparisons with popular algorithms in the literature for two-stage distributionally robust optimization verify the powerful capacity of our algorithm in computing the DRD problem. △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2504.09035 [pdf, ps, other]

InterQ: A DQN Framework for Optimal Intermittent Control

Authors: Shubham Aggarwal, Dipankar Maity, Tamer Başar

Abstract: In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and c… ▽ More In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: Submitted to IEEE for possible publication

arXiv:2503.00313 [pdf, other]

Communication and Control Co-design in Non-cooperative Games

Authors: Shubham Aggarwal, Tamer Başar, Dipankar Maity

Abstract: In this article, we revisit a communication-control co-design problem for a class of two-player stochastic differential games on an infinite horizon. Each 'player' represents two active decision makers, namely a scheduler and a remote controller, which cooperate to optimize over a global objective while competing with the other player. Each player's scheduler can only intermittently relay state in… ▽ More In this article, we revisit a communication-control co-design problem for a class of two-player stochastic differential games on an infinite horizon. Each 'player' represents two active decision makers, namely a scheduler and a remote controller, which cooperate to optimize over a global objective while competing with the other player. Each player's scheduler can only intermittently relay state information to its respective controller due to associated cost/constraint to communication. The scheduler's policy determines the information structure at the controller, thereby affecting the quality of the control inputs. Consequently, it leads to the classical communication-control trade-off problem. A high communication frequency improves the control performance of the player on account of a higher communication cost, and vice versa. Under suitable information structures of the players, we first compute the Nash controller policies for both players in terms of the conditional estimate of the state. Consequently, we reformulate the problem of computing Nash scheduler policies (within a class of parametrized randomized policies) into solving for the steady-state solution of a generalized Sylvester equation. Since the above-mentioned reformulation involves infinite sum of powers of the policy parameters, we provide a projected gradient descent-based algorithm to numerically compute a Nash equilibrium using a truncated polynomial approximation. Finally, we demonstrate the performance of the Nash control and scheduler policies using extensive numerical simulations. △ Less

Submitted 28 February, 2025; originally announced March 2025.

Comments: Submitted to IEEE for possible publication

arXiv:2501.18718 [pdf, other]

Distributed Offloading in Multi-Access Edge Computing Systems: A Mean-Field Perspective

Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Sennur Ulukus, Tamer Başar

Abstract: Multi-access edge computing (MEC) technology is a promising solution to assist power-constrained IoT devices by providing additional computing resources for time-sensitive tasks. In this paper, we consider the problem of optimal task offloading in MEC systems with due consideration of the timeliness and scalability issues under two scenarios of equitable and priority access to the edge server (ES)… ▽ More Multi-access edge computing (MEC) technology is a promising solution to assist power-constrained IoT devices by providing additional computing resources for time-sensitive tasks. In this paper, we consider the problem of optimal task offloading in MEC systems with due consideration of the timeliness and scalability issues under two scenarios of equitable and priority access to the edge server (ES). In the first scenario, we consider a MEC system consisting of $N$ devices assisted by one ES, where the devices can split task execution between a local processor and the ES, with equitable access to the ES. In the second scenario, we consider a MEC system consisting of one primary user, $N$ secondary users and one ES. The primary user has priority access to the ES while the secondary users have equitable access to the ES amongst themselves. In both scenarios, due to the power consumption associated with utilizing the local resource and task offloading, the devices must optimize their actions. Additionally, since the ES is a shared resource, other users' offloading activity serves to increase latency incurred by each user. We thus model both scenarios using a non-cooperative game framework. However, the presence of a large number of users makes it nearly impossible to compute the equilibrium offloading policies for each user, which would require a significant information exchange overhead between users. Thus, to alleviate such scalability issues, we invoke the paradigm of mean-field games to compute approximate Nash equilibrium policies for each user using their local information, and further study the trade-offs between increasing information freshness and reducing power consumption for each user. Using numerical evaluations, we show that our approach can recover the offloading trends displayed under centralized solutions, and provide additional insights into the results obtained. △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: Submitted to IEEE for possible publication

arXiv:2501.12256 [pdf, other]

Lie-Bracket Nash Equilibrium Seeking with Bounded Update Rates for Noncooperative Games

Authors: Victor Hugo Pereira Rodrigues, Tiago Roux Oliveira, Miroslav Krstic, Tamer Basar

Abstract: This paper proposes a novel approach for local convergence to Nash equilibrium in quadratic noncooperative games based on a distributed Lie-bracket extremum seeking control scheme. This is the first instance of noncooperative games being tackled in a model-free fashion integrated with the extremum seeking method of bounded update rates. In particular, the stability analysis is carried out using Li… ▽ More This paper proposes a novel approach for local convergence to Nash equilibrium in quadratic noncooperative games based on a distributed Lie-bracket extremum seeking control scheme. This is the first instance of noncooperative games being tackled in a model-free fashion integrated with the extremum seeking method of bounded update rates. In particular, the stability analysis is carried out using Lie-bracket approximation and Lyapunov's direct method. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an example in an oligopoly setting. △ Less

Submitted 21 January, 2025; originally announced January 2025.

arXiv:2501.05660 [pdf, ps, other]

Fully Decentralized Computation Offloading in Priority-Driven Edge Computing Systems

Authors: Shubham Aggarwal, Melih Bastopcu, Muhammad Aneeq uz Zaman, Tamer Başar, Sennur Ulukus, Nail Akar

Abstract: We develop a novel framework for fully decentralized offloading policy design in multi-access edge computing (MEC) systems. The system comprises $N$ power-constrained user equipments (UEs) assisted by an edge server (ES) to process incoming tasks. Tasks are labeled with urgency flags, and in this paper, we classify them under three urgency levels, namely, high, moderate, and low urgency. We formul… ▽ More We develop a novel framework for fully decentralized offloading policy design in multi-access edge computing (MEC) systems. The system comprises $N$ power-constrained user equipments (UEs) assisted by an edge server (ES) to process incoming tasks. Tasks are labeled with urgency flags, and in this paper, we classify them under three urgency levels, namely, high, moderate, and low urgency. We formulate the problem of designing computation decisions for the UEs within a large population noncooperative game framework, where each UE selfishly decides on how to split task execution between its local onboard processor and the ES. We employ the weighted average age of information (AoI) metric to quantify information freshness at the UEs. Increased onboard processing consumes more local power, while increased offloading may potentially incur a higher average AoI due to other UEs' packets being offloaded to the same ES. Thus, we use the mean-field game (MFG) formulation to compute approximate decentralized Nash equilibrium offloading and local computation policies for the UEs to balance between the information freshness and local power consumption. Finally, we provide a projected gradient descent-based algorithm to numerically assess the merits of our approach. △ Less

Submitted 9 January, 2025; originally announced January 2025.

Comments: Submitted to IEEE for possible publication

arXiv:2412.00679 [pdf, ps, other]

Remote Estimation Games with Random Walk Processes: Stackelberg Equilibrium

Authors: Atahan Dokme, Raj Kiriti Velicheti, Melih Bastopcu, Tamer Başar

Abstract: Remote estimation is a crucial element of real time monitoring of a stochastic process. While most of the existing works have concentrated on obtaining optimal sampling strategies, motivated by malicious attacks on cyber-physical systems, we model sensing under surveillance as a game between an attacker and a defender. This introduces strategic elements to conventional remote estimation problems.… ▽ More Remote estimation is a crucial element of real time monitoring of a stochastic process. While most of the existing works have concentrated on obtaining optimal sampling strategies, motivated by malicious attacks on cyber-physical systems, we model sensing under surveillance as a game between an attacker and a defender. This introduces strategic elements to conventional remote estimation problems. Additionally, inspired by increasing detection capabilities, we model an element of information leakage for each player. Parameterizing the game in terms of uncertainty on each side, information leakage, and cost of sampling, we consider the Stackelberg Equilibrium (SE) concept where one of the players acts as the leader and the other one as the follower. By focusing our attention on stationary probabilistic sampling policies, we characterize the SE of this game and provide simulations to show the efficacy of our results. △ Less

Submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.13234 [pdf, other]

Extremum and Nash Equilibrium Seeking with Delays and PDEs: Designs & Applications

Authors: Tiago Roux Oliveira, Miroslav Krstić, Tamer Başar

Abstract: The development of extremum seeking (ES) has progressed, over the past hundred years, from static maps, to finite-dimensional dynamic systems, to networks of static and dynamic agents. Extensions from ODE dynamics to maps and agents that incorporate delays or even partial differential equations (PDEs) is the next natural step in that progression through ascending research challenges. This paper re… ▽ More The development of extremum seeking (ES) has progressed, over the past hundred years, from static maps, to finite-dimensional dynamic systems, to networks of static and dynamic agents. Extensions from ODE dynamics to maps and agents that incorporate delays or even partial differential equations (PDEs) is the next natural step in that progression through ascending research challenges. This paper reviews results on algorithm design and theory of ES for such infinite-dimensional systems. Both hyperbolic and parabolic dynamics are presented: delays or transport equations, heat-dominated equation, wave equations, and reaction-advection-diffusion equations. Nash equilibrium seeking (NES) methods are introduced for noncooperative game scenarios of the model-free kind and then specialized to single-agent optimization. Even heterogeneous PDE games, such as a duopoly with one parabolic and one hyperbolic agent, are considered. Several engineering applications are touched upon for illustration, including flow-traffic control for urban mobility, oil-drilling systems, deep-sea cable-actuated source seeking, additive manufacturing modeled by the Stefan PDE, biological reactors, light-source seeking with flexible-beam structures, and neuromuscular electrical stimulation. △ Less

Submitted 20 November, 2024; originally announced November 2024.

Comments: Preprint submitted to IEEE Control Systems Magazine (Special Issue: Into the Second Century of Extremum Seeking Control, 38 pages and 34 figures)

arXiv:2411.04913 [pdf, other]

Structure Matters: Dynamic Policy Gradient

Authors: Sara Klein, Xiangyuan Zhang, Tamer Başar, Simon Weissmann, Leif Döring

Abstract: In this work, we study $γ$-discounted infinite-horizon tabular Markov decision processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG). The framework directly integrates dynamic programming with (any) policy gradient method, explicitly leveraging the Markovian property of the environment. DynPG dynamically adjusts the problem horizon during training, decomposing the origi… ▽ More In this work, we study $γ$-discounted infinite-horizon tabular Markov decision processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG). The framework directly integrates dynamic programming with (any) policy gradient method, explicitly leveraging the Markovian property of the environment. DynPG dynamically adjusts the problem horizon during training, decomposing the original infinite-horizon MDP into a sequence of contextual bandit problems. By iteratively solving these contextual bandits, DynPG converges to the stationary optimal policy of the infinite-horizon MDP. To demonstrate the power of DynPG, we establish its non-asymptotic global convergence rate under the tabular softmax parametrization, focusing on the dependencies on salient but essential parameters of the MDP. By combining classical arguments from dynamic programming with more recent convergence arguments of policy gradient schemes, we prove that softmax DynPG scales polynomially in the effective horizon $(1-γ)^{-1}$. Our findings contrast recent exponential lower bound examples for vanilla policy gradient. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: 46 pages, 4 figures

arXiv:2411.01794 [pdf, other]

Revisiting Game-Theoretic Control in Socio-Technical Networks: Emerging Design Frameworks and Contemporary Applications

Authors: Quanyan Zhu, Tamer Başar

Abstract: Socio-technical networks represent emerging cyber-physical infrastructures that are tightly interwoven with human networks. The coupling between human and technical networks presents significant challenges in managing, controlling, and securing these complex, interdependent systems. This paper investigates game-theoretic frameworks for the design and control of socio-technical networks, with a foc… ▽ More Socio-technical networks represent emerging cyber-physical infrastructures that are tightly interwoven with human networks. The coupling between human and technical networks presents significant challenges in managing, controlling, and securing these complex, interdependent systems. This paper investigates game-theoretic frameworks for the design and control of socio-technical networks, with a focus on critical applications such as misinformation management, infrastructure optimization, and resilience in socio-cyber-physical systems (SCPS). Core methodologies, including Stackelberg games, mechanism design, and dynamic game theory, are examined as powerful tools for modeling interactions in hierarchical, multi-agent environments. Key challenges addressed include mitigating human-driven vulnerabilities, managing large-scale system dynamics, and countering adversarial threats. By bridging individual agent behaviors with overarching system goals, this work illustrates how the integration of game theory and control theory can lead to robust, resilient, and adaptive socio-technical networks. This paper highlights the potential of these frameworks to dynamically align decentralized agent actions with system-wide objectives of stability, security, and efficiency. △ Less

Submitted 5 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

arXiv:2410.19696 [pdf, other]

Age of Coded Updates In Gossip Networks Under Memory and Memoryless Schemes

Authors: Erkan Bayram, Melih Bastopcu, Mohamed-Ali Belabbas, Tamer Başar

Abstract: We consider an information update system on a gossip network, where a source node encodes information into $n$ total keys such that any subset of at least $k+1$ keys can fully reconstruct the original information. This encoding process follows the principles of a $k$-out-of-$n$ threshold system. The encoded updates are then disseminated across the network through peer-to-peer communication. We hav… ▽ More We consider an information update system on a gossip network, where a source node encodes information into $n$ total keys such that any subset of at least $k+1$ keys can fully reconstruct the original information. This encoding process follows the principles of a $k$-out-of-$n$ threshold system. The encoded updates are then disseminated across the network through peer-to-peer communication. We have two different types of nodes in a network: subscriber nodes, which receive a unique key from the source node for every status update instantaneously, and nonsubscriber nodes, which receive a unique key for an update only if the node is selected by the source, and this selection is renewed for each update. For the message structure between nodes, we consider two different schemes: a memory scheme (in which the nodes keep the source's current and previous encrypted messages) and a memoryless scheme (in which the nodes are allowed to only keep the source's current message). We measure the timeliness of information updates by using a recent performance metric, the version age of information. We present explicit formulas for the time average AoI in a scalable homogeneous network as functions of the network parameters under a memoryless scheme. Additionally, we provide strict lower and upper bounds for the time average AoI under a memory scheme. △ Less

Submitted 25 October, 2024; originally announced October 2024.

Comments: A part of this work is presented at the ACSSC24. This work has been submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2402.11462

arXiv:2408.01327 [pdf, other]

Modeling Interfering Sources in Shared Queues for Timely Computations in Edge Computing Systems

Authors: Nail Akar, Melih Bastopcu, Sennur Ulukus, Tamer Başar

Abstract: Most existing stochastic models on age of information (AoI) focus on a single shared server serving status update packets from $N>1$ sources where each packet update stream is Poisson, i.e., single-hop scenario. In the current work, we study a two-hop edge computing system for which status updates from the information sources are still Poisson but they are not immediately available at the shared e… ▽ More Most existing stochastic models on age of information (AoI) focus on a single shared server serving status update packets from $N>1$ sources where each packet update stream is Poisson, i.e., single-hop scenario. In the current work, we study a two-hop edge computing system for which status updates from the information sources are still Poisson but they are not immediately available at the shared edge server, but instead they need to first receive service from a transmission server dedicated to each source. For exponentially distributed and heterogeneous service times for both the dedicated servers and the edge server, and bufferless preemptive resource management, we develop an analytical model using absorbing Markov chains (AMC) for obtaining the distribution of AoI for any source in the system. Moreover, for a given tagged source, the traffic arriving at the shared server from the $N-1$ un-tagged sources, namely the interference traffic, is not Poisson any more, but is instead a Markov modulated Poisson process (MMPP) whose state space grows exponentially with $N$. Therefore, we propose to employ a model reduction technique that approximates the behavior of the MMPP interference traffic with two states only, making it possible to approximately obtain the AoI statistics even for a very large number of sources. Numerical examples are presented to validate the proposed exact and approximate models. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 9 pages, 4 figures

arXiv:2407.06528 [pdf, ps, other]

Semantic Communication in Multi-team Dynamic Games: A Mean Field Perspective

Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Tamer Başar

Abstract: Coordinating communication and control is a key component in the stability and performance of networked multi-agent systems. While single user networked control systems have gained a lot of attention within this domain, in this work, we address the more challenging problem of large population multi-team dynamic games. In particular, each team constitutes two decision makers (namely, the sensor and… ▽ More Coordinating communication and control is a key component in the stability and performance of networked multi-agent systems. While single user networked control systems have gained a lot of attention within this domain, in this work, we address the more challenging problem of large population multi-team dynamic games. In particular, each team constitutes two decision makers (namely, the sensor and the controller) who coordinate over a shared network to control a dynamically evolving state of interest under costs on both actuation and sensing/communication. Due to the shared nature of the wireless channel, the overall cost of each team depends on other teams' policies, thereby leading to a noncooperative game setup. Due to the presence of a large number of teams, we compute approximate decentralized Nash equilibrium policies for each team using the paradigm of (extended) mean-field games, which is governed by (1) the mean traffic flowing over the channel, and (2) the value of information at the sensor, which highlights the semantic nature of the ensuing communication. In the process, we compute optimal controller policies and approximately optimal sensor policies for each representative team of the mean-field system to alleviate the problem of general non-contractivity of the mean-field fixed point operator associated with the finite cardinality of the sensor action space. Consequently, we also prove the $ε$--Nash property of the mean-field equilibrium solution which essentially characterizes how well the solution derived using mean-field analysis performs on the finite-team system. Finally, we provide extensive numerical simulations, which corroborate the theoretical findings and lead to additional insights on the properties of the results presented. △ Less

Submitted 24 June, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2406.13992 [pdf, ps, other]

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Authors: Muhammad Aneeq uz Zaman, Mathieu Laurière, Alec Koppel, Tamer Başar

Abstract: In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainti… ▽ More In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainties, we formulate the problem in a worst-case (minimax) framework, which is is intractable in general. Thus, we focus on the Linear Quadratic setting to derive benchmark solutions. First, since no standard theory exists for this problem due to the distributed information structure, we utilize the Mean-Field Type Game (MFTG) paradigm to establish guarantees on the solution quality in the sense of achieved Nash equilibrium of the MFTG. This in turn allows us to compare the performance against the corresponding original robust multi-agent control problem. Then, we propose a Receding-horizon Gradient Descent Ascent RL algorithm to find the MFTG Nash equilibrium and we prove a non-asymptotic rate of convergence. Finally, we provide numerical experiments to demonstrate the efficacy of our approach relative to a baseline algorithm. △ Less

Submitted 12 June, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for publication in L4DC 2024. Moved Disclaimer from footnote to unnumbered section

arXiv:2406.05632 [pdf, ps, other]

Best Response Strategies for Asymmetric Sensing in Linear-Quadratic Differential Games

Authors: Shubham Aggarwal, Tamer Başar, Dipankar Maity

Abstract: In this paper, we revisit the two-player continuous-time infinite-horizon linear quadratic differential game problem, where one of the players can sample the state of the system only intermittently due to a sensing constraint while the other player can do so continuously. Under these asymmetric sensing limitations between the players, we analyze the optimal sensing and control strategies for the p… ▽ More In this paper, we revisit the two-player continuous-time infinite-horizon linear quadratic differential game problem, where one of the players can sample the state of the system only intermittently due to a sensing constraint while the other player can do so continuously. Under these asymmetric sensing limitations between the players, we analyze the optimal sensing and control strategies for the player at a disadvantage while the other player continues to play its security strategy. We derive an optimal sensor policy within the class of stationary randomized policies. Finally, using simulations, we show that the expected cost accrued by the first player approaches its security level as its sensing limitation is relaxed. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: Accepted to IEEE L-CSS

arXiv:2405.15762 [pdf, other]

Sliding-Mode Nash Equilibrium Seeking for a Quadratic Duopoly Game

Authors: Victor Hugo Pereira Rodrigues, Tiago Roux Oliveira, Miroslav Krstić, Tamer Başar

Abstract: This paper introduces a new method to achieve stable convergence to Nash equilibrium in duopoly noncooperative games. Inspired by the recent fixed-time Nash Equilibrium seeking (NES) as well as prescribed-time extremum seeking (ES) and source seeking schemes, our approach employs a distributed sliding mode control (SMC) scheme, integrating extremum seeking with sinusoidal perturbation signals to e… ▽ More This paper introduces a new method to achieve stable convergence to Nash equilibrium in duopoly noncooperative games. Inspired by the recent fixed-time Nash Equilibrium seeking (NES) as well as prescribed-time extremum seeking (ES) and source seeking schemes, our approach employs a distributed sliding mode control (SMC) scheme, integrating extremum seeking with sinusoidal perturbation signals to estimate the pseudogradients of quadratic payoff functions. Notably, this is the first attempt to address noncooperative games without relying on models, combining classical extremum seeking with relay components instead of proportional control laws. We prove finite-time convergence of the closed-loop average system to Nash equilibrium using stability analysis techniques such as time-scaling, Lyapunov's direct method, and averaging theory for discontinuous systems. Additionally, we quantify the size of residual sets around the Nash equilibrium and validate our theoretical results through simulations. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 8 pages and 2 figures. arXiv admin note: substantial text overlap with arXiv:2404.07287

MSC Class: 91Axx; 91A05; 91A10; 93-XX; 93B52; 93C40; 93D30

arXiv:2405.00665 [pdf, other]

Optimizing Profitability in Timely Gossip Networks

Authors: Priyanka Kaswan, Melih Bastopcu, Sennur Ulukus, S. Rasoul Etesami, Tamer Başar

Abstract: We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time. The users wish to maintain their expected version ages below a threshold, and can either rely on gossip from their neighbors or directly subscribe to a server publishing about the event, if the former option do… ▽ More We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time. The users wish to maintain their expected version ages below a threshold, and can either rely on gossip from their neighbors or directly subscribe to a server publishing about the event, if the former option does not meet the timeliness requirements. The server wishes to maximize its profit by increasing subscriptions from users and minimizing event sampling frequency to reduce costs. This leads to a Stackelberg game between the server and the users where the sender is the leader deciding its sampling frequency and the users are the followers deciding their subscription strategies. We investigate equilibrium strategies for low-connectivity and high-connectivity topologies. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.16009 [pdf, other]

How to Make Money From Fresh Data: Subscription Strategies in Age-Based Systems

Authors: Priyanka Kaswan, Melih Bastopcu, Sennur Ulukus, S. Rasoul Etesami, Tamer Başar

Abstract: We consider a communication system consisting of a server that tracks and publishes updates about a time-varying data source or event, and a gossip network of users interested in closely tracking the event. The timeliness of the information is measured through the version age of information. The users wish to have their expected version ages remain below a threshold, and have the option to either… ▽ More We consider a communication system consisting of a server that tracks and publishes updates about a time-varying data source or event, and a gossip network of users interested in closely tracking the event. The timeliness of the information is measured through the version age of information. The users wish to have their expected version ages remain below a threshold, and have the option to either rely on gossip from their neighbors or subscribe to the server directly to follow updates about the event if the former option does not meet the timeliness requirements. The server wishes to maximize its profit by increasing the number of subscribers and reducing costs associated with the frequent sampling of the event. We model the problem setup as a Stackelberg game between the server and the users, where the server commits to a frequency of sampling the event, and the users make decisions on whether to subscribe or not. As an initial work, we focus on directed networks with unidirectional flow of information and obtain the optimal equilibrium strategies for all the players. We provide simulation results to confirm the theoretical findings and provide additional insights. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.11013 [pdf, other]

Control Theoretic Approach to Fine-Tuning and Transfer Learning

Authors: Erkan Bayram, Shenyu Liu, Mohamed-Ali Belabbas, Tamer Başar

Abstract: Given a training set in the form of a paired $(\mathcal{X},\mathcal{Y})$, we say that the control system $\dot x = f(x,u)$ has learned the paired set via the control $u^*$ if the system steers each point of $\mathcal{X}$ to its corresponding target in $\mathcal{Y}$. If the training set is expanded, most existing methods for finding a new control $u^*$ require starting from scratch, resulting in a… ▽ More Given a training set in the form of a paired $(\mathcal{X},\mathcal{Y})$, we say that the control system $\dot x = f(x,u)$ has learned the paired set via the control $u^*$ if the system steers each point of $\mathcal{X}$ to its corresponding target in $\mathcal{Y}$. If the training set is expanded, most existing methods for finding a new control $u^*$ require starting from scratch, resulting in a quadratic increase in complexity with the number of points. To overcome this limitation, we introduce the concept of $\textit{ tuning without forgetting}$. We develop $\textit{an iterative algorithm}$ to tune the control $u^*$ when the training set expands, whereby points already in the paired set are still matched, and new training samples are learned. At each update of our method, the control $u^*$ is projected onto the kernel of the end-point mapping generated by the controlled dynamics at the learned samples. It ensures keeping the end-points for the previously learned samples constant while iteratively learning additional samples. △ Less

Submitted 19 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.08509 [pdf, other]

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

Authors: Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer

Abstract: Large language models (LLMs) have been driving a new wave of interactive AI applications across numerous domains. However, efficiently serving LLM inference requests is challenging due to their unpredictable execution times originating from the autoregressive nature of generative models. Existing LLM serving systems exploit first-come-first-serve (FCFS) scheduling, suffering from head-of-line bloc… ▽ More Large language models (LLMs) have been driving a new wave of interactive AI applications across numerous domains. However, efficiently serving LLM inference requests is challenging due to their unpredictable execution times originating from the autoregressive nature of generative models. Existing LLM serving systems exploit first-come-first-serve (FCFS) scheduling, suffering from head-of-line blocking issues. To address the non-deterministic nature of LLMs and enable efficient interactive LLM serving, we present a speculative shortest-job-first (SSJF) scheduler that uses a light proxy model to predict LLM output sequence lengths. Our open-source SSJF implementation does not require changes to memory management or batching strategies. Evaluations on real-world datasets and production workload traces show that SSJF reduces average job completion times by 30.5-39.6% and increases throughput by 2.2-3.6x compared to FCFS schedulers, across no batching, dynamic batching, and continuous batching settings. △ Less

Submitted 25 November, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted at AIOps'24

arXiv:2404.07287 [pdf, other]

Nash Equilibrium Seeking for Noncooperative Duopoly Games via Event-Triggered Control

Authors: Victor Hugo Pereira Rodrigues, Tiago Roux Oliveira, Miroslav Krstić, Tamer Başar

Abstract: This paper proposes a novel approach for locally stable convergence to Nash equilibrium in duopoly noncooperative games based on a distributed event-triggered control scheme. The proposed approach employs extremum seeking, with sinusoidal perturbation signals applied to estimate the Gradient (first derivative) of unknown quadratic payoff functions. This is the first instance of noncooperative game… ▽ More This paper proposes a novel approach for locally stable convergence to Nash equilibrium in duopoly noncooperative games based on a distributed event-triggered control scheme. The proposed approach employs extremum seeking, with sinusoidal perturbation signals applied to estimate the Gradient (first derivative) of unknown quadratic payoff functions. This is the first instance of noncooperative games being tackled in a model-free fashion integrated with the event-triggered methodology. Each player evaluates independently the deviation between the corresponding current state variable and its last broadcasted value to update the player action, while they preserve control performance under limited bandwidth of the actuation paths and still guarantee stability for the closed-loop dynamics. In particular, the stability analysis is carried out using time-scaling technique, Lyapunov's direct method and averaging theory for discontinuous systems. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an example. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.02898 [pdf, ps, other]

Fully Decentralized Task Offloading in Multi-Access Edge Computing Systems

Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Sennur Ulukus, Tamer Başar

Abstract: We consider the problem of task offloading in multi-access edge computing (MEC) systems constituting $N$ devices assisted by an edge server (ES), where the devices can split task execution between a local processor and the ES. Since the local task execution and communication with the ES both consume power, each device must judiciously choose between the two. We model the problem as a large populat… ▽ More We consider the problem of task offloading in multi-access edge computing (MEC) systems constituting $N$ devices assisted by an edge server (ES), where the devices can split task execution between a local processor and the ES. Since the local task execution and communication with the ES both consume power, each device must judiciously choose between the two. We model the problem as a large population non-cooperative game among the $N$ devices. Since computation of an equilibrium in this scenario is difficult due to the presence of a large number of devices, we employ the mean-field game framework to reduce the finite-agent game problem to a generic user's multi-objective optimization problem, with a coupled consistency condition. By leveraging the novel age of information (AoI) metric, we invoke techniques from stochastic hybrid systems (SHS) theory and study the tradeoffs between increasing information freshness and reducing power consumption. In numerical simulations, we validate that a higher load at the ES may lead devices to upload their task to the ES less often. △ Less

Submitted 28 October, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Globecom Workshops 2024

arXiv:2404.02407 [pdf, other]

Decision Transformer as a Foundation Model for Partially Observable Continuous Control

Authors: Xiangyuan Zhang, Weichao Mao, Haoran Qiu, Tamer Başar

Abstract: Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Sp… ▽ More Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained language models, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to CDC 2024

arXiv:2404.00568 [pdf, other]

doi 10.1109/TSG.2024.3451993

Stochastic-Robust Planning of Networked Hydrogen-Electrical Microgrids: A Study on Induced Refueling Demand

Authors: Xunhang Sun, Xiaoyu Cao, Bo Zeng, Qiaozhu Zhai, Tamer Başar, Xiaohong Guan

Abstract: Hydrogen-electrical microgrids are increasingly assuming an important role on the pathway toward decarbonization of energy and transportation systems. This paper studies networked hydrogen-electrical microgrids planning (NHEMP), considering a critical but often-overlooked issue, i.e., the demand-inducing effect (DIE) associated with infrastructure development decisions. Specifically, higher refuel… ▽ More Hydrogen-electrical microgrids are increasingly assuming an important role on the pathway toward decarbonization of energy and transportation systems. This paper studies networked hydrogen-electrical microgrids planning (NHEMP), considering a critical but often-overlooked issue, i.e., the demand-inducing effect (DIE) associated with infrastructure development decisions. Specifically, higher refueling capacities will attract more refueling demand of hydrogen-powered vehicles (HVs). To capture such interactions between investment decisions and induced refueling demand, we introduce a decision-dependent uncertainty (DDU) set and build a trilevel stochastic-robust formulation. The upper-level determines optimal investment strategies for hydrogen-electrical microgrids, the lower-level optimizes the risk-aware operation schedules across a series of stochastic scenarios, and, for each scenario, the middle-level identifies the "worst" situation of refueling demand within an individual DDU set to ensure economic feasibility. Then, an adaptive and exact decomposition algorithm, based on Parametric Column-and-Constraint Generation (PC&CG), is customized and developed to address the computational challenge and to quantitatively analyze the impact of DIE. Case studies on an IEEE exemplary system validate the effectiveness of the proposed NHEMP model and the PC&CG algorithm. It is worth highlighting that DIE can make an important contribution to the economic benefits of NHEMP, yet its significance will gradually decrease when the main bottleneck transits to other system restrictions. △ Less

Submitted 27 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Journal ref: IEEE Transactions on Smart Grid, 16(1), 115-130, 2025

arXiv:2404.00045 [pdf, ps, other]

Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

Authors: Muhammad Aneeq uz Zaman, Shubham Aggarwal, Melih Bastopcu, Tamer Başar

Abstract: In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimizatio… ▽ More In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a $δ$-augmentation technique, which facilitates the achievement of an $ε$-NE within the game. △ Less

Submitted 13 September, 2024; v1 submitted 25 March, 2024; originally announced April 2024.

Comments: Accepted for Conference on Decision and Control 2024

arXiv:2403.11345 [pdf, other]

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Authors: Muhammad Aneeq uz Zaman, Alec Koppel, Mathieu Laurière, Tamer Başar

Abstract: We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in th… ▽ More We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTG). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be $O(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRNPG), where each team minimizes its cumulative cost \emph{independently} in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which \emph{independent natural policy gradient} is shown to exhibit linear convergence under time-independent diagonal dominance. Numerical studies included corroborate the theoretical results. △ Less

Submitted 8 February, 2025; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.08741 [pdf, ps, other]

Learning How to Strategically Disclose Information

Authors: Raj Kiriti Velicheti, Melih Bastopcu, S. Rasoul Etesami, Tamer Başar

Abstract: Strategic information disclosure, in its simplest form, considers a game between an information provider (sender) who has access to some private information that an information receiver is interested in. While the receiver takes an action that affects the utilities of both players, the sender can design information (or modify beliefs) of the receiver through signal commitment, hence posing a Stack… ▽ More Strategic information disclosure, in its simplest form, considers a game between an information provider (sender) who has access to some private information that an information receiver is interested in. While the receiver takes an action that affects the utilities of both players, the sender can design information (or modify beliefs) of the receiver through signal commitment, hence posing a Stackelberg game. However, obtaining a Stackelberg equilibrium for this game traditionally requires the sender to have access to the receiver's objective. In this work, we consider an online version of information design where a sender interacts with a receiver of an unknown type who is adversarially chosen at each round. Restricting attention to Gaussian prior and quadratic costs for the sender and the receiver, we show that $\mathcal{O}(\sqrt{T})$ regret is achievable with full information feedback, where $T$ is the total number of interactions between the sender and the receiver. Further, we propose a novel parametrization that allows the sender to achieve $\mathcal{O}(\sqrt{T})$ regret for a general convex utility function. We then consider the Bayesian Persuasion problem with an additional cost term in the objective function, which penalizes signaling policies that are more informative and obtain $\mathcal{O}(\log(T))$ regret. Finally, we establish a sublinear regret bound for the partial information feedback setting and provide simulations to support our theoretical results. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07890 [pdf, other]

$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games

Authors: Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Tamer Başar

Abstract: No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analog… ▽ More No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings. △ Less

Submitted 23 April, 2024; v1 submitted 2 February, 2024; originally announced March 2024.

arXiv:2403.06299 [pdf, other]

Disentangling Resilience from Robustness: Contextual Dualism, Interactionism, and Game-Theoretic Paradigms

Authors: Quanyan Zhu, Tamer Basar

Abstract: This article explains the distinctions between robustness and resilience in control systems. Resilience confronts a distinct set of challenges, posing new ones for designing controllers for feedback systems, networks, and machines that prioritize resilience over robustness. The concept of resilience is explored through a three-stage model, emphasizing the need for a proactive preparation and autom… ▽ More This article explains the distinctions between robustness and resilience in control systems. Resilience confronts a distinct set of challenges, posing new ones for designing controllers for feedback systems, networks, and machines that prioritize resilience over robustness. The concept of resilience is explored through a three-stage model, emphasizing the need for a proactive preparation and automated response to elastic events. A toy model is first used to illustrate the tradeoffs between resilience and robustness. Then, it delves into contextual dualism and interactionism, and introduces game-theoretic paradigms as a unifying framework to consolidate resilience and robustness. The article concludes by discussing the interplay between robustness and resilience, suggesting that a comprehensive theory of resilience and quantification metrics, and formalization through game-theoretic frameworks are necessary. The exploration extends to system-of-systems resilience and various mechanisms, including the integration of AI techniques and non-technical solutions, like cyber insurance, to achieve comprehensive resilience in control systems. As we approach 2030, the systems and control community is at the opportune moment to lay scientific foundations of resilience by bridging feedback control theory, game theory, and learning theory. Resilient control systems will enhance overall quality of life, enable the development of a resilient society, and create a societal-scale impact amid global challenges such as climate change, conflicts, and cyber insecurity. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.01005 [pdf, other]

Policy Optimization for PDE Control with a Warm Start

Authors: Xiangyuan Zhang, Saviz Mowlavi, Mouhacine Benosman, Tamer Başar

Abstract: Dimensionality reduction is crucial for controlling nonlinear partial differential equations (PDE) through a "reduce-then-design" strategy, which identifies a reduced-order model and then implements model-based control solutions. However, inaccuracies in the reduced-order modeling can substantially degrade controller performance, especially in PDEs with chaotic behavior. To address this issue, we… ▽ More Dimensionality reduction is crucial for controlling nonlinear partial differential equations (PDE) through a "reduce-then-design" strategy, which identifies a reduced-order model and then implements model-based control solutions. However, inaccuracies in the reduced-order modeling can substantially degrade controller performance, especially in PDEs with chaotic behavior. To address this issue, we augment the reduce-then-design procedure with a policy optimization (PO) step. The PO step fine-tunes the model-based controller to compensate for the modeling error from dimensionality reduction. This augmentation shifts the overall strategy into reduce-then-design-then-adapt, where the model-based controller serves as a warm start for PO. Specifically, we study the state-feedback tracking control of PDEs that aims to align the PDE state with a specific constant target subject to a linear-quadratic cost. Through extensive experiments, we show that a few iterations of PO can significantly improve the model-based controller performance. Our approach offers a cost-effective alternative to PDE control using end-to-end reinforcement learning. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.11462 [pdf, ps, other]

Age of $k$-out-of-$n$ Systems on a Gossip Network

Authors: Erkan Bayram, Melih Bastopcu, Mohamed-Ali Belabbas, Tamer Başar

Abstract: We consider information update systems on a gossip network, which consists of a single source and $n$ receiver nodes. The source encrypts the information into $n$ distinct keys with version stamps, sending a unique key to each node. For decoding the information in a $k$-out-of-$n$ system, each receiver node requires at least $k+1$ different keys with the same version, shared over peer-to-peer conn… ▽ More We consider information update systems on a gossip network, which consists of a single source and $n$ receiver nodes. The source encrypts the information into $n$ distinct keys with version stamps, sending a unique key to each node. For decoding the information in a $k$-out-of-$n$ system, each receiver node requires at least $k+1$ different keys with the same version, shared over peer-to-peer connections. Each node determines $k$ based on a given function, ensuring that as $k$ increases, the precision of the decoded information also increases. We consider two different schemes: a memory scheme (in which the nodes keep the source's current and previous encrypted messages) and a memoryless scheme (in which the nodes are allowed to only keep the source's current message). We measure the ''timeliness'' of information updates by using the $k$-keys version age of information. Our work focuses on determining closed-form expressions for the time average age of information in a heterogeneous random graph under both with memory and memoryless schemes. △ Less

Submitted 17 September, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted for publication in ACSSC24

arXiv:2311.18736 [pdf, other]

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

Authors: Xiangyuan Zhang, Weichao Mao, Saviz Mowlavi, Mouhacine Benosman, Tamer Başar

Abstract: We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-dimensional partial differential equation (PDE)-based control problems. Integrated within the OpenAI Gym/Gymnasium (Gym) framework, controlgym allows direct applications of standard reinforcement learning (RL) algorithms like stable-baselines3. Our control environments complement those in Gym with contin… ▽ More We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-dimensional partial differential equation (PDE)-based control problems. Integrated within the OpenAI Gym/Gymnasium (Gym) framework, controlgym allows direct applications of standard reinforcement learning (RL) algorithms like stable-baselines3. Our control environments complement those in Gym with continuous, unbounded action and observation spaces, motivated by real-world control applications. Moreover, the PDE control environments uniquely allow the users to extend the state dimensionality of the system to infinity while preserving the intrinsic dynamics. This feature is crucial for evaluating the scalability of RL algorithms for control. This project serves the learning for dynamics & control (L4DC) community, aiming to explore key questions: the convergence of RL algorithms in learning control policies; the stability and robustness issues of learning-based controllers; and the scalability of RL algorithms to high- and potentially infinite-dimensional systems. We open-source the controlgym project at https://github.com/xiangyuan-zhang/controlgym. △ Less

Submitted 23 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 25 pages, 16 figures

arXiv:2311.04455 [pdf, ps, other]

Vector-Valued Gossip over $w$-Holonomic Networks

Authors: Erkan Bayram, Mohamed-Ali Belabbas, Tamer Başar

Abstract: We study the weighted average consensus problem for a gossip network of agents with vector-valued states. For a given matrix-weighted graph, the gossip process is described by a sequence of pairs of adjacent agents communicating and updating their states based on the edge matrix weight. Our key contribution is providing conditions for the convergence of this non-homogeneous Markov process as well… ▽ More We study the weighted average consensus problem for a gossip network of agents with vector-valued states. For a given matrix-weighted graph, the gossip process is described by a sequence of pairs of adjacent agents communicating and updating their states based on the edge matrix weight. Our key contribution is providing conditions for the convergence of this non-homogeneous Markov process as well as the characterization of its limit set. To this end, we introduce the notion of "$w$-holonomy" of a set of stochastic matrices, which enables the characterization of sequences of gossiping pairs resulting in reaching a desired consensus in a decentralized manner. Stated otherwise, our result characterizes the limiting behavior of infinite products of (non-commuting, possibly with absorbing states) stochastic matrices. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.13853 [pdf, other]

A Discrete-time Networked Competitive Bivirus SIS Model

Authors: Sebin Gracy, Ji Liu, Tamer Basar, Cesar A. Uribe

Abstract: The paper deals with the analysis of a discrete-time networked competitive bivirus susceptible-infected-susceptible (SIS) model. More specifically, we suppose that virus 1 and virus 2 are circulating in the population and are in competition with each other. We show that the model is strongly monotone, and that, under certain assumptions, it does not admit any periodic orbit. We identify a sufficie… ▽ More The paper deals with the analysis of a discrete-time networked competitive bivirus susceptible-infected-susceptible (SIS) model. More specifically, we suppose that virus 1 and virus 2 are circulating in the population and are in competition with each other. We show that the model is strongly monotone, and that, under certain assumptions, it does not admit any periodic orbit. We identify a sufficient condition for exponential convergence to the disease-free equilibrium (DFE). Assuming only virus 1 (resp. virus 2) is alive, we establish a condition for global asymptotic convergence to the single-virus endemic equilibrium of virus 1 (resp. virus 2) -- our proof does not rely on the construction of a Lyapunov function. Assuming both virus 1 and virus 2 are alive, we establish a condition which ensures local exponential convergence to the single-virus equilibrium of virus 1 (resp. virus 2). Finally, we provide a sufficient (resp. necessary) condition for the existence of a coexistence equilibrium. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.15423 [pdf, other]

Prosumers Participation in Markets: A Scalar-Parameterized Function Bidding Approach

Authors: Abdullah Alawad, Muhammad Aneeq uz Zaman, Khaled Alshehri, Tamer Başar

Abstract: In uniform-price markets, suppliers compete to supply a resource to consumers, resulting in a single market price determined by their competition. For sufficient flexibility, producers and consumers prefer to commit to a function as their strategies, indicating their preferred quantity at any given market price. Producers and consumers may wish to act as both, i.e., prosumers. In this paper, we ex… ▽ More In uniform-price markets, suppliers compete to supply a resource to consumers, resulting in a single market price determined by their competition. For sufficient flexibility, producers and consumers prefer to commit to a function as their strategies, indicating their preferred quantity at any given market price. Producers and consumers may wish to act as both, i.e., prosumers. In this paper, we examine the behavior of profit-maximizing prosumers in a uniform-price market for resource allocation with the objective of maximizing the social welfare. We propose a scalar-parameterized function bidding mechanism for the prosumers, in which we establish the existence and uniqueness of Nash equilibrium. Furthermore, we provide an efficient way to compute the Nash equilibrium through the computation of the market allocation at the Nash equilibrium. Finally, we present a case study to illustrate the welfare loss under different variations of market parameters, such as the market's supply capacity and inelastic demand. △ Less

Submitted 14 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Corrected typos in the figures

arXiv:2309.14317 [pdf, ps, other]

Online and Offline Dynamic Influence Maximization Games Over Social Networks

Authors: Melih Bastopcu, S. Rasoul Etesami, Tamer Başar

Abstract: In this work, we consider dynamic influence maximization games over social networks with multiple players (influencers). The goal of each influencer is to maximize their own reward subject to their limited total budget rate constraints. Thus, influencers need to carefully design their investment policies considering individuals' opinion dynamics and other influencers' investment strategies, leadin… ▽ More In this work, we consider dynamic influence maximization games over social networks with multiple players (influencers). The goal of each influencer is to maximize their own reward subject to their limited total budget rate constraints. Thus, influencers need to carefully design their investment policies considering individuals' opinion dynamics and other influencers' investment strategies, leading to a dynamic game problem. We first consider the case of a single influencer who wants to maximize its utility subject to a total budget rate constraint. We study both offline and online versions of the problem where the opinion dynamics are either known or not known a priori. In the singe-influencer case, we propose an online no-regret algorithm, meaning that as the number of campaign opportunities grows, the average utilities obtained by the offline and online solutions converge. Then, we consider the game formulation with multiple influencers in offline and online settings. For the offline setting, we show that the dynamic game admits a unique Nash equilibrium policy and provide a method to compute it. For the online setting and with two influencers, we show that if each influencer applies the same no-regret online algorithm proposed for the single-influencer maximization problem, they will converge to the set of $ε$-Nash equilibrium policies where $ε=O(\frac{1}{\sqrt{K}})$ scales in average inversely with the number of campaign times $K$ considering the average utilities of the influencers. Moreover, we extend this result to any finite number of influencers under more strict requirements on the information structure. Finally, we provide numerical analysis to validate our results under various settings. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: This work has been submitted to IEEE for possible publication

arXiv:2309.04831 [pdf, other]

Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs

Authors: Xiangyuan Zhang, Saviz Mowlavi, Mouhacine Benosman, Tamer Başar

Abstract: We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrat… ▽ More We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrate vanilla PG (or any other policy search directions) into a dynamic programming outer loop, which iteratively decomposes the infinite-horizon KF problem that is constrained and non-convex in the policy parameter into a sequence of static estimation problems that are unconstrained and strongly-convex, thus enabling global convergence. We further provide fine-grained analyses of the optimization landscape under RHPG and detail the convergence and sample complexity guarantees of the algorithm. This work serves as an initial attempt to develop reinforcement learning algorithms specifically for control applications with performance guarantees by utilizing classic control theory in both algorithmic design and theoretical analyses. Lastly, we validate our theories by deploying the RHPG algorithm to learn the Kalman filter design of a large-scale convection-diffusion model. We open-source the code repository at \url{https://github.com/xiangyuan-zhang/LearningKF}. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.12624

arXiv:2306.14886 [pdf, ps, other]

Value of Information in Games with Multiple Strategic Information Providers

Authors: Raj Kiriti Velicheti, Melih Bastopcu, Tamer Başar

Abstract: In the classical communication setting multiple senders having access to the same source of information and transmitting it over channel(s) to a receiver in general leads to a decrease in estimation error at the receiver as compared with the single sender case. However, if the objectives of the information providers are different from that of the estimator, this might result in interesting strateg… ▽ More In the classical communication setting multiple senders having access to the same source of information and transmitting it over channel(s) to a receiver in general leads to a decrease in estimation error at the receiver as compared with the single sender case. However, if the objectives of the information providers are different from that of the estimator, this might result in interesting strategic interactions and outcomes. In this work, we consider a hierarchical signaling game between multiple senders (information designers) and a single receiver (decision maker) each having their own, possibly misaligned, objectives. The senders lead the game by committing to individual information disclosure policies simultaneously, within the framework of a non-cooperative Nash game among themselves. This is followed by the receiver's action decision. With Gaussian information structure and quadratic objectives (which depend on underlying state and receiver's action) for all the players, we show that in general the equilibrium is not unique. We hence identify a set of equilibria and further show that linear noiseless policies can achieve a minimal element of this set. Additionally, we show that competition among the senders is beneficial to the receiver, as compared with cooperation among the senders. Further, we extend our analysis to a dynamic signaling game of finite horizon with Markovian information evolution. We show that linear memoryless policies can achieve equilibrium in this dynamic game. We also consider an extension to a game with multiple receivers having coupled objectives. We provide algorithms to compute the equilibrium strategies in all these cases. Finally, via extensive simulations, we analyze the effects of multiple senders in varying degrees of alignment among their objectives. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: This work has been submitted for possible journal publication

arXiv:2305.09068 [pdf, other]

Analysis, Control, and State Estimation for the Networked Competitive Multi-Virus SIR Model

Authors: Ciyuan Zhang, Sebin Gracy, Tamer Basar, Philip E. Pare

Abstract: This paper proposes a novel discrete-time multi-virus susceptible-infected-recovered (SIR) model that captures the spread of competing epidemics over a population network. First, we provide sufficient conditions for the infection level of all the viruses over the networked model to converge to zero in exponential time. Second, we propose an observation model which captures the summation of all the… ▽ More This paper proposes a novel discrete-time multi-virus susceptible-infected-recovered (SIR) model that captures the spread of competing epidemics over a population network. First, we provide sufficient conditions for the infection level of all the viruses over the networked model to converge to zero in exponential time. Second, we propose an observation model which captures the summation of all the viruses' infection levels in each node, which represents the individuals who are infected by different viruses but share similar symptoms. Third, we present a sufficient condition for the model to be strongly locally observable, assuming that the network has only infected or recovered individuals. Fourth, we propose a Luenberger observer for estimating the states of our system. We prove that the estimation error of our proposed estimator converges to zero asymptotically with the observer gain. Finally, we present a distributed feedback controller which guarantees that each virus dies out at an exponential rate. We then show via simulations that the estimation error of the Luenberger observer converges to zero before the viruses die out. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2204.00708

arXiv:2303.09515 [pdf, ps, other]

Large Population Games on Constrained Unreliable Networks

Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Tamer Başar

Abstract: This paper studies an $N$--agent cost-coupled game where the agents are connected via an unreliable capacity constrained network. Each agent receives state information over that network which loses packets with probability $p$. A Base station (BS) actively schedules agent communications over the network by minimizing a weighted Age of Information (WAoI) based cost function under a capacity limit… ▽ More This paper studies an $N$--agent cost-coupled game where the agents are connected via an unreliable capacity constrained network. Each agent receives state information over that network which loses packets with probability $p$. A Base station (BS) actively schedules agent communications over the network by minimizing a weighted Age of Information (WAoI) based cost function under a capacity limit $\mathcal{C} < N$ on the number of transmission attempts at each instant. Under a standard information structure, we show that the problem can be decoupled into a scheduling problem for the BS and a game problem for the $N$ agents. Since the scheduling problem is an NP hard combinatorics problem, we propose an approximately optimal solution which approaches the optimal solution as $N \rightarrow \infty$. In the process, we also provide some insights on the case without channel erasure. Next, to solve the large population game problem, we use the mean-field game framework to compute an approximate decentralized Nash equilibrium. Finally, we validate the theoretical results using a numerical example. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Submitted to IEEE for possible publication

arXiv:2302.13144 [pdf, other]

Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient

Authors: Xiangyuan Zhang, Tamer Başar

Abstract: We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $ε$-close to the optimal LQR solution, and our algorithm doe… ▽ More We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $ε$-close to the optimal LQR solution, and our algorithm does not require knowing a stabilizing control policy for initialization. Combined with the recent application of RHPG in learning the Kalman filter, we demonstrate the general applicability of RHPG in linear control and estimation with streamlined analyses. △ Less

Submitted 31 January, 2024; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2301.12624 [pdf, other]

Learning the Kalman Filter with Fine-Grained Sample Complexity

Authors: Xiangyuan Zhang, Bin Hu, Tamer Başar

Abstract: We develop the first end-to-end sample complexity of model-free policy gradient (PG) methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the receding-horizon policy gradient (RHPG-KF) framework and demonstrate $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity for RHPG-KF in learning a stabilizing filter that is $ε$-close to the optimal Kalman filter. Notably, the p… ▽ More We develop the first end-to-end sample complexity of model-free policy gradient (PG) methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the receding-horizon policy gradient (RHPG-KF) framework and demonstrate $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity for RHPG-KF in learning a stabilizing filter that is $ε$-close to the optimal Kalman filter. Notably, the proposed RHPG-KF framework does not require the system to be open-loop stable nor assume any prior knowledge of a stabilizing filter. Our results shed light on applying model-free PG methods to control a linear dynamical system where the state measurements could be corrupted by statistical noises and other (possibly adversarial) disturbances. △ Less

Submitted 27 February, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

Comments: To appear in ACC 2023

arXiv:2212.07534 [pdf, ps, other]

Decentralized Nonconvex Optimization with Guaranteed Privacy and Accuracy

Authors: Yongqiang Wang, Tamer Basar

Abstract: Privacy protection and nonconvexity are two challenging problems in decentralized optimization and learning involving sensitive data. Despite some recent advances addressing each of the two problems separately, no results have been reported that have theoretical guarantees on both privacy protection and saddle/maximum avoidance in decentralized nonconvex optimization. We propose a new algorithm fo… ▽ More Privacy protection and nonconvexity are two challenging problems in decentralized optimization and learning involving sensitive data. Despite some recent advances addressing each of the two problems separately, no results have been reported that have theoretical guarantees on both privacy protection and saddle/maximum avoidance in decentralized nonconvex optimization. We propose a new algorithm for decentralized nonconvex optimization that can enable both rigorous differential privacy and saddle/maximum avoiding performance. The new algorithm allows the incorporation of persistent additive noise to enable rigorous differential privacy for data samples, gradients, and intermediate optimization variables without losing provable convergence, and thus circumventing the dilemma of trading accuracy for privacy in differential privacy design. More interestingly, the algorithm is theoretically proven to be able to efficiently { guarantee accuracy by avoiding} convergence to local maxima and saddle points, which has not been reported before in the literature on decentralized nonconvex optimization. The algorithm is efficient in both communication (it only shares one variable in each iteration) and computation (it is encryption-free), and hence is promising for large-scale nonconvex optimization and learning involving high-dimensional optimization parameters. Numerical experiments for both a decentralized estimation problem and an Independent Component Analysis (ICA) problem confirm the effectiveness of the proposed approach. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted as a full paper to Automatica

arXiv:2212.02072 [pdf, ps, other]

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Authors: Leilei Cui, Tamer Başar, Zhong-Ping Jiang

Abstract: This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optim… ▽ More This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm. △ Less

Submitted 6 December, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: 27 Pages, 13 Figures

arXiv:2211.07937 [pdf, other]

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Authors: Yanli Liu, Kaiqing Zhang, Tamer Başar, Wotao Yin

Abstract: In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary poin… ▽ More In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible, with both global convergence and an efficient finite-sample complexity. △ Less

Submitted 16 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: NeurIPS 2020 (improve the proof of Lemma B.1 and Proposition G.1.)

Journal ref: Advances in Neural Information Processing Systems 33 (2020): 7624-7636

arXiv:2210.04810 [pdf, other]

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Authors: Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar

Abstract: Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synt… ▽ More Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), $\mathcal{H}_\infty$ control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: To Appear in Annual Review of Control, Robotics, and Autonomous Systems

arXiv:2210.00551 [pdf, ps, other]

Gradient-tracking based Distributed Optimization with Guaranteed Optimality under Noisy Information Sharing

Authors: Yongqiang Wang, Tamer Başar

Abstract: Distributed optimization enables networked agents to cooperatively solve a global optimization problem even with each participating agent only having access to a local partial view of the objective function. Despite making significant inroads, most existing results on distributed optimization rely on noise-free information sharing among the agents, which is problematic when communication channels… ▽ More Distributed optimization enables networked agents to cooperatively solve a global optimization problem even with each participating agent only having access to a local partial view of the objective function. Despite making significant inroads, most existing results on distributed optimization rely on noise-free information sharing among the agents, which is problematic when communication channels are noisy, messages are coarsely quantized, or shared information are obscured by additive noise for the purpose of achieving differential privacy. The problem of information-sharing noise is particularly pronounced in the state-of-the-art gradient-tracking based distributed optimization algorithms, in that information-sharing noise will accumulate with iterations on the gradient-tracking estimate of these algorithms, and the ensuing variance will even grow unbounded when the noise is persistent. This paper proposes a new gradient-tracking based distributed optimization approach that can avoid information-sharing noise from accumulating in the gradient estimation. The approach is applicable even when the {inter-agent interaction is} time-varying, which is key to enable the incorporation of a decaying factor in inter-agent interaction to gradually eliminate the influence of information-sharing noise. In fact, we rigorously prove that the proposed approach can ensure the almost sure convergence of all agents to the same optimal solution even in the presence of persistent information-sharing noise. The approach is applicable to general directed graphs. It is also capable of ensuring the almost sure convergence of all agents to an optimal solution when the gradients are noisy, which is common in machine learning applications. Numerical simulations confirm the effectiveness of the proposed approach. △ Less

Submitted 2 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE Transactions on Automatic Control as a full paper. arXiv admin note: text overlap with arXiv:2202.01113

arXiv:2209.12888 [pdf, ps, other]

Weighted Age of Information based Scheduling for Large Population Games on Networks

Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Tamer Başar

Abstract: In this paper, we consider a discrete-time multi-agent system involving $N$ cost-coupled networked rational agents solving a consensus problem and a central Base Station (BS), scheduling agent communications over a network. Due to a hard bandwidth constraint on the number of transmissions through the network, at most $R_d < N$ agents can concurrently access their state information through the netw… ▽ More In this paper, we consider a discrete-time multi-agent system involving $N$ cost-coupled networked rational agents solving a consensus problem and a central Base Station (BS), scheduling agent communications over a network. Due to a hard bandwidth constraint on the number of transmissions through the network, at most $R_d < N$ agents can concurrently access their state information through the network. Under standard assumptions on the information structure of the agents and the BS, we first show that the control actions of the agents are free of any dual effect, allowing for separation between estimation and control problems at each agent. Next, we propose a weighted age of information (WAoI) metric for the scheduling problem of the BS, where the weights depend on the estimation error of the agents. The BS aims to find the optimum scheduling policy that minimizes the WAoI, subject to the hard bandwidth constraint. Since this problem is NP hard, we first relax the hard constraint to a soft update rate constraint, and then compute an optimal policy for the relaxed problem by reformulating it into a Markov Decision Process (MDP). This then inspires a sub-optimal policy for the bandwidth constrained problem, which is shown to approach the optimal policy as $N \rightarrow \infty$. Next, we solve the consensus problem using the mean-field game framework wherein we first design decentralized control policies for a limiting case of the $N$-agent system (as $N \rightarrow \infty$). By explicitly constructing the mean-field system, we prove the existence and uniqueness of the mean-field equilibrium. Consequently, we show that the obtained equilibrium policies constitute an $ε$-Nash equilibrium for the finite agent system. Finally, we validate the performance of both the scheduling and the control policies through numerical simulations. △ Less

Submitted 26 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: This work has been submitted to IEEE for possible publication

arXiv:2209.04938 [pdf, ps, other]

Ensuring both Provable Convergence and Differential Privacy in Nash Equilibrium Seeking on Directed Graphs

Authors: Yongqiang Wang, Tamer Basar

Abstract: We study in this paper privacy protection in fully distributed Nash equilibrium seeking where a player can only access its own cost function and receive information from its immediate neighbors over a directed communication network. In view of the non-cooperative nature of the underlying decision-making process, it is imperative to protect the privacy of individual players in networked games when… ▽ More We study in this paper privacy protection in fully distributed Nash equilibrium seeking where a player can only access its own cost function and receive information from its immediate neighbors over a directed communication network. In view of the non-cooperative nature of the underlying decision-making process, it is imperative to protect the privacy of individual players in networked games when sensitive information is involved. We propose an approach that can achieve both accurate convergence and rigorous differential privacy with finite cumulative privacy budget in distributed Nash equilibrium seeking, which is in sharp contrast to existing differential-privacy solutions for networked games that have to trade convergence accuracy for differential privacy. The approach is applicable even when the communication graph is unbalanced and it does not require individual players to have any global structure information of the communication graph. Since the approach utilizes independent noises for privacy protection, it can combat adversaries having access to all shared messages in the network. It is also encryption-free, ensuring high efficiency in communication and computation. Numerical comparison results with existing counterparts confirm the effectiveness of the proposed approach. △ Less

Submitted 10 April, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Showing 1–50 of 217 results for author: Başar, T