-
The Lagrangian Method for Solving Constrained Markov Games
Authors:
Soham Das,
Santiago Paternain,
Luiz F. O. Chamon,
Ceyhun Eksin
Abstract:
We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for d…
▽ More
We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games
Authors:
Sarper Aydin,
Ceyhun Eksin
Abstract:
We propose networked policy gradient play for solving Markov potential games including continuous action and state spaces. In the decentralized algorithm, agents sample their actions from parametrized and differentiable policies that depend on the current state and other agents' policy parameters. During training, agents estimate their gradient information through two consecutive episodes, generat…
▽ More
We propose networked policy gradient play for solving Markov potential games including continuous action and state spaces. In the decentralized algorithm, agents sample their actions from parametrized and differentiable policies that depend on the current state and other agents' policy parameters. During training, agents estimate their gradient information through two consecutive episodes, generating unbiased estimators of reward and policy score functions. Using this information, agents compute the stochastic gradients of their policy functions and update their parameters accordingly. Additionally, they update their estimates of other agents' policy parameters based on the local estimates received through a time-varying communication network. In Markov potential games, there exists a potential value function among agents with gradients corresponding to the gradients of local value functions. Using this structure, we prove the almost sure convergence of joint policy parameters to stationary points of the potential value function. We also show that the convergence rate of the networked policy gradient algorithm is $\mathcal{O}(1/ε^2)$. Numerical experiments on a dynamic multi-agent newsvendor problem verify the convergence of local beliefs and gradients. It further shows that networked policy gradient play converges as fast as independent policy gradient updates, while collecting higher rewards.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost
Authors:
Khaled Nakhleh,
Ceyhun Eksin,
Sabit Ekin
Abstract:
This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temp…
▽ More
This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temporal difference (TD) policy evaluation step. We use the separable structure of the instantaneous cost to show that the policy improvement step follows a Boltzmann distribution that depends on the current value function estimate and the uncontrolled transition probabilities. This allows agents to compute the improved joint policy independently. We show that both the synchronous (entire state space evaluation) and asynchronous (a uniformly sampled set of substates) versions of the OPI scheme with finite policy evaluation rollout converge to the optimal value function and an optimal joint policy asymptotically. Simulation results on a multi-agent MDP with KL control cost variant of the Stag-Hare game validates our scheme's performance in terms of minimizing the cost return.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Learning graph-Fourier spectra of textured surface images for defect localization
Authors:
Tapan Ganatma Nakkina,
Adithyaa Karthikeyan,
Yuhao Zhong,
Ceyhun Eksin,
Satish T. S. Bukkapatnam
Abstract:
In the realm of industrial manufacturing, product inspection remains a significant bottleneck, with only a small fraction of manufactured items undergoing inspection for surface defects. Advances in imaging systems and AI can allow automated full inspection of manufactured surfaces. However, even the most contemporary imaging and machine learning methods perform poorly for detecting defects in ima…
▽ More
In the realm of industrial manufacturing, product inspection remains a significant bottleneck, with only a small fraction of manufactured items undergoing inspection for surface defects. Advances in imaging systems and AI can allow automated full inspection of manufactured surfaces. However, even the most contemporary imaging and machine learning methods perform poorly for detecting defects in images with highly textured backgrounds, that stem from diverse manufacturing processes. This paper introduces an approach based on graph Fourier analysis to automatically identify defective images, as well as crucial graph Fourier coefficients that inform the defects in images amidst highly textured backgrounds. The approach capitalizes on the ability of graph representations to capture the complex dynamics inherent in high-dimensional data, preserving crucial locality properties in a lower dimensional space. A convolutional neural network model (1D-CNN) was trained with the coefficients of the graph Fourier transform of the images as the input to identify, with classification accuracy of 99.4%, if the image contains a defect. An explainable AI method using SHAP (SHapley Additive exPlanations) was used to further analyze the trained 1D-CNN model to discern important spectral coefficients for each image. This approach sheds light on the crucial contribution of low-frequency graph eigen waveforms to precisely localize surface defects in images, thereby advancing the realization of zero-defect manufacturing.
△ Less
Submitted 1 December, 2023; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Robust Social Welfare Maximization via Information Design in Linear-Quadratic-Gaussian Games
Authors:
Furkan Sezer,
Ceyhun Eksin
Abstract:
Information design in an incomplete information game includes a designer with the goal of influencing players' actions through signals generated from a designed probability distribution so that its objective function is optimized. We consider a setting in which the designer has partial knowledge on agents' utilities. We address the uncertainty about players' preferences by formulating a robust inf…
▽ More
Information design in an incomplete information game includes a designer with the goal of influencing players' actions through signals generated from a designed probability distribution so that its objective function is optimized. We consider a setting in which the designer has partial knowledge on agents' utilities. We address the uncertainty about players' preferences by formulating a robust information design problem against worst case payoffs. If the players have quadratic payoffs that depend on the players' actions and an unknown payoff-relevant state, and signals on the state that follow a Gaussian distribution conditional on the state realization, then the information design problem under quadratic design objectives is a semidefinite program (SDP). Specifically, we consider ellipsoid perturbations over payoff coefficients in linear-quadratic-Gaussian (LQG) games. We show that this leads to a tractable robust SDP formulation. Numerical studies are carried out to identify the relation between the perturbation levels and the optimal information structures.
△ Less
Submitted 28 April, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Average submodularity of maximizing anticoordination in network games
Authors:
Soham Das,
Ceyhun Eksin
Abstract:
We consider the control of decentralized learning dynamics for agents in an anti-coordination network game. In the anti-coordination network game, there is a preferred action in the absence of neighbors' actions, and the utility an agent receives from the preferred action decreases as more of its neighbors select the preferred action, potentially causing the agent to select a less desirable action…
▽ More
We consider the control of decentralized learning dynamics for agents in an anti-coordination network game. In the anti-coordination network game, there is a preferred action in the absence of neighbors' actions, and the utility an agent receives from the preferred action decreases as more of its neighbors select the preferred action, potentially causing the agent to select a less desirable action. The decentralized dynamics that is based on the iterated elimination of dominated strategies converge for the considered game. Given a convergent action profile, we measure anti-coordination by the number of edges in the underlying graph that have at least one agent in either end of the edge not taking the preferred action. The maximum anti-coordination (MAC) problem seeks to find an optimal set of agents to control under a finite budget so that the overall network disconnect is maximized on game convergence as a result of the dynamics. We show that the MAC is submodular in expectation in dense bipartite networks for any realization of the utility constants in the population. Utilizing this result, we obtain a performance guarantee for the greedy agent selection algorithm for MAC. Finally, we provide a computational study to show the effectiveness of greedy node selection strategies to solve MAC on general bipartite networks.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Information Preferences of Individual Agents in Linear-Quadratic-Gaussian Network Games
Authors:
Furkan Sezer,
Ceyhun Eksin
Abstract:
We consider linear-quadratic-Gaussian (LQG) network games in which agents have quadratic payoffs that depend on their individual and neighbors' actions, and an unknown payoff-relevant state. An information designer determines the fidelity of information revealed to the agents about the payoff state to maximize the social welfare. Prior results show that full information disclosure is optimal under…
▽ More
We consider linear-quadratic-Gaussian (LQG) network games in which agents have quadratic payoffs that depend on their individual and neighbors' actions, and an unknown payoff-relevant state. An information designer determines the fidelity of information revealed to the agents about the payoff state to maximize the social welfare. Prior results show that full information disclosure is optimal under certain assumptions on the payoffs, i.e., it is beneficial for the average individual. In this paper, we provide conditions based on the strength of the dependence of payoffs on neighbors' actions, i.e., competition, under which a rational agent is expected to benefit, i.e., receive higher payoffs, from full information disclosure. We find that all agents benefit from information disclosure for the star network structure when the game is symmetric and submodular or supermodular. We also identify that the central agent benefits more than a peripheral agent from full information disclosure unless the competition is strong and the number of peripheral agents is small enough. Despite the fact that all agents expect to benefit from information disclosure ex-ante, a central agent can be worse-off from information disclosure in many realizations of the payoff state under strong competition, indicating that a risk-averse central agent can prefer uninformative signals ex-ante.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
A Market Mechanism for Trading Flexibility Between Interconnected Electricity Markets
Authors:
Hossein Khazaei,
Ceyhun Eksin,
Roohallah Khatami,
Alfredo Garcia
Abstract:
Electricity markets differ in their ability to meet power imbalances in short notice in a controlled fashion. Relatively flexible markets have the ability to ramp up (or down) power flows across interties without compromising their ability to reliably meet internal demand. In this paper, a market mechanism to enable flexibility trading amongst market operators is introduced. In the proposed market…
▽ More
Electricity markets differ in their ability to meet power imbalances in short notice in a controlled fashion. Relatively flexible markets have the ability to ramp up (or down) power flows across interties without compromising their ability to reliably meet internal demand. In this paper, a market mechanism to enable flexibility trading amongst market operators is introduced. In the proposed market mechanism, market operators exchange information regarding optimal terms of trade (nodal prices and flows) along interconnection lines at every trading round. Equipped with this information, each market operator then independently solves its own internal chance-constrained economic dispatch problem and broadcasts the updated optimal terms of trade for flows across markets. We show the proposed decentralized market mechanism for flexibility trading converges to a Nash equilibrium of the intraday market coupling game, i.e. a combination of internal market clearing solutions (one for each participating market) and flows and prices along interconnection lines so that no individual market operator has an incentive to modify its own internal solution and/or the terms of trade along interties. For a specific class of chance constraints, we show that the limiting equilibrium outcome is efficient, i.e. it corresponds to the solution of the single market clearing problem for all participating markets. The proposed market mechanism is illustrated with an application to the three-area IEEE Reliability Test System.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Decentralized Fictitious Play Converges Near a Nash Equilibrium in Near-Potential Games
Authors:
Sarper Aydin,
Sina Arefizadeh,
Ceyhun Eksin
Abstract:
We investigate convergence of decentralized fictitious play (DFP) in near-potential games, wherein agents preferences can almost be captured by a potential function. In DFP agents keep local estimates of other agents' empirical frequencies, best-respond against these estimates, and receive information over a time-varying communication network. We prove that empirical frequencies of actions generat…
▽ More
We investigate convergence of decentralized fictitious play (DFP) in near-potential games, wherein agents preferences can almost be captured by a potential function. In DFP agents keep local estimates of other agents' empirical frequencies, best-respond against these estimates, and receive information over a time-varying communication network. We prove that empirical frequencies of actions generated by DFP converge around a single Nash Equilibrium (NE) assuming that there are only finitely many Nash equilibria, and the difference in utility functions resulting from unilateral deviations is close enough to the difference in the potential function values. This result assures that DFP has the same convergence properties of standard Fictitious play (FP) in near-potential games.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks
Authors:
Sarper Aydın,
Ceyhun Eksin
Abstract:
Multiple autonomous agents interact over a random communication network to maximize their individual utility functions which depend on the actions of other agents. We consider decentralized best-response with inertia type algorithms in which agents form beliefs about the future actions of other players based on local information, and take an action that maximizes their expected utility computed wi…
▽ More
Multiple autonomous agents interact over a random communication network to maximize their individual utility functions which depend on the actions of other agents. We consider decentralized best-response with inertia type algorithms in which agents form beliefs about the future actions of other players based on local information, and take an action that maximizes their expected utility computed with respect to these beliefs or continue to take their previous action. We show convergence of these types of algorithms to a Nash equilibrium in weakly acyclic games under the condition that the belief update and information exchange protocols successfully learn the actions of other players with positive probability in finite time given a static environment, i.e., when other agents' actions do not change. We design a decentralized fictitious play algorithm with voluntary and limited communication (DFP-VL) protocols that satisfy this condition. In the voluntary communication protocol, each agent decides whom to exchange information with by assessing the novelty of its information and the potential effect of its information on others' assessments of their utility functions. The limited communication protocol entails agents sending only their most frequent action to agents that they decide to communicate with. Numerical experiments on a target assignment game demonstrate that the voluntary and limited communication protocol can more than halve the number of communication attempts while retaining the same convergence rate as DFP in which agents constantly attempt to communicate.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
Decentralized Fictitious Play in Near-Potential Games with Time-Varying Communication Networks
Authors:
Sarper Aydın,
Sina Arefizadeh,
Ceyhun Eksin
Abstract:
We study the convergence properties of decentralized fictitious play (DFP) for the class of near-potential games where the incentives of agents are nearly aligned with a potential function. In DFP, agents share information only with their current neighbors in a sequence of time-varying networks, keep estimates of other agents' empirical frequencies, and take actions to maximize their expected util…
▽ More
We study the convergence properties of decentralized fictitious play (DFP) for the class of near-potential games where the incentives of agents are nearly aligned with a potential function. In DFP, agents share information only with their current neighbors in a sequence of time-varying networks, keep estimates of other agents' empirical frequencies, and take actions to maximize their expected utility functions computed with respect to the estimated empirical frequencies. We show that empirical frequencies of actions converge to a set of strategies with potential function values that are larger than the potential function values obtained by approximate Nash equilibria of the closest potential game. This result establishes that DFP has identical convergence guarantees in near-potential games as the standard fictitious play in which agents observe the past actions of all the other agents.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Maximizing Social Welfare and Agreement via Information Design in Linear-Quadratic-Gaussian Games
Authors:
Furkan Sezer,
Hossein Khazaei,
Ceyhun Eksin
Abstract:
We consider linear-quadratic Gaussian (LQG) games in which players have quadratic payoffs that depend on the players' actions and an unknown payoff-relevant state, and signals on the state that follow a Gaussian distribution conditional on the state realization. An information designer decides the fidelity of information revealed to the players in order to maximize the social welfare of the player…
▽ More
We consider linear-quadratic Gaussian (LQG) games in which players have quadratic payoffs that depend on the players' actions and an unknown payoff-relevant state, and signals on the state that follow a Gaussian distribution conditional on the state realization. An information designer decides the fidelity of information revealed to the players in order to maximize the social welfare of the players or reduce the disagreement among players' actions. Leveraging the semi-definiteness of the information design problem, we derive analytical solutions for these objectives under specific LQG games. We show that full information disclosure maximizes social welfare when there is a common payoff-relevant state, when there is strategic substitutability in the actions of players, or when the signals are public. Numerical results show that as strategic substitution increases, the value of the information disclosure increases. When the objective is to induce conformity among players' actions, hiding information is optimal. Lastly, we consider the information design objective that is a weighted combination of social welfare and cohesiveness of players' actions. We obtain an interval for the weights where full information disclosure is optimal under public signals for games with strategic substitutability. Numerical solutions show that the actual interval where full information disclosure is optimal gets close to the analytical interval obtained as substitution increases.
△ Less
Submitted 26 February, 2023; v1 submitted 25 February, 2021;
originally announced February 2021.
-
An Iterative Mechanism for Coupling Electricity Markets
Authors:
Alfredo Garcia,
Roohallah Khatami,
Ceyhun Eksin,
Furkan Sezer
Abstract:
The coordinated operation of interconnected but locally controlled electricity markets is generally referred to as a "coupling". In this paper we propose a new mechanism design for efficient coupling of independent electricity markets. The mechanism operates after each individual market has settled (e.g. hour-ahead) and based upon the reported supply and demand functions for internal market optimi…
▽ More
The coordinated operation of interconnected but locally controlled electricity markets is generally referred to as a "coupling". In this paper we propose a new mechanism design for efficient coupling of independent electricity markets. The mechanism operates after each individual market has settled (e.g. hour-ahead) and based upon the reported supply and demand functions for internal market optimization (clearing), each market operator is asked to iteratively quote the terms of energy trade (on behalf of the agents participating in its market) across the transmission lines connecting to other markets. The mechanism is scalable as the informational demands placed on each market operator at each iteration are limited. We show that the mechanism's outcome converges to the optimal flows between markets given the reported supply and demand functions from each individual market clearing. We show the proposed market coupling design does not alter the structure of incentives in each internal market, i.e., any internal market equilibrium will remain so (approximately) after coupling is implemented. This is achieved via incentive transfers (updated at each iteration) that remunerate each market with its marginal contribution (i.e. cost savings) to all other participating markets. We identify a sufficient condition on a uniform participation fee for each market operator ensuring the mechanism incurs no deficit. The proposed decentralized mechanism is implemented on the three-area IEEE Reliability Test System where the simulation results showcase the efficiency of proposed model.
△ Less
Submitted 12 January, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
A Best-Response Algorithm with Voluntary Communication and Mobility Protocols for Mobile Autonomous Teams Solving the Target Assignment Problem
Authors:
Sarper Aydin,
Ceyhun Eksin
Abstract:
We consider a team of mobile autonomous robots with the aim to cover a given set of targets. Each robot aims to select a target to cover and physically reach it by the final time in coordination with other robots given the locations of targets. Robots are unaware of which targets other robots intend to cover. Each robot can control its mobility and who to send information to. We assume communicati…
▽ More
We consider a team of mobile autonomous robots with the aim to cover a given set of targets. Each robot aims to select a target to cover and physically reach it by the final time in coordination with other robots given the locations of targets. Robots are unaware of which targets other robots intend to cover. Each robot can control its mobility and who to send information to. We assume communication happens over a wireless channel that is subject to fading and failures. Given the setup, we propose a decentralized algorithm based on decentralized fictitious play in which robots reason about the selections and locations of other robots to decide which target to select, whether to communicate or not, who to communicate with, and where to move. Specifically, the communication actions of the robots are learning-aware, and their mobility actions are sensitive to the success probability of communication. We show that the decentralized algorithm guarantees that robots will cover their targets in finite time. Numerical simulations and experiments using a team of mobile robots confirm the target coverage in finite time and show that mobility control for communication and learning-aware voluntary communication protocols reduce the number of communication attempts in comparison to a benchmark distributed algorithm that relies on communication after every decision epoch.
△ Less
Submitted 22 November, 2021; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Optimal evolutionary control for artificial selection on molecular phenotypes
Authors:
Armita Nourmohammad,
Ceyhun Eksin
Abstract:
Controlling an evolving population is an important task in modern molecular genetics, including directed evolution for improving the activity of molecules and enzymes, in breeding experiments in animals and in plants, and in devising public health strategies to suppress evolving pathogens. An optimal intervention to direct evolution should be designed by considering its impact over an entire stoch…
▽ More
Controlling an evolving population is an important task in modern molecular genetics, including directed evolution for improving the activity of molecules and enzymes, in breeding experiments in animals and in plants, and in devising public health strategies to suppress evolving pathogens. An optimal intervention to direct evolution should be designed by considering its impact over an entire stochastic evolutionary trajectory that follows. As a result, a seemingly suboptimal intervention at a given time can be globally optimal as it can open opportunities for desirable actions in the future. Here, we propose a feedback control formalism to devise globally optimal artificial selection protocol to direct the evolution of molecular phenotypes. We show that artificial selection should be designed to counter evolutionary tradeoffs among multi-variate phenotypes to avoid undesirable outcomes in one phenotype by imposing selection on another. Control by artificial selection is challenged by our ability to predict molecular evolution. We develop an information theoretical framework and show that molecular time-scales for evolution under natural selection can inform how to monitor a population in order to acquire sufficient predictive information for an effective intervention with artificial selection. Our formalism opens a new avenue for devising artificial selection methods for directed evolution of molecular functions.
△ Less
Submitted 12 January, 2021; v1 submitted 31 December, 2019;
originally announced December 2019.
-
Distributed Fictitious Play in Potential Games with Time-Varying Communication Networks
Authors:
Sina Arefizadeh,
Ceyhun Eksin
Abstract:
We propose a distributed algorithm for multiagent systems that aim to optimize a common objective when agents differ in their estimates of the objective-relevant state of the environment. Each agent keeps an estimate of the environment and a model of the behavior of other agents. The model of other agents' behavior assumes agents choose their actions randomly based on a stationary distribution det…
▽ More
We propose a distributed algorithm for multiagent systems that aim to optimize a common objective when agents differ in their estimates of the objective-relevant state of the environment. Each agent keeps an estimate of the environment and a model of the behavior of other agents. The model of other agents' behavior assumes agents choose their actions randomly based on a stationary distribution determined by the empirical frequencies of past actions. At each step, each agent takes the action that maximizes its expectation of the common objective computed with respect to its estimate of the environment and its model of others. We propose a weighted averaging rule with non-doubly stochastic weights for agents to estimate the empirical frequency of past actions of all other agents by exchanging their estimates with their neighbors over a time-varying communication network. Under this averaging rule, we show agents' estimates converge to the actual empirical frequencies fast enough. This implies convergence of actions to a Nash equilibrium of the game with identical payoffs given by the expectation of the common objective with respect to an asymptotically agreed estimate of the state of the environment.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Distributed Networked Learning with Correlated Data
Authors:
Lingzhou Hong,
Alfredo Garcia,
Ceyhun Eksin
Abstract:
We consider a distributed estimation method in a setting with heterogeneous streams of correlated data distributed across nodes in a network. In the considered approach, linear models are estimated locally (i.e., with only local data) subject to a network regularization term that penalizes a local model that differs from neighboring models. We analyze computation dynamics (associated with stochast…
▽ More
We consider a distributed estimation method in a setting with heterogeneous streams of correlated data distributed across nodes in a network. In the considered approach, linear models are estimated locally (i.e., with only local data) subject to a network regularization term that penalizes a local model that differs from neighboring models. We analyze computation dynamics (associated with stochastic gradient updates) and information exchange (associated with exchanging current models with neighboring nodes). We provide a finite-time characterization of convergence of the weighted ensemble average estimate and compare this result to federated learning, an alternative approach to estimation wherein a single model is updated by locally generated gradient updates. This comparison highlights the trade-off between speed vs precision: while model updates take place at a faster rate in federated learning, the proposed networked approach to estimation enables the identification of models with higher precision. We illustrate the method's general applicability in two examples: estimating a Markov random field using wireless sensor networks and modeling prey escape behavior of flocking birds based on a publicly available dataset.
△ Less
Submitted 9 February, 2021; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Control of learning in anti-coordination network games
Authors:
Ceyhun Eksin,
Keith Paarporn
Abstract:
We consider control of heterogeneous players repeatedly playing an anti-coordination network game. In an anti-coordination game, each player has an incentive to differentiate its action from its neighbors. At each round of play, players take actions according to a learning algorithm that mimics the iterated elimination of strictly dominated strategies. We show that the learning dynamics may fail t…
▽ More
We consider control of heterogeneous players repeatedly playing an anti-coordination network game. In an anti-coordination game, each player has an incentive to differentiate its action from its neighbors. At each round of play, players take actions according to a learning algorithm that mimics the iterated elimination of strictly dominated strategies. We show that the learning dynamics may fail to reach anti-coordination in certain scenarios. We formulate an optimization problem with the objective to reach maximum anti-coordination while minimizing the number of players to control. We consider both static and dynamic control policy formulations. Relating the problem to a minimum vertex cover problem on bipartite networks, we develop a feasible dynamic policy that is efficient to compute. Solving for optimal policies on benchmark networks show that the vertex cover based policy can be a loose upper bound when there is a potential to make use of cascades caused by the learning dynamics of uncontrolled players. We propose an algorithm that finds feasible, though possibly suboptimal, policies by sequentially adding players to control considering their cascade potential. Numerical experiments on random networks show the cascade-based algorithm can lower the control effort significantly compared to simpler control schemes.
△ Less
Submitted 11 December, 2018; v1 submitted 8 December, 2018;
originally announced December 2018.
-
Optimal control policies for evolutionary dynamics with environmental feedback
Authors:
Keith Paarporn,
Ceyhun Eksin,
Joshua S. Weitz,
Yorai Wardi
Abstract:
We study a dynamical model of a population of cooperators and defectors whose actions have long-term consequences on environmental "commons" - what we term the "resource". Cooperators contribute to restoring the resource whereas defectors degrade it. The population dynamics evolve according to a replicator equation coupled with an environmental state. Our goal is to identify methods of influencing…
▽ More
We study a dynamical model of a population of cooperators and defectors whose actions have long-term consequences on environmental "commons" - what we term the "resource". Cooperators contribute to restoring the resource whereas defectors degrade it. The population dynamics evolve according to a replicator equation coupled with an environmental state. Our goal is to identify methods of influencing the population with the objective to maximize accumulation of the resource. In particular, we consider strategies that modify individual-level incentives. We then extend the model to incorporate a public opinion state that imperfectly tracks the true environmental state, and study strategies that influence opinion. We formulate optimal control problems and solve them using numerical techniques to characterize locally optimal control policies for three problem formulations: 1) control of incentives, and control of opinions through 2) propaganda-like strategies and 3) awareness campaigns. We show numerically that the resulting controllers in all formulations achieve the objective, albeit with an unintended consequence. The resulting dynamics include cycles between low and high resource states - a dynamical regime termed an "oscillating tragedy of the commons". This outcome may have desirable average properties, but includes risks to resource depletion. Our findings suggest the need for new approaches to controlling coupled population-environment dynamics.
△ Less
Submitted 18 March, 2018;
originally announced March 2018.
-
Networked SIS Epidemics with Awareness
Authors:
Keith Paarporn,
Ceyhun Eksin,
Joshua S. Weitz,
Jeff S. Shamma
Abstract:
We study an SIS epidemic process over a static contact network where the nodes have partial information about the epidemic state. They react by limiting their interactions with their neighbors when they believe the epidemic is currently prevalent. A node's awareness is weighted by the fraction of infected neighbors in their social network, and a global broadcast of the fraction of infected nodes i…
▽ More
We study an SIS epidemic process over a static contact network where the nodes have partial information about the epidemic state. They react by limiting their interactions with their neighbors when they believe the epidemic is currently prevalent. A node's awareness is weighted by the fraction of infected neighbors in their social network, and a global broadcast of the fraction of infected nodes in the entire network. The dynamics of the benchmark (no awareness) and awareness models are described by discrete-time Markov chains, from which mean-field approximations (MFA) are derived. The states of the MFA are interpreted as the nodes' probabilities of being infected. We show a sufficient condition for existence of a "metastable", or endemic, state of the awareness model coincides with that of the benchmark model. Furthermore, we use a coupling technique to give a full stochastic comparison analysis between the two chains, which serves as a probabilistic analogue to the MFA analysis. In particular, we show that adding awareness reduces the expectation of any epidemic metric on the space of sample paths, e.g. eradication time or total infections. We characterize the reduction in expectations in terms of the coupling distribution. In simulations, we evaluate the effect social distancing has on contact networks from different random graph families (geometric, Erdős-Renyi, and scale-free random networks).
△ Less
Submitted 12 July, 2016; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Distributed Inertial Best-Response Dynamics
Authors:
Brian Swenson,
Ceyhun Eksin,
Soummya Kar,
Alejandro Ribeiro
Abstract:
The note considers the problem of computing pure Nash equilibrium (NE) strategies in distributed (i.e., network-based) settings. The paper studies a class of inertial best response dynamics based on the fictitious play (FP) algorithm. It is shown that inertial best response dynamics are robust to informational limitations common in distributed settings. Fully distributed variants of FP with inerti…
▽ More
The note considers the problem of computing pure Nash equilibrium (NE) strategies in distributed (i.e., network-based) settings. The paper studies a class of inertial best response dynamics based on the fictitious play (FP) algorithm. It is shown that inertial best response dynamics are robust to informational limitations common in distributed settings. Fully distributed variants of FP with inertia and joint strategy FP with inertia are developed and convergence is proven to the set of pure NE. The distributed algorithms rely on consensus methods. Results are validated using numerical simulations.
△ Less
Submitted 3 April, 2018; v1 submitted 2 May, 2016;
originally announced May 2016.
-
Disease dynamics on a network game: a little empathy goes a long way
Authors:
Ceyhun Eksin,
Jeff S. Shamma,
Joshua S. Weitz
Abstract:
Individuals change their behavior during an epidemic in response to whether they and/or those they interact with are healthy or sick. Healthy individuals are concerned about contracting a disease from their sick contacts and may utilize protective measures. Sick individuals may be concerned with spreading the disease to their healthy contacts and adopt preemptive measures. Yet, in practice both pr…
▽ More
Individuals change their behavior during an epidemic in response to whether they and/or those they interact with are healthy or sick. Healthy individuals are concerned about contracting a disease from their sick contacts and may utilize protective measures. Sick individuals may be concerned with spreading the disease to their healthy contacts and adopt preemptive measures. Yet, in practice both protective and preemptive changes in behavior come with costs. This paper proposes a stochastic network disease game model that captures the self-interests of individuals during the spread of a susceptible-infected-susceptible (SIS) disease where individuals react to current risk of disease spread, and their reactions together with the current state of the disease stochastically determine the next stage of the disease. We show that there is a critical level of concern, i.e., empathy, by the sick individuals above which disease is eradicated fast. Furthermore, we find that if the network and disease parameters are above the epidemic threshold, the risk averse behavior by the healthy individuals cannot eradicate the disease without the preemptive measures of the sick individuals. This imbalance in the role played by the response of the infected versus the susceptible individuals in disease eradication affords critical policy insights.
△ Less
Submitted 15 April, 2016; v1 submitted 12 April, 2016;
originally announced April 2016.
-
Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information
Authors:
Ceyhun Eksin,
Alejandro Ribeiro
Abstract:
A multi-agent system operates in an uncertain environment about which agents have different and time varying beliefs that, as time progresses, converge to a common belief. A global utility function that depends on the realized state of the environment and actions of all the agents determines the system's optimal behavior. We define the asymptotically optimal action profile as an equilibrium of the…
▽ More
A multi-agent system operates in an uncertain environment about which agents have different and time varying beliefs that, as time progresses, converge to a common belief. A global utility function that depends on the realized state of the environment and actions of all the agents determines the system's optimal behavior. We define the asymptotically optimal action profile as an equilibrium of the potential game defined by considering the expected utility with respect to the asymptotic belief. At finite time, however, agents have not entirely congruous beliefs about the state of the environment and may select conflicting actions. This paper proposes a variation of the fictitious play algorithm which is proven to converge to equilibrium actions if the state beliefs converge to a common distribution at a rate that is at least linear. In conventional fictitious play, agents build beliefs on others' future behavior by computing histograms of past actions and best respond to their expected payoffs integrated with respect to these histograms. In the variations developed here histograms are built using knowledge of actions taken by nearby nodes and best responses are further integrated with respect to the local beliefs on the state of the environment. We exemplify the use of the algorithm in coordination and target covering games.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
Demand Response with Communicating Rational Consumers
Authors:
Ceyhun Eksin,
Hakan Delic,
Alejandro Ribeiro
Abstract:
The performance of an energy system under a real-time pricing mechanism depends on the consumption behavior of its customers, which involves uncertainties. In this paper, we consider a system operator that charges its customers with a real-time price that depends on the total realized consumption. Customers have unknown and heterogeneous consumption preferences. We propose behavior models in which…
▽ More
The performance of an energy system under a real-time pricing mechanism depends on the consumption behavior of its customers, which involves uncertainties. In this paper, we consider a system operator that charges its customers with a real-time price that depends on the total realized consumption. Customers have unknown and heterogeneous consumption preferences. We propose behavior models in which customers act selfishly, altruistically or as welfare-maximizers. In addition, we consider information models where customers keep their consumption levels private, communicate with a neighboring set of customers, or receive broadcasted demand from the operator. Our analysis focuses on the dispersion of the system performance under different consumption models. To this end, for each pair of behavior and information model we define and characterize optimal rational behavior, and provide a local algorithm that can be implemented by the consumption scheduler devices. Analytical comparisons of the two extreme information models, namely, private and complete information models, show that communication model reduces demand uncertainty while having negligible effect on aggregate consumer utility and welfare. In addition, we show the impact of real-time price policy parameters have on the expected welfare loss due to selfish behavior affording critical policy insights.
△ Less
Submitted 26 September, 2016; v1 submitted 18 November, 2015;
originally announced November 2015.
-
Bayesian Quadratic Network Game Filters
Authors:
Ceyhun Eksin,
Pooya Molavi,
Alejandro Ribeiro,
Ali Jadbabaie
Abstract:
A repeated network game where agents have quadratic utilities that depend on information externalities -- an unknown underlying state -- as well as payoff externalities -- the actions of all other agents in the network -- is considered. Agents play Bayesian Nash Equilibrium strategies with respect to their beliefs on the state of the world and the actions of all other nodes in the network. These b…
▽ More
A repeated network game where agents have quadratic utilities that depend on information externalities -- an unknown underlying state -- as well as payoff externalities -- the actions of all other agents in the network -- is considered. Agents play Bayesian Nash Equilibrium strategies with respect to their beliefs on the state of the world and the actions of all other nodes in the network. These beliefs are refined over subsequent stages based on the observed actions of neighboring peers. This paper introduces the Quadratic Network Game (QNG) filter that agents can run locally to update their beliefs, select corresponding optimal actions, and eventually learn a sufficient statistic of the network's state. The QNG filter is demonstrated on a Cournot market competition game and a coordination game to implement navigation of an autonomous team.
△ Less
Submitted 1 February, 2013;
originally announced February 2013.