-
Welfare Measure for Resource Allocation with Algorithmic Implementation: Beyond Average and Max-Min
Authors:
Ezra Tampubolon,
Holger Boche
Abstract:
In this work, we propose an axiomatic approach for measuring the performance/welfare of a system consisting of concurrent agents in a resource-driven system. Our approach provides a unifying view on popular system optimality principles, such as the maximal average/total utilities and the max-min fairness. Moreover, it gives rise to other system optimality notions that have not been fully exploited…
▽ More
In this work, we propose an axiomatic approach for measuring the performance/welfare of a system consisting of concurrent agents in a resource-driven system. Our approach provides a unifying view on popular system optimality principles, such as the maximal average/total utilities and the max-min fairness. Moreover, it gives rise to other system optimality notions that have not been fully exploited yet, such as the maximal lowest total subgroup utilities. For the axiomatically defined welfare measures, we provide a generic gradient-based method to find an optimal resource allocation and present a theoretical guarantee for its success. Lastly, we demonstrate the power of our approach through the power control application in wireless networks.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality
Authors:
Ezra Tampubolon,
Haris Ceribasic,
Holger Boche
Abstract:
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underl…
▽ More
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
△ Less
Submitted 22 January, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Coordinated Online Learning for Multi-Agent Systems with Coupled Constraints and Perturbed Utility Observations
Authors:
Ezra Tampubolon,
Holger Boche
Abstract:
Competitive non-cooperative online decision-making agents whose actions increase congestion of scarce resources constitute a model for widespread modern large-scale applications. To ensure sustainable resource behavior, we introduce a novel method to steer the agents toward a stable population state, fulfilling the given coupled resource constraints. The proposed method is a decentralized resource…
▽ More
Competitive non-cooperative online decision-making agents whose actions increase congestion of scarce resources constitute a model for widespread modern large-scale applications. To ensure sustainable resource behavior, we introduce a novel method to steer the agents toward a stable population state, fulfilling the given coupled resource constraints. The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian. Assuming that the online learning agents have only noisy first-order utility feedback, we show that for a polynomially decaying agents' step size/learning rate, the population's dynamic will almost surely converge to generalized Nash equilibrium. A particular consequence of the latter is the fulfillment of resource constraints in the asymptotic limit. Moreover, we investigate the finite-time quality of the proposed algorithm by giving a nonasymptotic time decaying bound for the expected amount of resource constraint violation.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Resource-Aware Control via Dynamic Pricing for Congestion Game with Finite-Time Guarantees
Authors:
Ezra Tampubolon,
Haris Ceribasic,
Holger Boche
Abstract:
Congestion game is a widely used model for modern networked applications. A central issue in such applications is that the selfish behavior of the participants may result in resource overloading and negative externalities for the system participants. In this work, we propose a pricing mechanism that guarantees the sub-linear increase of the time-cumulative violation of the resource load constraint…
▽ More
Congestion game is a widely used model for modern networked applications. A central issue in such applications is that the selfish behavior of the participants may result in resource overloading and negative externalities for the system participants. In this work, we propose a pricing mechanism that guarantees the sub-linear increase of the time-cumulative violation of the resource load constraints. The feature of our method is that it is resource-centric in the sense that it depends on the congestion state of the resources and not on specific characteristics of the system participants. This feature makes our mechanism scalable, flexible, and privacy-preserving. Moreover, we show by numerical simulations that our pricing mechanism has no significant effect on the agents' welfare in contrast to the improvement of the capacity violation.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Pricing Mechanism for Resource Sustainability in Competitive Online Learning Multi-Agent Systems
Authors:
Ezra Tampubolon,
Holger Boche
Abstract:
In this paper, we consider the problem of resource congestion control for competing online learning agents. On the basis of non-cooperative game as the model for the interaction between the agents, and the noisy online mirror ascent as the model for rational behavior of the agents, we propose a novel pricing mechanism which gives the agents incentives for sustainable use of the resources. Our mech…
▽ More
In this paper, we consider the problem of resource congestion control for competing online learning agents. On the basis of non-cooperative game as the model for the interaction between the agents, and the noisy online mirror ascent as the model for rational behavior of the agents, we propose a novel pricing mechanism which gives the agents incentives for sustainable use of the resources. Our mechanism is distributed and resource-centric, in the sense that it is done by the resources themselves and not by a centralized instance, and that it is based rather on the congestion state of the resources than the preferences of the agents. In case that the noise is persistent, and for several choices of the intrinsic parameter of the agents, such as their learning rate, and of the mechanism parameters, such as the learning rate of -, the progressivity of the price-setters, and the extrinsic price sensitivity of the agents, we show that the accumulative violation of the resource constraints of the resulted iterates is sub-linear w.r.t. the time horizon. Moreover, we provide numerical simulations to support our theoretical findings.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Robust Online Learning for Resource Allocation -- Beyond Euclidean Projection and Dynamic Fit
Authors:
Ezra Tampubolon,
Holger Boche
Abstract:
Online-learning literature has focused on designing algorithms that ensure sub-linear growth of the cumulative long-term constraint violations. The drawback of this guarantee is that strictly feasible actions may cancel out constraint violations on other time slots. For this reason, we introduce a new performance measure called $\hCFit$, whose particular instance is the cumulative positive part of…
▽ More
Online-learning literature has focused on designing algorithms that ensure sub-linear growth of the cumulative long-term constraint violations. The drawback of this guarantee is that strictly feasible actions may cancel out constraint violations on other time slots. For this reason, we introduce a new performance measure called $\hCFit$, whose particular instance is the cumulative positive part of the constraint violations. We propose a class of non-causal algorithms for online-decision making, which guarantees, in slowly changing environments, sub-linear growth of this quantity despite noisy first-order feedback. Furthermore, we demonstrate by numerical experiments the performance gain of our method relative to the state of art.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Semi-Decentralized Coordinated Online Learning for Continuous Games with Coupled Constraints via Augmented Lagrangian
Authors:
Ezra Tampubolon,
Holger Boche
Abstract:
We consider a class of concave continuous games in which the corresponding admissible strategy profile of each player underlies affine coupling constraints. We propose a novel algorithm that leads the relevant population dynamic toward Nash equilibrium. This algorithm is based on a mirror ascent algorithm, which suits with the framework of no-regret online learning, and on the augmented Lagrangian…
▽ More
We consider a class of concave continuous games in which the corresponding admissible strategy profile of each player underlies affine coupling constraints. We propose a novel algorithm that leads the relevant population dynamic toward Nash equilibrium. This algorithm is based on a mirror ascent algorithm, which suits with the framework of no-regret online learning, and on the augmented Lagrangian method. The decentralization aspect of the algorithm corresponds to the aspects that the iterate of each player requires the local information about how she contributes to the coupling constraints and the price vector broadcasted by a central coordinator. So each player needs not know about the population action. Moreover, no specific control by the central primary coordinator is required. We give a condition on the step sizes and the degree of the augmentation of the Lagrangian, such that the proposed algorithm converges to a generalized Nash equilibrium.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.