-
Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
Authors:
Patrick Emami,
Xiangyu Zhang,
David Biagioni,
Ahmed S. Zamzam
Abstract:
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-worl…
▽ More
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning
Authors:
Daniel Tabas,
Ahmed S. Zamzam,
Baosen Zhang
Abstract:
Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First…
▽ More
Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First, we show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. However, by making simple modifications to the penalty term, we can enforce meaningful probabilistic (chance and conditional value at risk) constraints. Second, we quantify the effect of the penalty term on the value function, uncovering an improved value estimation procedure. We use these insights to propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulations in a simple constrained multiagent environment affirm that our reinterpretation of the primal-dual method in terms of probabilistic constraints is effective, and that our proposed value estimate accelerates convergence to a safe joint policy.
△ Less
Submitted 26 April, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems
Authors:
David Biagioni,
Xiangyu Zhang,
Dylan Wald,
Deepthi Vaidhynathan,
Rohit Chintala,
Jennifer King,
Ahmed S. Zamzam
Abstract:
We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the enviro…
▽ More
We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the environments themselves, especially in the context of heterogeneous (composite, multi-device) power systems where power flow solutions are required to define grid-level variables and costs. PowerGridworld is an open-source software package that helps to fill this gap. To highlight PowerGridworld's key features, we present two case studies and demonstrate learning MARL policies using both OpenAI's multi-agent deep deterministic policy gradient (MADDPG) and RLLib's proximal policy optimization (PPO) algorithms. In both cases, at least some subset of agents incorporates elements of the power flow solution at each time step as part of their reward (negative cost) structures.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets
Authors:
Trager Joswig-Jones,
Kyri Baker,
Ahmed S. Zamzam
Abstract:
Increasing levels of renewable generation motivate a growing interest in data-driven approaches for AC optimal power flow (AC OPF) to manage uncertainty; however, a lack of disciplined dataset creation and benchmarking prohibits useful comparison among approaches in the literature. To instill confidence, models must be able to reliably predict solutions across a wide range of operating conditions.…
▽ More
Increasing levels of renewable generation motivate a growing interest in data-driven approaches for AC optimal power flow (AC OPF) to manage uncertainty; however, a lack of disciplined dataset creation and benchmarking prohibits useful comparison among approaches in the literature. To instill confidence, models must be able to reliably predict solutions across a wide range of operating conditions. This paper develops the OPF-Learn package for Julia and Python, which uses a computationally efficient approach to create representative datasets that span a wide spectrum of the AC OPF feasible region. Load profiles are uniformly sampled from a convex set that contains the AC OPF feasible set. For each infeasible point found, the convex set is reduced using infeasibility certificates, found by using properties of a relaxed formulation. The framework is shown to generate datasets that are more representative of the entire feasible space versus traditional techniques seen in the literature, improving machine learning model performance.
△ Less
Submitted 3 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Towards Quantifying the Carbon Emissions of Differentially Private Machine Learning
Authors:
Rakshit Naidu,
Harshita Diddee,
Ajinkya Mulay,
Aleti Vardhan,
Krithika Ramesh,
Ahmed Zamzam
Abstract:
In recent years, machine learning techniques utilizing large-scale datasets have achieved remarkable performance. Differential privacy, by means of adding noise, provides strong privacy guarantees for such learning algorithms. The cost of differential privacy is often a reduced model accuracy and a lowered convergence speed. This paper investigates the impact of differential privacy on learning al…
▽ More
In recent years, machine learning techniques utilizing large-scale datasets have achieved remarkable performance. Differential privacy, by means of adding noise, provides strong privacy guarantees for such learning algorithms. The cost of differential privacy is often a reduced model accuracy and a lowered convergence speed. This paper investigates the impact of differential privacy on learning algorithms in terms of their carbon footprint due to either longer run-times or failed experiments. Through extensive experiments, further guidance is provided on choosing the noise levels which can strike a balance between desired privacy levels and reduced carbon emissions.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
PHASED: Phase-Aware Submodularity-Based Energy Disaggregation
Authors:
Faisal M. Almutairi,
Aritra Konar,
Ahmed S. Zamzam,
Nicholas D. Sidiropoulos
Abstract:
Energy disaggregation is the task of discerning the energy consumption of individual appliances from aggregated measurements, which holds promise for understanding and reducing energy usage. In this paper, we propose PHASED, an optimization approach for energy disaggregation that has two key features: PHASED (i) exploits the structure of power distribution systems to make use of readily available…
▽ More
Energy disaggregation is the task of discerning the energy consumption of individual appliances from aggregated measurements, which holds promise for understanding and reducing energy usage. In this paper, we propose PHASED, an optimization approach for energy disaggregation that has two key features: PHASED (i) exploits the structure of power distribution systems to make use of readily available measurements that are neglected by existing methods, and (ii) poses the problem as a minimization of a difference of submodular functions. We leverage this form by applying a discrete optimization variant of the majorization-minimization algorithm to iteratively minimize a sequence of global upper bounds of the cost function to obtain high-quality approximate solutions. PHASED improves the disaggregation accuracy of state-of-the-art models by up to 61% and achieves better prediction on heavy load appliances.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Model-Free State Estimation Using Low-Rank Canonical Polyadic Decomposition
Authors:
Ahmed S. Zamzam,
Yajing Liu,
Andrey Bernstein
Abstract:
As electric grids experience high penetration levels of renewable generation, fundamental changes are required to address real-time situational awareness. This paper uses unique traits of tensors to devise a model-free situational awareness and energy forecasting framework for distribution networks. This work formulates the state of the network at multiple time instants as a three-way tensor; henc…
▽ More
As electric grids experience high penetration levels of renewable generation, fundamental changes are required to address real-time situational awareness. This paper uses unique traits of tensors to devise a model-free situational awareness and energy forecasting framework for distribution networks. This work formulates the state of the network at multiple time instants as a three-way tensor; hence, recovering full state information of the network is tantamount to estimating all the values of the tensor. Given measurements received from $μ$phasor measurement units and/or smart meters, the recovery of unobserved quantities is carried out using the low-rank canonical polyadic decomposition of the state tensor---that is, the state estimation task is posed as a tensor imputation problem utilizing observed patterns in measured quantities. Two structured sampling schemes are considered: slab sampling and fiber sampling. For both schemes, we present sufficient conditions on the number of sampled slabs and fibers that guarantee identifiability of the factors of the state tensor. Numerical results demonstrate the ability of the proposed framework to achieve high estimation accuracy in multiple sampling scenarios.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
GRATE: Granular Recovery of Aggregated Tensor Data by Example
Authors:
Ahmed S. Zamzam,
Bo Yang,
Nicholas D. Sidiropoulos
Abstract:
In this paper, we address the challenge of recovering an accurate breakdown of aggregated tensor data using disaggregation examples. This problem is motivated by several applications. For example, given the breakdown of energy consumption at some homes, how can we disaggregate the total energy consumed during the same period at other homes? In order to address this challenge, we propose GRATE, a p…
▽ More
In this paper, we address the challenge of recovering an accurate breakdown of aggregated tensor data using disaggregation examples. This problem is motivated by several applications. For example, given the breakdown of energy consumption at some homes, how can we disaggregate the total energy consumed during the same period at other homes? In order to address this challenge, we propose GRATE, a principled method that turns the ill-posed task at hand into a constrained tensor factorization problem. Then, this optimization problem is tackled using an alternating least-squares algorithm. GRATE has the ability to handle exact aggregated data as well as inexact aggregation where some unobserved quantities contribute to the aggregated data. Special emphasis is given to the energy disaggregation problem where the goal is to provide energy breakdown for consumers from their monthly aggregated consumption. Experiments on two real datasets show the efficacy of GRATE in recovering more accurate disaggregation than state-of-the-art energy disaggregation methods.
△ Less
Submitted 5 April, 2020; v1 submitted 27 March, 2020;
originally announced March 2020.
-
Learning-Accelerated ADMM for Distributed Optimal Power Flow
Authors:
David Biagioni,
Peter Graf,
Xiangyu Zhang,
Ahmed Zamzam,
Kyri Baker,
Jennifer King
Abstract:
We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the conver…
▽ More
We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the converged values of dual and consensus variables. Given a new realization of system load, a small number of initial ADMM iterations is taken as input to infer the converged values and directly inject them into the iteration. We empirically demonstrate that the online injection of these values into the ADMM iteration accelerates convergence by a significant factor for partitioned 14-, 118- and 2848-bus test systems under differing load scenarios. The proposed method has several advantages: it maintains the security of private decision variables inherent in consensus ADMM; inference is fast and so may be used in online settings; RNN-generated predictions can dramatically improve time to convergence but, by construction, can never result in infeasible ADMM subproblems; it can be easily integrated into existing software implementations. While we focus on the ADMM formulation of distributed DC-OPF in this paper, the ideas presented are naturally extended to other distributed optimization problems.
△ Less
Submitted 15 September, 2020; v1 submitted 7 November, 2019;
originally announced November 2019.
-
Learning Optimal Solutions for Extremely Fast AC Optimal Power Flow
Authors:
Ahmed Zamzam,
Kyri Baker
Abstract:
In this paper, we develop an online method that leverages machine learning to obtain feasible solutions to the AC optimal power flow (OPF) problem with negligible optimality gaps on extremely fast timescales (e.g., milliseconds), bypassing solving an AC OPF altogether. This is motivated by the fact that as the power grid experiences increasing amounts of renewable power generation, controllable lo…
▽ More
In this paper, we develop an online method that leverages machine learning to obtain feasible solutions to the AC optimal power flow (OPF) problem with negligible optimality gaps on extremely fast timescales (e.g., milliseconds), bypassing solving an AC OPF altogether. This is motivated by the fact that as the power grid experiences increasing amounts of renewable power generation, controllable loads, and other inverter-interfaced devices, faster system dynamics and quicker fluctuations in the power supply are likely to occur. Currently, grid operators typically solve AC OPF every 15 minutes to determine economic generator settings while ensuring grid constraints are satisfied. Due to the computational challenges with solving this nonconvex problem, many efforts have focused on linearizing or approximating the problem in order to solve the AC OPF on faster timescales. However, many of these approximations can be fairly poor representations of the actual system state and still require solving an optimization problem, which can be time consuming for large networks. In this work, we leverage historical data to learn a mapping between the system loading and optimal generation values, enabling us to find near-optimal and feasible AC OPF solutions on extremely fast timescales without actually solving an optimization problem.
△ Less
Submitted 27 September, 2019;
originally announced October 2019.
-
Energy Storage Management via Deep Q-Networks
Authors:
Ahmed S. Zamzam,
Bo Yang,
Nicholas D. Sidiropoulos
Abstract:
Energy storage devices represent environmentally friendly candidates to cope with volatile renewable energy generation. Motivated by the increase in privately owned storage systems, this paper studies the problem of real-time control of a storage unit co-located with a renewable energy generator and an inelastic load. Unlike many approaches in the literature, no distributional assumptions are bein…
▽ More
Energy storage devices represent environmentally friendly candidates to cope with volatile renewable energy generation. Motivated by the increase in privately owned storage systems, this paper studies the problem of real-time control of a storage unit co-located with a renewable energy generator and an inelastic load. Unlike many approaches in the literature, no distributional assumptions are being made on the renewable energy generation or the real-time prices. Building on the deep Q-networks algorithm, a reinforcement learning approach utilizing a neural network is devised where the storage unit operational constraints are respected. The neural network approximates the action-value function which dictates what action (charging, discharging, etc.) to take. Simulations indicate that near-optimal performance can be attained with the proposed learning-based control policy for the storage units.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection
Authors:
Vassilis N. Ioannidis,
Ahmed S. Zamzam,
Georgios B. Giannakis,
Nicholas D. Sidiropoulos
Abstract:
Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or g…
▽ More
Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or graphs, existing CMTF algorithms may fall short. Alleviating current limitations, we introduce a novel model coined coupled graph-tensor factorization (CGTF) that judiciously accounts for graph-related side information. The CGTF model has the potential to overcome practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matrices. A novel alternating direction method of multipliers (ADMM) is also developed that recovers the nonnegative factors of CGTF. Our algorithm enjoys closed-form updates that result in reduced computational complexity and allow for convergence claims. A novel direction is further explored by employing the interpretable factors to detect graph communities having the tensor as side information. The resulting community detection approach is successful even when some links in the graphs are missing. Results with real data sets corroborate the merits of the proposed methods relative to state-of-the-art competing factorization techniques in providing recommendations and detecting communities.
△ Less
Submitted 30 May, 2019; v1 submitted 21 September, 2018;
originally announced September 2018.