-
Inverse Delayed Reinforcement Learning
Authors:
Simon Sinong Zhan,
Qingyuan Wu,
Zhian Ruan,
Frank Yang,
Philip Wang,
Yixuan Wang,
Ruochen Jiao,
Chao Huang,
Qi Zhu
Abstract:
Inverse Reinforcement Learning (IRL) has demonstrated effectiveness in a variety of imitation tasks. In this paper, we introduce an IRL framework designed to extract rewarding features from expert trajectories affected by delayed disturbances. Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover…
▽ More
Inverse Reinforcement Learning (IRL) has demonstrated effectiveness in a variety of imitation tasks. In this paper, we introduce an IRL framework designed to extract rewarding features from expert trajectories affected by delayed disturbances. Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover optimal policies from augmented delayed observations. Empirical evaluations in the MuJoCo environment under diverse delay settings validate the effectiveness of our method. Furthermore, we provide a theoretical analysis showing that recovering expert policies from augmented delayed observations outperforms using direct delayed observations.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Switching Controller Synthesis for Hybrid Systems Against STL Formulas
Authors:
Han Su,
Shenghua Feng,
Sinong Zhan,
Naijun Zhan
Abstract:
Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into t…
▽ More
Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into the synthesis of switching controllers for HSs that meet system objectives given by a fragment of STL, which essentially corresponds to a reach-avoid problem with timing constraints. Our approach involves iteratively computing the state sets that can be driven to satisfy the reach-avoid specification with timing constraints. This technique supports to create switching controllers for both constant and non-constant HSs. We validate our method's soundness, and confirm its relative completeness for a certain subclass of HSs. Experiment results affirms the efficacy of our approach.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Distributed Online Feedback Optimization for Real-time Distribution System Voltage Regulation
Authors:
Sen Zhan,
Nikolaos G. Paterakis,
Wouter van den Akker,
Anne van der Molen,
Johan Morren,
J. G. Slootweg
Abstract:
We investigate the real-time voltage regulation problem in distribution systems employing online feedback optimization (OFO) with short-range communication between physical neighbours. OFO does not need an accurate grid model nor estimated consumption of non-controllable loads, affords fast calculations, and demonstrates robustness to uncertainties and disturbances, which render it particularly su…
▽ More
We investigate the real-time voltage regulation problem in distribution systems employing online feedback optimization (OFO) with short-range communication between physical neighbours. OFO does not need an accurate grid model nor estimated consumption of non-controllable loads, affords fast calculations, and demonstrates robustness to uncertainties and disturbances, which render it particularly suitable for real-time distribution system applications. However, many OFO controllers require centralized communication, making them susceptible to single-point failures. This paper proposes a distributed OFO design based on a nested feedback optimization strategy and analyzes its convergence. Numerical study results demonstrate that the proposed design achieves effective voltage regulation and outperforms other distributed and local approaches.
△ Less
Submitted 11 October, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Runtime Monitoring and Fault Detection for Neural Network-Controlled Systems
Authors:
Jianglin Lan,
Siyuan Zhan,
Ron Patton,
Xianxian Zhao
Abstract:
There is an emerging trend in applying deep learning methods to control complex nonlinear systems. This paper considers enhancing the runtime safety of nonlinear systems controlled by neural networks in the presence of disturbance and measurement noise. A robustly stable interval observer is designed to generate sound and precise lower and upper bounds for the neural network, nonlinear function, a…
▽ More
There is an emerging trend in applying deep learning methods to control complex nonlinear systems. This paper considers enhancing the runtime safety of nonlinear systems controlled by neural networks in the presence of disturbance and measurement noise. A robustly stable interval observer is designed to generate sound and precise lower and upper bounds for the neural network, nonlinear function, and system state. The obtained interval is utilised to monitor the real-time system safety and detect faults in the system outputs or actuators. An adaptive cruise control vehicular system is simulated to demonstrate effectiveness of the proposed design.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
Authors:
Qingyuan Wu,
Simon Sinong Zhan,
Yixuan Wang,
Yuhui Wang,
Chung-Wei Lin,
Chen Lv,
Qi Zhu,
Jürgen Schmidhuber,
Chao Huang
Abstract:
Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary ta…
▽ More
Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL.
△ Less
Submitted 5 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Empowering Autonomous Driving with Large Language Models: A Safety Perspective
Authors:
Yixuan Wang,
Ruochen Jiao,
Sinong Simon Zhan,
Chengtian Lang,
Chao Huang,
Zhaoran Wang,
Zhuoran Yang,
Qi Zhu
Abstract:
Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their rob…
▽ More
Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their robust common-sense knowledge and reasoning abilities. The proposed methodologies employ LLMs as intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning, for enhancing driving performance and safety. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine. Demonstrating superior performance and safety metrics compared to state-of-the-art approaches, our approach shows the promising potential for using LLMs for autonomous vehicles.
△ Less
Submitted 22 March, 2024; v1 submitted 27 November, 2023;
originally announced December 2023.
-
State-Wise Safe Reinforcement Learning With Pixel Observations
Authors:
Simon Sinong Zhan,
Yixuan Wang,
Qingyuan Wu,
Ruochen Jiao,
Chao Huang,
Qi Zhu
Abstract:
In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the explorati…
▽ More
In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the exploration and learning process, where the agent must avoid unsafe regions without prior knowledge, adds another layer of complexity. In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through a newly introduced latent barrier-like function learning mechanism. As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations. We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return. Experimental evaluations on the safety-gym benchmark suite demonstrate that our proposed method significantly reduces safety violations throughout the training process, and demonstrates faster safety convergence compared to existing methods while achieving competitive results in reward return.
△ Less
Submitted 11 December, 2023; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments
Authors:
Yixuan Wang,
Simon Sinong Zhan,
Ruochen Jiao,
Zhilu Wang,
Wanxin Jin,
Zhuoran Yang,
Zhaoran Wang,
Chao Huang,
Qi Zhu
Abstract:
It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain…
▽ More
It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation costs. In this work, we leverage the notion of barrier function to explicitly encode the hard safety constraints, and given that the environment is unknown, relax them to our design of \emph{generative-model-based soft barrier functions}. Based on such soft barriers, we propose a safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
△ Less
Submitted 13 June, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning
Authors:
Yixuan Wang,
Simon Zhan,
Zhilu Wang,
Chao Huang,
Zhaoran Wang,
Zhuoran Yang,
Qi Zhu
Abstract:
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and ver…
▽ More
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e.g., safety, stability) under the learned controller. However, as existing methods typically apply formal verification \emph{after} the controller has been learned, it is sometimes difficult to obtain any certificate, even after many iterations between learning and verification. To address this challenge, we propose a framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is differentiable by the gradients from the value function and certificates. Experiments on a variety of examples demonstrate the significant advantages of our framework over the model-based stochastic value gradient (SVG) method and the model-free proximal policy optimization (PPO) method in finding feasible controllers with barrier functions and Lyapunov functions that ensure system safety and stability.
△ Less
Submitted 21 March, 2023; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Distributionally Robust Chance-Constrained Flexibility Planning for Integrated Energy System
Authors:
Sen Zhan,
Peng Hou,
Guangya Yang
Abstract:
Inflexible combined heat and power (CHP) plants and uncertain wind power production result in excess power in distribution networks, which leads to inverse power flow challenging grid operations. Power-to-X facilities such as electrolysers and electric boilers can offer extra flexibility to the integrated energy system. In this regard, we aim to jointly determine the optimal Power-to-X facility si…
▽ More
Inflexible combined heat and power (CHP) plants and uncertain wind power production result in excess power in distribution networks, which leads to inverse power flow challenging grid operations. Power-to-X facilities such as electrolysers and electric boilers can offer extra flexibility to the integrated energy system. In this regard, we aim to jointly determine the optimal Power-to-X facility sizing and integrated energy system operations in this study. To account for wind power uncertainties, a distributionally robust chance-constrained model is developed to characterize wind power uncertainties using ambiguity sets. Linear decision rules are applied to analytically express real-time recourse actions when uncertainties are exposed, which allows the propagation of wind power uncertainties to gas and heat systems. Accordingly, the developed three-stage distributionally robust chance-constrained model is converted into a computationally tractable single-stage mixed-integer conic model. A case study validates the effectiveness of introducing the electrolyser and electric boiler into the integrated energy system, with respect to the decreased system cost, expanded CHP plant flexibility and reduced inverse power flow. The developed distributionally robust optimization model exhibits better effectiveness and robustness compared to a chance-constrained optimization model assuming wind forecast errors follow Gaussian distribution. Detailed profit analysis reveals that although the overall system cost is minimized, the profit is distributed unevenly across various stakeholders in the system.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Optimal Real-time Coordination of Distributed Energy Resources in Low-voltage Grids
Authors:
Sen Zhan,
Johan Morren,
Wouter van den Akker,
Anne van der Molen,
Han Slootweg
Abstract:
This study proposes a real-time distributed energy resource (DER) coordination model that can exploit flexibility from the DERs to solve voltage and overloading issues using both active and reactive power. The model considers time-coupling devices including electric vehicles and heat pumps by deviating as little as possible from their original schedules while prioritizing DERs with the most urgent…
▽ More
This study proposes a real-time distributed energy resource (DER) coordination model that can exploit flexibility from the DERs to solve voltage and overloading issues using both active and reactive power. The model considers time-coupling devices including electric vehicles and heat pumps by deviating as little as possible from their original schedules while prioritizing DERs with the most urgent demand using dynamic cost terms. The model does not require a multi-period setting or a multi-period-ahead forecast, which enables the model to alleviate the computational difficulty and enhance its applicability for DSOs to manage the grids in real time. A case study using a Dutch low-voltage grid assuming a 100% penetration scenario of electric vehicles, heat pumps, and photovoltaics (PVs) in the households validates that the proposed model can resolve the network issues while not affecting user comfort.
△ Less
Submitted 6 May, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Technoeconomic Supplement of P2G Clusters with Hydrogen Pipeline for Coordinated Renewable Energy and HVDC Systems
Authors:
Jiarong Li,
Jin Lin,
Yonghua Song,
Jinyu Xiao,
Feng Liu,
Yuxuan Zhao,
Sen Zhan
Abstract:
Under the downward tendency of prices of renewable energy generators and upward trend of hydrogen demand, this paper studies the technoeconomic supplement of P2G clusters with hydrogen pipeline for HVDC to jointly consume renewable energy. First, the planning and operation constraints of large-capacity P2G clusters is established. On this basis, the multistage coordinated planning model of renewab…
▽ More
Under the downward tendency of prices of renewable energy generators and upward trend of hydrogen demand, this paper studies the technoeconomic supplement of P2G clusters with hydrogen pipeline for HVDC to jointly consume renewable energy. First, the planning and operation constraints of large-capacity P2G clusters is established. On this basis, the multistage coordinated planning model of renewable energy, HVDCs, P2Gs and hydrogen pipelines is proposed considering both variability and uncertainty, rendering a distributionally robust chance-constrained (DRCC) program. Then this model is applied in the case study based on the real Inner Mongolia-Shandong system. Compared with energy transmission via HVDC only, P2G can provide operation supplement with its operational flexibility and long term economic supplement with increasing demand in high-valued transportation sector, which stimulates an extra 24 GW renewable energy exploration. Sensitivity analysis for both technical and economic factors further verifies the advantages of P2G in the presence of high variability due to renewable energy and downward tendency of prices of renewable energy generators. However, since the additional levelized cost of the P2G (0.04 RMB/kWh) is approximately twice the HVDC (0.02 RMB/kWh), P2G is more sensitive to uncertainty from both renewable energy and hydrogen demand.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Ensemble emotion recognizing with multiple modal physiological signals
Authors:
Jing Zhang,
Yong Zhang,
Suhua Zhan,
Cheng Cheng
Abstract:
Physiological signals that provide the objective repression of human affective states are attracted increasing attention in the emotion recognition field. However, the single signal is difficult to obtain completely and accurately description for emotion. Multiple physiological signals fusing models, building the uniform classification model by means of consistent and complementary information fro…
▽ More
Physiological signals that provide the objective repression of human affective states are attracted increasing attention in the emotion recognition field. However, the single signal is difficult to obtain completely and accurately description for emotion. Multiple physiological signals fusing models, building the uniform classification model by means of consistent and complementary information from different emotions to improve recognition performance. Original fusing models usually choose the particular classification method to recognition, which is ignoring different distribution of multiple signals. Aiming above problems, in this work, we propose an emotion classification model through multiple modal physiological signals for different emotions. Features are extracted from EEG, EMG, EOG signals for characterizing emotional state on valence and arousal levels. For characterization, four bands filtering theta, beta, alpha, gamma for signal preprocessing are adopted and three Hjorth parameters are computing as features. To improve classification performance, an ensemble classifier is built. Experiments are conducted on the benchmark DEAP datasets. For the two-class task, the best result on arousal is 94.42\%, the best result on valence is 94.02\%, respectively. For the four-class task, the highest average classification accuracy is 90.74, and it shows good stability. The influence of different peripheral physiological signals for results is also analyzed in this paper.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.