Search | arXiv e-print repository

A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering

Authors: Qihan Qi, Xinsong Yang, Gang Xia, Daniel W. C. Ho, Pengyang Tang

Abstract: This paper proposes a safety modulator actor-critic (SMAC) method to address safety constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety constraint and focus on maximizing reward. Additionally, a distributional critic with a theoretical updat… ▽ More This paper proposes a safety modulator actor-critic (SMAC) method to address safety constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety constraint and focus on maximizing reward. Additionally, a distributional critic with a theoretical update rule for SMAC is proposed to mitigate the overestimation of Q-values with safety constraints. Both simulation and real-world scenarios experiments on Unmanned Aerial Vehicles (UAVs) hovering confirm that the SMAC can effectively maintain safety constraints and outperform mainstream baseline algorithms. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2104.10637 [pdf, ps, other]

doi 10.1088/1361-6420/ac23c3

Robust Kernel-based Distribution Regression

Authors: Zhan Yu, Daniel W. C. Ho, Ding-Xuan Zhou

Abstract: Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS). Recently, theoretical analysis on DR has been carried out via kernel ridge regress… ▽ More Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS). Recently, theoretical analysis on DR has been carried out via kernel ridge regression and several learning behaviors have been observed. However, the topic has not been explored and understood beyond the least square based DR. By introducing a robust loss function $l_σ$ for two-stage sampling problems, we present a novel robust distribution regression (RDR) scheme. With a windowing function $V$ and a scaling parameter $σ$ which can be appropriately chosen, $l_σ$ can include a wide range of popular used loss functions that enrich the theme of DR. Moreover, the loss $l_σ$ is not necessarily convex, hence largely improving the former regression class (least square) in the literature of DR. The learning rates under different regularity ranges of the regression function $f_ρ$ are comprehensively studied and derived via integral operator techniques. The scaling parameter $σ$ is shown to be crucial in providing robustness and satisfactory learning rates of RDR. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 29 pages

arXiv:2006.09017 [pdf, ps, other]

Estimates on Learning Rates for Multi-Penalty Distribution Regression

Authors: Zhan Yu, Daniel W. C. Ho

Abstract: This paper is concerned with functional learning by utilizing two-stage sampled distribution regression. We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory. The algorithm aims at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging… ▽ More This paper is concerned with functional learning by utilizing two-stage sampled distribution regression. We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory. The algorithm aims at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging, since only second stage samples are observable in practical setting. In the algorithm, to transform information from samples, we embed the distributions to a reproducing kernel Hilbert space $\mathcal{H}_K$ associated with Mercer kernel $K$ via mean embedding technique. The main contribution of the paper is to present a novel multi-penalty regularization algorithm to capture more features of distribution regression and derive optimal learning rates for the algorithm. The work also derives learning rates for distribution regression in the nonstandard setting $f_ρ\notin\mathcal{H}_K$, which is not explored in existing literature. Moreover, we propose a distribution regression-based distributed learning algorithm to face large-scale data or information challenge. The optimal learning rates are derived for the distributed learning algorithm. By providing new algorithms and showing their learning rates, we improve the existing work in different aspects in the literature. △ Less

Submitted 28 November, 2023; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:1004.3617 [pdf, ps, other]

Consensus over a Random Network Generated by i.i.d. Stochastic Matrices

Authors: Qingshuo Song, Guanrong Chen, Daniel W. C. Ho

Abstract: Our goal is to find a necessary and sufficient condition on the consensus over a random network, generated by i.i.d. stochastic matrices. We show that the consensus problem in three different convergence modes (almost surely, in probability, and in L1) are equivalent, thus have the same necessary and sufficient condition. We obtain the necessary and sufficient condition through the stability in a… ▽ More Our goal is to find a necessary and sufficient condition on the consensus over a random network, generated by i.i.d. stochastic matrices. We show that the consensus problem in three different convergence modes (almost surely, in probability, and in L1) are equivalent, thus have the same necessary and sufficient condition. We obtain the necessary and sufficient condition through the stability in a projected subspace. △ Less

Submitted 21 April, 2010; originally announced April 2010.

Showing 1–4 of 4 results for author: Ho, D W C