-
Convex Estimation of Sparse-Smooth Power Spectral Densities from Mixtures of Realizations with Application to Weather Radar
Authors:
Hiroki Kuroda,
Daichi Kitahara,
Eiichi Yoshikawa,
Hiroshi Kikuchi,
Tomoo Ushio
Abstract:
In this paper, we propose a convex optimization-based estimation of sparse and smooth power spectral densities (PSDs) of complex-valued random processes from mixtures of realizations. While the PSDs are related to the magnitude of the frequency components of the realizations, it has been a major challenge to exploit the smoothness of the PSDs, because penalizing the difference of the magnitude of…
▽ More
In this paper, we propose a convex optimization-based estimation of sparse and smooth power spectral densities (PSDs) of complex-valued random processes from mixtures of realizations. While the PSDs are related to the magnitude of the frequency components of the realizations, it has been a major challenge to exploit the smoothness of the PSDs, because penalizing the difference of the magnitude of the frequency components results in a nonconvex optimization problem that is difficult to solve. To address this challenge, we design the proposed model that jointly estimates the complex-valued frequency components and the nonnegative PSDs, which are respectively regularized to be sparse and sparse-smooth. By penalizing the difference of the nonnegative variable that estimates the PSDs, the proposed model can enhance the smoothness of the PSDs via convex optimization. Numerical experiments on the phased array weather radar, an advanced weather radar system, demonstrate that the proposed model achieves superior estimation accuracy compared to existing sparse estimation models, regardless of whether they are combined with a smoothing technique as a post-processing step or not.
△ Less
Submitted 14 November, 2023; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Learning-based Bounded Synthesis for Semi-MDPs with LTL Specifications
Authors:
Ryohei Oura,
Toshimitsu Ushio
Abstract:
This letter proposes a learning-based bounded synthesis for a semi-Markov decision process (SMDP) with a linear temporal logic (LTL) specification. In the product of the SMDP and the deterministic $K$-co-Büchi automaton (d$K$cBA) converted from the LTL specification, we learn both the winning region of satisfying the LTL specification and the dynamics therein based on reinforcement learning and Ba…
▽ More
This letter proposes a learning-based bounded synthesis for a semi-Markov decision process (SMDP) with a linear temporal logic (LTL) specification. In the product of the SMDP and the deterministic $K$-co-Büchi automaton (d$K$cBA) converted from the LTL specification, we learn both the winning region of satisfying the LTL specification and the dynamics therein based on reinforcement learning and Bayesian inference. Then, we synthesize an optimal policy satisfying the following two conditions. (1) It maximizes the probability of reaching the wining region. (2) It minimizes a long-term risk for the dwell time within the winning region. The minimization of the long-term risk is done based on the estimated dynamics and a value iteration. We show that, if the discount factor is sufficiently close to one, the synthesized policy converges to the optimal policy as the number of the data obtained by the exploration goes to the infinity.
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation
Authors:
Junya Ikemoto,
Toshimitsu Ushio
Abstract:
Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal lo…
▽ More
Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within bounded time intervals. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a $τ$-CMDP. We formulate the STL-constrained optimal control problem as the $τ$-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.
△ Less
Submitted 19 November, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications
Authors:
Junya Ikemoto,
Toshimitsu Ushio
Abstract:
We apply deep reinforcement learning (DRL) to design of a networked controller with network delays to complete a temporal control task that is described by a signal temporal logic (STL) formula. STL is useful to deal with a specification with a bounded time interval for a dynamical system. In general, an agent needs not only the current system state but also the past behavior of the system to dete…
▽ More
We apply deep reinforcement learning (DRL) to design of a networked controller with network delays to complete a temporal control task that is described by a signal temporal logic (STL) formula. STL is useful to deal with a specification with a bounded time interval for a dynamical system. In general, an agent needs not only the current system state but also the past behavior of the system to determine a desired control action for satisfying the given STL formula. Additionally, we need to consider the effect of network delays for data transmissions. Thus, we propose an extended Markov decision process using past system states and control actions, which is called a $τd$-MDP, so that the agent can evaluate the satisfaction of the STL formula considering the network delays. Thereafter, we apply a DRL algorithm to design a networked controller using the $τd$-MDP. Through simulations, we also demonstrate the learning performance of the proposed algorithm.
△ Less
Submitted 27 March, 2022; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Collaborative rover-copter path planning and exploration with temporal logic specifications based on Bayesian update under uncertain environments
Authors:
Kazumune Hashimoto,
Natsuko Tsumagari,
Toshimitsu Ushio
Abstract:
This paper investigates a collaborative rover-copter path planning and exploration with temporal logic specifications under uncertain environments. The objective of the rover is to complete a mission expressed by a syntactically co-safe linear temporal logic (scLTL) formula, while the objective of the copter is to actively explore the environment and reduce its uncertainties, aiming at assisting t…
▽ More
This paper investigates a collaborative rover-copter path planning and exploration with temporal logic specifications under uncertain environments. The objective of the rover is to complete a mission expressed by a syntactically co-safe linear temporal logic (scLTL) formula, while the objective of the copter is to actively explore the environment and reduce its uncertainties, aiming at assisting the rover and enhancing the efficiency of the mission completion. To formalize our approach, we first capture the environmental uncertainties by environmental beliefs of the atomic propositions, under an assumption that it is unknown which properties (or, atomic propositions) are satisfied in each area of the environment. The environmental beliefs of the atomic propositions are updated according to the Bayes rule based on the Bernoulli-type sensor measurements provided by both the rover and the copter. Then, the optimal policy for the rover is synthesized by maximizing a belief of the satisfaction of the scLTL formula through an implementation of an automata-based model checking. An exploration policy for the copter is then synthesized by employing the notion of an entropy that is evaluated based on the environmental beliefs of the atomic propositions, and a path that the rover intends to follow according to the optimal policy. As such, the copter can actively explore regions whose uncertainties are high and that are relevant to the mission completion. Finally, some numerical examples illustrate the effectiveness of the proposed approach.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Bounded Synthesis and Reinforcement Learning of Supervisors for Stochastic Discrete Event Systems with LTL Specifications
Authors:
Ryohei Oura,
Toshimitsu Ushio,
Ami Sakakibara
Abstract:
In this paper, we consider supervisory control of stochastic discrete event systems (SDESs) under linear temporal logic specifications. Applying the bounded synthesis, we reduce the supervisor synthesis into a problem of satisfying a safety condition. First, we consider a synthesis problem of a directed controller using the safety condition. We assign a negative reward to the unsafe states and int…
▽ More
In this paper, we consider supervisory control of stochastic discrete event systems (SDESs) under linear temporal logic specifications. Applying the bounded synthesis, we reduce the supervisor synthesis into a problem of satisfying a safety condition. First, we consider a synthesis problem of a directed controller using the safety condition. We assign a negative reward to the unsafe states and introduce an expected return with a state-dependent discount factor. We compute a winning region and a directed controller with the maximum satisfaction probability using a dynamic programming method, where the expected return is used as a value function. Next, we construct a permissive supervisor via the optimal value function. We show that the supervisor accomplishes the maximum satisfaction probability and maximizes the reachable set within the winning region. Finally, for an unknown SDES, we propose a two-stage model-free reinforcement learning method for efficient learning of the winning region and the directed controllers with the maximum satisfaction probability. We also demonstrate the effectiveness of the proposed method by simulation.
△ Less
Submitted 9 April, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time Systems
Authors:
Junya Ikemoto,
Toshimitsu Ushio
Abstract:
Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system pa…
▽ More
Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system parameter vector. We can collect many experiences more efficiently than interactions with the real system. However, it is difficult to identify the system parameter vector accurately. If we have an identification error, experiences obtained by the simulator may degrade the performance of the learned policy. Thus, we propose a practical RL algorithm that consists of two stages. At the first stage, we choose multiple system parameter vectors. Then, we have a mathematical model for each system parameter vector, which is called a virtual system. We obtain optimal Q-functions for multiple virtual systems using the continuous deep Q-learning algorithm. At the second stage, we represent a Q-function for the real system by a linear approximated function whose basis functions are optimal Q-functions learned at the first stage. The agent learns the Q-function through interactions with the real system online. By numerical simulations, we show the usefulness of our proposed method.
△ Less
Submitted 19 April, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Learning-based Symbolic Abstractions for Nonlinear Control Systems
Authors:
Kazumune Hashimoto,
Adnane Saoud,
Masako Kishida,
Toshimitsu Ushio,
Dimos Dimarogonas
Abstract:
Symbolic models or abstractions are known to be powerful tools for the control design of cyber-physical systems (CPSs) with logic specifications. In this paper, we investigate a novel learning-based approach to the construction of symbolic models for nonlinear control systems. In particular, the symbolic model is constructed based on learning the un-modeled part of the dynamics from training data…
▽ More
Symbolic models or abstractions are known to be powerful tools for the control design of cyber-physical systems (CPSs) with logic specifications. In this paper, we investigate a novel learning-based approach to the construction of symbolic models for nonlinear control systems. In particular, the symbolic model is constructed based on learning the un-modeled part of the dynamics from training data based on state-space exploration, and the concept of an alternating simulation relation that represents behavioral relationships with respect to the original control system. Moreover, we aim at achieving safe exploration, meaning that the trajectory of the system is guaranteed to be in a safe region for all times while collecting the training data. In addition, we provide some techniques to reduce the computational load, in terms of memory and computation time, of constructing the symbolic models and the safety controller synthesis, so as to make our approach practical. Finally, a numerical simulation illustrates the effectiveness of the proposed approach.
△ Less
Submitted 3 August, 2022; v1 submitted 4 April, 2020;
originally announced April 2020.
-
On-Line Synthesis of Permissive Supervisors for Partially Observed Discrete Event Systems under scLTL Constraints
Authors:
Ami Sakakibara,
Toshimitsu Ushio
Abstract:
We consider a supervisory control problem of a discrete event system (DES) under partial observation, where a control specification is given by a fragment of linear temporal logic. We design an on-line supervisor that dynamically computes its control action with the complete information of the product automaton of the DES and an acceptor for the specification. The concepts of controllability and o…
▽ More
We consider a supervisory control problem of a discrete event system (DES) under partial observation, where a control specification is given by a fragment of linear temporal logic. We design an on-line supervisor that dynamically computes its control action with the complete information of the product automaton of the DES and an acceptor for the specification. The concepts of controllability and observability are defined by means of a ranking function defined on the product automaton, which decreases its value if an accepting state of the product automaton is being approached. The proposed on-line control scheme leverages the ranking function and a permissiveness function, which represents a time-varying permissiveness level. As a result, the on-line supervisor achieves the specification, being aware of the tradeoff between its permissiveness and acceptance of the specification, if the product automaton is controllable and observable.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
On-Line Permissive Supervisory Control of Discrete Event Systems for scLTL Specifications
Authors:
Ami Sakakibara,
Toshimitsu Ushio
Abstract:
We propose an on-line supervisory control scheme for discrete event systems (DESs), where a control specification is described by a fragment of linear temporal logic. On the product automaton of the DES and an acceptor for the specification, we define a ranking function that returns the minimum number of steps required to reach an accepting state from each state. In addition, we introduce a permis…
▽ More
We propose an on-line supervisory control scheme for discrete event systems (DESs), where a control specification is described by a fragment of linear temporal logic. On the product automaton of the DES and an acceptor for the specification, we define a ranking function that returns the minimum number of steps required to reach an accepting state from each state. In addition, we introduce a permissiveness function that indicates a time-varying permissive level. At each step during the on-line control scheme, the supervisor refers to the permissiveness function as well as the ranking function in order to guarantee the control specification while handling the tradeoff between its permissiveness and acceptance of the specification. The proposed scheme is demonstrated in a surveillance problem for a mobile robot.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata
Authors:
Ryohei Oura,
Ami Sakakibara,
Toshimitsu Ushio
Abstract:
This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all inf…
▽ More
This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.
△ Less
Submitted 26 March, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Control of Timed Discrete Event Systems with Ticked Linear Temporal Logic Constraints
Authors:
Takuma Kinugawa,
Kazumune Hashimoto,
Toshimitsu Ushio
Abstract:
This paper presents a novel method of synthesizing a fragment of a timed discrete event system(TDES),introducing a novel linear temporal logic(LTL), called ticked LTL$_f$. The ticked LTL$_f$ is given as an extension to LTL$_f$, where the semantics is defined over a finite execution fragment. Differently from the standard LTL$_f$, the formula is defined as a variant of metric temporal logic formula…
▽ More
This paper presents a novel method of synthesizing a fragment of a timed discrete event system(TDES),introducing a novel linear temporal logic(LTL), called ticked LTL$_f$. The ticked LTL$_f$ is given as an extension to LTL$_f$, where the semantics is defined over a finite execution fragment. Differently from the standard LTL$_f$, the formula is defined as a variant of metric temporal logic formula, where the temporal properties are described by counting the number of tick in the fragment of the TDES. Moreover, we provide a scheme that encodes the problem into a suitable one that can be solved by an integer linear programming (ILP). The effectiveness of the proposed approach is illustrated through a numerical example of a path planning.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Learning self-triggered controllers with Gaussian processes
Authors:
Kazumune Hashimoto,
Yuichi Yoshimura,
Toshimitsu Ushio
Abstract:
This paper investigates the design of self-triggered controllers for networked control systems (NCSs), where the dynamics of the plant is \textit{unknown} apriori. To deal with the unknown transition dynamics, we employ the Gaussian process (GP) regression in order to learn the dynamics of the plant. To design the self-triggered controller, we formulate an optimal control problem, such that the op…
▽ More
This paper investigates the design of self-triggered controllers for networked control systems (NCSs), where the dynamics of the plant is \textit{unknown} apriori. To deal with the unknown transition dynamics, we employ the Gaussian process (GP) regression in order to learn the dynamics of the plant. To design the self-triggered controller, we formulate an optimal control problem, such that the optimal control and communication policies can be jointly designed based on the GP model of the plant. Moreover, we provide an overall implementation algorithm that jointly learns the dynamics of the plant and the self-triggered controller based on a reinforcement learning framework. Finally, a numerical simulation illustrates the effectiveness of the proposed approach.
△ Less
Submitted 8 March, 2020; v1 submitted 31 August, 2019;
originally announced September 2019.
-
Networked Control of Nonlinear Systems under Partial Observation Using Continuous Deep Q-Learning
Authors:
Junya Ikemoto,
Toshimitsu Ushio
Abstract:
In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not…
▽ More
In this paper, we propose a design of a model-free networked controller for a nonlinear plant whose mathematical model is unknown. In a networked control system, the controller and plant are located away from each other and exchange data over a network, which causes network delays that may fluctuate randomly due to network routing. So, in this paper, we assume that the current network delay is not known but the maximum value of fluctuating network delays is known beforehand. Moreover, we also assume that the sensor cannot observe all state variables of the plant. Under these assumption, we apply continuous deep Q-learning to the design of the networked controller. Then, we introduce an extended state consisting of a sequence of past control inputs and outputs as inputs to the deep neural network. By simulation, it is shown that, using the extended state, the controller can learn a control policy robust to the fluctuation of the network delays under the partial observation.
△ Less
Submitted 29 August, 2019; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Model-free Control of Chaos with Continuous Deep Q-learning
Authors:
Junya Ikemoto,
Toshimitsu Ushio
Abstract:
The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback co…
▽ More
The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a stabilizing periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. To overcome this problem, we propose a model-free reinforcement learning algorithm to the design of a controller for the chaotic system. In recent years, model-free reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. However, it is known that model-free reinforcement learning algorithms are not efficient because learners must explore their control policies over the entire state space. Moreover, model-free reinforcement learning algorithms with deep neural networks have the disadvantage in taking much time to learn their control optimal policies. Thus, we propose a data-based control policy consisting of two steps, where we determine a region including the stabilizing periodic orbit first, and make the controller learn an optimal control policy for its stabilization. In the proposed method, the controller efficiently explores its control policy only in the region.
△ Less
Submitted 24 August, 2019; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Output Feedback Controller Design with Symbolic Observers for Cyber-physical Systems
Authors:
Masashi Mizoguchi,
Toshimitsu Ushio
Abstract:
In this paper, we design a symbolic output feedback controller of a cyber-physical system (CPS). The physical plant is modeled by an infinite transition system. We consider the situation that a finite abstracted system of the physical plant, called a c-abstracted system, is given. There exists an approximate alternating simulation relation from the c-abstracted system to the physical plant. A desi…
▽ More
In this paper, we design a symbolic output feedback controller of a cyber-physical system (CPS). The physical plant is modeled by an infinite transition system. We consider the situation that a finite abstracted system of the physical plant, called a c-abstracted system, is given. There exists an approximate alternating simulation relation from the c-abstracted system to the physical plant. A desired behavior of the c-abstracted system is also given, and we have a symbolic state feedback controller of the physical plant. We consider the case where some states of the plant are not measured. Then, to estimate the states with abstracted outputs measured by sensors, we introduce a finite abstracted system of the physical plant, called an o-abstracted system, such that there exists an approximate simulation relation. The relation guarantees that an observer designed based on the state of the o-abstracted system estimates the current state of the plant. We construct a symbolic output feedback controller by composing these systems. By a relation-based approach, we proved that the controlled system approximately exhibits the desired behavior.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.