-
Line of Sight Curvature for Missile Guidance using Reinforcement Meta-Learning
Authors:
Brian Gaudet,
Roberto Furfaro
Abstract:
We use reinforcement meta learning to optimize a line of sight curvature policy that increases the effectiveness of a guidance system against maneuvering targets. The policy is implemented as a recurrent neural network that maps navigation system outputs to a Euler 321 attitude representation. The attitude representation is then used to construct a direction cosine matrix that biases the observed…
▽ More
We use reinforcement meta learning to optimize a line of sight curvature policy that increases the effectiveness of a guidance system against maneuvering targets. The policy is implemented as a recurrent neural network that maps navigation system outputs to a Euler 321 attitude representation. The attitude representation is then used to construct a direction cosine matrix that biases the observed line of sight vector. The line of sight rotation rate derived from the biased line of sight is then mapped to a commanded acceleration by the guidance system. By varying the bias as a function of navigation system outputs, the policy enhances accuracy against highly maneuvering targets. Importantly, our method does not require an estimate of target acceleration. In our experiments, we demonstrate that when our method is combined with proportional navigation, the system significantly outperforms augmented proportional navigation with perfect knowledge of target acceleration, achieving improved accuracy with less control effort against a wide range of target maneuvers.
△ Less
Submitted 29 April, 2022;
originally announced May 2022.
-
Integrated Guidance and Control for Lunar Landing using a Stabilized Seeker
Authors:
Brian Gaudet,
Roberto Furfaro
Abstract:
We develop an integrated guidance and control system that in conjunction with a stabilized seeker and landing site detection software can achieve precise and safe planetary landing. The seeker tracks the designated landing site by adjusting seeker elevation and azimuth angles to center the designated landing site in the sensor field of view. The seeker angles, closing speed, and range to the desig…
▽ More
We develop an integrated guidance and control system that in conjunction with a stabilized seeker and landing site detection software can achieve precise and safe planetary landing. The seeker tracks the designated landing site by adjusting seeker elevation and azimuth angles to center the designated landing site in the sensor field of view. The seeker angles, closing speed, and range to the designated landing site are used to formulate a velocity field that is used by the guidance and control system to achieve a safe landing at the designated landing site. The guidance and control system maps this velocity field, attitude, and rotational velocity directly to a commanded thrust vector for the lander's four engines. The guidance and control system is implemented as a policy optimized using reinforcement meta learning. We demonstrate that the guidance and control system is compatible with multiple diverts during the powered descent phase, and is robust to seeker lag, actuator lag and degradation, and center of mass variation induced by fuel consumption. We outline several concepts of operations, including an approach using a preplaced landing beacon.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Terminal Adaptive Guidance for Autonomous Hypersonic Strike Weapons via Reinforcement Learning
Authors:
Brian Gaudet,
Roberto Furfaro
Abstract:
An adaptive guidance system suitable for the terminal phase trajectory of a hypersonic strike weapon is optimized using reinforcement meta learning. The guidance system maps observations directly to commanded bank angle, angle of attack, and sideslip angle rates. Importantly, the observations are directly measurable from radar seeker outputs with minimal processing. The optimization framework impl…
▽ More
An adaptive guidance system suitable for the terminal phase trajectory of a hypersonic strike weapon is optimized using reinforcement meta learning. The guidance system maps observations directly to commanded bank angle, angle of attack, and sideslip angle rates. Importantly, the observations are directly measurable from radar seeker outputs with minimal processing. The optimization framework implements a shaping reward that minimizes the line of sight rotation rate, with a terminal reward given if the agent satisfies path constraints and meets terminal accuracy and speed criteria. We show that the guidance system can adapt to off-nominal flight conditions including perturbation of aerodynamic coefficient parameters, actuator failure scenarios, sensor scale factor errors, and actuator lag, while satisfying heating rate, dynamic pressure, and load path constraints, as well as a minimum impact speed constraint. We demonstrate precision strike capability against a maneuvering ground target and the ability to divert to a new target, the latter being important to maximize strike effectiveness for a group of hypersonic strike weapons. Moreover, we demonstrate a threat evasion strategy against interceptors with limited midcourse correction capability, where the hypersonic strike weapon implements multiple diverts to alternate targets, with the last divert to the actual target. Finally, we include preliminary results for an integrated guidance and control system in a six degrees-of-freedom environment.
△ Less
Submitted 16 October, 2021; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning
Authors:
Brian Gaudet,
Roberto Furfaro
Abstract:
We apply a reinforcement meta-learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile. The system is implemented as a policy that maps navigation system outputs directly to commanded rates of change for the missile's control surface deflections. The system induces intercept trajectories against a maneuvering target that satisfy control…
▽ More
We apply a reinforcement meta-learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile. The system is implemented as a policy that maps navigation system outputs directly to commanded rates of change for the missile's control surface deflections. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model, and demonstrate that the system adapts to both a large flight envelope and off-nominal flight conditions including perturbation of aerodynamic coefficient parameters and center of pressure locations, and flexible body dynamics. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction and imperfect seeker stabilization. We compare our system's performance to a longitudinal model of proportional navigation coupled with a three loop autopilot, and find that our system outperforms this benchmark by a large margin. Additional experiments investigate the impact of removing the recurrent layer from the policy and value function networks, performance with an infrared seeker, and flexible body dynamics.
△ Less
Submitted 3 May, 2022; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Reinforcement Meta-Learning for Interception of Maneuvering Exoatmospheric Targets with Parasitic Attitude Loop
Authors:
Brian Gaudet,
Roberto Furfaro,
Richard Linares,
Andrea Scorsoglio
Abstract:
We use Reinforcement Meta-Learning to optimize an adaptive integrated guidance, navigation, and control system suitable for exoatmospheric interception of a maneuvering target. The system maps observations consisting of strapdown seeker angles and rate gyro measurements directly to thruster on-off commands. Using a high fidelity six degree-of-freedom simulator, we demonstrate that the optimized po…
▽ More
We use Reinforcement Meta-Learning to optimize an adaptive integrated guidance, navigation, and control system suitable for exoatmospheric interception of a maneuvering target. The system maps observations consisting of strapdown seeker angles and rate gyro measurements directly to thruster on-off commands. Using a high fidelity six degree-of-freedom simulator, we demonstrate that the optimized policy can adapt to parasitic effects including seeker angle measurement lag, thruster control lag, the parasitic attitude loop resulting from scale factor errors and Gaussian noise on angle and rotational velocity measurements, and a time varying center of mass caused by fuel consumption and slosh. Importantly, the optimized policy gives good performance over a wide range of challenging target maneuvers. Unlike previous work that enhances range observability by inducing line of sight oscillations, our system is optimized to use only measurements available from the seeker and rate gyros. Through extensive Monte Carlo simulation of randomized exoatmospheric interception scenarios, we demonstrate that the optimized policy gives performance close to that of augmented proportional navigation with perfect knowledge of the full engagement state. The optimized system is computationally efficient and requires minimal memory, and should be compatible with today's flight processors.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Adaptive Generalized ZEM-ZEV Feedback Guidance for Planetary Landing via a Deep Reinforcement Learning Approach
Authors:
Roberto Furfaro,
Andrea Scorsoglio,
Richard Linares,
Mauro Massari
Abstract:
Precision landing on large and small planetary bodies is a technology of utmost importance for future human and robotic exploration of the solar system. In this context, the Zero-Effort-Miss/Zero-Effort-Velocity (ZEM/ZEV) feedback guidance algorithm has been studied extensively and is still a field of active research. The algorithm, although powerful in terms of accuracy and ease of implementation…
▽ More
Precision landing on large and small planetary bodies is a technology of utmost importance for future human and robotic exploration of the solar system. In this context, the Zero-Effort-Miss/Zero-Effort-Velocity (ZEM/ZEV) feedback guidance algorithm has been studied extensively and is still a field of active research. The algorithm, although powerful in terms of accuracy and ease of implementation, has some limitations. Therefore with this paper we present an adaptive guidance algorithm based on classical ZEM/ZEV in which machine learning is used to overcome its limitations and create a closed loop guidance algorithm that is sufficiently lightweight to be implemented on board spacecraft and flexible enough to be able to adapt to the given constraint scenario. The adopted methodology is an actor-critic reinforcement learning algorithm that learns the parameters of the above-mentioned guidance architecture according to the given problem constraints.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Fuel-Efficient Powered Descent Guidance on Large Planetary Bodies via Theory of Functional Connections
Authors:
Hunter Johnston,
Enrico Schiassi,
Roberto Furfaro,
Daniele Mortari
Abstract:
In this paper we present a new approach to solve the fuel-efficient powered descent guidance problem on large planetary bodies with no atmosphere (e.g. the Moon or Mars) using the recently developed Theory of Functional Connections. The problem is formulated using the indirect method which casts the optimal guidance problem as a system of nonlinear two-point boundary value problems. Using the Theo…
▽ More
In this paper we present a new approach to solve the fuel-efficient powered descent guidance problem on large planetary bodies with no atmosphere (e.g. the Moon or Mars) using the recently developed Theory of Functional Connections. The problem is formulated using the indirect method which casts the optimal guidance problem as a system of nonlinear two-point boundary value problems. Using the Theory of Functional Connections, the problem constraints are analytically embedded into a "constrained expression," which maintains a free-function that is expanded using orthogonal polynomials with unknown coefficients. The constraints are satisfied regardless of the values of the unknown coefficients which convert the two-point boundary value problem into an unconstrained optimization problem. This process casts the solution into the admissible subspace of the problem and therefore simple numerical techniques can be used (i.e. in this paper a nonlinear least-squares method is used). In addition to the derivation of this technique, the method is validated in two scenarios and the results are compared to those obtained by the general purpose optimal control software, GPOPS-II. In general, the proposed technique produces solutions of $\mathcal{O}(10^{-10})$. Additionally, for the proposed test cases, it is reported that each individual TFC-based inner-loop iteration converges within 6 iterations, each iteration exhibiting a computational time between 72 and 81 milliseconds within the MATLAB legacy implementation. Consequently, the proposed methodology is potentially suitable for on-board generation of optimal trajectories in real-time.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Six Degree-of-Freedom Body-Fixed Hovering over Unmapped Asteroids via LIDAR Altimetry and Reinforcement Meta-Learning
Authors:
Brian Gaudet,
Richard Linares,
Roberto Furfaro
Abstract:
We optimize a six degrees of freedom hovering policy using reinforcement meta-learning. The policy maps flash LIDAR measurements directly to on/off spacecraft body-frame thrust commands, allowing hovering at a fixed position and attitude in the asteroid body-fixed reference frame. Importantly, the policy does not require position and velocity estimates, and can operate in environments with unknown…
▽ More
We optimize a six degrees of freedom hovering policy using reinforcement meta-learning. The policy maps flash LIDAR measurements directly to on/off spacecraft body-frame thrust commands, allowing hovering at a fixed position and attitude in the asteroid body-fixed reference frame. Importantly, the policy does not require position and velocity estimates, and can operate in environments with unknown dynamics, and without an asteroid shape model or navigation aids. Indeed, during optimization the agent is confronted with a new randomly generated asteroid for each episode, insuring that it does not learn an asteroid's shape, texture, or environmental dynamics. This allows the deployed policy to generalize well to novel asteroid characteristics, which we demonstrate in our experiments. Moreover, our experiments show that the optimized policy adapts to actuator failure and sensor noise. Although the policy is optimized using randomly generated synthetic asteroids, it is tested on two shape models from actual asteroids: Bennu and Itokawa. We find that the policy generalizes well to these shape models. The hovering controller has the potential to simplify mission planning by allowing asteroid body-fixed hovering immediately upon the spacecraft's arrival to an asteroid. This in turn simplifies shape model generation and allows resource mapping via remote sensing immediately upon arrival at the target asteroid.
△ Less
Submitted 8 February, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Space Objects Maneuvering Prediction via Maximum Causal Entropy Inverse Reinforcement Learning
Authors:
Bryce Doerr,
Richard Linares,
Roberto Furfaro
Abstract:
Inverse Reinforcement Learning (RL) can be used to determine the behavior of Space Objects (SOs) by estimating the reward function that an SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using maximum causal entropy. This approach determines the optimal reward function that a SO is usin…
▽ More
Inverse Reinforcement Learning (RL) can be used to determine the behavior of Space Objects (SOs) by estimating the reward function that an SO is using for control. The approach discussed in this work can be used to analyze maneuvering of SOs from observational data. The inverse RL problem is solved using maximum causal entropy. This approach determines the optimal reward function that a SO is using while maneuvering with random disturbances by assuming that the observed trajectories are optimal with respect to the SO's own reward function. Lastly, this paper develops results for scenarios involving Low Earth Orbit (LEO) station-keeping and Geostationary Orbit (GEO) station-keeping.
△ Less
Submitted 6 December, 2019; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Seeker based Adaptive Guidance via Reinforcement Meta-Learning Applied to Asteroid Close Proximity Operations
Authors:
Brian Gaudet,
Richard Linares,
Roberto Furfaro
Abstract:
Current practice for asteroid close proximity maneuvers requires extremely accurate characterization of the environmental dynamics and precise spacecraft positioning prior to the maneuver. This creates a delay of several months between the spacecraft's arrival and the ability to safely complete close proximity maneuvers. In this work we develop an adaptive integrated guidance, navigation, and cont…
▽ More
Current practice for asteroid close proximity maneuvers requires extremely accurate characterization of the environmental dynamics and precise spacecraft positioning prior to the maneuver. This creates a delay of several months between the spacecraft's arrival and the ability to safely complete close proximity maneuvers. In this work we develop an adaptive integrated guidance, navigation, and control system that can complete these maneuvers in environments with unknown dynamics, with initial conditions spanning a large deployment region, and without a shape model of the asteroid. The system is implemented as a policy optimized using reinforcement meta-learning. The spacecraft is equipped with an optical seeker that locks to either a terrain feature, back-scattered light from a targeting laser, or an active beacon, and the policy maps observations consisting of seeker angles and LIDAR range readings directly to engine thrust commands. The policy implements a recurrent network layer that allows the deployed policy to adapt real time to both environmental forces acting on the agent and internal disturbances such as actuator failure and center of mass variation. We validate the guidance system through simulated landing maneuvers in a six degrees-of-freedom simulator. The simulator randomizes the asteroid's characteristics such as solar radiation pressure, density, spin rate, and nutation angle, requiring the guidance and control system to adapt to the environment. We also demonstrate robustness to actuator failure, sensor bias, and changes in the spacecraft's center of mass and inertia tensor. Finally, we suggest a concept of operations for asteroid close proximity maneuvers that is compatible with the guidance system.
△ Less
Submitted 13 July, 2019;
originally announced July 2019.
-
Reinforcement Learning for Angle-Only Intercept Guidance of Maneuvering Targets
Authors:
Brian Gaudet,
Roberto Furfaro,
Richard Linares
Abstract:
We present a novel guidance law that uses observations consisting solely of seeker line of sight angle measurements and their rate of change. The policy is optimized using reinforcement meta-learning and demonstrated in a simulated terminal phase of a mid-course exo-atmospheric interception. Importantly, the guidance law does not require range estimation, making it particularly suitable for passiv…
▽ More
We present a novel guidance law that uses observations consisting solely of seeker line of sight angle measurements and their rate of change. The policy is optimized using reinforcement meta-learning and demonstrated in a simulated terminal phase of a mid-course exo-atmospheric interception. Importantly, the guidance law does not require range estimation, making it particularly suitable for passive seekers. The optimized policy maps stabilized seeker line of sight angles and their rate of change directly to commanded thrust for the missile's divert thrusters. The use of reinforcement meta-learning allows the optimized policy to adapt to target acceleration, and we demonstrate that the policy performs as well as augmented zero-effort miss guidance with perfect target acceleration knowledge. The optimized policy is computationally efficient and requires minimal memory, and should be compatible with today's flight processors.
△ Less
Submitted 15 November, 2019; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Adaptive Guidance and Integrated Navigation with Reinforcement Meta-Learning
Authors:
Brian Gaudet,
Richard Linares,
Roberto Furfaro
Abstract:
This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a r…
▽ More
This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four challenging environments with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a RL meta-learning optimized policy to implement a guidance law using observations consisting of only Doppler radar altimeter readings in a Mars landing environment, and LIDAR altimeter readings in an asteroid landing environment, thus integrating guidance and navigation.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories
Authors:
Brian Gaudet,
Richard Linares,
Roberto Furfaro
Abstract:
We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training…
▽ More
We present a novel predictive model architecture based on the principles of predictive coding that enables open loop prediction of future observations over extended horizons. There are two key innovations. First, whereas current methods typically learn to make long-horizon open-loop predictions using a multi-step cost function, we instead run the model open loop in the forward pass during training. Second, current predictive coding models initialize the representation layer's hidden state to a constant value at the start of an episode, and consequently typically require multiple steps of interaction with the environment before the model begins to produce accurate predictions. Instead, we learn a mapping from the first observation in an episode to the hidden state, allowing the trained model to immediately produce accurate predictions. We compare the performance of our architecture to a standard predictive coding model and demonstrate the ability of the model to make accurate long horizon open-loop predictions of simulated Doppler radar altimeter readings during a six degree of freedom Mars landing. Finally, we demonstrate a 2X reduction in sample complexity by using the model to implement a Dyna style algorithm to accelerate policy learning with proximal policy optimization.
△ Less
Submitted 12 January, 2019;
originally announced January 2019.
-
Deep Reinforcement Learning for Six Degree-of-Freedom Planetary Powered Descent and Landing
Authors:
Brian Gaudet,
Richard Linares,
Roberto Furfaro
Abstract:
Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse $<$ 5 m radius). The latter requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to…
▽ More
Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse $<$ 5 m radius). The latter requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander's estimated state directly to a commanded thrust for each engine, with the policy resulting in accurate and fuel-efficient trajectories. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system's performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.