-
Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems
Authors:
Nathaniel Hamilton,
Kyle Dunlap,
Kerianne L Hobbs
Abstract:
Recent research using Reinforcement Learning (RL) to learn autonomous control for spacecraft operations has shown great success. However, a recent study showed their performance could be improved by changing the action space, i.e. control outputs, used in the learning environment. This has opened the door for finding more improvements through further changes to the environment. The work in this pa…
▽ More
Recent research using Reinforcement Learning (RL) to learn autonomous control for spacecraft operations has shown great success. However, a recent study showed their performance could be improved by changing the action space, i.e. control outputs, used in the learning environment. This has opened the door for finding more improvements through further changes to the environment. The work in this paper focuses on how changes to the environment's observation space can impact the training and performance of RL agents learning the spacecraft inspection task. The studies are split into two groups. The first looks at the impact of sensors that were designed to help agents learn the task. The second looks at the impact of reference frames, reorienting the agent to see the world from a different perspective. The results show the sensors are not necessary, but most of them help agents learn more optimal behavior, and that the reference frame does not have a large impact, but is best kept consistent.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
The Safe Trusted Autonomy for Responsible Space Program
Authors:
Kerianne L. Hobbs,
Sean Phillips,
Michelle Simon,
Joseph B. Lyons,
Jared Culbertson,
Hamilton Scott Clouse,
Nathaniel Hamilton,
Kyle Dunlap,
Zachary S. Lippay,
Joshua Aurand,
Zachary I. Bell,
Taleri Hammack,
Dorothy Ayres,
Rizza Lim
Abstract:
The Safe Trusted Autonomy for Responsible Space (STARS) program aims to advance autonomy technologies for space by leveraging machine learning technologies while mitigating barriers to trust, such as uncertainty, opaqueness, brittleness, and inflexibility. This paper presents the achievements and lessons learned from the STARS program in integrating reinforcement learning-based multi-satellite con…
▽ More
The Safe Trusted Autonomy for Responsible Space (STARS) program aims to advance autonomy technologies for space by leveraging machine learning technologies while mitigating barriers to trust, such as uncertainty, opaqueness, brittleness, and inflexibility. This paper presents the achievements and lessons learned from the STARS program in integrating reinforcement learning-based multi-satellite control, run time assurance approaches, and flexible human-autonomy teaming interfaces, into a new integrated testing environment for collaborative autonomous satellite systems. The primary results describe analysis of the reinforcement learning multi-satellite control and run time assurance algorithms. These algorithms are integrated into a prototype human-autonomy interface using best practices from human-autonomy trust literature, however detailed analysis of the effectiveness is left to future work. References are provided with additional detailed results of individual experiments.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Deep Reinforcement Learning for Scalable Multiagent Spacecraft Inspection
Authors:
Kyle Dunlap,
Nathaniel Hamilton,
Kerianne L. Hobbs
Abstract:
As the number of spacecraft in orbit continues to increase, it is becoming more challenging for human operators to manage each mission. As a result, autonomous control methods are needed to reduce this burden on operators. One method of autonomous control is Reinforcement Learning (RL), which has proven to have great success across a variety of complex tasks. For missions with multiple controlled…
▽ More
As the number of spacecraft in orbit continues to increase, it is becoming more challenging for human operators to manage each mission. As a result, autonomous control methods are needed to reduce this burden on operators. One method of autonomous control is Reinforcement Learning (RL), which has proven to have great success across a variety of complex tasks. For missions with multiple controlled spacecraft, or agents, it is critical for the agents to communicate and have knowledge of each other, where this information is typically given to the Neural Network Controller (NNC) as an input observation. As the number of spacecraft used for the mission increases or decreases, rather than modifying the size of the observation, this paper develops a scalable observation space that uses a constant observation size to give information on all of the other agents. This approach is similar to a lidar sensor, where determines ranges of other objects in the environment. This observation space is applied to a spacecraft inspection task, where RL is used to train multiple deputy spacecraft to cooperate and inspect a passive chief spacecraft. It is expected that the scalable observation space will allow the agents to learn to complete the task more efficiently compared to a baseline solution where no information is communicated between agents.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Run Time Assured Reinforcement Learning for Six Degree-of-Freedom Spacecraft Inspection
Authors:
Kyle Dunlap,
Kochise Bennett,
David van Wijk,
Nathaniel Hamilton,
Kerianne Hobbs
Abstract:
The trial and error approach of reinforcement learning (RL) results in high performance across many complex tasks, but it can also lead to unsafe behavior. Run time assurance (RTA) approaches can be used to assure safety of the agent during training, allowing it to safely explore the environment. This paper investigates the application of RTA during RL training for a 6-Degree-of-Freedom spacecraft…
▽ More
The trial and error approach of reinforcement learning (RL) results in high performance across many complex tasks, but it can also lead to unsafe behavior. Run time assurance (RTA) approaches can be used to assure safety of the agent during training, allowing it to safely explore the environment. This paper investigates the application of RTA during RL training for a 6-Degree-of-Freedom spacecraft inspection task, where the agent must control its translational motion and attitude to inspect a passive chief spacecraft. Several safety constraints are developed based on position, velocity, attitude, temperature, and power of the spacecraft, and are all enforced simultaneously during training through the use of control barrier functions. This paper also explores simulating the RL agent and RTA at different frequencies to best balance training performance and safety assurance. The agent is trained with and without RTA, and the performance is compared across several metrics including inspection percentage and fuel usage.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls
Authors:
Nathaniel Hamilton,
Kyle Dunlap,
Kerianne L. Hobbs
Abstract:
For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising perf…
▽ More
For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies
Authors:
Kyle Dunlap,
Nathaniel Hamilton,
Francisco Viramontes,
Derrek Landauer,
Evan Kain,
Kerianne L. Hobbs
Abstract:
As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured…
▽ More
As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured in real time. However, in order to use these algorithms on board a spacecraft, they must be able to run in real time on space grade processors, which are typically outdated and less capable than state-of-the-art equipment. In this paper, multiple RL-trained neural network controllers (NNCs) and RTA algorithms were tested on commercial-off-the-shelf (COTS) and radiation tolerant processors. The results show that all NNCs and most RTA algorithms can compute optimal and safe actions in well under 1 second with room for further optimization before deploying in the real world.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Demonstrating Reinforcement Learning and Run Time Assurance for Spacecraft Inspection Using Unmanned Aerial Vehicles
Authors:
Kyle Dunlap,
Nathaniel Hamilton,
Zachary Lippay,
Matthew Shubert,
Sean Phillips,
Kerianne L. Hobbs
Abstract:
On-orbit spacecraft inspection is an important capability for enabling servicing and manufacturing missions and extending the life of spacecraft. However, as space operations become increasingly more common and complex, autonomous control methods are needed to reduce the burden on operators to individually monitor each mission. In order for autonomous control methods to be used in space, they must…
▽ More
On-orbit spacecraft inspection is an important capability for enabling servicing and manufacturing missions and extending the life of spacecraft. However, as space operations become increasingly more common and complex, autonomous control methods are needed to reduce the burden on operators to individually monitor each mission. In order for autonomous control methods to be used in space, they must exhibit safe behavior that demonstrates robustness to real world disturbances and uncertainty. In this paper, neural network controllers (NNCs) trained with reinforcement learning are used to solve an inspection task, which is a foundational capability for servicing missions. Run time assurance (RTA) is used to assure safety of the NNC in real time, enforcing several different constraints on position and velocity. The NNC and RTA are tested in the real world using unmanned aerial vehicles designed to emulate spacecraft dynamics. The results show this emulation is a useful demonstration of the capability of the NNC and RTA, and the algorithms demonstrate robustness to real world disturbances.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Collision Avoidance and Geofencing for Fixed-wing Aircraft with Control Barrier Functions
Authors:
Tamas G. Molnar,
Suresh K. Kannan,
James Cunningham,
Kyle Dunlap,
Kerianne L. Hobbs,
Aaron D. Ames
Abstract:
Safety-critical failures often have fatal consequences in aerospace control. Control systems on aircraft, therefore, must ensure the strict satisfaction of safety constraints, preferably with formal guarantees of safe behavior. This paper establishes the safety-critical control of fixed-wing aircraft in collision avoidance and geofencing tasks. A control framework is developed wherein a run-time a…
▽ More
Safety-critical failures often have fatal consequences in aerospace control. Control systems on aircraft, therefore, must ensure the strict satisfaction of safety constraints, preferably with formal guarantees of safe behavior. This paper establishes the safety-critical control of fixed-wing aircraft in collision avoidance and geofencing tasks. A control framework is developed wherein a run-time assurance (RTA) system modulates the nominal flight controller of the aircraft whenever necessary to prevent it from colliding with other aircraft or crossing a boundary (geofence) in space. The RTA is formulated as a safety filter using control barrier functions (CBFs) with formal guarantees of safe behavior. CBFs are constructed and compared for a nonlinear kinematic fixed-wing aircraft model. The proposed CBF-based controllers showcase the capability of safely executing simultaneous collision avoidance and geofencing, as demonstrated by simulations on the kinematic model and a high-fidelity dynamical model.
△ Less
Submitted 27 January, 2025; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Run Time Assurance for Simultaneous Constraint Satisfaction During Spacecraft Attitude Maneuvering
Authors:
Cassie-Kay McQuinn,
Kyle Dunlap,
Nathaniel Hamilton,
Jabari Wilson,
Kerianne L. Hobbs
Abstract:
A fundamental capability for On-orbit Servicing, Assembly, and Manufacturing (OSAM) is inspection of the vehicle to be serviced, or the structure being assembled. This research assumes autonomous slewing to maintain situational awareness of multiple vehicles operating in close proximity where several safety constraints must be satisfied. A variety of techniques may be used as the primary controlle…
▽ More
A fundamental capability for On-orbit Servicing, Assembly, and Manufacturing (OSAM) is inspection of the vehicle to be serviced, or the structure being assembled. This research assumes autonomous slewing to maintain situational awareness of multiple vehicles operating in close proximity where several safety constraints must be satisfied. A variety of techniques may be used as the primary controller. The focus of this research is developing Run Time Assurance (RTA) filters that monitor system behavior and the output of the primary controller to enforce safety constraint satisfaction. Specifically, this research explores combining a subset of the constraints into an Active Set Invariance Filter (ASIF) RTA defined using control barrier functions. This method is minimally invasive to the primary control by minimizing deviation from the desired control output of the primary controller, while simultaneously enforcing all safety constraints. The RTA is designed to ensure the spacecraft maintains attitude requirements for communication and data transfer with a ground station during scheduled communication windows, adheres to conical attitude keep out zones, limits thermally unfavorable attitude duration, maintains attitude requirements for sufficient power generation, ensures maneuvers are below threshold to cause structural damage, ensures maximum angular velocity is below limits to maintain ability to respond quickly to new slewing commands, and conserves actuator use to prevent wear when possible. Slack variables are introduced into the ASIF controller to prioritize safety constraints when a solution to all safety constraints is infeasible. Monte Carlo simulation results as well as plots of example cases are shown and evaluated for a three degree of freedom spacecraft with reaction wheel attitude control.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Deep Reinforcement Learning for Autonomous Spacecraft Inspection using Illumination
Authors:
David van Wijk,
Kyle Dunlap,
Manoranjan Majji,
Kerianne L. Hobbs
Abstract:
This paper investigates the problem of on-orbit spacecraft inspection using a single free-flying deputy spacecraft, equipped with an optical sensor, whose controller is a neural network control system trained with Reinforcement Learning (RL). This work considers the illumination of the inspected spacecraft (chief) by the Sun in order to incentivize acquisition of well-illuminated optical data. The…
▽ More
This paper investigates the problem of on-orbit spacecraft inspection using a single free-flying deputy spacecraft, equipped with an optical sensor, whose controller is a neural network control system trained with Reinforcement Learning (RL). This work considers the illumination of the inspected spacecraft (chief) by the Sun in order to incentivize acquisition of well-illuminated optical data. The agent's performance is evaluated through statistically efficient metrics. Results demonstrate that the RL agent is able to inspect all points on the chief successfully, while maximizing illumination on inspected points in a simulated environment, using only low-level actions. Due to the stochastic nature of RL, 10 policies were trained using 10 random seeds to obtain a more holistic measure of agent performance. Over these 10 seeds, the interquartile mean (IQM) percentage of inspected points for the finalized model was 98.82%.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Run Time Assurance for Autonomous Spacecraft Inspection
Authors:
Kyle Dunlap,
David van Wijk,
Kerianne L. Hobbs
Abstract:
As autonomous systems become more prevalent in the real world, it is critical to ensure they operate safely. One approach is the use of Run Time Assurance (RTA), which is a real-time safety assurance technique that monitors a primary controller and intervenes to assure safety when necessary. As these autonomous systems become more complex, RTA is useful because it can be designed completely indepe…
▽ More
As autonomous systems become more prevalent in the real world, it is critical to ensure they operate safely. One approach is the use of Run Time Assurance (RTA), which is a real-time safety assurance technique that monitors a primary controller and intervenes to assure safety when necessary. As these autonomous systems become more complex, RTA is useful because it can be designed completely independent of the primary controller. This paper develops several translational motion safety constraints for a multi-agent autonomous spacecraft inspection problem, where all of these constraints can be enforced with RTA. A comparison is made between centralized and decentralized control, where simulations of the inspection problem then demonstrate that RTA can assure safety of all constraints. Monte Carlo analysis is then used to show that no scenarios were found where the centralized RTA cannot assure safety. While some scenarios were found where decentralized RTA cannot assure safety, solutions are discussed to mitigate these failures.
△ Less
Submitted 7 August, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
A Universal Framework for Generalized Run Time Assurance with JAX Automatic Differentiation
Authors:
Umberto Ravaioli,
Kyle Dunlap,
Kerianne Hobbs
Abstract:
With the rise of increasingly complex autonomous systems powered by black box AI models, there is a growing need for Run Time Assurance (RTA) systems that provide online safety filtering to untrusted primary controller output. Currently, research in RTA tends to be ad hoc and inflexible, diminishing collaboration and the pace of innovation. The Safe Autonomy Run Time Assurance Framework presented…
▽ More
With the rise of increasingly complex autonomous systems powered by black box AI models, there is a growing need for Run Time Assurance (RTA) systems that provide online safety filtering to untrusted primary controller output. Currently, research in RTA tends to be ad hoc and inflexible, diminishing collaboration and the pace of innovation. The Safe Autonomy Run Time Assurance Framework presented in this paper provides a standardized interface for RTA modules and a set of universal implementations of constraint-based RTA capable of providing safety assurance given arbitrary dynamical systems and constraints. Built around JAX, this framework leverages automatic differentiation to populate advanced optimization based RTA methods minimizing user effort and error. To validate the feasibility of this framework, a simulation of a multi-agent spacecraft inspection problem is shown with safety constraints on position and velocity.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Systems Theoretic Process Analysis of a Run Time Assured Neural Network Control System
Authors:
Kerianne L. Hobbs,
Benjamin K. Heiner,
Lillian Busse,
Kyle Dunlap,
Jonathan Rowanhill,
Ashlie B. Hocking,
Aditya Zutshi
Abstract:
This research considers the problem of identifying safety constraints and developing Run Time Assurance (RTA) for Deep Reinforcement Learning (RL) Tactical Autopilots that use neural network control systems (NNCS). This research studies a specific use case of an NNCS performing autonomous formation flight while an RTA system provides collision avoidance and geofence assurances. First, Systems Theo…
▽ More
This research considers the problem of identifying safety constraints and developing Run Time Assurance (RTA) for Deep Reinforcement Learning (RL) Tactical Autopilots that use neural network control systems (NNCS). This research studies a specific use case of an NNCS performing autonomous formation flight while an RTA system provides collision avoidance and geofence assurances. First, Systems Theoretic Accident Models and Processes (STAMP) is applied to identify accidents, hazards, and safety constraints as well as define a functional control system block diagram of the ground station, manned flight lead, and surrogate unmanned wingman. Then, Systems Theoretic Process Analysis (STPA) is applied to the interactions of the the ground station, manned flight lead, surrogate unmanned wingman, and internal elements of the wingman aircraft to identify unsafe control actions, scenarios leading to each, and safety requirements to mitigate risks. This research is the first application of STAMP and STPA to an NNCS bounded by RTA.
△ Less
Submitted 9 November, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents
Authors:
Nathaniel Hamilton,
Kyle Dunlap,
Taylor T Johnson,
Kerianne L Hobbs
Abstract:
Reinforcement Learning (RL) has become an increasingly important research area as the success of machine learning algorithms and methods grows. To combat the safety concerns surrounding the freedom given to RL agents while training, there has been an increase in work concerning Safe Reinforcement Learning (SRL). However, these new and safe methods have been held to less scrutiny than their unsafe…
▽ More
Reinforcement Learning (RL) has become an increasingly important research area as the success of machine learning algorithms and methods grows. To combat the safety concerns surrounding the freedom given to RL agents while training, there has been an increase in work concerning Safe Reinforcement Learning (SRL). However, these new and safe methods have been held to less scrutiny than their unsafe counterparts. For instance, comparisons among safe methods often lack fair evaluation across similar initial condition bounds and hyperparameter settings, use poor evaluation metrics, and cherry-pick the best training runs rather than averaging over multiple random seeds. In this work, we conduct an ablation study using evaluation best practices to investigate the impact of run time assurance (RTA), which monitors the system state and intervenes to assure safety, on effective learning. By studying multiple RTA approaches in both on-policy and off-policy RL algorithms, we seek to understand which RTA methods are most effective, whether the agents become dependent on the RTA, and the importance of reward shaping versus safe exploration in RL agent training. Our conclusions shed light on the most promising directions of SRL, and our evaluation methodology lays the groundwork for creating better comparisons in future SRL work.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach
Authors:
Nathan A. Chi,
Peter Washington,
Aaron Kline,
Arman Husic,
Cathy Hou,
Chloe He,
Kaitlyn Dunlap,
Dennis Wall
Abstract:
Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process, significant attention has been given to developing systems that automatically screen for autism. Pr…
▽ More
Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process, significant attention has been given to developing systems that automatically screen for autism. Prosody abnormalities are among the clearest signs of autism, with affected children displaying speech idiosyncrasies including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. In this work, we present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments. We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients); second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0--a state-of-the-art Transformer-based ASR model. We train our classifiers on our novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game, an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. The Random Forest classifier achieves 70% accuracy, the fine-tuned wav2vec 2.0 model achieves 77% accuracy, and the CNN achieves 79% accuracy when classifying children's audio as either ASD or NT. Our models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality, which may be more generalizable to real world conditions. These results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Comparing Run Time Assurance Approaches for Safe Spacecraft Docking
Authors:
Kyle Dunlap,
Michael Hibbard,
Mark Mote,
Kerianne Hobbs
Abstract:
Run Time Assurance (RTA) systems are online safety verification techniques that filter the output of a primary controller to assure safety. RTA approaches are used in safety-critical control to intervene when a performance-driven primary controller would cause the system to violate safety constraints. This paper presents four categories of RTA approaches based on their membership to explicit or im…
▽ More
Run Time Assurance (RTA) systems are online safety verification techniques that filter the output of a primary controller to assure safety. RTA approaches are used in safety-critical control to intervene when a performance-driven primary controller would cause the system to violate safety constraints. This paper presents four categories of RTA approaches based on their membership to explicit or implicit monitoring and switching or optimization interventions. To validate the feasibility of each approach and compare computation time, four RTAs are defined for a three-dimensional spacecraft docking example with safety constraints on velocity.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.