-
Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles
Authors:
Jonas Kiemel,
Ludovic Righetti,
Torsten Kröger,
Tamim Asfour
Abstract:
In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer…
▽ More
In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Jerk-limited Traversal of One-dimensional Paths and its Application to Multi-dimensional Path Tracking
Authors:
Jonas C. Kiemel,
Torsten Kröger
Abstract:
In this paper, we present an iterative method to quickly traverse multi-dimensional paths considering jerk constraints. As a first step, we analyze the traversal of each individual path dimension. We derive a range of feasible target accelerations for each intermediate waypoint of a one-dimensional path using a binary search algorithm. Computing a trajectory from waypoint to waypoint leads to the…
▽ More
In this paper, we present an iterative method to quickly traverse multi-dimensional paths considering jerk constraints. As a first step, we analyze the traversal of each individual path dimension. We derive a range of feasible target accelerations for each intermediate waypoint of a one-dimensional path using a binary search algorithm. Computing a trajectory from waypoint to waypoint leads to the fastest progress on the path when selecting the highest feasible target acceleration. Similarly, it is possible to calculate a trajectory that leads to minimum progress along the path. This insight allows us to control the traversal of a one-dimensional path in such a way that a reference path length of a multi-dimensional path is approximately tracked over time. In order to improve the tracking accuracy, we propose an iterative scheme to adjust the temporal course of the selected reference path length. More precisely, the temporal region causing the largest position deviation is identified and updated at each iteration. In our evaluation, we thoroughly analyze the performance of our method using seven-dimensional reference paths with different path characteristics. We show that our method manages to quickly traverse the reference paths and compare the required traversing time and the resulting path accuracy with other state-of-the-art approaches.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Learning Time-optimized Path Tracking with or without Sensory Feedback
Authors:
Jonas C. Kiemel,
Torsten Kröger
Abstract:
In this paper, we present a learning-based approach that allows a robot to quickly follow a reference path defined in joint space without exceeding limits on the position, velocity, acceleration and jerk of each robot joint. Contrary to offline methods for time-optimal path parameterization, the reference path can be changed during motion execution. In addition, our approach can utilize sensory fe…
▽ More
In this paper, we present a learning-based approach that allows a robot to quickly follow a reference path defined in joint space without exceeding limits on the position, velocity, acceleration and jerk of each robot joint. Contrary to offline methods for time-optimal path parameterization, the reference path can be changed during motion execution. In addition, our approach can utilize sensory feedback, for instance, to follow a reference path with a bipedal robot without losing balance. With our method, the robot is controlled by a neural network that is trained via reinforcement learning using data generated by a physics simulator. From a mathematical perspective, the problem of tracking a reference path in a time-optimized manner is formalized as a Markov decision process. Each state includes a fixed number of waypoints specifying the next part of the reference path. The action space is designed in such a way that all resulting motions comply with the specified kinematic joint limits. The reward function finally reflects the trade-off between the execution time, the deviation from the desired reference path and optional additional objectives like balancing. We evaluate our approach with and without additional objectives and show that time-optimized path tracking can be successfully learned for both industrial and humanoid robots. In addition, we demonstrate that networks trained in simulation can be successfully transferred to a real robot.
△ Less
Submitted 20 October, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Learning Collision-free and Torque-limited Robot Trajectories based on Alternative Safe Behaviors
Authors:
Jonas C. Kiemel,
Torsten Kröger
Abstract:
This paper presents an approach for learning online generation of collision-free and torque-limited robot trajectories. In order to generate future motions, a neural network is periodically invoked. Based on the current kinematic state of the robot and the network output, a trajectory for the current time interval can be calculated. The main idea of our paper is to execute the computed motion only…
▽ More
This paper presents an approach for learning online generation of collision-free and torque-limited robot trajectories. In order to generate future motions, a neural network is periodically invoked. Based on the current kinematic state of the robot and the network output, a trajectory for the current time interval can be calculated. The main idea of our paper is to execute the computed motion only if a collision-free and torque-limited way to continue the trajectory is known. In practice, the motion computed for the current time interval is extended by a braking trajectory and simulated using a physics engine. If the simulated trajectory complies with all safety constraints, the computed motion is carried out. Otherwise, the braking trajectory calculated in the previous time interval serves as an alternative safe behavior. Given a task-specific reward function, the neural network is trained using reinforcement learning. The design of the action space used for reinforcement learning ensures that all computed trajectories comply with kinematic joint limits. For our evaluation, simulated humanoid robots and industrial robots are trained to reach as many randomly placed target points as possible. We show that our method reliably prevents collisions with static obstacles and collisions between the robot arms, while generating motions that respect both torque limits and kinematic joint limits. Experiments with a real robot demonstrate that safe trajectories can be generated in real-time.
△ Less
Submitted 20 October, 2022; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Learning Robot Trajectories subject to Kinematic Joint Constraints
Authors:
Jonas C. Kiemel,
Torsten Kröger
Abstract:
We present an approach to learn fast and dynamic robot motions without exceeding limits on the position $θ$, velocity $\dotθ$, acceleration $\ddotθ$ and jerk $\dddotθ$ of each robot joint. Movements are generated by mapping the predictions of a neural network to safely executable joint accelerations. The neural network is invoked periodically and trained via reinforcement learning. Our main contri…
▽ More
We present an approach to learn fast and dynamic robot motions without exceeding limits on the position $θ$, velocity $\dotθ$, acceleration $\ddotθ$ and jerk $\dddotθ$ of each robot joint. Movements are generated by mapping the predictions of a neural network to safely executable joint accelerations. The neural network is invoked periodically and trained via reinforcement learning. Our main contribution is an analytical procedure for calculating safe joint accelerations, which considers the prediction frequency $f_N$ of the neural network. As a result, the frequency $f_N$ can be freely chosen and treated as a hyperparameter. We show that our approach is preferable to penalizing constraint violations as it provides explicit guarantees and does not distort the desired optimization target. In addition, the influence of the selected prediction frequency on the learning performance and on the computing effort is highlighted by various experiments.
△ Less
Submitted 27 March, 2021; v1 submitted 1 November, 2020;
originally announced November 2020.
-
TrueRMA: Learning Fast and Smooth Robot Trajectories with Recursive Midpoint Adaptations in Cartesian Space
Authors:
Jonas C. Kiemel,
Pascal Meißner,
Torsten Kröger
Abstract:
We present TrueRMA, a data-efficient, model-free method to learn cost-optimized robot trajectories over a wide range of starting points and endpoints. The key idea is to calculate trajectory waypoints in Cartesian space by recursively predicting orthogonal adaptations relative to the midpoints of straight lines. We generate a differentiable path by adding circular blends around the waypoints, calc…
▽ More
We present TrueRMA, a data-efficient, model-free method to learn cost-optimized robot trajectories over a wide range of starting points and endpoints. The key idea is to calculate trajectory waypoints in Cartesian space by recursively predicting orthogonal adaptations relative to the midpoints of straight lines. We generate a differentiable path by adding circular blends around the waypoints, calculate the corresponding joint positions with an inverse kinematics solver and calculate a time-optimal parameterization considering velocity and acceleration limits. During training, the trajectory is executed in a physics simulator and costs are assigned according to a user-specified cost function which is not required to be differentiable. Given a starting point and an endpoint as input, a neural network is trained to predict midpoint adaptations that minimize the cost of the resulting trajectory via reinforcement learning. We successfully train a KUKA iiwa robot to keep a ball on a plate while moving between specified points and compare the performance of TrueRMA against two baselines. The results show that our method requires less training data to learn the task while generating shorter and faster trajectories.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
TrueÆdapt: Learning Smooth Online Trajectory Adaptation with Bounded Jerk, Acceleration and Velocity in Joint Space
Authors:
Jonas C. Kiemel,
Robin Weitemeyer,
Pascal Meißner,
Torsten Kröger
Abstract:
We present TrueÆdapt, a model-free method to learn online adaptations of robot trajectories based on their effects on the environment. Given sensory feedback and future waypoints of the original trajectory, a neural network is trained to predict joint accelerations at regular intervals. The adapted trajectory is generated by linear interpolation of the predicted accelerations, leading to continuou…
▽ More
We present TrueÆdapt, a model-free method to learn online adaptations of robot trajectories based on their effects on the environment. Given sensory feedback and future waypoints of the original trajectory, a neural network is trained to predict joint accelerations at regular intervals. The adapted trajectory is generated by linear interpolation of the predicted accelerations, leading to continuously differentiable joint velocities and positions. Bounded jerks, accelerations and velocities are guaranteed by calculating the range of valid accelerations at each decision step and clipping the network's output accordingly. A deviation penalty during the training process causes the adapted trajectory to follow the original one. Smooth movements are encouraged by penalizing high accelerations and jerks. We evaluate our approach by training a simulated KUKA iiwa robot to balance a ball on a plate while moving and demonstrate that the balancing policy can be directly transferred to a real robot.
△ Less
Submitted 19 October, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.