Search | arXiv e-print repository

Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards

Authors: Ahmad Ahmad, Mehdi Kermanshah, Kevin Leahy, Zachary Serlin, Ho Chit Siu, Makai Mann, Cristian-Ioan Vasile, Roberto Tron, Calin Belta

Abstract: In this paper, we tackle the challenging problem of delayed rewards in reinforcement learning (RL). While Proximal Policy Optimization (PPO) has emerged as a leading Policy Gradient method, its performance can degrade under delayed rewards. We introduce two key enhancements to PPO: a hybrid policy architecture that combines an offline policy (trained on expert demonstrations) with an online PPO po… ▽ More In this paper, we tackle the challenging problem of delayed rewards in reinforcement learning (RL). While Proximal Policy Optimization (PPO) has emerged as a leading Policy Gradient method, its performance can degrade under delayed rewards. We introduce two key enhancements to PPO: a hybrid policy architecture that combines an offline policy (trained on expert demonstrations) with an online PPO policy, and a reward shaping mechanism using Time Window Temporal Logic (TWTL). The hybrid architecture leverages offline data throughout training while maintaining PPO's theoretical guarantees. Building on the monotonic improvement framework of Trust Region Policy Optimization (TRPO), we prove that our approach ensures improvement over both the offline policy and previous iterations, with a bounded performance gap of $(2ςγα^2)/(1-γ)^2$, where $α$ is the mixing parameter, $γ$ is the discount factor, and $ς$ bounds the expected advantage. Additionally, we prove that our TWTL-based reward shaping preserves the optimal policy of the original problem. TWTL enables formal translation of temporal objectives into immediate feedback signals that guide learning. We demonstrate the effectiveness of our approach through extensive experiments on an inverted pendulum and a lunar lander environments, showing improvements in both learning speed and final performance compared to standard PPO and offline-only approaches. △ Less

Submitted 4 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

arXiv:2406.02722 [pdf, other]

Model Predictive Control for Magnetically-Actuated Cellbots

Authors: Mehdi Kermanshah, Logan E. Beaver, Max Sokolich, Fatma Ceren Kirmizitas, Sambeeta Das, Roberto Tron, Ron Weiss, Calin Belta

Abstract: This paper presents a control framework for magnetically actuated cellbots, which combines Model Predictive Control (MPC) with Gaussian Processes (GPs) as a disturbance estimator for precise trajectory tracking. To address the challenges posed by unmodeled dynamics, we integrate data-driven modeling with model-based control to accurately track desired trajectories using relatively small data. To t… ▽ More This paper presents a control framework for magnetically actuated cellbots, which combines Model Predictive Control (MPC) with Gaussian Processes (GPs) as a disturbance estimator for precise trajectory tracking. To address the challenges posed by unmodeled dynamics, we integrate data-driven modeling with model-based control to accurately track desired trajectories using relatively small data. To the best of our knowledge, this is the first work to integrate data-driven modeling with model-based control for the magnetic actuation of cellbots. The GP effectively learns and predicts unmodeled disturbances, providing uncertainty bounds as well. We validate our method through experiments with cellbots, demonstrating improved trajectory tracking accuracy. △ Less

Submitted 26 September, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2403.14519 [pdf, other]

Designing Robust Linear Output Feedback Controller based on CLF-CBF framework via Linear~Programming(LP-CLF-CBF)

Authors: Mahroo Bahreinian, Mehdi Kermanshah, Roberto Tron

Abstract: We consider the problem of designing output feedback controllers that use measurements from a set of landmarks to navigate through a cell-decomposable environment using duality, Control Lyapunov and Barrier Functions (CLF, CBF), and Linear Programming. We propose two objectives for navigating in an environment, one to traverse the environment by making loops and one by converging to a stabilizatio… ▽ More We consider the problem of designing output feedback controllers that use measurements from a set of landmarks to navigate through a cell-decomposable environment using duality, Control Lyapunov and Barrier Functions (CLF, CBF), and Linear Programming. We propose two objectives for navigating in an environment, one to traverse the environment by making loops and one by converging to a stabilization point while smoothing the transition between consecutive cells. We test our algorithms in a simulation environment, evaluating the robustness of the approach to practical conditions, such as bearing-only measurements, and measurements acquired with a camera with a limited field of view. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2203.04416

arXiv:2310.08413 [pdf, other]

Control-Based Planning over Probability Mass Function Measurements via Robust Linear Programming

Authors: Mehdi Kermanshah, Calin Belta, Roberto Tron

Abstract: We propose an approach to synthesize linear feedback controllers for linear systems in polygonal environments. Our method focuses on designing a robust controller that can account for uncertainty in measurements. Its inputs are provided by a perception module that generates probability mass functions (PMFs) for predefined landmarks in the environment, such as distinguishable geometric features. We… ▽ More We propose an approach to synthesize linear feedback controllers for linear systems in polygonal environments. Our method focuses on designing a robust controller that can account for uncertainty in measurements. Its inputs are provided by a perception module that generates probability mass functions (PMFs) for predefined landmarks in the environment, such as distinguishable geometric features. We formulate an optimization problem with Control Lyapunov Function (CLF) and Control Barrier Function (CBF) constraints to derive a stable and safe controller. Using the strong duality of linear programs (LPs) and robust optimization, we convert the optimization problem to a linear program that can be efficiently solved offline. At a high level, our approach partially combines perception, planning, and real-time control into a single design problem. An additional advantage of our method is the ability to produce controllers capable of exhibiting nonlinear behavior while relying solely on an offline LP for control synthesis. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Showing 1–4 of 4 results for author: Kermanshah, M