-
Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic
Authors:
Bassel El Mabsout,
Abdelrahman AbdelGawad,
Renato Mancuso
Abstract:
Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquito…
▽ More
Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formula representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500\% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of non-linear utility scalarization design, specifically for continuous control problems.
△ Less
Submitted 22 March, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Survey on safe robot control via learning
Authors:
Bassel El Mabsout
Abstract:
Control systems are critical to modern technological infrastructure, spanning industries from aerospace to healthcare. This survey explores the landscape of safe robot learning, investigating methods that balance high-performance control with rigorous safety constraints. By examining classical control techniques, learning-based approaches, and embedded system design, the research seeks to understa…
▽ More
Control systems are critical to modern technological infrastructure, spanning industries from aerospace to healthcare. This survey explores the landscape of safe robot learning, investigating methods that balance high-performance control with rigorous safety constraints. By examining classical control techniques, learning-based approaches, and embedded system design, the research seeks to understand how robotic systems can be developed to prevent hazardous states while maintaining optimal performance across complex operational environments.
△ Less
Submitted 16 December, 2024;
originally announced January 2025.
-
Scrap Your Schedules with PopDescent
Authors:
Abhinav Pomalapally,
Bassel El Mabsout,
Renato Mansuco
Abstract:
In contemporary machine learning workloads, numerous hyper-parameter search algorithms are frequently utilized to efficiently discover high-performing hyper-parameter values, such as learning and regularization rates. As a result, a range of parameter schedules have been designed to leverage the capability of adjusting hyper-parameters during training to enhance loss performance. These schedules,…
▽ More
In contemporary machine learning workloads, numerous hyper-parameter search algorithms are frequently utilized to efficiently discover high-performing hyper-parameter values, such as learning and regularization rates. As a result, a range of parameter schedules have been designed to leverage the capability of adjusting hyper-parameters during training to enhance loss performance. These schedules, however, introduce new hyper-parameters to be searched and do not account for the current loss values of the models being trained.
To address these issues, we propose Population Descent (PopDescent), a progress-aware hyper-parameter tuning technique that employs a memetic, population-based search. By merging evolutionary and local search processes, PopDescent proactively explores hyper-parameter options during training based on their performance. Our trials on standard machine learning vision tasks show that PopDescent converges faster than existing search methods, finding model parameters with test-loss values up to 18% lower, even when considering the use of schedules. Moreover, we highlight the robustness of PopDescent to its initial training parameters, a crucial characteristic for hyper-parameter search techniques.
△ Less
Submitted 24 April, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Sim-Anchored Learning for On-the-Fly Adaptation
Authors:
Bassel El Mabsout,
Shahin Roozkhosh,
Siddharth Mysore,
Kate Saenko,
Renato Mancuso
Abstract:
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepr…
▽ More
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepresented scenarios. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality. Our approach leverages critics from simulation as "anchors for design intent" (anchor critics). By jointly optimizing policies against both anchor critics and critics trained on real-world experience, our method enables adaptation while preserving prioritized behaviors from simulation. Evaluations demonstrate robust behavior retention in sim-to-sim benchmarks and a sim-to-real scenario with a racing quadrotor, allowing for power consumption reductions of up to 50% without control loss. We also contribute SwaNNFlight, an open-source firmware for enabling live adaptation on similar robotic platforms.
△ Less
Submitted 1 May, 2025; v1 submitted 17 January, 2023;
originally announced January 2023.