Search | arXiv e-print repository

arXiv:2506.01980 [pdf, other]

Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance

Authors: Lianhao Yin, Ozanan Meireles, Guy Rosman, Daniela Rus

Abstract: Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS). However, supervised learning approaches require large, annotated datasets that are scarce due to annotation efforts that are prohibitive, e.g., in medical fields. Although self-supervision methods can address such limitations, current self-supervised methods often fail to capture structural and physi… ▽ More Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS). However, supervised learning approaches require large, annotated datasets that are scarce due to annotation efforts that are prohibitive, e.g., in medical fields. Although self-supervision methods can address such limitations, current self-supervised methods often fail to capture structural and physical information in a form that generalizes across tasks. We propose Compress-to-Explore (C2E), a novel self-supervised framework that leverages Kolmogorov complexity to learn compact, informative representations from surgical videos. C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data. Trained on large-scale unlabeled surgical datasets, C2E demonstrates strong generalization across a variety of surgical ML tasks, such as workflow classification, tool-tissue interaction classification, segmentation, and diagnosis tasks, providing improved performance as a surgical visual foundation model. As we further show in the paper, the model's internal compact representation better disentangles features from different structural parts of images. The resulting performance improvements highlight the yet untapped potential of self-supervised learning to enhance surgical AI and improve outcomes in MIS. △ Less

Submitted 16 May, 2025; originally announced June 2025.

arXiv:2505.03841 [pdf, other]

Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions

Authors: Kiwan Wong, Maximilian Stölzle, Wei Xiao, Cosimo Della Santina, Daniela Rus, Gioele Zardini

Abstract: Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements… ▽ More Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety and actively regulate environmental coupling, allowing, for instance, safe object manipulation under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance. △ Less

Submitted 5 May, 2025; originally announced May 2025.

Comments: 8 pages

arXiv:2406.13025 [pdf, other]

ABNet: Attention BarrierNet for Safe and Scalable Robot Learning

Authors: Wei Xiao, Tsun-Hsuan Wang, Daniela Rus

Abstract: Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attentio… ▽ More Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attention BarrierNet (ABNet) that is scalable to build larger foundational safe models in an incremental manner. Each head of BarrierNet in the ABNet could learn safe robot control policies from different features and focus on specific part of the observation. In this way, we do not need to one-shotly construct a large model for complex tasks, which significantly facilitates the training of the model while ensuring its stable output. Most importantly, we can still formally prove the safety guarantees of the ABNet. We demonstrate the strength of ABNet in 2D robot obstacle avoidance, safe robot manipulation, and vision-based end-to-end autonomous driving, with results showing much better robustness and guarantees over existing models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages

arXiv:2401.13441 [pdf, other]

Guiding Soft Robots with Motor-Imagery Brain Signals and Impedance Control

Authors: Maximilian Stölzle, Sonal Santosh Baberwal, Daniela Rus, Shirley Coyle, Cosimo Della Santina

Abstract: Integrating Brain-Machine Interfaces into non-clinical applications like robot motion control remains difficult - despite remarkable advancements in clinical settings. Specifically, EEG-based motor imagery systems are still error-prone, posing safety risks when rigid robots operate near humans. This work presents an alternative pathway towards safe and effective operation by combining wearable EEG… ▽ More Integrating Brain-Machine Interfaces into non-clinical applications like robot motion control remains difficult - despite remarkable advancements in clinical settings. Specifically, EEG-based motor imagery systems are still error-prone, posing safety risks when rigid robots operate near humans. This work presents an alternative pathway towards safe and effective operation by combining wearable EEG with physically embodied safety in soft robots. We introduce and test a pipeline that allows a user to move a soft robot's end effector in real time via brain waves that are measured by as few as three EEG channels. A robust motor imagery algorithm interprets the user's intentions to move the position of a virtual attractor to which the end effector is attracted, thanks to a new Cartesian impedance controller. We specifically focus here on planar soft robot-based architected metamaterials, which require the development of a novel control architecture to deal with the peculiar nonlinearities - e.g., non-affinity in control. We preliminarily but quantitatively evaluate the approach on the task of setpoint regulation. We observe that the user reaches the proximity of the setpoint in 66% of steps and that for successful steps, the average response time is 21.5s. We also demonstrate the execution of simple real-world tasks involving interaction with the environment, which would be extremely hard to perform if it were not for the robot's softness. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 8 pages, presented at 7th IEEE-RAS International Conference on Soft Robotics (2024)

arXiv:2309.04492 [pdf, other]

Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions

Authors: Wei Xiao, Ross Allen, Daniela Rus

Abstract: This paper addresses the problem of safety-critical control for non-affine control systems. It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs). Our recently proposed High Order CBFs (HOCBFs) can accommodate constraints of arbitrary relative degree… ▽ More This paper addresses the problem of safety-critical control for non-affine control systems. It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs). Our recently proposed High Order CBFs (HOCBFs) can accommodate constraints of arbitrary relative degree. The main challenges in this approach are that it requires affine control dynamics and the solution of the CBF-based QP is sub-optimal since it is solved point-wise. To address these challenges, we incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems. The differentiable CBFs are trainable in terms of their parameters, and thus, they can address the conservativeness of CBFs such that the system state will not stay unnecessarily far away from safe set boundaries. Moreover, the imitation learning model is capable of learning complex and optimal control policies that are usually intractable online. We illustrate the effectiveness of the proposed framework on LiDAR-based autonomous driving and compare it with existing methods. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 8 pages, accepted in IEEE CDC 2023

arXiv:2306.00148 [pdf, other]

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models

Authors: Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela Rus

Abstract: Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees, thus making it hard to be applied for safety-critical applications. To address these challenges, we propose a new method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy specifications by using a class of control barrier functions. The key idea of our approach is t… ▽ More Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees, thus making it hard to be applied for safety-critical applications. To address these challenges, we propose a new method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy specifications by using a class of control barrier functions. The key idea of our approach is to embed the proposed finite-time diffusion invariance into the denoising diffusion procedure, which enables trustworthy diffusion data generation. Moreover, we demonstrate that our finite-time diffusion invariance method through generative models not only maintains generalization performance but also creates robustness in safe data generation. We test our method on a series of safe planning tasks, including maze path generation, legged robot locomotion, and 3D space manipulation, with results showing the advantages of robustness and guarantees over vanilla diffusion models. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: 19 pages, website: https://safediffuser.github.io/safediffuser/

arXiv:2304.02733 [pdf, other]

Learning Stability Attention in Vision-based End-to-end Driving Policies

Authors: Tsun-Hsuan Wang, Wei Xiao, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end… ▽ More Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility. We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control, and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: First two authors contributed equally; L4DC 2023

arXiv:2210.04763 [pdf, other]

On the Forward Invariance of Neural ODEs

Authors: Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Yutong Ban, Chuang Gan, Daniela Rus

Abstract: We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system. This setup allows us to achieve output specification guarantees simply by changing the constr… ▽ More We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system. This setup allows us to achieve output specification guarantees simply by changing the constrained parameters/inputs both during training and inference. Moreover, we demonstrate that our invariance set propagation through data-controlled neural ODEs not only maintains generalization performance but also creates an additional degree of robustness by enabling causal manipulation of the system's parameters/inputs. We test our method on a series of representation learning tasks, including modeling physical dynamics and convexity portraits, as well as safe collision avoidance for autonomous vehicles. △ Less

Submitted 31 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 25 pages, accepted in ICML2023, website: https://weixy21.github.io/invariance/

arXiv:2209.12968 [pdf, ps, other]

Intention Communication and Hypothesis Likelihood in Game-Theoretic Motion Planning

Authors: Makram Chahine, Roya Firoozi, Wei Xiao, Mac Schwager, Daniela Rus

Abstract: Game-theoretic motion planners are a potent solution for controlling systems of multiple highly interactive robots. Most existing game-theoretic planners unrealistically assume a priori objective function knowledge is available to all agents. To address this, we propose a fault-tolerant receding horizon game-theoretic motion planner that leverages inter-agent communication with intention hypothesi… ▽ More Game-theoretic motion planners are a potent solution for controlling systems of multiple highly interactive robots. Most existing game-theoretic planners unrealistically assume a priori objective function knowledge is available to all agents. To address this, we propose a fault-tolerant receding horizon game-theoretic motion planner that leverages inter-agent communication with intention hypothesis likelihood. Specifically, robots communicate their objective function incorporating their intentions. A discrete Bayesian filter is designed to infer the objectives in real-time based on the discrepancy between observed trajectories and the ones from communicated intentions. In simulation, we consider three safety-critical autonomous driving scenarios of overtaking, lane-merging and intersection crossing, to demonstrate our planner's ability to capitalize on alternative intention hypotheses to generate safe trajectories in the presence of faulty transmissions in the communication network. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: This work has been submitted to the IEEE for possible publication

ACM Class: I.2.8; I.2.9; I.2.11

arXiv:2205.09117 [pdf, other]

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

Authors: Ryan Sander, Wilko Schwarting, Tim Seyde, Igor Gilitschenski, Sertac Karaman, Daniela Rus

Abstract: Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates tra… ▽ More Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: Accepted to L4DC 2022

arXiv:2203.07978 [pdf, other]

Control Barrier Functions for Systems with Multiple Control Inputs

Authors: Wei Xiao, Christos G. Cassandras, Calin A. Belta, Daniela Rus

Abstract: Control Barrier Functions (CBFs) are becoming popular tools in guaranteeing safety for nonlinear systems and constraints, and they can reduce a constrained optimal control problem into a sequence of Quadratic Programs (QPs) for affine control systems. The recently proposed High Order Control Barrier Functions (HOCBFs) work for arbitrary relative degree constraints. One of the challenges in a HOCBF… ▽ More Control Barrier Functions (CBFs) are becoming popular tools in guaranteeing safety for nonlinear systems and constraints, and they can reduce a constrained optimal control problem into a sequence of Quadratic Programs (QPs) for affine control systems. The recently proposed High Order Control Barrier Functions (HOCBFs) work for arbitrary relative degree constraints. One of the challenges in a HOCBF is to address the relative degree problem when a system has multiple control inputs, i.e., the relative degree could be defined with respect to different components of the control vector. This paper proposes two methods for HOCBFs to deal with systems with multiple control inputs: a general integral control method and a method which is simpler but limited to specific classes of physical systems. When control bounds are involved, the feasibility of the above mentioned QPs can also be significantly improved with the proposed methods. We illustrate our approaches on a unicyle model with two control inputs, and compare the two proposed methods to demonstrate their effectiveness and performance. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: To appear in ACC2022

arXiv:2111.11277 [pdf, other]

BarrierNet: A Safety-Guaranteed Layer for Neural Networks

Authors: Wei Xiao, Ramin Hasani, Xiao Li, Daniela Rus

Abstract: This paper introduces differentiable higher-order control barrier functions (CBF) that are end-to-end trainable together with learning systems. CBFs are usually overly conservative, while guaranteeing safety. Here, we address their conservativeness by softening their definitions using environmental dependencies without loosing safety guarantees, and embed them into differentiable quadratic program… ▽ More This paper introduces differentiable higher-order control barrier functions (CBF) that are end-to-end trainable together with learning systems. CBFs are usually overly conservative, while guaranteeing safety. Here, we address their conservativeness by softening their definitions using environmental dependencies without loosing safety guarantees, and embed them into differentiable quadratic programs. These novel safety layers, termed a BarrierNet, can be used in conjunction with any neural network-based controller, and can be trained by gradient descent. BarrierNet allows the safety constraints of a neural controller be adaptable to changing environments. We evaluate them on a series of control problems such as traffic merging and robot navigations in 2D and 3D space, and demonstrate their effectiveness compared to state-of-the-art approaches. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 23 pages

arXiv:2110.01358 [pdf, other]

doi 10.1109/MCS.2023.3253419

Model Based Control of Soft Robots: A Survey of the State of the Art and Open Challenges

Authors: Cosimo Della Santina, Christian Duriez, Daniela Rus

Abstract: Continuum soft robots are mechanical systems entirely made of continuously deformable elements. This design solution aims to bring robots closer to invertebrate animals and soft appendices of vertebrate animals (e.g., an elephant's trunk, a monkey's tail). This work aims to introduce the control theorist perspective to this novel development in robotics. We aim to remove the barriers to entry into… ▽ More Continuum soft robots are mechanical systems entirely made of continuously deformable elements. This design solution aims to bring robots closer to invertebrate animals and soft appendices of vertebrate animals (e.g., an elephant's trunk, a monkey's tail). This work aims to introduce the control theorist perspective to this novel development in robotics. We aim to remove the barriers to entry into this field by presenting existing results and future challenges using a unified language and within a coherent framework. Indeed, the main difficulty in entering this field is the wide variability of terminology and scientific backgrounds, making it quite hard to acquire a comprehensive view on the topic. Another limiting factor is that it is not obvious where to draw a clear line between the limitations imposed by the technology not being mature yet and the challenges intrinsic to this class of robots. In this work, we argue that the intrinsic effects are the continuum or multi-body dynamics, the presence of a non-negligible elastic potential field, and the variability in sensing and actuation strategies. △ Less

Submitted 3 September, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: 69 pages, 13 figures

arXiv:2104.08614 [pdf]

Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales

Authors: Jacob Andreas, Gašper Beguš, Michael M. Bronstein, Roee Diamant, Denley Delaney, Shane Gero, Shafi Goldwasser, David F. Gruber, Sarah de Haas, Peter Malkin, Roger Payne, Giovanni Petri, Daniela Rus, Pratyusha Sharma, Dan Tchernov, Pernille Tønnesen, Antonio Torralba, Daniel Vogt, Robert J. Wood

Abstract: The past decade has witnessed a groundbreaking rise of machine learning for human language analysis, with current methods capable of automatically accurately recovering various aspects of syntax and semantics - including sentence structure and grounded word meaning - from large data collections. Recent research showed the promise of such tools for analyzing acoustic communication in nonhuman speci… ▽ More The past decade has witnessed a groundbreaking rise of machine learning for human language analysis, with current methods capable of automatically accurately recovering various aspects of syntax and semantics - including sentence structure and grounded word meaning - from large data collections. Recent research showed the promise of such tools for analyzing acoustic communication in nonhuman species. We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data. Cetaceans are unique non-human model species as they possess sophisticated acoustic communications, but utilize a very different encoding system that evolved in an aquatic rather than terrestrial medium. Sperm whales, in particular, with their highly-developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent starting point for advanced machine learning tools that can be applied to other animals in the future. This paper details a roadmap toward this goal based on currently existing technology and multidisciplinary scientific community effort. We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales, detecting their basic communication units and language-like higher-level structures, and validating these models through interactive playback experiments. The technological capabilities developed by such an undertaking are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research. △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:2103.10888 [pdf, other]

Feedback from Pixels: Output Regulation via Learning-Based Scene View Synthesis

Authors: Murad Abu-Khalaf, Sertac Karaman, Daniela Rus

Abstract: We propose a novel controller synthesis involving feedback from pixels, whereby the measurement is a high dimensional signal representing a pixelated image with Red-Green-Blue (RGB) values. The approach neither requires feature extraction, nor object detection, nor visual correspondence. The control policy does not involve the estimation of states or similar latent representations. Instead, tracki… ▽ More We propose a novel controller synthesis involving feedback from pixels, whereby the measurement is a high dimensional signal representing a pixelated image with Red-Green-Blue (RGB) values. The approach neither requires feature extraction, nor object detection, nor visual correspondence. The control policy does not involve the estimation of states or similar latent representations. Instead, tracking is achieved directly in image space, with a model of the reference signal embedded as required by the internal model principle. The reference signal is generated by a neural network with learning-based scene view synthesis capabilities. Our approach does not require an end-to-end learning of a pixel-to-action control policy. The approach is applied to a motion control problem, namely the longitudinal dynamics of a car-following problem. We show how this approach lend itself to a tractable stability analysis with associated bounds critical to establishing trustworthiness and interpretability of the closed-loop dynamics. △ Less

Submitted 23 April, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: Submitted to L4DC on November-20-2020; Accepted on March-5-2021

arXiv:2008.05245 [pdf, other]

doi 10.1109/LCSYS.2020.3039322

Covid-19 and Flattening the Curve: a Feedback Control Perspective

Authors: Francesco Di Lauro, István Z. Kiss, Daniela Rus, Cosimo Della Santina

Abstract: Many of the control policies that were put into place during the Covid-19 pandemic had a common goal: to flatten the curve of the number of infected people so that its peak remains under a critical threshold. This letter considers the challenge of engineering a strategy that enforces such a goal using control theory. We introduce a simple formulation of the optimal flattening problem, and provide… ▽ More Many of the control policies that were put into place during the Covid-19 pandemic had a common goal: to flatten the curve of the number of infected people so that its peak remains under a critical threshold. This letter considers the challenge of engineering a strategy that enforces such a goal using control theory. We introduce a simple formulation of the optimal flattening problem, and provide a closed form solution.This is augmented through nonlinear closed loop tracking of the nominal solution, with the aim of ensuring close-to-optimal performance under uncertain conditions. A key contribution ofthis paper is to provide validation of the method with extensive and realistic simulations in a Covid-19 scenario, with particular focus on the case of Codogno - a small city in Northern Italy that has been among the most harshly hit by the pandemic. △ Less

Submitted 22 October, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

Comments: 6 pages, 8 figures, letter

arXiv:1905.11524 [pdf, other]

Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach

Authors: Murad Abu-Khalaf, Sertac Karaman, Daniela Rus

Abstract: We propose controller synthesis for state regulation problems in which a human operator shares control with an autonomy system, running in parallel. The autonomy system continuously improves over human action, with minimal intervention, and can take over full-control. It additively combines user input with an adaptive optimal corrective signal. It is adaptive in that it neither estimates nor requi… ▽ More We propose controller synthesis for state regulation problems in which a human operator shares control with an autonomy system, running in parallel. The autonomy system continuously improves over human action, with minimal intervention, and can take over full-control. It additively combines user input with an adaptive optimal corrective signal. It is adaptive in that it neither estimates nor requires a model of the human's action policy, or the internal dynamics of the plant, and can adjust to changes in both. Our contribution is twofold; first, a new synthesis for shared control which we formulate as an adaptive optimal control problem for continuous-time linear systems and solve it online as a human-in-the-loop reinforcement learning. The result is an architecture that we call shared linear quadratic regulator (sLQR). Second, we provide new analysis of reinforcement learning for continuous-time linear systems in two parts. In the first analysis part, we avoid learning along a single state-space trajectory which we show leads to data collinearity under certain conditions. We make a clear separation between exploitation of learned policies and exploration of the state-space, and propose an exploration scheme that requires switching to new state-space trajectories rather than injecting noise continuously while learning. This avoidance of continuous noise injection minimizes interference with human action, and avoids bias in the convergence to the stabilizing solution of the underlying algebraic Riccati equation. We show that exploring a minimum number of pairwise distinct state-space trajectories is necessary to avoid collinearity in the learning data. In the second analysis part, we show conditions under which existence and uniqueness of solutions can be established for off-policy reinforcement learning in continuous-time linear systems; namely, prior knowledge of the input matrix. △ Less

Submitted 20 September, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: Accepted by IEEE CDC 2019

arXiv:1803.00387 [pdf, other]

A General Pipeline for 3D Detection of Vehicles

Authors: Xinxin Du, Marcelo H. Ang Jr., Sertac Karaman, Daniela Rus

Abstract: Autonomous driving requires 3D perception of vehicles and other objects in the in environment. Much of the current methods support 2D vehicle detection. This paper proposes a flexible pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to generate 3D information with minimum changes of the 2D detection networks. To identify the 3D box, an effective model fitting algorithm… ▽ More Autonomous driving requires 3D perception of vehicles and other objects in the in environment. Much of the current methods support 2D vehicle detection. This paper proposes a flexible pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to generate 3D information with minimum changes of the 2D detection networks. To identify the 3D box, an effective model fitting algorithm is developed based on generalised car models and score maps. A two-stage convolutional neural network (CNN) is proposed to refine the detected 3D box. This pipeline is tested on the KITTI dataset using two different 2D detection networks. The 3D detection results based on these two networks are similar, demonstrating the flexibility of the proposed pipeline. The results rank second among the 3D detection algorithms, indicating its competencies in 3D detection. △ Less

Submitted 12 February, 2018; originally announced March 2018.

Comments: Accepted at ICRA 2018

arXiv:1104.1159 [pdf, other]

LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees

Authors: Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

Abstract: We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth va… ▽ More We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth value of a proposition only at the current region. Motivated by several results on partitioned-based abstractions, we assume that the motion is performed on a graph. To account for noisy sensors and actuators, we assume that a control action enables several transitions with known probabilities. We show that this problem can be reduced to the problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized. We provide a complete solution for the latter problem that builds on existing results from probabilistic model checking. We include an illustrative case study. △ Less

Submitted 7 April, 2011; v1 submitted 6 April, 2011; originally announced April 2011.

Comments: Technical Report accompanying IFAC 2011

arXiv:1103.4342 [pdf, other]

MDP Optimal Control under Temporal Logic Constraints

Authors: Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

Abstract: In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exi… ▽ More In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed. △ Less

Submitted 23 March, 2011; v1 submitted 22 March, 2011; originally announced March 2011.

Comments: Technical report accompanying the CDC2011 submission

Showing 1–20 of 20 results for author: Rus, D