Search | arXiv e-print repository

From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Authors: Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller

Abstract: Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interaction… ▽ More Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: Accepted at International Conference on Intelligent Transportation Systems 2025 (ITSC 2025)

arXiv:2506.02205 [pdf, ps, other]

Bregman Centroid Guided Cross-Entropy Method

Authors: Yuliang Gu, Hongpeng Cao, Marco Caccamo, Naira Hovakimyan

Abstract: The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled infor… ▽ More The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled information aggregation and diversity control. $\textbf{$\mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $\textbf{$\mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $\textbf{$\mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM. △ Less

Submitted 30 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

arXiv:2505.11554 [pdf, ps, other]

doi 10.4230/LIPIcs.ECRTS.2025.7

Multi-Objective Memory Bandwidth Regulation and Cache Partitioning for Multicore Real-Time Systems

Authors: Binqi Sun, Zhihang Wei, Andrea Bastoni, Debayan Roy, Mirco Theile, Tomasz Kloda, Rodolfo Pellizzoni, Marco Caccamo

Abstract: Memory bandwidth regulation and cache partitioning are widely used techniques for achieving predictable timing in real-time computing systems. Combined with partitioned scheduling, these methods require careful co-allocation of tasks and resources to cores, as task execution times strongly depend on available allocated resources. To address this challenge, this paper presents a 0-1 linear program… ▽ More Memory bandwidth regulation and cache partitioning are widely used techniques for achieving predictable timing in real-time computing systems. Combined with partitioned scheduling, these methods require careful co-allocation of tasks and resources to cores, as task execution times strongly depend on available allocated resources. To address this challenge, this paper presents a 0-1 linear program for task-resource co-allocation, along with a multi-objective heuristic designed to minimize resource usage while guaranteeing schedulability under a preemptive EDF scheduling policy. Our heuristic employs a multi-layer framework, where an outer layer explores resource allocations using Pareto-pruned search, and an inner layer optimizes task allocation by solving a knapsack problem using dynamic programming. To evaluate the performance of the proposed optimization algorithm, we profile real-world benchmarks on an embedded AMD UltraScale+ ZCU102 platform, with fine-grained resource partitioning enabled by the Jailhouse hypervisor, leveraging cache set partitioning and MemGuard for memory bandwidth regulation. Experiments based on the benchmarking results show that the proposed 0-1 linear program outperforms existing mixed-integer programs by finding more optimal solutions within the same time limit. Moreover, the proposed multi-objective multi-layer heuristic performs consistently better than the state-of-the-art multi-resource-task co-allocation algorithm in terms of schedulability, resource usage, number of non-dominated solutions, and computational efficiency. △ Less

Submitted 15 May, 2025; originally announced May 2025.

Comments: Accepted in the 37th Euromicro Conference on Real-Time Systems (ECRTS 2025)

arXiv:2505.08382 [pdf, ps, other]

Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning

Authors: Mirco Theile, Andres R. Zapata Rodriguez, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Abstract: Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring comp… ▽ More Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable-size axis-aligned rectangles and UAV motion with curvature-constrained Bézier curves. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: Submitted to IROS 2025

arXiv:2503.17038 [pdf, other]

Arm DynamIQ Shared Unit and Real-Time: An Empirical Evaluation

Authors: Ashutosh Pradhan, Daniele Ottaviano, Yi Jiang, Haozheng Huang, Alexander Zuepke, Andrea Bastoni, Marco Caccamo

Abstract: The increasing complexity of embedded hardware platforms poses significant challenges for real-time workloads. Architectural features such as Intel RDT, Arm QoS, and Arm MPAM are either unavailable on commercial embedded platforms or designed primarily for server environments optimized for average-case performance and might fail to deliver the expected real-time guarantees. Arm DynamIQ Shared Unit… ▽ More The increasing complexity of embedded hardware platforms poses significant challenges for real-time workloads. Architectural features such as Intel RDT, Arm QoS, and Arm MPAM are either unavailable on commercial embedded platforms or designed primarily for server environments optimized for average-case performance and might fail to deliver the expected real-time guarantees. Arm DynamIQ Shared Unit (DSU) includes isolation features-among others, hardware per-way cache partitioning-that can improve the real-time guarantees of complex embedded multicore systems and facilitate real-time analysis. However, the DSU also targets average cases, and its real-time capabilities have not yet been evaluated. This paper presents the first comprehensive analysis of three real-world deployments of the Arm DSU on Rockchip RK3568, Rockchip RK3588, and NVIDIA Orin platforms. We integrate support for the DSU at the operating system and hypervisor level and conduct a large-scale evaluation using both synthetic and real-world benchmarks with varying types and intensities of interference. Our results make extensive use of performance counters and indicate that, although effective, the quality of partitioning and isolation provided by the DSU depends on the type and the intensity of the interfering workloads. In addition, we uncover and analyze in detail the correlation between benchmarks and different types and intensities of interference. △ Less

Submitted 27 March, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

Comments: Accepted for publication in the Proceedings of the 31st IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2025)

MSC Class: 68M20 ACM Class: C.3; C.4; D.4.7

arXiv:2503.05546 [pdf, other]

Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning

Authors: Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile, Marco Caccamo

Abstract: As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures… ▽ More As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures, potential advancements in network design for deep reinforcement learning-specific image encoders remain largely unexplored. We find that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement. This approach outperforms larger and more complex models in the Procgen Benchmark, particularly in terms of generalization. We call our proposed encoder model Impoola-CNN. A decrease in the network's translation sensitivity may be central to this improvement, as we observe the most significant gains in games without agent-centered observations. Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2503.04794 [pdf, other]

Runtime Learning of Quadruped Robots in Wild Environments

Authors: Yihao Cai, Yanbing Mao, Lui Sha, Hongpeng Cao, Marco Caccamo

Abstract: This paper presents a runtime learning framework for quadruped robots, enabling them to learn and adapt safely in dynamic wild environments. The framework integrates sensing, navigation, and control, forming a closed-loop system for the robot. The core novelty of this framework lies in two interactive and complementary components within the control module: the high-performance (HP)-Student and the… ▽ More This paper presents a runtime learning framework for quadruped robots, enabling them to learn and adapt safely in dynamic wild environments. The framework integrates sensing, navigation, and control, forming a closed-loop system for the robot. The core novelty of this framework lies in two interactive and complementary components within the control module: the high-performance (HP)-Student and the high-assurance (HA)-Teacher. HP-Student is a deep reinforcement learning (DRL) agent that engages in self-learning and teaching-to-learn to develop a safe and high-performance action policy. HA-Teacher is a simplified yet verifiable physics-model-based controller, with the role of teaching HP-Student about safety while providing a backup for the robot's safe locomotion. HA-Teacher is innovative due to its real-time physics model, real-time action policy, and real-time control goals, all tailored to respond effectively to real-time wild environments, ensuring safety. The framework also includes a coordinator who effectively manages the interaction between HP-Student and HA-Teacher. Experiments involving a Unitree Go2 robot in Nvidia Isaac Gym and comparisons with state-of-the-art safe DRLs demonstrate the effectiveness of the proposed runtime learning framework. △ Less

Submitted 1 March, 2025; originally announced March 2025.

arXiv:2502.15738 [pdf, ps, other]

Light Virtualization: a proof-of-concept for hardware-based virtualization

Authors: Francesco Ciraolo, Mattia Nicolella, Denis Hoornaert, Marco Caccamo, Renato Mancuso

Abstract: Virtualization has become widespread across all computing environments, from edge devices to cloud systems. Its main advantages are resource management through abstraction and improved isolation of platform resources and processes. However, there are still some important tradeoffs as it requires significant support from the existing hardware infrastructure and negatively impacts performance. Addit… ▽ More Virtualization has become widespread across all computing environments, from edge devices to cloud systems. Its main advantages are resource management through abstraction and improved isolation of platform resources and processes. However, there are still some important tradeoffs as it requires significant support from the existing hardware infrastructure and negatively impacts performance. Additionally, the current approaches to resource virtualization are inflexible, using a model that doesn't allow for dynamic adjustments during operation. This research introduces Light Virtualization (LightV), a new virtualization method for commercial platforms. LightV uses programmable hardware to direct cache coherence traffic, enabling precise and seamless control over which resources are virtualized. The paper explains the core principles of LightV, explores its capabilities, and shares initial findings from a basic proof-of-concept module tested on commercial hardware. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2412.13224 [pdf, other]

Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this… ▽ More Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies. △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: under review

arXiv:2412.04327 [pdf, other]

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Authors: Mirco Theile, Lukas Dirnberger, Raphael Trumpp, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Abstract: Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility… ▽ More Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models. △ Less

Submitted 5 December, 2024; originally announced December 2024.

arXiv:2409.05898 [pdf, other]

Simplex-enabled Safe Continual Learning Machine

Authors: Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo

Abstract: This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Co… ▽ More This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap. △ Less

Submitted 5 October, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

arXiv:2404.12683 [pdf, other]

A Containerized Microservice Architecture for a ROS 2 Autonomous Driving Software: An End-to-End Latency Evaluation

Authors: Tobias Betz, Long Wen, Fengjunjie Pan, Gemb Kaljavesi, Alexander Zuepke, Andrea Bastoni, Marco Caccamo, Alois Knoll, Johannes Betz

Abstract: The automotive industry is transitioning from traditional ECU-based systems to software-defined vehicles. A central role of this revolution is played by containers, lightweight virtualization technologies that enable the flexible consolidation of complex software applications on a common hardware platform. Despite their widespread adoption, the impact of containerization on fundamental real-time m… ▽ More The automotive industry is transitioning from traditional ECU-based systems to software-defined vehicles. A central role of this revolution is played by containers, lightweight virtualization technologies that enable the flexible consolidation of complex software applications on a common hardware platform. Despite their widespread adoption, the impact of containerization on fundamental real-time metrics such as end-to-end latency, communication jitter, as well as memory and CPU utilization has remained virtually unexplored. This paper presents a microservice architecture for a real-world autonomous driving application where containers isolate each service. Our comprehensive evaluation shows the benefits in terms of end-to-end latency of such a solution even over standard bare-Linux deployments. Specifically, in the case of the presented microservice architecture, the mean end-to-end latency can be improved by 5-8 %. Also, the maximum latencies were significantly reduced using container deployment. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.12856 [pdf, other]

doi 10.1109/IROS58592.2024.10801688

Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

Authors: Mirco Theile, Hongpeng Cao, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Abstract: In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to… ▽ More In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance. △ Less

Submitted 25 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted at IROS 2024. A video can be found here: https://youtu.be/L6NOdvU7n7s. The code is available at https://github.com/theilem/uavSim

arXiv:2403.10726 [pdf, other]

doi 10.1109/RTAS61025.2024.00028

Strict Partitioning for Sporadic Rigid Gang Tasks

Authors: Binqi Sun, Tomasz Kloda, Marco Caccamo

Abstract: The rigid gang task model is based on the idea of executing multiple threads simultaneously on a fixed number of processors to increase efficiency and performance. Although there is extensive literature on global rigid gang scheduling, partitioned approaches have several practical advantages (e.g., task isolation and reduced scheduling overheads). In this paper, we propose a new partitioned schedu… ▽ More The rigid gang task model is based on the idea of executing multiple threads simultaneously on a fixed number of processors to increase efficiency and performance. Although there is extensive literature on global rigid gang scheduling, partitioned approaches have several practical advantages (e.g., task isolation and reduced scheduling overheads). In this paper, we propose a new partitioned scheduling strategy for rigid gang tasks, named strict partitioning. The method creates disjoint partitions of tasks and processors to avoid inter-partition interference. Moreover, it tries to assign tasks with similar volumes (i.e., parallelisms) to the same partition so that the intra-partition interference can be reduced. Within each partition, the tasks can be scheduled using any type of scheduler, which allows the use of a less pessimistic schedulability test. Extensive synthetic experiments and a case study based on Edge TPU benchmarks show that strict partitioning achieves better schedulability performance than state-of-the-art global gang schedulability analyses for both preemptive and non-preemptive rigid gang task sets. △ Less

Submitted 1 September, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Published in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2024)

arXiv:2403.07129 [pdf, other]

RaceMOP: Mapless Online Path Planning for Multi-Agent Autonomous Racing using Residual Policy Learning

Authors: Raphael Trumpp, Ehsan Javanmardi, Jin Nakazato, Manabu Tsukada, Marco Caccamo

Abstract: The interactive decision-making in multi-agent autonomous racing offers insights valuable beyond the domain of self-driving cars. Mapless online path planning is particularly of practical appeal but poses a challenge for safely overtaking opponents due to the limited planning horizon. To address this, we introduce RaceMOP, a novel method for mapless online path planning designed for multi-agent ra… ▽ More The interactive decision-making in multi-agent autonomous racing offers insights valuable beyond the domain of self-driving cars. Mapless online path planning is particularly of practical appeal but poses a challenge for safely overtaking opponents due to the limited planning horizon. To address this, we introduce RaceMOP, a novel method for mapless online path planning designed for multi-agent racing of F1TENTH cars. Unlike classical planners that rely on predefined racing lines, RaceMOP operates without a map, utilizing only local observations to execute high-speed overtaking maneuvers. Our approach combines an artificial potential field method as a base policy with residual policy learning to enable long-horizon planning. We advance the field by introducing a novel approach for policy fusion with the residual policy directly in probability space. Extensive experiments on twelve simulated racetracks validate that RaceMOP is capable of long-horizon decision-making with robust collision avoidance during overtaking maneuvers. RaceMOP demonstrates superior handling over existing mapless planners and generalizes to unknown racetracks, affirming its potential for broader applications in robotics. Our code is available at http://github.com/raphajaner/racemop. △ Less

Submitted 16 September, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems 2024

arXiv:2402.18558 [pdf, other]

Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks

Authors: Benjamin David Evans, Raphael Trumpp, Marco Caccamo, Felix Jahncke, Johannes Betz, Hendrik Willem Jordaan, Herman Arnold Engelbrecht

Abstract: The F1TENTH autonomous driving platform, consisting of 1:10-scale remote-controlled cars, has evolved into a well-established education and research platform. The many publications and real-world competitions span many domains, from classical path planning to novel learning-based algorithms. Consequently, the field is wide and disjointed, hindering direct comparison of developed methods and making… ▽ More The F1TENTH autonomous driving platform, consisting of 1:10-scale remote-controlled cars, has evolved into a well-established education and research platform. The many publications and real-world competitions span many domains, from classical path planning to novel learning-based algorithms. Consequently, the field is wide and disjointed, hindering direct comparison of developed methods and making it difficult to assess the state-of-the-art. Therefore, we aim to unify the field by surveying current approaches, describing common methods, and providing benchmark results to facilitate clear comparisons and establish a baseline for future work. This research aims to survey past and current work with F1TENTH vehicles in the classical and learning categories and explain the different solution approaches. We describe particle filter localisation, trajectory optimisation and tracking, model predictive contouring control, follow-the-gap, and end-to-end reinforcement learning. We provide an open-source evaluation of benchmark methods and investigate overlooked factors of control frequency and localisation accuracy for classical methods as well as reward signal and training map for learning methods. The evaluation shows that the optimisation and tracking method achieves the fastest lap times, followed by the online planning approach. Finally, our work identifies and outlines the relevant research aspects to help motivate future work in the F1TENTH domain. △ Less

Submitted 25 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 12 pages, 18 figures. Sumbitted for publication

arXiv:2310.02959 [pdf, other]

Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

Authors: Binqi Sun, Debayan Roy, Tomasz Kloda, Andrea Bastoni, Rodolfo Pellizzoni, Marco Caccamo

Abstract: Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on the cache available for it to use, co-optimizing cache partitioning and task allocation improves the system's schedulability. In this paper, we propose a hybrid… ▽ More Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on the cache available for it to use, co-optimizing cache partitioning and task allocation improves the system's schedulability. In this paper, we propose a hybrid multi-layer design space exploration technique to solve this multi-resource management problem. We explore the interplay between cache partitioning and schedulability by systematically interleaving three optimization layers, viz., (i) in the outer layer, we perform a breadth-first search combined with proactive pruning for cache partitioning; (ii) in the middle layer, we exploit a first-fit heuristic for allocating tasks to cores; and (iii) in the inner layer, we use the well-known recurrence relation for the schedulability analysis of non-preemptive fixed-priority (NP-FP) tasks in a uniprocessor setting. Although our focus is on NP-FP scheduling, we evaluate the flexibility of our framework in supporting different scheduling policies (NP-EDF, P-EDF) by plugging in appropriate analysis methods in the inner layer. Experiments show that, compared to the state-of-the-art techniques, the proposed framework can improve the real-time schedulability of NP-FP task sets by an average of 15.2% with a maximum improvement of 233.6% (when tasks are highly cache-sensitive) and a minimum of 1.6% (when cache sensitivity is low). For such task sets, we found that clustering similar-period (or mutually compatible) tasks often leads to higher schedulability (on average 7.6%) than clustering by cache sensitivity. In our evaluation, the framework also achieves good results for preemptive and dynamic-priority scheduling policies. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: to be published in IEEE Real-Time Systems Symposium (RTSS), 2023

arXiv:2309.03157 [pdf, other]

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Authors: Mirco Theile, Harald Bayerlein, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Abstract: Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlig… ▽ More Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem. △ Less

Submitted 7 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2308.14647 [pdf, other]

doi 10.1109/TC.2024.3350243

Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning

Authors: Binqi Sun, Mirco Theile, Ziyuan Qin, Daniele Bernardini, Debayan Roy, Andrea Bastoni, Marco Caccamo

Abstract: Directed acyclic graph (DAG) tasks are currently adopted in the real-time domain to model complex applications from the automotive, avionics, and industrial domains that implement their functionalities through chains of intercommunicating tasks. This paper studies the problem of scheduling real-time DAG tasks by presenting a novel schedulability test based on the concept of trivial schedulability.… ▽ More Directed acyclic graph (DAG) tasks are currently adopted in the real-time domain to model complex applications from the automotive, avionics, and industrial domains that implement their functionalities through chains of intercommunicating tasks. This paper studies the problem of scheduling real-time DAG tasks by presenting a novel schedulability test based on the concept of trivial schedulability. Using this schedulability test, we propose a new DAG scheduling framework (edge generation scheduling -- EGS) that attempts to minimize the DAG width by iteratively generating edges while guaranteeing the deadline constraint. We study how to efficiently solve the problem of generating edges by developing a deep reinforcement learning algorithm combined with a graph representation neural network to learn an efficient edge generation policy for EGS. We evaluate the effectiveness of the proposed algorithm by comparing it with state-of-the-art DAG scheduling heuristics and an optimal mixed-integer linear programming baseline. Experimental results show that the proposed algorithm outperforms the state-of-the-art by requiring fewer processors to schedule the same DAG tasks. The code is available at https://github.com/binqi-sun/egs. △ Less

Submitted 10 January, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted for publication in IEEE Transactions on Computers

arXiv:2306.02029 [pdf, other]

Model-aided Federated Reinforcement Learning for Multi-UAV Trajectory Planning in IoT Networks

Authors: Jichao Chen, Omid Esrafilian, Harald Bayerlein, David Gesbert, Marco Caccamo

Abstract: Deploying teams of unmanned aerial vehicles (UAVs) to harvest data from distributed Internet of Things (IoT) devices requires efficient trajectory planning and coordination algorithms. Multi-agent reinforcement learning (MARL) has emerged as a solution, but requires extensive and costly real-world training data. To tackle this challenge, we propose a novel model-aided federated MARL algorithm to c… ▽ More Deploying teams of unmanned aerial vehicles (UAVs) to harvest data from distributed Internet of Things (IoT) devices requires efficient trajectory planning and coordination algorithms. Multi-agent reinforcement learning (MARL) has emerged as a solution, but requires extensive and costly real-world training data. To tackle this challenge, we propose a novel model-aided federated MARL algorithm to coordinate multiple UAVs on a data harvesting mission with only limited knowledge about the environment. The proposed algorithm alternates between building an environment simulation model from real-world measurements, specifically learning the radio channel characteristics and estimating unknown IoT device positions, and federated QMIX training in the simulated environment. Each UAV agent trains a local QMIX model in its simulated environment and continuously consolidates it through federated learning with other agents, accelerating the learning process. A performance comparison with standard MARL algorithms demonstrates that our proposed model-aided FedQMIX algorithm reduces the need for real-world training experiences by around three magnitudes while attaining similar data collection performance. △ Less

Submitted 7 October, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.19904 [pdf, other]

Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Authors: Raphael Trumpp, Martin Büchner, Abhinav Valada, Marco Caccamo

Abstract: Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitu… ▽ More Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for efficient training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users. △ Less

Submitted 15 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: IEEE International Conference on Intelligent Transportation Systems 2023

arXiv:2305.16614 [pdf, other]

Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided… ▽ More This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee. △ Less

Submitted 8 July, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2303.16860 [pdf, other]

Physical Deep Reinforcement Learning Towards Safety Guarantee

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

Abstract: Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learnin… ▽ More Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Working Paper

arXiv:2303.03153 [pdf, other]

Flexible Gear Assembly With Visual Servoing and Force Feedback

Authors: Junjie Ming, Daniel Bargmann, Hongpeng Cao, Marco Caccamo

Abstract: Gear assembly is an essential but challenging task in industrial automation. This paper presents a novel two-stage approach for achieving high-precision and flexible gear assembly. The proposed approach integrates YOLO to coarsely localize the workpiece in a searching phase and deep reinforcement learning (DRL) to complete the insertion. Specifically, DRL addresses the challenge of partial visibil… ▽ More Gear assembly is an essential but challenging task in industrial automation. This paper presents a novel two-stage approach for achieving high-precision and flexible gear assembly. The proposed approach integrates YOLO to coarsely localize the workpiece in a searching phase and deep reinforcement learning (DRL) to complete the insertion. Specifically, DRL addresses the challenge of partial visibility when the on-wrist camera is too close to the workpiece. Additionally, force feedback is used to smoothly transit the process from the first phase to the second phase. To reduce the data collection effort for training deep neural networks, we use synthetic RGB images for training YOLO and construct an offline interaction environment leveraging sampled real-world data for training DRL agents. We evaluate the proposed approach in a gear assembly experiment with a precision tolerance of 0.3mm. The results show that our method can robustly and efficiently complete searching and insertion from arbitrary positions within an average of 15 seconds. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Submitted to 2023 IEEE/RSJ International Conference on Intelligent Robots (IROS)

arXiv:2302.07035 [pdf, ps, other]

Residual Policy Learning for Vehicle Control of Autonomous Racing Cars

Authors: Raphael Trumpp, Denis Hoornaert, Marco Caccamo

Abstract: The development of vehicle controllers for autonomous racing is challenging because racing cars operate at their physical driving limit. Prompted by the demand for improved performance, autonomous racing research has seen the proliferation of machine learning-based controllers. While these approaches show competitive performance, their practical applicability is often limited. Residual policy lear… ▽ More The development of vehicle controllers for autonomous racing is challenging because racing cars operate at their physical driving limit. Prompted by the demand for improved performance, autonomous racing research has seen the proliferation of machine learning-based controllers. While these approaches show competitive performance, their practical applicability is often limited. Residual policy learning promises to mitigate this drawback by combining classical controllers with learned residual controllers. The critical advantage of residual controllers is their high adaptability parallel to the classical controller's stable behavior. We propose a residual vehicle controller for autonomous racing cars that learns to amend a classical controller for the path-following of racing lines. In an extensive study, performance gains of our approach are evaluated for a simulated car of the F1TENTH autonomous racing series. The evaluation for twelve replicated real-world racetracks shows that the residual controller reduces lap times by an average of 4.55 % compared to a classical controller and even enables lap time gains on unknown racetracks. △ Less

Submitted 31 May, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: IEEE Intelligent Vehicles Symposium 2023

arXiv:2301.11461 [pdf, other]

doi 10.1109/ACCESS.2024.3376739

Learning to Generate All Feasible Actions

Authors: Mirco Theile, Daniele Bernardini, Raphael Trumpp, Cristina Piazza, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Abstract: Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally proh… ▽ More Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient. △ Less

Submitted 5 July, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2209.01710 [pdf, other]

doi 10.1002/stvr.1879

Perception Simplex: Verifiable Collision Avoidance in Autonomous Vehicles Amidst Obstacle Detection Faults

Authors: Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha

Abstract: Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in en… ▽ More Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose Perception Simplex (PS), a fault-tolerant application architecture designed for obstacle detection and collision avoidance. We analyze an existing LiDAR-based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning-based perception systems yet. By employing verifiable obstacle detection algorithms, PS identifies obstacle existence detection faults in the output of unverifiable DNN-based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software-in-the-loop simulations, we demonstrate that PS provides predictable and deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee. △ Less

Submitted 28 November, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.14403

ACM Class: D.2.11; I.2.9; C.4; J.7

Journal ref: Software Testing, Verification and Reliability. 2024. e1879

arXiv:2208.14403 [pdf, other]

doi 10.1109/ISSRE55969.2022.00017

Verifiable Obstacle Detection

Authors: Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha

Abstract: Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitabl… ▽ More Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks. In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted at ISSRE 2022

ACM Class: D.2.4; I.2.9; I.4.8

Journal ref: 33rd International Symposium on Software Reliability Engineering (ISSRE), pp. 61-72. IEEE, 2022

arXiv:2208.14288 [pdf, other]

6IMPOSE: Bridging the Reality Gap in 6D Pose Estimation for Robotic Grasping

Authors: Hongpeng Cao, Lukas Dirnberger, Daniele Bernardini, Cristina Piazza, Marco Caccamo

Abstract: 6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four… ▽ More 6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by the fine-tuning of data generation and domain randomization techniques, and the optimization of the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pretrained models available on Github. △ Less

Submitted 9 March, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

arXiv:2203.02230 [pdf, other]

doi 10.1109/IROS47612.2022.9981565

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Authors: Hongpeng Cao, Mirco Theile, Federico G. Wyrwal, Marco Caccamo

Abstract: Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment. However, the training of DRL policies requires large amounts of training experiences, making it impractical to learn the policy directly on physical systems. Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them… ▽ More Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment. However, the training of DRL policies requires large amounts of training experiences, making it impractical to learn the policy directly on physical systems. Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world. Unfortunately, the direct real-world deployment of pretrained policies usually suffers from performance deterioration due to the different dynamics, known as the reality gap. Recent sim-to-real methods, such as domain randomization and domain adaptation, focus on improving the robustness of the pretrained agents. Nevertheless, the simulation-trained policies often need to be tuned with real-world data to reach optimal performance, which is challenging due to the high cost of real-world samples. This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time. In the architecture, the inference and training are assigned to the edge and cloud, separating the real-time control loop from the computationally expensive training loop. To overcome the reality gap, our architecture exploits sim-to-real transfer strategies to continue the training of simulation-pretrained agents on a physical system. We demonstrate its applicability on a physical inverted-pendulum control system, analyzing critical parameters. The real-world experiments show that our architecture can adapt the pretrained DRL agents to unseen dynamics consistently and efficiently. △ Less

Submitted 28 July, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Submitted to IROS 2022

arXiv:2107.09973 [pdf, other]

Multi-Agent Belief Sharing through Autonomous Hierarchical Multi-Level Clustering

Authors: Mirco Theile, Jonathan Ponniah, Or Dantsker, Marco Caccamo

Abstract: Coordination in multi-agent systems is challenging for agile robots such as unmanned aerial vehicles (UAVs), where relative agent positions frequently change due to unconstrained movement. The problem is exacerbated through the individual take-off and landing of agents for battery recharging leading to a varying number of active agents throughout the whole mission. This work proposes autonomous hi… ▽ More Coordination in multi-agent systems is challenging for agile robots such as unmanned aerial vehicles (UAVs), where relative agent positions frequently change due to unconstrained movement. The problem is exacerbated through the individual take-off and landing of agents for battery recharging leading to a varying number of active agents throughout the whole mission. This work proposes autonomous hierarchical multi-level clustering (MLC), which forms a clustering hierarchy utilizing decentralized methods. Through periodic cluster maintenance executed by cluster-heads, stable multi-level clustering is achieved. The resulting hierarchy is used as a backbone to solve the communication problem for locally-interactive applications such as UAV tracking problems. Using observation aggregation, compression, and dissemination, agents share local observations throughout the hierarchy, giving every agent a total system belief with spatially dependent resolution and freshness. Extensive simulations show that MLC yields a stable cluster hierarchy under different motion patterns and that the proposed belief sharing is highly applicable in wildfire front monitoring scenarios. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Submitted to IEEE Transactions on Robotics, article extends on https://doi.org/10.2514/6.2021-0656

arXiv:2106.04146 [pdf, other]

doi 10.1109/MECO52532.2021.9460196

Risk Ranked Recall: Collision Safety Metric for Object Detection Systems in Autonomous Vehicles

Authors: Ayoosh Bansal, Jayati Singh, Micaela Verucchi, Marco Caccamo, Lui Sha

Abstract: Commonly used metrics for evaluation of object detection systems (precision, recall, mAP) do not give complete information about their suitability of use in safety critical tasks, like obstacle detection for collision avoidance in Autonomous Vehicles (AV). This work introduces the Risk Ranked Recall ($R^3$) metrics for object detection systems. The $R^3$ metrics categorize objects within three ran… ▽ More Commonly used metrics for evaluation of object detection systems (precision, recall, mAP) do not give complete information about their suitability of use in safety critical tasks, like obstacle detection for collision avoidance in Autonomous Vehicles (AV). This work introduces the Risk Ranked Recall ($R^3$) metrics for object detection systems. The $R^3$ metrics categorize objects within three ranks. Ranks are assigned based on an objective cyber-physical model for the risk of collision. Recall is measured for each rank. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Cyber-Physical Systems and Internet-of-Things 2021

ACM Class: I.2.9; J.7

Journal ref: 2021 10th Mediterranean Conference on Embedded Computing (MECO)

arXiv:2104.04528 [pdf, other]

SchedGuard: Protecting against Schedule Leaks Using Linux Containers

Authors: Jiyang Chen, Tomasz Kloda, Ayoosh Bansal, Rohan Tabish, Chien-Ying Chen, Bo Liu, Sibin Mohan, Marco Caccamo, Lui Sha

Abstract: Real-time systems have recently been shown to be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions such as schedule randomization lack the ability to protect against such attacks, often limited by the system's real-time nature. This paper presents SchedGuard: a temporal protection framework for Linux-based hard real-time systems that pr… ▽ More Real-time systems have recently been shown to be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions such as schedule randomization lack the ability to protect against such attacks, often limited by the system's real-time nature. This paper presents SchedGuard: a temporal protection framework for Linux-based hard real-time systems that protects against posterior scheduler side-channel attacks by preventing untrusted tasks from executing during specific time segments. SchedGuard is integrated into the Linux kernel using cgroups, making it amenable to use with container frameworks. We demonstrate the effectiveness of our system using a realistic radio-controlled rover platform and synthetically generated workloads. Not only is SchedGuard able to protect against the attacks mentioned above, but it also ensures that the real-time tasks/containers meet their temporal requirements. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2010.12461 [pdf, other]

doi 10.1109/OJCOMS.2021.3081996

Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement Learning

Authors: Harald Bayerlein, Mirco Theile, Marco Caccamo, David Gesbert

Abstract: Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the num… ▽ More Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits. △ Less

Submitted 3 June, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: Modifications: final formatting; Code available under https://github.com/hbayerlein/uav_data_harvesting, article extends on arXiv:2007.00544

Journal ref: IEEE Open Journal of the Communications Society, vol. 2, pp. 1171-1187, 2021

arXiv:2010.06917 [pdf, other]

doi 10.1109/ICAR53236.2021.9659413

UAV Path Planning using Global and Local Map Information with Deep Reinforcement Learning

Authors: Mirco Theile, Harald Bayerlein, Richard Nai, David Gesbert, Marco Caccamo

Abstract: Path planning methods for autonomous unmanned aerial vehicles (UAVs) are typically designed for one specific type of mission. This work presents a method for autonomous UAV path planning based on deep reinforcement learning (DRL) that can be applied to a wide range of mission scenarios. Specifically, we compare coverage path planning (CPP), where the UAV's goal is to survey an area of interest to… ▽ More Path planning methods for autonomous unmanned aerial vehicles (UAVs) are typically designed for one specific type of mission. This work presents a method for autonomous UAV path planning based on deep reinforcement learning (DRL) that can be applied to a wide range of mission scenarios. Specifically, we compare coverage path planning (CPP), where the UAV's goal is to survey an area of interest to data harvesting (DH), where the UAV collects data from distributed Internet of Things (IoT) sensor devices. By exploiting structured map information of the environment, we train double deep Q-networks (DDQNs) with identical architectures on both distinctly different mission scenarios to make movement decisions that balance the respective mission goal with navigation constraints. By introducing a novel approach exploiting a compressed global map of the environment combined with a cropped but uncompressed local map showing the vicinity of the UAV agent, we demonstrate that the proposed method can efficiently scale to large environments. We also extend previous results for generalizing control policies that require no retraining when scenario parameters change and offer a detailed analysis of crucial map processing parameters' effects on path planning performance. △ Less

Submitted 21 October, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: ICAR 2021, code available at https://github.com/theilem/uavSim

arXiv:2007.00544 [pdf, other]

doi 10.1109/GLOBECOM42002.2020.9322234

UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

Authors: Harald Bayerlein, Mirco Theile, Marco Caccamo, David Gesbert

Abstract: Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subjec… ▽ More Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated. △ Less

Submitted 26 October, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: Code available under https://github.com/hbayerlein/uav_data_harvesting, IEEE Global Communications Conference (GLOBECOM) 2020

arXiv:2003.02609 [pdf, other]

doi 10.1109/IROS45743.2020.9340934

UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning

Authors: Mirco Theile, Harald Bayerlein, Richard Nai, David Gesbert, Marco Caccamo

Abstract: Coverage path planning (CPP) is the task of designing a trajectory that enables a mobile agent to travel over every point of an area of interest. We propose a new method to control an unmanned aerial vehicle (UAV) carrying a camera on a CPP mission with random start positions and multiple options for landing positions in an environment containing no-fly zones. While numerous approaches have been p… ▽ More Coverage path planning (CPP) is the task of designing a trajectory that enables a mobile agent to travel over every point of an area of interest. We propose a new method to control an unmanned aerial vehicle (UAV) carrying a camera on a CPP mission with random start positions and multiple options for landing positions in an environment containing no-fly zones. While numerous approaches have been proposed to solve similar CPP problems, we leverage end-to-end reinforcement learning (RL) to learn a control policy that generalizes over varying power constraints for the UAV. Despite recent improvements in battery technology, the maximum flying range of small UAVs is still a severe constraint, which is exacerbated by variations in the UAV's power consumption that are hard to predict. By using map-like input channels to feed spatial information through convolutional network layers to the agent, we are able to train a double deep Q-network (DDQN) to make control decisions for the UAV, balancing limited power budget and coverage goal. The proposed method can be applied to a wide variety of environments and harmonizes complex goal structures with system constraints. △ Less

Submitted 12 February, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1909.05349 [pdf, other]

doi 10.1109/MECO49872.2020.9134262

Cache Where you Want! Reconciling Predictability and Coherent Caching

Authors: Ayoosh Bansal, Jayati Singh, Yifan Hao, Jen-Yang Wen, Renato Mancuso, Marco Caccamo

Abstract: Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of unpredictability. Large fluctuations in latency to access data shared between multiple cores is an important contributor to the overall execution-time variabili… ▽ More Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of unpredictability. Large fluctuations in latency to access data shared between multiple cores is an important contributor to the overall execution-time variability. In addition to the temporal unpredictability introduced by caching, parallel applications with data shared across multiple cores also pay additional latency overheads due to data coherence. Analyzing the impact of data coherence on the worst-case execution-time of real-time applications is challenging because only scarce implementation details are revealed by manufacturers. This paper presents application level control for caching data at different levels of the cache hierarchy. The rationale is that by caching data only in shared cache it is possible to bypass private caches. The access latency to data present in caches becomes independent of its coherence state. We discuss the existing architectural support as well as the required hardware and OS modifications to support the proposed cacheability control. We evaluate the system on an architectural simulator. We show that the worst case execution time for a single memory write request is reduced by 52%. Benchmark evaluations show that proposed technique has a minimal impact on average performance. △ Less

Submitted 27 June, 2021; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: 13 pages, 10 figures, v2 update includes overview section with formal solution definition. This is a long version of a prior publication

ACM Class: C.0; C.3; C.4; D.4.7; J.7

Journal ref: 2020 9th Mediterranean Conference on Embedded Computing (MECO), 2020, pp. 1-6

arXiv:1705.01520 [pdf, other]

Restart-Based Security Mechanisms for Safety-Critical Embedded Systems

Authors: Fardin Abdi, Chien-Ying Chen, Monowar Hasan, Songran Liu, Sibin Mohan, Marco Caccamo

Abstract: Many physical plants that are controlled by embedded systems have safety requirements that need to be respected at all times - any deviations from expected behavior can result in damage to the system (often to the physical plant), the environment or even endanger human life. In recent times, malicious attacks against such systems have increased - many with the intent to cause physical damage. In t… ▽ More Many physical plants that are controlled by embedded systems have safety requirements that need to be respected at all times - any deviations from expected behavior can result in damage to the system (often to the physical plant), the environment or even endanger human life. In recent times, malicious attacks against such systems have increased - many with the intent to cause physical damage. In this paper, we aim to decouple the safety of the plant from security of the embedded system by taking advantage of the inherent inertia in such systems. In this paper we present a system-wide restart-based framework that combines hardware and software components to (a) maintain the system within the safety region and (b) thwart potential attackers from destabilizing the system. We demonstrate the feasibility of our approach using two realistic systems - an actual 3 degree of freedom (3-DoF) helicopter and a simulated warehouse temperature control unit. Our proof-of-concept implementation is tested against multiple emulated attacks on the control units of these systems. △ Less

Submitted 3 May, 2017; originally announced May 2017.

arXiv:1202.5722 [pdf, other]

S3A: Secure System Simplex Architecture for Enhanced Security of Cyber-Physical Systems

Authors: Sibin Mohan, Stanley Bak, Emiliano Betti, Heechul Yun, Lui Sha, Marco Caccamo

Abstract: Until recently, cyber-physical systems, especially those with safety-critical properties that manage critical infrastructure (e.g. power generation plants, water treatment facilities, etc.) were considered to be invulnerable against software security breaches. The recently discovered 'W32.Stuxnet' worm has drastically changed this perception by demonstrating that such systems are susceptible to ex… ▽ More Until recently, cyber-physical systems, especially those with safety-critical properties that manage critical infrastructure (e.g. power generation plants, water treatment facilities, etc.) were considered to be invulnerable against software security breaches. The recently discovered 'W32.Stuxnet' worm has drastically changed this perception by demonstrating that such systems are susceptible to external attacks. Here we present an architecture that enhances the security of safety-critical cyber-physical systems despite the presence of such malware. Our architecture uses the property that control systems have deterministic execution behavior, to detect an intrusion within 0.6 μs while still guaranteeing the safety of the plant. We also show that even if an attack is successful, the overall state of the physical system will still remain safe. Even if the operating system's administrative privileges have been compromised, our architecture will still be able to protect the physical system from coming to harm. △ Less

Submitted 25 February, 2012; originally announced February 2012.

Comments: 12 pages

Showing 1–40 of 40 results for author: Caccamo, M