Search | arXiv e-print repository

Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach

Authors: Renzi Wang, Flavia Sofia Acerbo, Tong Duy Son, Panagiotis Patrinos

Abstract: This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At t… ▽ More This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy. The parameters of the model are learned via a two-stage framework. By leveraging the existing dynamics knowledge, the first stage of the framework estimates the control input sequences and hence reduces the problem complexity. At the second stage, the policy is learned by solving a regularized maximum-likelihood estimation problem using the estimated control input sequences. We further extend the learning procedure by incorporating a Lyapunov stability constraint to ensure asymptotic stability of the identified model, for accurate multi-step predictions. The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations, demonstrating its practical applicability in modelling complex nonlinear dynamics. △ Less

Submitted 12 November, 2024; originally announced November 2024.

arXiv:2403.15102 [pdf, other]

Driving from Vision through Differentiable Optimal Control

Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

Abstract: This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the… ▽ More This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the differentiability of parametric NMPC, allowing for end-to-end learning of the driving model from images to control. The model is trained on an offline dataset comprising various human demonstrations collected on a motion-base driving simulator. During online testing, the model demonstrates successful imitation of different driving styles, and the interpreted NMPC parameters provide insights into the achievement of specific driving behaviors. Our experimental results show that DriViDOC outperforms other methods involving NMPC and neural networks, exhibiting an average improvement of 20% in imitation scores. △ Less

Submitted 2 September, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: This work has been accepted for publication in the Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024). Accompanying video available at: https://youtu.be/ENHhphpbPLs

arXiv:2211.12111 [pdf, other]

Evaluation of MPC-based Imitation Learning for Human-like Autonomous Driving

Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

Abstract: This work evaluates and analyzes the combination of imitation learning (IL) and differentiable model predictive control (MPC) for the application of human-like autonomous driving. We combine MPC with a hierarchical learning-based policy, and measure its performance in open-loop and closed-loop with metrics related to safety, comfort and similarity to human driving characteristics. We also demonstr… ▽ More This work evaluates and analyzes the combination of imitation learning (IL) and differentiable model predictive control (MPC) for the application of human-like autonomous driving. We combine MPC with a hierarchical learning-based policy, and measure its performance in open-loop and closed-loop with metrics related to safety, comfort and similarity to human driving characteristics. We also demonstrate the value of augmenting open-loop behavioral cloning with closed-loop training for a more robust learning, approximating the policy gradient through time with the state space model used by the MPC. We perform experimental evaluations on a lane keeping control system, learned from demonstrations collected on a fixed-base driving simulator, and show that our imitative policies approach the human driving style preferences. △ Less

Submitted 26 June, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND. arXiv admin note: text overlap with arXiv:2206.12348

arXiv:2210.01747 [pdf, other]

Learning from Demonstrations of Critical Driving Behaviours Using Driver's Risk Field

Authors: Yurui Du, Flavia Sofia Acerbo, Jens Kober, Tong Duy Son

Abstract: In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous IL works show sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First… ▽ More In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous IL works show sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with less training resource still has superior performance compared to the previous state-of-the-art. △ Less

Submitted 31 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2206.12348 [pdf, other]

MPC-based Imitation Learning for Safe and Human-like Autonomous Driving

Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

Abstract: To ensure user acceptance of autonomous vehicles (AVs), control systems are being developed to mimic human drivers from demonstrations of desired driving behaviors. Imitation learning (IL) algorithms serve this purpose, but struggle to provide safety guarantees on the resulting closed-loop system trajectories. On the other hand, Model Predictive Control (MPC) can handle nonlinear systems with safe… ▽ More To ensure user acceptance of autonomous vehicles (AVs), control systems are being developed to mimic human drivers from demonstrations of desired driving behaviors. Imitation learning (IL) algorithms serve this purpose, but struggle to provide safety guarantees on the resulting closed-loop system trajectories. On the other hand, Model Predictive Control (MPC) can handle nonlinear systems with safety constraints, but realizing human-like driving with it requires extensive domain knowledge. This work suggests the use of a seamless combination of the two techniques to learn safe AV controllers from demonstrations of desired driving behaviors, by using MPC as a differentiable control layer within a hierarchical IL policy. With this strategy, IL is performed in closed-loop and end-to-end, through parameters in the MPC cost, model or constraints. Experimental results of this methodology are analyzed for the design of a lane keeping control system, learned via behavioral cloning from observations (BCO), given human demonstrations on a fixed-base driving simulator. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted at the 1st Workshop on Safe Learning for Autonomous Driving (SL4AD), co-located with the 39th International Conference on Machine Learning (ICML 2022)

arXiv:2110.04052 [pdf, other]

Safe Imitation Learning on Real-Life Highway Data for Human-like Autonomous Driving

Authors: Flavia Sofia Acerbo, Mohsen Alirezaei, Herman Van der Auweraer, Tong Duy Son

Abstract: This paper presents a safe imitation learning approach for autonomous vehicle driving, with attention on real-life human driving data and experimental validation. In order to increase occupant's acceptance and gain drivers' trust, the autonomous driving function needs to provide a both safe and comfortable behavior such as risk-free and naturalistic driving. Our goal is to obtain such behavior via… ▽ More This paper presents a safe imitation learning approach for autonomous vehicle driving, with attention on real-life human driving data and experimental validation. In order to increase occupant's acceptance and gain drivers' trust, the autonomous driving function needs to provide a both safe and comfortable behavior such as risk-free and naturalistic driving. Our goal is to obtain such behavior via imitation learning of a planning policy from human driving data. In particular, we propose to incorporate barrier functions and smooth spline-based motion parametrization in the training loss function. The advantage is twofold: improving safety of the learning algorithm, while reducing the amount of needed training data. Moreover, the behavior is learned from highway driving data, which is collected consistently by a human driver and then processed towards a specific driving scenario. For development validation, a digital twin of the real test vehicle, sensors, and traffic scenarios are reconstructed toward high-fidelity and physics-based modeling technologies. These models are imported to simulation tools and co-simulated with the proposed algorithm for validation and further testing. Finally, we present experimental results and analyses, and compare with the conventional imitation learning technique (behavioral cloning) to justify the proposed development. △ Less

Submitted 8 October, 2021; originally announced October 2021.

Comments: Published in the proceedings of the 24th IEEE International Conference on Intelligent Transportation Systems - ITSC2021 September 19-22, 2021 (Indianapolis, IN, United States)

Showing 1–6 of 6 results for author: Acerbo, F S