-
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Authors:
Mohamed-Khalil Bouzidi,
Christian Schlauch,
Nicole Scheuerer,
Yue Yao,
Nadja Klein,
Daniel Göhring,
Jörg Reichardt
Abstract:
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving st…
▽ More
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/continental/pred2plan.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations
Authors:
Yue Yao,
Mohamed-Khalil Bouzidi,
Daniel Goehring,
Joerg Reichardt
Abstract:
As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion altern…
▽ More
As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach. The code and model checkpoints can be found here: https://github.com/continental/EP-Diffuser.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Beyond In-Distribution Performance: A Cross-Dataset Study of Trajectory Prediction Robustness
Authors:
Yue Yao,
Daniel Goehring,
Joerg Reichardt
Abstract:
We study the Out-of-Distribution (OoD) generalization ability of three SotA trajectory prediction models with comparable In-Distribution (ID) performance but different model designs. We investigate the influence of inductive bias, size of training data and data augmentation strategy by training the models on Argoverse 2 (A2) and testing on Waymo Open Motion (WO) and vice versa. We find that the sm…
▽ More
We study the Out-of-Distribution (OoD) generalization ability of three SotA trajectory prediction models with comparable In-Distribution (ID) performance but different model designs. We investigate the influence of inductive bias, size of training data and data augmentation strategy by training the models on Argoverse 2 (A2) and testing on Waymo Open Motion (WO) and vice versa. We find that the smallest model with highest inductive bias exhibits the best OoD generalization across different augmentation strategies when trained on the smaller A2 dataset and tested on the large WO dataset. In the converse setting, training all models on the larger WO dataset and testing on the smaller A2 dataset, we find that all models generalize poorly, even though the model with the highest inductive bias still exhibits the best generalization ability. We discuss possible reasons for this surprising finding and draw conclusions about the design and test of trajectory prediction models and benchmarks.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation
Authors:
Xavier Timoneda,
Markus Herb,
Fabian Duerr,
Daniel Goehring,
Fisher Yu
Abstract:
LiDAR Semantic Segmentation is a fundamental task in autonomous driving perception consisting of associating each LiDAR point to a semantic label. Fully-supervised models have widely tackled this task, but they require labels for each scan, which either limits their domain or requires impractical amounts of expensive annotations. Camera images, which are generally recorded alongside LiDAR pointclo…
▽ More
LiDAR Semantic Segmentation is a fundamental task in autonomous driving perception consisting of associating each LiDAR point to a semantic label. Fully-supervised models have widely tackled this task, but they require labels for each scan, which either limits their domain or requires impractical amounts of expensive annotations. Camera images, which are generally recorded alongside LiDAR pointclouds, can be processed by the widely available 2D foundation models, which are generic and dataset-agnostic. However, distilling knowledge from 2D data to improve LiDAR perception raises domain adaptation challenges. For example, the classical perspective projection suffers from the parallax effect produced by the position shift between both sensors at their respective capture times. We propose a Semi-Supervised Learning setup to leverage unlabeled LiDAR pointclouds alongside distilled knowledge from the camera images. To self-supervise our model on the unlabeled scans, we add an auxiliary NeRF head and cast rays from the camera viewpoint over the unlabeled voxel features. The NeRF head predicts densities and semantic logits at each sampled ray location which are used for rendering pixel semantics. Concurrently, we query the Segment-Anything (SAM) foundation model with the camera image to generate a set of unlabeled generic masks. We fuse the masks with the rendered pixel semantics from LiDAR to produce pseudo-labels that supervise the pixel predictions. During inference, we drop the NeRF head and run our model with only LiDAR. We show the effectiveness of our approach in three public LiDAR Semantic Segmentation benchmarks: nuScenes, SemanticKITTI and ScribbleKITTI.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Impact of 3D LiDAR Resolution in Graph-based SLAM Approaches: A Comparative Study
Authors:
J. Jorge,
T. Barros,
C. Premebida,
M. Aleksandrov,
D. Goehring,
U. J. Nunes
Abstract:
Simultaneous Localization and Mapping (SLAM) is a key component of autonomous systems operating in environments that require a consistent map for reliable localization. SLAM has been a widely studied topic for decades with most of the solutions being camera or LiDAR based. Early LiDAR-based approaches primarily relied on 2D data, whereas more recent frameworks use 3D data. In this work, we survey…
▽ More
Simultaneous Localization and Mapping (SLAM) is a key component of autonomous systems operating in environments that require a consistent map for reliable localization. SLAM has been a widely studied topic for decades with most of the solutions being camera or LiDAR based. Early LiDAR-based approaches primarily relied on 2D data, whereas more recent frameworks use 3D data. In this work, we survey recent 3D LiDAR-based Graph-SLAM methods in urban environments, aiming to compare their strengths, weaknesses, and limitations. Additionally, we evaluate their robustness regarding the LiDAR resolution namely 64 $vs$ 128 channels. Regarding SLAM methods, we evaluate SC-LeGO-LOAM, SC-LIO-SAM, Cartographer, and HDL-Graph on real-world urban environments using the KITTI odometry dataset (a LiDAR with 64-channels only) and a new dataset (AUTONOMOS-LABS). The latter dataset, collected using instrumented vehicles driving in Berlin suburban area, comprises both 64 and 128 LiDARs. The experimental results are reported in terms of quantitative `metrics' and complemented by qualitative maps.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation
Authors:
Philipp Quentin,
Daniel Goehring
Abstract:
For the use of 6D pose estimation in robotic applications, reliable poses are of utmost importance to ensure a safe, reliable and predictable operational performance. Despite these requirements, state-of-the-art 6D pose estimators often do not provide any uncertainty quantification for their pose estimates at all, or if they do, it has been shown that the uncertainty provided is only weakly correl…
▽ More
For the use of 6D pose estimation in robotic applications, reliable poses are of utmost importance to ensure a safe, reliable and predictable operational performance. Despite these requirements, state-of-the-art 6D pose estimators often do not provide any uncertainty quantification for their pose estimates at all, or if they do, it has been shown that the uncertainty provided is only weakly correlated with the actual true error. To address this issue, we investigate a simple but effective uncertainty quantification, that we call MaskVal, which compares the pose estimates with their corresponding instance segmentations by rendering and does not require any modification of the pose estimator itself. Despite its simplicity, MaskVal significantly outperforms a state-of-the-art ensemble method on both a dataset and a robotic setup. We show that by using MaskVal, the performance of a state-of-the-art 6D pose estimator is significantly improved towards a safe and reliable operation. In addition, we propose a new and specific approach to compare and evaluate uncertainty quantification methods for 6D pose estimation in the context of robotic manipulation.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations
Authors:
Yue Yao,
Shengchao Yan,
Daniel Goehring,
Wolfram Burgard,
Joerg Reichardt
Abstract:
Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets.…
▽ More
Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets. We introduce a novel prediction algorithm based on polynomial representations for agent trajectory and road geometry on both the input and output sides of the model. With a much smaller model size, training effort, and inference time, we reach near SotA performance for ID testing and significantly improve robustness in OoD testing. Within our OoD testing protocol, we further study two augmentation strategies of SotA models and their effects on model generalization. Highlighting the contrast between ID and OoD performance, we suggest adding OoD testing to the evaluation criteria of trajectory prediction models.
△ Less
Submitted 25 January, 2025; v1 submitted 18 July, 2024;
originally announced July 2024.
-
GroundGrid:LiDAR Point Cloud Ground Segmentation and Terrain Estimation
Authors:
Nicolai Steinke,
Daniel Göhring,
Raùl Rojas
Abstract:
The precise point cloud ground segmentation is a crucial prerequisite of virtually all perception tasks for LiDAR sensors in autonomous vehicles. Especially the clustering and extraction of objects from a point cloud usually relies on an accurate removal of ground points. The correct estimation of the surrounding terrain is important for aspects of the drivability of a surface, path planning, and…
▽ More
The precise point cloud ground segmentation is a crucial prerequisite of virtually all perception tasks for LiDAR sensors in autonomous vehicles. Especially the clustering and extraction of objects from a point cloud usually relies on an accurate removal of ground points. The correct estimation of the surrounding terrain is important for aspects of the drivability of a surface, path planning, and obstacle prediction. In this article, we propose our system GroundGrid which relies on 2D elevation maps to solve the terrain estimation and point cloud ground segmentation problems. We evaluate the ground segmentation and terrain estimation performance of GroundGrid and compare it to other state-of-the-art methods using the SemanticKITTI dataset and a novel evaluation method relying on airborne LiDAR scanning. The results show that GroundGrid is capable of outperforming other state-of-the-art systems with an average IoU of 94.78% while maintaining a high run-time performance of 171Hz. The source code is available at https://github.com/dcmlr/groundgrid
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Motion Planning under Uncertainty: Integrating Learning-Based Multi-Modal Predictors into Branch Model Predictive Control
Authors:
Mohamed-Khalil Bouzidi,
Bojan Derajic,
Daniel Goehring,
Joerg Reichardt
Abstract:
In complex traffic environments, autonomous vehicles face multi-modal uncertainty about other agents' future behavior. To address this, recent advancements in learningbased motion predictors output multi-modal predictions. We present our novel framework that leverages Branch Model Predictive Control(BMPC) to account for these predictions. The framework includes an online scenario-selection process…
▽ More
In complex traffic environments, autonomous vehicles face multi-modal uncertainty about other agents' future behavior. To address this, recent advancements in learningbased motion predictors output multi-modal predictions. We present our novel framework that leverages Branch Model Predictive Control(BMPC) to account for these predictions. The framework includes an online scenario-selection process guided by topology and collision risk criteria. This efficiently selects a minimal set of predictions, rendering the BMPC realtime capable. Additionally, we introduce an adaptive decision postponing strategy that delays the planner's commitment to a single scenario until the uncertainty is resolved. Our comprehensive evaluations in traffic intersection and random highway merging scenarios demonstrate enhanced comfort and safety through our method.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic
Authors:
Mohamed-Khalil Bouzidi,
Yue Yao,
Daniel Goehring,
Joerg Reichardt
Abstract:
Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. The…
▽ More
Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics
Authors:
Philipp Quentin,
Dino Knoll,
Daniel Goehring
Abstract:
Despite the advances in robotics a large proportion of the of parts handling tasks in the automotive industry's internal logistics are not automated but still performed by humans. A key component to competitively automate these processes is a 6D pose estimation that can handle a large number of different parts, is adaptable to new parts with little manual effort, and is sufficiently accurate and r…
▽ More
Despite the advances in robotics a large proportion of the of parts handling tasks in the automotive industry's internal logistics are not automated but still performed by humans. A key component to competitively automate these processes is a 6D pose estimation that can handle a large number of different parts, is adaptable to new parts with little manual effort, and is sufficiently accurate and robust with respect to industry requirements. In this context, the question arises as to the current status quo with respect to these measures. To address this we built a representative 6D pose estimation pipeline with state-of-the-art components from economically scalable real to synthetic data generation to pose estimators and evaluated it on automotive parts with regards to a realistic sequencing process. We found that using the data generation approaches, the performance of the trained 6D pose estimators are promising, but do not meet industry requirements. We reveal that the reason for this is the inability of the estimators to provide reliable uncertainties for their poses, rather than the ability of to provide sufficiently accurate poses. In this context we further analyzed how RGB- and RGB-D-based approaches compare against this background and show that they are differently vulnerable to the domain gap induced by synthetic data.
△ Less
Submitted 9 April, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
An Empirical Bayes Analysis of Object Trajectory Representation Models
Authors:
Yue Yao,
Daniel Goehring,
Joerg Reichardt
Abstract:
Linear trajectory models provide mathematical advantages to autonomous driving applications such as motion prediction. However, linear models' expressive power and bias for real-world trajectories have not been thoroughly analyzed. We present an in-depth empirical analysis of the trade-off between model complexity and fit error in modelling object trajectories. We analyze vehicle, cyclist, and ped…
▽ More
Linear trajectory models provide mathematical advantages to autonomous driving applications such as motion prediction. However, linear models' expressive power and bias for real-world trajectories have not been thoroughly analyzed. We present an in-depth empirical analysis of the trade-off between model complexity and fit error in modelling object trajectories. We analyze vehicle, cyclist, and pedestrian trajectories. Our methodology estimates observation noise and prior distributions over model parameters from several large-scale datasets. Incorporating these priors can then regularize prediction models. Our results show that linear models do represent real-world trajectories with high fidelity at very moderate model complexity. This suggests the feasibility of using linear trajectory models in future motion prediction systems with inherent mathematical advantages.
△ Less
Submitted 21 May, 2025; v1 submitted 3 November, 2022;
originally announced November 2022.