-
Time to Retrain? Detecting Concept Drifts in Machine Learning Systems
Authors:
Tri Minh Triet Pham,
Karthikeyan Premkumar,
Mohamed Naili,
Jinqiu Yang
Abstract:
With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and product…
▽ More
With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and production. It is essential to use concept rift detection to monitor the deployed ML models and re-train the ML models when needed. In this work, we explore applying state-of-the-art (SOTA) concept drift detection techniques on synthetic and real-world datasets in an industrial setting. Such an industrial setting requires minimal manual effort in labeling and maximal generality in ML model architecture. We find that current SOTA semi-supervised methods not only require significant labeling effort but also only work for certain types of ML models. To overcome such limitations, we propose a novel model-agnostic technique (CDSeer) for detecting concept drift. Our evaluation shows that CDSeer has better precision and recall compared to the state-of-the-art while requiring significantly less manual labeling. We demonstrate the effectiveness of CDSeer at concept drift detection by evaluating it on eight datasets from different domains and use cases. Results from internal deployment of CDSeer on an industrial proprietary dataset show a 57.1% improvement in precision while using 99% fewer labels compared to the SOTA concept drift detection method. The performance is also comparable to the supervised concept drift detection method, which requires 100% of the data to be labeled. The improved performance and ease of adoption of CDSeer are valuable in making ML systems more reliable.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Robustness of LiDAR-Based Pose Estimation: Evaluating and Improving Odometry and Localization Under Common Point Cloud Corruptions
Authors:
Bo Yang,
Tri Minh Triet Pham,
Jinqiu Yang
Abstract:
Accurate and reliable pose estimation, i.e., determining the precise position and orientation of autonomous robots and vehicles, is critical for tasks like navigation and mapping. LiDAR is a widely used sensor for pose estimation, with odometry and localization being two primary tasks. LiDAR odometry estimates the relative motion between consecutive scans, while LiDAR localization aligns real-time…
▽ More
Accurate and reliable pose estimation, i.e., determining the precise position and orientation of autonomous robots and vehicles, is critical for tasks like navigation and mapping. LiDAR is a widely used sensor for pose estimation, with odometry and localization being two primary tasks. LiDAR odometry estimates the relative motion between consecutive scans, while LiDAR localization aligns real-time scans with a pre-recorded map to obtain a global pose. Although they have different objectives and application scenarios, both rely on point cloud registration as the underlying technique and face shared challenges of data corruption caused by adverse conditions (e.g., rain). While state-of-the-art (SOTA) pose estimation systems achieved high accuracy on clean data, their robustness to corrupted data remains unclear. In this work, we propose a framework to systematically evaluate five SOTA LiDAR pose estimation systems across 18 synthetic real-world point cloud corruptions. Our experiments reveal that odometry systems degrade significantly under specific corruptions, with relative position errors increasing from 0.5% to more than 80%, while localization systems remain highly robust. We further demonstrate that denoising techniques can effectively mitigate the adverse effects of noise-induced corruptions, and re-training learning-based systems with corrupted data significantly enhances the robustness against various corruption types.
△ Less
Submitted 4 March, 2025; v1 submitted 16 September, 2024;
originally announced September 2024.
-
Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems
Authors:
Tri Minh Triet Pham,
Bo Yang,
Jinqiu Yang
Abstract:
Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, includ…
▽ More
Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, including the loss of human life. In this work, we propose SimsV, which performs system-level testing on multi-module ADS. SimsV targets perception failures of ADS and further assesses the impact of perception failure on the system as a whole. SimsV leverages a high-fidelity simulator for test input and oracle generation by continuously applying predefined mutation operators. In addition, SimsV leverages various metrics to guide the testing process. We implemented a prototype SimsV for testing a commercial-grade Level 4 ADS (i.e., Apollo) using a popular open-source driving platform simulator. Our evaluation shows that SimsV is capable of finding weaknesses in the perception of Apollo. Furthermore, we show that by exploiting such weakness, SimsV finds severe problems in Apollo, including collisions.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Evaluating the Robustness of LiDAR-based 3D Obstacles Detection and Its Impacts on Autonomous Driving Systems
Authors:
Tri Minh Triet Pham,
Bo Yang,
Jinqiu Yang
Abstract:
Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensi…
▽ More
Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensive work to test ADSs, it remains unclear whether current ADSs are robust against very subtle changes in LiDAR point cloud data. In this work, we study the impact of the built-in inaccuracies in LiDAR sensors on LiDAR-3D obstacle detection models to provide insight into how they can impact obstacle detection (i.e., robustness) and by extension trajectory prediction (i.e., how the robustness of obstacle detection would impact ADSs).
We propose a framework SORBET, that applies subtle perturbations to LiDAR data, evaluates the robustness of LiDAR-3D obstacle detection, and assesses the impacts on the trajectory prediction module and ADSs. We applied SORBET to evaluate the robustness of five classic LiDAR-3D obstacle detection models, including one from an industry-grade Level 4 ADS (Baidu's Apollo). Furthermore, we studied how changes in the obstacle detection results would negatively impact trajectory prediction in a cascading fashion. Our evaluation highlights the importance of testing the robustness of LiDAR-3D obstacle detection models against subtle perturbations. We find that even very subtle changes in point cloud data (i.e., removing two points) may introduce a non-trivial decrease in the detection performance. Furthermore, such a negative impact will further propagate to other modules, and endanger the safety of ADSs.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.