-
Time to Retrain? Detecting Concept Drifts in Machine Learning Systems
Authors:
Tri Minh Triet Pham,
Karthikeyan Premkumar,
Mohamed Naili,
Jinqiu Yang
Abstract:
With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and product…
▽ More
With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and production. It is essential to use concept rift detection to monitor the deployed ML models and re-train the ML models when needed. In this work, we explore applying state-of-the-art (SOTA) concept drift detection techniques on synthetic and real-world datasets in an industrial setting. Such an industrial setting requires minimal manual effort in labeling and maximal generality in ML model architecture. We find that current SOTA semi-supervised methods not only require significant labeling effort but also only work for certain types of ML models. To overcome such limitations, we propose a novel model-agnostic technique (CDSeer) for detecting concept drift. Our evaluation shows that CDSeer has better precision and recall compared to the state-of-the-art while requiring significantly less manual labeling. We demonstrate the effectiveness of CDSeer at concept drift detection by evaluating it on eight datasets from different domains and use cases. Results from internal deployment of CDSeer on an industrial proprietary dataset show a 57.1% improvement in precision while using 99% fewer labels compared to the SOTA concept drift detection method. The performance is also comparable to the supervised concept drift detection method, which requires 100% of the data to be labeled. The improved performance and ease of adoption of CDSeer are valuable in making ML systems more reliable.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Robustness of LiDAR-Based Pose Estimation: Evaluating and Improving Odometry and Localization Under Common Point Cloud Corruptions
Authors:
Bo Yang,
Tri Minh Triet Pham,
Jinqiu Yang
Abstract:
Accurate and reliable pose estimation, i.e., determining the precise position and orientation of autonomous robots and vehicles, is critical for tasks like navigation and mapping. LiDAR is a widely used sensor for pose estimation, with odometry and localization being two primary tasks. LiDAR odometry estimates the relative motion between consecutive scans, while LiDAR localization aligns real-time…
▽ More
Accurate and reliable pose estimation, i.e., determining the precise position and orientation of autonomous robots and vehicles, is critical for tasks like navigation and mapping. LiDAR is a widely used sensor for pose estimation, with odometry and localization being two primary tasks. LiDAR odometry estimates the relative motion between consecutive scans, while LiDAR localization aligns real-time scans with a pre-recorded map to obtain a global pose. Although they have different objectives and application scenarios, both rely on point cloud registration as the underlying technique and face shared challenges of data corruption caused by adverse conditions (e.g., rain). While state-of-the-art (SOTA) pose estimation systems achieved high accuracy on clean data, their robustness to corrupted data remains unclear. In this work, we propose a framework to systematically evaluate five SOTA LiDAR pose estimation systems across 18 synthetic real-world point cloud corruptions. Our experiments reveal that odometry systems degrade significantly under specific corruptions, with relative position errors increasing from 0.5% to more than 80%, while localization systems remain highly robust. We further demonstrate that denoising techniques can effectively mitigate the adverse effects of noise-induced corruptions, and re-training learning-based systems with corrupted data significantly enhances the robustness against various corruption types.
△ Less
Submitted 4 March, 2025; v1 submitted 16 September, 2024;
originally announced September 2024.
-
Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems
Authors:
Tri Minh Triet Pham,
Bo Yang,
Jinqiu Yang
Abstract:
Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, includ…
▽ More
Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, including the loss of human life. In this work, we propose SimsV, which performs system-level testing on multi-module ADS. SimsV targets perception failures of ADS and further assesses the impact of perception failure on the system as a whole. SimsV leverages a high-fidelity simulator for test input and oracle generation by continuously applying predefined mutation operators. In addition, SimsV leverages various metrics to guide the testing process. We implemented a prototype SimsV for testing a commercial-grade Level 4 ADS (i.e., Apollo) using a popular open-source driving platform simulator. Our evaluation shows that SimsV is capable of finding weaknesses in the perception of Apollo. Furthermore, we show that by exploiting such weakness, SimsV finds severe problems in Apollo, including collisions.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Evaluating the Robustness of LiDAR-based 3D Obstacles Detection and Its Impacts on Autonomous Driving Systems
Authors:
Tri Minh Triet Pham,
Bo Yang,
Jinqiu Yang
Abstract:
Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensi…
▽ More
Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensive work to test ADSs, it remains unclear whether current ADSs are robust against very subtle changes in LiDAR point cloud data. In this work, we study the impact of the built-in inaccuracies in LiDAR sensors on LiDAR-3D obstacle detection models to provide insight into how they can impact obstacle detection (i.e., robustness) and by extension trajectory prediction (i.e., how the robustness of obstacle detection would impact ADSs).
We propose a framework SORBET, that applies subtle perturbations to LiDAR data, evaluates the robustness of LiDAR-3D obstacle detection, and assesses the impacts on the trajectory prediction module and ADSs. We applied SORBET to evaluate the robustness of five classic LiDAR-3D obstacle detection models, including one from an industry-grade Level 4 ADS (Baidu's Apollo). Furthermore, we studied how changes in the obstacle detection results would negatively impact trajectory prediction in a cascading fashion. Our evaluation highlights the importance of testing the robustness of LiDAR-3D obstacle detection models against subtle perturbations. We find that even very subtle changes in point cloud data (i.e., removing two points) may introduce a non-trivial decrease in the detection performance. Furthermore, such a negative impact will further propagate to other modules, and endanger the safety of ADSs.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
A War Beyond Deepfake: Benchmarking Facial Counterfeits and Countermeasures
Authors:
Minh Tam Pham,
Thanh Trung Huynh,
Van Vinh Tong,
Thanh Tam Nguyen,
Thanh Thi Nguyen,
Hongzhi Yin,
Quoc Viet Hung Nguyen
Abstract:
In recent years, visual forgery has reached a level of sophistication that humans cannot identify fraud, which poses a significant threat to information security. A wide range of malicious applications have emerged, such as fake news, defamation or blackmailing of celebrities, impersonation of politicians in political warfare, and the spreading of rumours to attract views. As a result, a rich body…
▽ More
In recent years, visual forgery has reached a level of sophistication that humans cannot identify fraud, which poses a significant threat to information security. A wide range of malicious applications have emerged, such as fake news, defamation or blackmailing of celebrities, impersonation of politicians in political warfare, and the spreading of rumours to attract views. As a result, a rich body of visual forensic techniques has been proposed in an attempt to stop this dangerous trend. In this paper, we present a benchmark that provides in-depth insights into visual forgery and visual forensics, using a comprehensive and empirical approach. More specifically, we develop an independent framework that integrates state-of-the-arts counterfeit generators and detectors, and measure the performance of these techniques using various criteria. We also perform an exhaustive analysis of the benchmarking results, to determine the characteristics of the methods that serve as a comparative reference in this never-ending war between measures and countermeasures.
△ Less
Submitted 7 April, 2022; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Anonymous communication system provides a secure environment without leaking metadata, which has many application scenarios in IoT
Authors:
Ngoc Ai Van Nguyen,
Minh Thuy Truc Pham
Abstract:
Anonymous Identity Based Encryption (AIBET) scheme allows a tracer to use the tracing key to reveal the recipient's identity from the ciphertext while keeping other data anonymous. This special feature makes AIBET a promising solution to distributed IoT data security. In this paper, we construct an efficient quantum-safe Hierarchical Identity-Based cryptosystem with Traceable Identities (AHIBET) w…
▽ More
Anonymous Identity Based Encryption (AIBET) scheme allows a tracer to use the tracing key to reveal the recipient's identity from the ciphertext while keeping other data anonymous. This special feature makes AIBET a promising solution to distributed IoT data security. In this paper, we construct an efficient quantum-safe Hierarchical Identity-Based cryptosystem with Traceable Identities (AHIBET) with fully anonymous ciphertexts. We prove the security of the AHIBET scheme under the Learning with Errors (LWE) problem in the standard model.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Recognition of 26 Degrees of Freedom of Hands Using Model-based approach and Depth-Color Images
Authors:
Cong Hoang Quach,
Minh Trien Pham,
Anh Viet Dang,
Dinh Tuan Pham,
Thuan Hoang Tran,
Manh Duong Phung
Abstract:
In this study, we present an model-based approach to recognize full 26 degrees of freedom of a human hand. Input data include RGB-D images acquired from a Kinect camera and a 3D model of the hand constructed from its anatomy and graphical matrices. A cost function is then defined so that its minimum value is achieved when the model and observation images are matched. To solve the optimization prob…
▽ More
In this study, we present an model-based approach to recognize full 26 degrees of freedom of a human hand. Input data include RGB-D images acquired from a Kinect camera and a 3D model of the hand constructed from its anatomy and graphical matrices. A cost function is then defined so that its minimum value is achieved when the model and observation images are matched. To solve the optimization problem in 26 dimensional space, the particle swarm optimization algorimth with improvements are used. In addition, parallel computation in graphical processing units (GPU) is utilized to handle computationally expensive tasks. Simulation and experimental results show that the system can recognize 26 degrees of freedom of hands with the processing time of 0.8 seconds per frame. The algorithm is robust to noise and the hardware requirement is simple with a single camera.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Real-time Lane Marker Detection Using Template Matching with RGB-D Camera
Authors:
Cong Hoang Quach,
Van Lien Tran,
Duy Hung Nguyen,
Viet Thang Nguyen,
Minh Trien Pham,
Manh Duong Phung
Abstract:
This paper addresses the problem of lane detection which is fundamental for self-driving vehicles. Our approach exploits both colour and depth information recorded by a single RGB-D camera to better deal with negative factors such as lighting conditions and lane-like objects. In the approach, colour and depth images are first converted to a half-binary format and a 2D matrix of 3D points. They are…
▽ More
This paper addresses the problem of lane detection which is fundamental for self-driving vehicles. Our approach exploits both colour and depth information recorded by a single RGB-D camera to better deal with negative factors such as lighting conditions and lane-like objects. In the approach, colour and depth images are first converted to a half-binary format and a 2D matrix of 3D points. They are then used as the inputs of template matching and geometric feature extraction processes to form a response map so that its values represent the probability of pixels being lane markers. To further improve the results, the template and lane surfaces are finally refined by principal component analysis and lane model fitting techniques. A number of experiments have been conducted on both synthetic and real datasets. The result shows that the proposed approach can effectively eliminate unwanted noise to accurately detect lane markers in various scenarios. Moreover, the processing speed of 20 frames per second under hardware configuration of a popular laptop computer allows the proposed algorithm to be implemented for real-time autonomous driving applications.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.
-
Image segmentation based on histogram of depth and an application in driver distraction detection
Authors:
Tran Hiep Dinh,
Minh Trien Pham,
Manh Duong Phung,
Duc Manh Nguyen,
Van Manh Hoang,
Quang Vinh Tran
Abstract:
This study proposes an approach to segment human object from a depth image based on histogram of depth values. The region of interest is first extracted based on a predefined threshold for histogram regions. A region growing process is then employed to separate multiple human bodies with the same depth interval. Our contribution is the identification of an adaptive growth threshold based on the de…
▽ More
This study proposes an approach to segment human object from a depth image based on histogram of depth values. The region of interest is first extracted based on a predefined threshold for histogram regions. A region growing process is then employed to separate multiple human bodies with the same depth interval. Our contribution is the identification of an adaptive growth threshold based on the detected histogram region. To demonstrate the effectiveness of the proposed method, an application in driver distraction detection was introduced. After successfully extracting the driver's position inside the car, we came up with a simple solution to track the driver motion. With the analysis of the difference between initial and current frame, a change of cluster position or depth value in the interested region, which cross the preset threshold, is considered as a distracted activity. The experiment results demonstrated the success of the algorithm in detecting typical distracted driving activities such as using phone for calling or texting, adjusting internal devices and drinking in real time.
△ Less
Submitted 31 August, 2016;
originally announced September 2016.
-
Multiagent Conflict Resolution for a Specification Network of Discrete-Event Coordinating Agents
Authors:
Manh Tung Pham,
Kiam Tian Seow
Abstract:
This paper presents a novel compositional approach to distributed coordination module (CM) synthesis for multiple discrete-event agents in the formal languages and automata framework. The approach is supported by two original ideas. The first is a new formalism called the Distributed Constraint Specification Network (DCSN) that can comprehensibly describe the networking constraint relationships am…
▽ More
This paper presents a novel compositional approach to distributed coordination module (CM) synthesis for multiple discrete-event agents in the formal languages and automata framework. The approach is supported by two original ideas. The first is a new formalism called the Distributed Constraint Specification Network (DCSN) that can comprehensibly describe the networking constraint relationships among distributed agents. The second is multiagent conflict resolution planning, which entails generating and using AND/OR graphs to compactly represent conflict resolution (synthesis-process) plans for a DCSN. Together with the framework of local CM design developed in the authors' earlier work, the systematic approach supports separately designing local and deconflicting CM's for individual agents in accordance to a selected conflict resolution plan. Composing the agent models and the CM's designed furnishes an overall nonblocking coordination solution that meets the set of inter-agent constraints specified in a given DCSN.
△ Less
Submitted 19 March, 2014;
originally announced March 2014.