-
Assembly line balancing considering stochastic task times and production defects
Authors:
Gazi Nazia Nur,
Mohammad Ahnaf Sadat,
Basit Mahmud Shahriar
Abstract:
In this paper, we address the inherent limitations in traditional assembly line balancing, specifically the assumptions that task times are constant and no defective outputs occur. These assumptions often do not hold in practical scenarios, leading to inefficiencies. To address these challenges, we introduce a framework utilizing an "adjusted processing time" approach based on the distributional i…
▽ More
In this paper, we address the inherent limitations in traditional assembly line balancing, specifically the assumptions that task times are constant and no defective outputs occur. These assumptions often do not hold in practical scenarios, leading to inefficiencies. To address these challenges, we introduce a framework utilizing an "adjusted processing time" approach based on the distributional information of both processing times and defect occurrences. We validate our framework through the analysis of two case studies from existing literature, demonstrating its robustness and adaptability. Our framework is characterized by its simplicity, both in understanding and implementation, marking a substantial advancement in the field. It presents a viable and efficient solution for industries seeking to enhance operational efficiency through improved resource allocation.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Impact of Adversarial Attacks on Deep Learning Model Explainability
Authors:
Gazi Nazia Nur,
Mohammad Ahnaf Sadat
Abstract:
In this paper, we investigate the impact of adversarial attacks on the explainability of deep learning models, which are commonly criticized for their black-box nature despite their capacity for autonomous feature extraction. This black-box nature can affect the perceived trustworthiness of these models. To address this, explainability techniques such as GradCAM, SmoothGrad, and LIME have been dev…
▽ More
In this paper, we investigate the impact of adversarial attacks on the explainability of deep learning models, which are commonly criticized for their black-box nature despite their capacity for autonomous feature extraction. This black-box nature can affect the perceived trustworthiness of these models. To address this, explainability techniques such as GradCAM, SmoothGrad, and LIME have been developed to clarify model decision-making processes. Our research focuses on the robustness of these explanations when models are subjected to adversarial attacks, specifically those involving subtle image perturbations that are imperceptible to humans but can significantly mislead models. For this, we utilize attack methods like the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM) and observe their effects on model accuracy and explanations. The results reveal a substantial decline in model accuracy, with accuracies dropping from 89.94% to 58.73% and 45.50% under FGSM and BIM attacks, respectively. Despite these declines in accuracy, the explanation of the models measured by metrics such as Intersection over Union (IoU) and Root Mean Square Error (RMSE) shows negligible changes, suggesting that these metrics may not be sensitive enough to detect the presence of adversarial perturbations.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems
Authors:
Asir Saadat,
Tasmia Binte Sogir,
Md Taukir Azam Chowdhury,
Syem Aziz
Abstract:
Large language models (LLMs) are increasingly relied upon to solve complex mathematical word problems. However, being susceptible to hallucination, they may generate inaccurate results when presented with unanswerable questions, raising concerns about their potential harm. While GPT models are now widely used and trusted, the exploration of how they can effectively abstain from answering unanswera…
▽ More
Large language models (LLMs) are increasingly relied upon to solve complex mathematical word problems. However, being susceptible to hallucination, they may generate inaccurate results when presented with unanswerable questions, raising concerns about their potential harm. While GPT models are now widely used and trusted, the exploration of how they can effectively abstain from answering unanswerable math problems and the enhancement of their abstention capabilities has not been rigorously investigated. In this paper, we investigate whether GPTs can appropriately respond to unanswerable math word problems by applying prompts typically used in solvable mathematical scenarios. Our experiments utilize the Unanswerable Word Math Problem (UWMP) dataset, directly leveraging GPT model APIs. Evaluation metrics are introduced, which integrate three key factors: abstention, correctness and confidence. Our findings reveal critical gaps in GPT models and the hallucination it suffers from for unsolvable problems, highlighting the need for improved models capable of better managing uncertainty and complex reasoning in math word problem-solving contexts.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Contextual Breach: Assessing the Robustness of Transformer-based QA Models
Authors:
Asir Saadat,
Nahian Ibn Asad,
Md Farhan Ishmam
Abstract:
Contextual question-answering models are susceptible to adversarial perturbations to input context, commonly observed in real-world scenarios. These adversarial noises are designed to degrade the performance of the model by distorting the textual input. We introduce a unique dataset that incorporates seven distinct types of adversarial noise into the context, each applied at five different intensi…
▽ More
Contextual question-answering models are susceptible to adversarial perturbations to input context, commonly observed in real-world scenarios. These adversarial noises are designed to degrade the performance of the model by distorting the textual input. We introduce a unique dataset that incorporates seven distinct types of adversarial noise into the context, each applied at five different intensity levels on the SQuAD dataset. To quantify the robustness, we utilize robustness metrics providing a standardized measure for assessing model performance across varying noise types and levels. Experiments on transformer-based question-answering models reveal robustness vulnerabilities and important insights into the model's performance in realistic textual input.
△ Less
Submitted 20 September, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Federated Impression for Learning with Distributed Heterogeneous Data
Authors:
Atrin Arya,
Sana Ayromlou,
Armin Saadat,
Purang Abolmaesumi,
Xiaoxiao Li
Abstract:
Standard deep learning-based classification approaches may not always be practical in real-world clinical applications, as they require a centralized collection of all samples. Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data, which can help mitigate privacy and data ownership issues. In FL, sub-optimal converg…
▽ More
Standard deep learning-based classification approaches may not always be practical in real-world clinical applications, as they require a centralized collection of all samples. Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data, which can help mitigate privacy and data ownership issues. In FL, sub-optimal convergence caused by data heterogeneity is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers. Through experimentation in this study, we show that data heterogeneity leads to the phenomenon of catastrophic forgetting during local training. We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression. To achieve this, we distill the global model resulting from each communication round. Subsequently, we use the synthetic data alongside the local data to enhance the generalization of local training. Extensive experiments show that the proposed method achieves state-of-the-art performance on both the BloodMNIST and Retina datasets, which contain label imbalance and domain shift, with an improvement in classification accuracy of up to 20%.
△ Less
Submitted 9 October, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
LSSF-Net: Lightweight Segmentation with Self-Awareness, Spatial Attention, and Focal Modulation
Authors:
Hamza Farooq,
Zuhair Zafar,
Ahsan Saadat,
Tariq M Khan,
Shahzaib Iqbal,
Imran Razzak
Abstract:
Accurate segmentation of skin lesions within dermoscopic images plays a crucial role in the timely identification of skin cancer for computer-aided diagnosis on mobile platforms. However, varying shapes of the lesions, lack of defined edges, and the presence of obstructions such as hair strands and marker colors make this challenge more complex. \textcolor{red}Additionally, skin lesions often exhi…
▽ More
Accurate segmentation of skin lesions within dermoscopic images plays a crucial role in the timely identification of skin cancer for computer-aided diagnosis on mobile platforms. However, varying shapes of the lesions, lack of defined edges, and the presence of obstructions such as hair strands and marker colors make this challenge more complex. \textcolor{red}Additionally, skin lesions often exhibit subtle variations in texture and color that are difficult to differentiate from surrounding healthy skin, necessitating models that can capture both fine-grained details and broader contextual information. Currently, melanoma segmentation models are commonly based on fully connected networks and U-Nets. However, these models often struggle with capturing the complex and varied characteristics of skin lesions, such as the presence of indistinct boundaries and diverse lesion appearances, which can lead to suboptimal segmentation performance.To address these challenges, we propose a novel lightweight network specifically designed for skin lesion segmentation utilizing mobile devices, featuring a minimal number of learnable parameters (only 0.8 million). This network comprises an encoder-decoder architecture that incorporates conformer-based focal modulation attention, self-aware local and global spatial attention, and split channel-shuffle. The efficacy of our model has been evaluated on four well-established benchmark datasets for skin lesion segmentation: ISIC 2016, ISIC 2017, ISIC 2018, and PH2. Empirical findings substantiate its state-of-the-art performance, notably reflected in a high Jaccard index.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Visual Robustness Benchmark for Visual Question Answering (VQA)
Authors:
Md Farhan Ishmam,
Ishmam Tashdeed,
Talukder Asir Saadat,
Md Hamjajul Ashmafee,
Abu Raihan Mostofa Kamal,
Md. Azam Hossain
Abstract:
Can Visual Question Answering (VQA) systems perform just as well when deployed in the real world? Or are they susceptible to realistic corruption effects e.g. image blur, which can be detrimental in sensitive applications, such as medical VQA? While linguistic or textual robustness has been thoroughly explored in the VQA literature, there has yet to be any significant work on the visual robustness…
▽ More
Can Visual Question Answering (VQA) systems perform just as well when deployed in the real world? Or are they susceptible to realistic corruption effects e.g. image blur, which can be detrimental in sensitive applications, such as medical VQA? While linguistic or textual robustness has been thoroughly explored in the VQA literature, there has yet to be any significant work on the visual robustness of VQA models. We propose the first large-scale benchmark comprising 213,000 augmented images, challenging the visual robustness of multiple VQA models and assessing the strength of realistic visual corruptions. Additionally, we have designed several robustness evaluation metrics that can be aggregated into a unified metric and tailored to fit a variety of use cases. Our experiments reveal several insights into the relationships between model size, performance, and robustness with the visual corruptions. Our benchmark highlights the need for a balanced approach in model development that considers model performance without compromising the robustness.
△ Less
Submitted 29 October, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving
Authors:
Sourav Biswas,
Sergio Casas,
Quinlan Sykora,
Ben Agro,
Abbas Sadat,
Raquel Urtasun
Abstract:
A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps hav…
▽ More
A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate candidate trajectories around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art in high-fidelity closed-loop simulations.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Constrained Bayesian Optimization Using a Lagrange Multiplier Applied to Power Transistor Design
Authors:
Ping-Ju Chuang,
Ali Saadat,
Sara Ghazvini,
Hal Edwards,
William G. Vandenberghe
Abstract:
We propose a novel constrained Bayesian Optimization (BO) algorithm optimizing the design process of Laterally-Diffused Metal-Oxide-Semiconductor (LDMOS) transistors while realizing a target Breakdown Voltage (BV). We convert the constrained BO problem into a conventional BO problem using a Lagrange multiplier. Instead of directly optimizing the traditional Figure-of-Merit (FOM), we set the Lagran…
▽ More
We propose a novel constrained Bayesian Optimization (BO) algorithm optimizing the design process of Laterally-Diffused Metal-Oxide-Semiconductor (LDMOS) transistors while realizing a target Breakdown Voltage (BV). We convert the constrained BO problem into a conventional BO problem using a Lagrange multiplier. Instead of directly optimizing the traditional Figure-of-Merit (FOM), we set the Lagrangian as the objective function of BO. This adaptive objective function with a changeable Lagrange multiplier can address constrained BO problems which have constraints that require costly evaluations, without the need for additional surrogate models to approximate constraints. Our algorithm enables a device designer to set the target BV in the design space, and obtain a device that satisfies the optimized FOM and the target BV constraint automatically. Utilizing this algorithm, we have also explored the physical limits of the FOM for our devices in 30 - 50 V range within the defined design space.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques
Authors:
Samiul Islam,
Md. Maksudul Haque,
Abu Jobayer Md. Sadat
Abstract:
Traditional approaches in speech emotion recognition, such as LSTM, CNN, RNN, SVM, and MLP, have limitations such as difficulty capturing long-term dependencies in sequential data, capturing the temporal dynamics, and struggling to capture complex patterns and relationships in multimodal data. This research addresses these shortcomings by proposing an ensemble model that combines Graph Convolution…
▽ More
Traditional approaches in speech emotion recognition, such as LSTM, CNN, RNN, SVM, and MLP, have limitations such as difficulty capturing long-term dependencies in sequential data, capturing the temporal dynamics, and struggling to capture complex patterns and relationships in multimodal data. This research addresses these shortcomings by proposing an ensemble model that combines Graph Convolutional Networks (GCN) for processing textual data and the HuBERT transformer for analyzing audio signals. We found that GCNs excel at capturing Long-term contextual dependencies and relationships within textual data by leveraging graph-based representations of text and thus detecting the contextual meaning and semantic relationships between words. On the other hand, HuBERT utilizes self-attention mechanisms to capture long-range dependencies, enabling the modeling of temporal dynamics present in speech and capturing subtle nuances and variations that contribute to emotion recognition. By combining GCN and HuBERT, our ensemble model can leverage the strengths of both approaches. This allows for the simultaneous analysis of multimodal data, and the fusion of these modalities enables the extraction of complementary information, enhancing the discriminative power of the emotion recognition system. The results indicate that the combined model can overcome the limitations of traditional methods, leading to enhanced accuracy in recognizing emotions from speech.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography
Authors:
Hooman Vaseli,
Ang Nan Gu,
S. Neda Ahmadi Amiri,
Michael Y. Tsang,
Andrea Fung,
Nima Kondori,
Armin Saadat,
Purang Abolmaesumi,
Teresa S. M. Tsang
Abstract:
Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography video…
▽ More
Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography videos, while making interpretable predictions based on the similarity between the input and learned spatio-temporal prototypes. This approach provides supporting evidence that is clinically relevant, as the prototypes typically highlight markers such as calcification and restricted movement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstention loss to estimate aleatoric uncertainty by defining a set of prototypes that capture ambiguity and insufficient information in the observed data. This provides a reliable system that can detect and explain when it may fail. We evaluate ProtoASNet on a private dataset and the publicly available TMED-2 dataset, where it outperforms existing state-of-the-art methods with an accuracy of 80.0% and 79.7%, respectively. Furthermore, ProtoASNet provides interpretability and an uncertainty measure for each prediction, which can improve transparency and facilitate the interactive usage of deep networks to aid clinical decision-making. Our source code is available at: https://github.com/hooman007/ProtoASNet.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
A Residual Encoder-Decoder Network for Segmentation of Retinal Image-Based Exudates in Diabetic Retinopathy Screening
Authors:
Malik A. Manan,
Tariq M. Khan,
Ahsan Saadat,
Muhammad Arsalan,
Syed S. Naqvi
Abstract:
Diabetic retinopathy refers to the pathology of the retina induced by diabetes and is one of the leading causes of preventable blindness in the world. Early detection of diabetic retinopathy is critical to avoid vision problem through continuous screening and treatment. In traditional clinical practice, the involved lesions are manually detected using photographs of the fundus. However, this task…
▽ More
Diabetic retinopathy refers to the pathology of the retina induced by diabetes and is one of the leading causes of preventable blindness in the world. Early detection of diabetic retinopathy is critical to avoid vision problem through continuous screening and treatment. In traditional clinical practice, the involved lesions are manually detected using photographs of the fundus. However, this task is cumbersome and time-consuming and requires intense effort due to the small size of lesion and low contrast of the images. Thus, computer-assisted diagnosis of diabetic retinopathy based on the detection of red lesions is actively being explored recently. In this paper, we present a convolutional neural network with residual skip connection for the segmentation of exudates in retinal images. To improve the performance of network architecture, a suitable image augmentation technique is used. The proposed network can robustly segment exudates with high accuracy, which makes it suitable for diabetic retinopathy screening. Comparative performance analysis of three benchmark databases: HEI-MED, E-ophtha, and DiaretDB1 is presented. It is shown that the proposed method achieves accuracy (0.98, 0.99, 0.98) and sensitivity (0.97, 0.92, and 0.95) on E-ophtha, HEI-MED, and DiaReTDB1, respectively.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
MP3: A Unified Model to Map, Perceive, Predict and Plan
Authors:
Sergio Casas,
Abbas Sadat,
Raquel Urtasun
Abstract:
High-definition maps (HD maps) are a key component of most modern self-driving systems due to their valuable semantic and geometric information. Unfortunately, building HD maps has proven hard to scale due to their cost as well as the requirements they impose in the localization system that has to work everywhere with centimeter-level accuracy. Being able to drive without an HD map would be very b…
▽ More
High-definition maps (HD maps) are a key component of most modern self-driving systems due to their valuable semantic and geometric information. Unfortunately, building HD maps has proven hard to scale due to their cost as well as the requirements they impose in the localization system that has to work everywhere with centimeter-level accuracy. Being able to drive without an HD map would be very beneficial to scale self-driving solutions as well as to increase the failure tolerance of existing ones (e.g., if localization fails or the map is not up-to-date). Towards this goal, we propose MP3, an end-to-end approach to mapless driving where the input is raw sensor data and a high-level command (e.g., turn left at the intersection). MP3 predicts intermediate representations in the form of an online map and the current and future state of dynamic agents, and exploits them in a novel neural motion planner to make interpretable decisions taking into account uncertainty. We show that our approach is significantly safer, more comfortable, and can follow commands better than the baselines in challenging long-term closed-loop simulations, as well as when compared to an expert driver in a large-scale real-world dataset.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
Authors:
John Phillips,
Julieta Martinez,
Ioan Andrei Bârsan,
Sergio Casas,
Abbas Sadat,
Raquel Urtasun
Abstract:
Over the last few years, we have witnessed tremendous progress on many subtasks of autonomous driving, including perception, motion forecasting, and motion planning. However, these systems often assume that the car is accurately localized against a high-definition map. In this paper we question this assumption, and investigate the issues that arise in state-of-the-art autonomy stacks under localiz…
▽ More
Over the last few years, we have witnessed tremendous progress on many subtasks of autonomous driving, including perception, motion forecasting, and motion planning. However, these systems often assume that the car is accurately localized against a high-definition map. In this paper we question this assumption, and investigate the issues that arise in state-of-the-art autonomy stacks under localization error. Based on our observations, we design a system that jointly performs perception, prediction, and localization. Our architecture is able to reuse computation between both tasks, and is thus able to correct localization errors efficiently. We show experiments on a large-scale autonomy dataset, demonstrating the efficiency and accuracy of our proposed approach.
△ Less
Submitted 10 April, 2021; v1 submitted 17 January, 2021;
originally announced January 2021.
-
End-to-end Interpretable Neural Motion Planner
Authors:
Wenyuan Zeng,
Wenjie Luo,
Simon Suo,
Abbas Sadat,
Bin Yang,
Sergio Casas,
Raquel Urtasun
Abstract:
In this paper, we propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their fut…
▽ More
In this paper, we propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon. We then sample a set of diverse physically possible trajectories and choose the one with the minimum learned cost. Importantly, our cost volume is able to naturally capture multi-modality. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America. Our experiments show that the learned cost volume can generate safer planning than all the baselines.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Diverse Complexity Measures for Dataset Curation in Self-driving
Authors:
Abbas Sadat,
Sean Segal,
Sergio Casas,
James Tu,
Bin Yang,
Raquel Urtasun,
Ersin Yumer
Abstract:
Modern self-driving autonomy systems heavily rely on deep learning. As a consequence, their performance is influenced significantly by the quality and richness of the training data. Data collecting platforms can generate many hours of raw data in a daily basis, however, it is not feasible to label everything. It is thus of key importance to have a mechanism to identify "what to label". Active lear…
▽ More
Modern self-driving autonomy systems heavily rely on deep learning. As a consequence, their performance is influenced significantly by the quality and richness of the training data. Data collecting platforms can generate many hours of raw data in a daily basis, however, it is not feasible to label everything. It is thus of key importance to have a mechanism to identify "what to label". Active learning approaches identify examples to label, but their interestingness is tied to a fixed model performing a particular task. These assumptions are not valid in self-driving, where we have to solve a diverse set of tasks (i.e., perception, and motion forecasting) and our models evolve over time frequently. In this paper we introduce a novel approach and propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments on a wide range of tasks and models show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Authors:
Jingkang Wang,
Ava Pun,
James Tu,
Sivabalan Manivasagam,
Abbas Sadat,
Sergio Casas,
Mengye Ren,
Raquel Urtasun
Abstract:
As self-driving systems become better, simulating scenarios where the autonomy stack may fail becomes more important. Traditionally, those scenarios are generated for a few scenes with respect to the planning module that takes ground-truth actor states as input. This does not scale and cannot identify all possible autonomy failures, such as perception failures due to occlusion. In this paper, we p…
▽ More
As self-driving systems become better, simulating scenarios where the autonomy stack may fail becomes more important. Traditionally, those scenarios are generated for a few scenes with respect to the planning module that takes ground-truth actor states as input. This does not scale and cannot identify all possible autonomy failures, such as perception failures due to occlusion. In this paper, we propose AdvSim, an adversarial framework to generate safety-critical scenarios for any LiDAR-based autonomy system. Given an initial traffic scenario, AdvSim modifies the actors' trajectories in a physically plausible manner and updates the LiDAR sensor data to match the perturbed world. Importantly, by simulating directly from sensor data, we obtain adversarial scenarios that are safety-critical for the full autonomy stack. Our experiments show that our approach is general and can identify thousands of semantically meaningful safety-critical scenarios for a wide range of modern self-driving systems. Furthermore, we show that the robustness and safety of these systems can be further improved by training them with scenarios generated by AdvSim.
△ Less
Submitted 16 April, 2023; v1 submitted 16 January, 2021;
originally announced January 2021.
-
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
Authors:
Alexander Cui,
Sergio Casas,
Abbas Sadat,
Renjie Liao,
Raquel Urtasun
Abstract:
In this paper, we present LookOut, a novel autonomy system that perceives the environment, predicts a diverse set of futures of how the scene might unroll and estimates the trajectory of the SDV by optimizing a set of contingency plans over these future realizations. In particular, we learn a diverse joint distribution over multi-agent future trajectories in a traffic scene that covers a wide rang…
▽ More
In this paper, we present LookOut, a novel autonomy system that perceives the environment, predicts a diverse set of futures of how the scene might unroll and estimates the trajectory of the SDV by optimizing a set of contingency plans over these future realizations. In particular, we learn a diverse joint distribution over multi-agent future trajectories in a traffic scene that covers a wide range of future modes with high sample efficiency while leveraging the expressive power of generative models. Unlike previous work in diverse motion forecasting, our diversity objective explicitly rewards sampling future scenarios that require distinct reactions from the self-driving vehicle for improved safety. Our contingency planner then finds comfortable and non-conservative trajectories that ensure safe reactions to a wide range of future scenarios. Through extensive evaluations, we show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset as well as safer and less-conservative motion plans in long-term closed-loop simulations when compared to current state-of-the-art models.
△ Less
Submitted 7 May, 2021; v1 submitted 16 January, 2021;
originally announced January 2021.
-
Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs
Authors:
Sean Segal,
Eric Kee,
Wenjie Luo,
Abbas Sadat,
Ersin Yumer,
Raquel Urtasun
Abstract:
In this paper, we tackle the problem of spatio-temporal tagging of self-driving scenes from raw sensor data. Our approach learns a universal embedding for all tags, enabling efficient tagging of many attributes and faster learning of new attributes with limited data. Importantly, the embedding is spatio-temporally aware, allowing the model to naturally output spatio-temporal tag values. Values can…
▽ More
In this paper, we tackle the problem of spatio-temporal tagging of self-driving scenes from raw sensor data. Our approach learns a universal embedding for all tags, enabling efficient tagging of many attributes and faster learning of new attributes with limited data. Importantly, the embedding is spatio-temporally aware, allowing the model to naturally output spatio-temporal tag values. Values can then be pooled over arbitrary regions, in order to, for example, compute the pedestrian density in front of the SDV, or determine if a car is blocking another car at a 4-way intersection. We demonstrate the effectiveness of our approach on a new large scale self-driving dataset, SDVScenes, containing 15 attributes relating to vehicle and pedestrian density, the actions of each actor, the speed of each actor, interactions between actors, and the topology of the road map.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction
Authors:
Kelvin Wong,
Qiang Zhang,
Ming Liang,
Bin Yang,
Renjie Liao,
Abbas Sadat,
Raquel Urtasun
Abstract:
We present a novel method for testing the safety of self-driving vehicles in simulation. We propose an alternative to sensor simulation, as sensor simulation is expensive and has large domain gaps. Instead, we directly simulate the outputs of the self-driving vehicle's perception and prediction system, enabling realistic motion planning testing. Specifically, we use paired data in the form of grou…
▽ More
We present a novel method for testing the safety of self-driving vehicles in simulation. We propose an alternative to sensor simulation, as sensor simulation is expensive and has large domain gaps. Instead, we directly simulate the outputs of the self-driving vehicle's perception and prediction system, enabling realistic motion planning testing. Specifically, we use paired data in the form of ground truth labels and real perception and prediction outputs to train a model that predicts what the online system will produce. Importantly, the inputs to our system consists of high definition maps, bounding boxes, and trajectories, which can be easily sketched by a test engineer in a matter of minutes. This makes our approach a much more scalable solution. Quantitative results on two large-scale datasets demonstrate that we can realistically test motion planning using our simulations.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations
Authors:
Abbas Sadat,
Sergio Casas,
Mengye Ren,
Xinyu Wu,
Pranaab Dhawan,
Raquel Urtasun
Abstract:
In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy rep…
▽ More
In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Initializing Successive Linear Programming Solver for ACOPF using Machine Learning
Authors:
Sayed Abdullah Sadat,
Mostafa Sahraei-Ardakani
Abstract:
A Successive linear programming (SLP) approach is one of the favorable approaches for solving large scale nonlinear optimization problems. Solving an alternating current optimal power flow (ACOPF) problem is no exception, particularly considering the large real-world transmission networks across the country. It is, however, essential to improve the computational performance of the SLP algorithm. O…
▽ More
A Successive linear programming (SLP) approach is one of the favorable approaches for solving large scale nonlinear optimization problems. Solving an alternating current optimal power flow (ACOPF) problem is no exception, particularly considering the large real-world transmission networks across the country. It is, however, essential to improve the computational performance of the SLP algorithm. One way to achieve this goal is through the efficient initialization of the algorithm with a near-optimal solution. This paper examines various machine learning (ML) algorithms available in the Scikit-Learn library to initialize an SLP-ACOPF solver, including examining linear and nonlinear ML algorithms. We evaluate the quality of each of these machine learning algorithms for predicting variables needed for a power flow solution. The solution is then used as an initialization for an SLP-ACOPF algorithm. The approach is tested on a congested and non-congested 3 bus systems. The results obtained from the best-performed ML algorithm in this work are compared with the results of a DCOPF solution for the initialization of an SLP-ACOPF solver.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles
Authors:
Abbas Sadat,
Mengye Ren,
Andrei Pokrovsky,
Yen-Chen Lin,
Ersin Yumer,
Raquel Urtasun
Abstract:
The motion planners used in self-driving vehicles need to generate trajectories that are safe, comfortable, and obey the traffic rules. This is usually achieved by two modules: behavior planner, which handles high-level decisions and produces a coarse trajectory, and trajectory planner that generates a smooth, feasible trajectory for the duration of the planning horizon. These planners, however, a…
▽ More
The motion planners used in self-driving vehicles need to generate trajectories that are safe, comfortable, and obey the traffic rules. This is usually achieved by two modules: behavior planner, which handles high-level decisions and produces a coarse trajectory, and trajectory planner that generates a smooth, feasible trajectory for the duration of the planning horizon. These planners, however, are typically developed separately, and changes in the behavior planner might affect the trajectory planner in unexpected ways. Furthermore, the final trajectory outputted by the trajectory planner might differ significantly from the one generated by the behavior planner, as they do not share the same objective. In this paper, we propose a jointly learnable behavior and trajectory planner. Unlike most existing learnable motion planners that address either only behavior planning, or use an uninterpretable neural network to represent the entire logic from sensors to driving commands, our approach features an interpretable cost function on top of perception, prediction and vehicle dynamics, and a joint learning algorithm that learns a shared cost function employed by our behavior and trajectory components. Experiments on real-world self-driving data demonstrate that jointly learned planner performs significantly better in terms of both similarity to human driving and other safety metrics, compared to baselines that do not adopt joint behavior and trajectory learning.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Exact Blur Measure Outperforms Conventional Learned Features for Depth Finding
Authors:
Akbar Saadat
Abstract:
Image analysis methods that are based on exact blur values are faced with the computational complexities due to blur measurement error. This atmosphere encourages scholars to look for handcrafted and learned features for finding depth from a single image. This paper introduces a novel exact realization for blur measures on digital images and implements it on a new measure of defocus Gaussian blur…
▽ More
Image analysis methods that are based on exact blur values are faced with the computational complexities due to blur measurement error. This atmosphere encourages scholars to look for handcrafted and learned features for finding depth from a single image. This paper introduces a novel exact realization for blur measures on digital images and implements it on a new measure of defocus Gaussian blur at edge points in Depth From Defocus (DFD) methods with the potential to change this atmosphere. The experiments on real images indicate superiority of the proposed measure in error performance over conventional learned features in the state-of the-art single image based depth estimation methods.
△ Less
Submitted 31 August, 2017;
originally announced September 2017.