-
Iterative Active-Inactive Obstacle Classification for Time-Optimal Collision Avoidance
Authors:
Mehmetcan Kaymaz,
Nazim Kemal Ure
Abstract:
Time-optimal obstacle avoidance is a prevalent problem encountered in various fields, including robotics and autonomous vehicles, where the task involves determining a path for a moving vehicle to reach its goal while navigating around obstacles within its environment. This problem becomes increasingly challenging as the number of obstacles in the environment rises. We propose an iterative active-…
▽ More
Time-optimal obstacle avoidance is a prevalent problem encountered in various fields, including robotics and autonomous vehicles, where the task involves determining a path for a moving vehicle to reach its goal while navigating around obstacles within its environment. This problem becomes increasingly challenging as the number of obstacles in the environment rises. We propose an iterative active-inactive obstacle approach, which involves identifying a subset of the obstacles as "active", that considers solely the effect of the "active" obstacles on the path of the moving vehicle. The remaining obstacles are considered "inactive" and are not considered in the path planning process. The obstacles are classified as 'active' on the basis of previous findings derived from prior iterations. This approach allows for a more efficient calculation of the optimal path by reducing the number of obstacles that need to be considered. The effectiveness of the proposed method is demonstrated with two different dynamic models using the various number of obstacles. The results show that the proposed method is able to find the optimal path in a timely manner, while also being able to handle a large number of obstacles in the environment and the constraints on the motion of the object.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration Data
Authors:
Gulay Goktas Sever,
Umut Demir,
Abdullah Sadik Satir,
Mustafa Cagatay Sahin,
Nazim Kemal Ure
Abstract:
In this paper, we present a methodology for constructing data-driven maneuver generation models for agile aircraft that can generalize across a wide range of trim conditions and aircraft model parameters. Maneuver generation models play a crucial role in the testing and evaluation of aircraft prototypes, providing insights into the maneuverability and agility of the aircraft. However, constructing…
▽ More
In this paper, we present a methodology for constructing data-driven maneuver generation models for agile aircraft that can generalize across a wide range of trim conditions and aircraft model parameters. Maneuver generation models play a crucial role in the testing and evaluation of aircraft prototypes, providing insights into the maneuverability and agility of the aircraft. However, constructing the models typically requires extensive amounts of real pilot data, which can be time-consuming and costly to obtain. Moreover, models built with limited data often struggle to generalize beyond the specific flight conditions covered in the original dataset. To address these challenges, we propose a hybrid architecture that leverages a simulation model, referred to as the source model. This open-source agile aircraft simulator shares similar dynamics with the target aircraft and allows us to generate unlimited data for building a proxy maneuver generation model. We then fine-tune this model to the target aircraft using a limited amount of real pilot data. Our approach combines techniques from imitation learning, transfer learning, and reinforcement learning to achieve this objective. To validate our methodology, we utilize real agile pilot data provided by Turkish Aerospace Industries (TAI). By employing the F-16 as the source model, we demonstrate that it is possible to construct a maneuver generation model that generalizes across various trim conditions and aircraft parameters without requiring any additional real pilot data. Our results showcase the effectiveness of our approach in developing robust and adaptable models for agile aircraft.
△ Less
Submitted 27 December, 2023;
originally announced January 2024.
-
Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics
Authors:
Gokhan Budan,
Francesca Damiani,
Can Kurtulus,
N. Kemal Ure
Abstract:
Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to e…
▽ More
Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Reinforcement Learning Based Self-play and State Stacking Techniques for Noisy Air Combat Environment
Authors:
Ahmet Semih Tasbas,
Safa Onur Sahin,
Nazim Kemal Ure
Abstract:
Reinforcement learning (RL) has recently proven itself as a powerful instrument for solving complex problems and even surpassed human performance in several challenging applications. This signifies that RL algorithms can be used in the autonomous air combat problem, which has been studied for many years. The complexity of air combat arises from aggressive close-range maneuvers and agile enemy beha…
▽ More
Reinforcement learning (RL) has recently proven itself as a powerful instrument for solving complex problems and even surpassed human performance in several challenging applications. This signifies that RL algorithms can be used in the autonomous air combat problem, which has been studied for many years. The complexity of air combat arises from aggressive close-range maneuvers and agile enemy behaviors. In addition to these complexities, there may be uncertainties in real-life scenarios due to sensor errors, which prevent estimation of the actual position of the enemy. In this case, autonomous aircraft should be successful even in the noisy environments. In this study, we developed an air combat simulation, which provides noisy observations to the agents, therefore, make the air combat problem even more challenging. Thus, we present a state stacking method for noisy RL environments as a noise reduction technique. In our extensive set of experiments, the proposed method significantly outperforms the baseline algorithms in terms of the winning ratio, where the performance improvement is even more pronounced in the high noise levels. In addition, we incorporate a self-play scheme to our training process by periodically updating the enemy with a frozen copy of the training agent. By this way, the training agent performs air combat simulations to an enemy with smarter strategies, which improves the performance and robustness of the agents. In our simulations, we demonstrate that the self-play scheme provides important performance gains compared to the classical RL training.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social Dilemmas
Authors:
Bengisu Guresti,
Abdullah Vanlioglu,
Nazim Kemal Ure
Abstract:
Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations am…
▽ More
Achieving and maintaining cooperation between agents to accomplish a common objective is one of the central goals of Multi-Agent Reinforcement Learning (MARL). Nevertheless in many real-world scenarios, separately trained and specialized agents are deployed into a shared environment, or the environment requires multiple objectives to be achieved by different coexisting parties. These variations among specialties and objectives are likely to cause mixed motives that eventually result in a social dilemma where all the parties are at a loss. In order to resolve this issue, we propose the Incentive Q-Flow (IQ-Flow) algorithm, which modifies the system's reward setup with an incentive regulator agent such that the cooperative policy also corresponds to the self-interested policy for the agents. Unlike the existing methods that learn to incentivize self-interested agents, IQ-Flow does not make any assumptions about agents' policies or learning algorithms, which enables the generalization of the developed framework to a wider array of applications. IQ-Flow performs an offline evaluation of the optimality of the learned policies using the data provided by other agents to determine cooperative and self-interested policies. Next, IQ-Flow uses meta-gradient learning to estimate how policy evaluation changes according to given incentives and modifies the incentive such that the greedy policy for cooperative objective and self-interested objective yield the same actions. We present the operational characteristics of IQ-Flow in Iterated Matrix Games. We demonstrate that IQ-Flow outperforms the state-of-the-art incentive design algorithm in Escape Room and 2-Player Cleanup environments. We further demonstrate that the pretrained IQ-Flow mechanism significantly outperforms the performance of the shared reward setup in the 2-Player Cleanup environment.
△ Less
Submitted 4 March, 2023; v1 submitted 28 February, 2023;
originally announced February 2023.
-
Scalable Planning and Learning Framework Development for Swarm-to-Swarm Engagement Problems
Authors:
Umut Demir,
A. Sadik Satir,
Gulay Goktas Sever,
Cansu Yikilmaz,
Nazim Kemal Ure
Abstract:
Development of guidance, navigation and control frameworks/algorithms for swarms attracted significant attention in recent years. That being said, algorithms for planning swarm allocations/trajectories for engaging with enemy swarms is largely an understudied problem. Although small-scale scenarios can be addressed with tools from differential game theory, existing approaches fail to scale for lar…
▽ More
Development of guidance, navigation and control frameworks/algorithms for swarms attracted significant attention in recent years. That being said, algorithms for planning swarm allocations/trajectories for engaging with enemy swarms is largely an understudied problem. Although small-scale scenarios can be addressed with tools from differential game theory, existing approaches fail to scale for large-scale multi-agent pursuit evasion (PE) scenarios. In this work, we propose a reinforcement learning (RL) based framework to decompose to large-scale swarm engagement problems into a number of independent multi-agent pursuit-evasion games. We simulate a variety of multi-agent PE scenarios, where finite time capture is guaranteed under certain conditions. The calculated PE statistics are provided as a reward signal to the high level allocation layer, which uses an RL algorithm to allocate controlled swarm units to eliminate enemy swarm units with maximum efficiency. We verify our approach in large-scale swarm-to-swarm engagement simulations.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms
Authors:
Resul Dagdanov,
Halil Durmus,
Nazim Kemal Ure
Abstract:
In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of…
▽ More
In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.
△ Less
Submitted 9 July, 2023; v1 submitted 29 October, 2022;
originally announced October 2022.
-
DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving
Authors:
Resul Dagdanov,
Feyza Eksen,
Halil Durmus,
Ferhat Yurdakul,
Nazim Kemal Ure
Abstract:
Safely navigating through an urban environment without violating any traffic rules is a crucial performance target for reliable autonomous driving. In this paper, we present a Reinforcement Learning (RL) based methodology to DEtect and FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting infraction spots and re-constructing mini-scenarios on these infraction areas to train an RL…
▽ More
Safely navigating through an urban environment without violating any traffic rules is a crucial performance target for reliable autonomous driving. In this paper, we present a Reinforcement Learning (RL) based methodology to DEtect and FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting infraction spots and re-constructing mini-scenarios on these infraction areas to train an RL agent for fixing the shortcomings of the IL approach. DeFIX is a continuous learning framework, where extraction of failure scenarios and training of RL agents are executed in an infinite loop. After each new policy is trained and added to the library of policies, a policy classifier method effectively decides on which policy to activate at each step during the evaluation. It is demonstrated that even with only one RL agent trained on failure scenario of an IL agent, DeFIX method is either competitive or does outperform state-of-the-art IL and RL based autonomous urban driving benchmarks. We trained and validated our approach on the most challenging map (Town05) of CARLA simulator which involves complex, realistic, and adversarial driving scenarios. The source code is publicly available at https://github.com/data-and-decision-lab/DeFIX
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
A Scalable Reinforcement Learning Approach for Attack Allocation in Swarm to Swarm Engagement Problems
Authors:
Umut Demir,
Nazim Kemal Ure
Abstract:
In this work we propose a reinforcement learning (RL) framework that controls the density of a large-scale swarm for engaging with adversarial swarm attacks. Although there is a significant amount of existing work in applying artificial intelligence methods to swarm control, analysis of interactions between two adversarial swarms is a rather understudied area. Most of the existing work in this sub…
▽ More
In this work we propose a reinforcement learning (RL) framework that controls the density of a large-scale swarm for engaging with adversarial swarm attacks. Although there is a significant amount of existing work in applying artificial intelligence methods to swarm control, analysis of interactions between two adversarial swarms is a rather understudied area. Most of the existing work in this subject develop strategies by making hard assumptions regarding the strategy and dynamics of the adversarial swarm. Our main contribution is the formulation of the swarm to swarm engagement problem as a Markov Decision Process and development of RL algorithms that can compute engagement strategies without the knowledge of strategy/dynamics of the adversarial swarm. Simulation results show that the developed framework can handle a wide array of large-scale engagement scenarios in an efficient manner.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Obstacle Identification and Ellipsoidal Decomposition for Fast Motion Planning in Unknown Dynamic Environments
Authors:
Mehmetcan Kaymaz,
Nazim Kemal Ure
Abstract:
Collision avoidance in the presence of dynamic obstacles in unknown environments is one of the most critical challenges for unmanned systems. In this paper, we present a method that identifies obstacles in terms of ellipsoids to estimate linear and angular obstacle velocities. Our proposed method is based on the idea of any object can be approximately expressed by ellipsoids. To achieve this, we p…
▽ More
Collision avoidance in the presence of dynamic obstacles in unknown environments is one of the most critical challenges for unmanned systems. In this paper, we present a method that identifies obstacles in terms of ellipsoids to estimate linear and angular obstacle velocities. Our proposed method is based on the idea of any object can be approximately expressed by ellipsoids. To achieve this, we propose a method based on variational Bayesian estimation of Gaussian mixture model, the Kyachiyan algorithm, and a refinement algorithm. Our proposed method does not require knowledge of the number of clusters and can operate in real-time, unlike existing optimization-based methods. In addition, we define an ellipsoid-based feature vector to match obstacles given two timely close point frames. Our method can be applied to any environment with static and dynamic obstacles, including the ones with rotating obstacles. We compare our algorithm with other clustering methods and show that when coupled with a trajectory planner, the overall system can efficiently traverse unknown environments in the presence of dynamic obstacles.
△ Less
Submitted 9 July, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
GAN-based Intrinsic Exploration For Sample Efficient Reinforcement Learning
Authors:
Doğay Kamar,
Nazım Kemal Üre,
Gözde Ünal
Abstract:
In this study, we address the problem of efficient exploration in reinforcement learning. Most common exploration approaches depend on random action selection, however these approaches do not work well in environments with sparse or no rewards. We propose Generative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward t…
▽ More
In this study, we address the problem of efficient exploration in reinforcement learning. Most common exploration approaches depend on random action selection, however these approaches do not work well in environments with sparse or no rewards. We propose Generative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently. We discuss a few weaknesses and conclude by discussing future works.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Quality Characteristics of a Software Platform for Human-AI Teaming in Smart Manufacturing
Authors:
Philipp Haindl,
Thomas Hoch,
Javier Dominguez,
Julen Aperribai,
Nazim Kemal Ure,
Mehmet Tunçel
Abstract:
As AI-enabled software systems become more prevalent in smart manufacturing, their role shifts from a reactive to a proactive one that provides context-specific support to machine operators. In the context of an international research project, we develop an AI-based software platform that shall facilitate the collaboration between human operators and manufacturing machines. We conducted 14 structu…
▽ More
As AI-enabled software systems become more prevalent in smart manufacturing, their role shifts from a reactive to a proactive one that provides context-specific support to machine operators. In the context of an international research project, we develop an AI-based software platform that shall facilitate the collaboration between human operators and manufacturing machines. We conducted 14 structured interviews with stakeholders of the prospective software platform in order to determine the individual relevance of selected quality characteristics for human-AI teaming in smart manufacturing. These characteristics include the ISO 25010:2011 standard for software quality and AI-specific quality characteristics such as trustworthiness, explicability, and auditability. The interviewees rated trustworthiness, functional suitability, reliability, and security as the most important quality characteristics for this context, and portability, compatibility, and maintainability as the least important. Also, we observed agreement regarding the relevance of the quality characteristics among interviewees having the same role. On the other hand, the relevance of each quality characteristics varied depending on the concrete use case of the prospective software platform. The interviewees also were asked about the key success factors related to human-AI teaming in smart manufacturing. They identified improving the production cycle, increasing operator efficiency, reducing scrap, and reducing ergonomic risks as key success criteria. In this paper, we also discuss metrics for measuring the fulfillment of these quality characteristics, which we intend to operationalize and monitor during operation of the prospective software platform.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Evaluating Generalization and Transfer Capacity of Multi-Agent Reinforcement Learning Across Variable Number of Agents
Authors:
Bengisu Guresti,
Nazim Kemal Ure
Abstract:
Multi-agent Reinforcement Learning (MARL) problems often require cooperation among agents in order to solve a task. Centralization and decentralization are two approaches used for cooperation in MARL. While fully decentralized methods are prone to converge to suboptimal solutions due to partial observability and nonstationarity, the methods involving centralization suffer from scalability limitati…
▽ More
Multi-agent Reinforcement Learning (MARL) problems often require cooperation among agents in order to solve a task. Centralization and decentralization are two approaches used for cooperation in MARL. While fully decentralized methods are prone to converge to suboptimal solutions due to partial observability and nonstationarity, the methods involving centralization suffer from scalability limitations and lazy agent problem. Centralized training decentralized execution paradigm brings out the best of these two approaches; however, centralized training still has an upper limit of scalability not only for acquired coordination performance but also for model size and training time. In this work, we adopt the centralized training with decentralized execution paradigm and investigate the generalization and transfer capacity of the trained models across variable number of agents. This capacity is assessed by training variable number of agents in a specific MARL problem and then performing greedy evaluations with variable number of agents for each training configuration. Thus, we analyze the evaluation performance for each combination of agent count for training versus evaluation. We perform experimental evaluations on predator prey and traffic junction environments and demonstrate that it is possible to obtain similar or higher evaluation performance by training with less agents. We conclude that optimal number of agents to perform training may differ from the target number of agents and argue that transfer across large number of agents can be a more efficient solution to scaling up than directly increasing number of agents during training.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Nonlinear Model Based Guidance with Deep Learning Based Target Trajectory Prediction Against Aerial Agile Attack Patterns
Authors:
A. Sadik Satir,
Umut Demir,
Gulay Goktas Sever,
N. Kemal Ure
Abstract:
In this work, we propose a novel missile guidance algorithm that combines deep learning based trajectory prediction with nonlinear model predictive control. Although missile guidance and threat interception is a well-studied problem, existing algorithms' performance degrades significantly when the target is pulling high acceleration attack maneuvers while rapidly changing its direction. We argue t…
▽ More
In this work, we propose a novel missile guidance algorithm that combines deep learning based trajectory prediction with nonlinear model predictive control. Although missile guidance and threat interception is a well-studied problem, existing algorithms' performance degrades significantly when the target is pulling high acceleration attack maneuvers while rapidly changing its direction. We argue that since most threats execute similar attack maneuvers, these nonlinear trajectory patterns can be processed with modern machine learning methods to build high accuracy trajectory prediction algorithms. We train a long short-term memory network (LSTM) based on a class of simulated structured agile attack patterns, then combine this predictor with quadratic programming based nonlinear model predictive control (NMPC). Our method, named nonlinear model based predictive control with target acceleration predictions (NMPC-TAP), significantly outperforms compared approaches in terms of miss distance, for the scenarios where the target/threat is executing agile maneuvers.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Investigating Value of Curriculum Reinforcement Learning in Autonomous Driving Under Diverse Road and Weather Conditions
Authors:
Anil Ozturk,
Mustafa Burak Gunel,
Resul Dagdanov,
Mirac Ekim Vural,
Ferhat Yurdakul,
Melih Dal,
Nazim Kemal Ure
Abstract:
Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curri…
▽ More
Applications of reinforcement learning (RL) are popular in autonomous driving tasks. That being said, tuning the performance of an RL agent and guaranteeing the generalization performance across variety of different driving scenarios is still largely an open problem. In particular, getting good performance on complex road and weather conditions require exhaustive tuning and computation time. Curriculum RL, which focuses on solving simpler automation tasks in order to transfer knowledge to complex tasks, is attracting attention in RL community. The main contribution of this paper is a systematic study for investigating the value of curriculum reinforcement learning in autonomous driving applications. For this purpose, we setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions. Next, we train and evaluate performance of RL agents on different sequences of task combinations and curricula. Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity. Results also demonstrate that different curricula might enable different benefits, which hints future research directions for automated curriculum training.
△ Less
Submitted 2 August, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation
Authors:
Reyhan Kevser Keser,
Aydin Ayanzadeh,
Omid Abdollahi Aghdam,
Caglar Kilcioglu,
Behcet Ugur Toreyin,
Nazim Kemal Ure
Abstract:
One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, conventional distillation approaches overlook this fact and use the same hint points as in the early studies. Therefore,…
▽ More
One of the most efficient methods for model compression is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, conventional distillation approaches overlook this fact and use the same hint points as in the early studies. Therefore, we propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. Our method is applicable for any student network, once it is applied on a chosen teacher network. The proposed approach is validated in CIFAR-100 and ImageNet datasets, using various teacher-student pairs and numerous hint distillation methods. Our results show that hint points selected by our algorithm results in superior compression performance compared to state-of-the-art knowledge distillation algorithms on the same student models and datasets.
△ Less
Submitted 3 November, 2022; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Learning How to Trade-Off Safety with Agility Using Deep Covariance Estimation for Perception Driven UAV Motion Planning
Authors:
Onur Akgun,
Kamil Canberk Atik,
Mustafa Erdem,
Mehmetcan Kaymaz,
Bugrahan Yamak,
N. Kemal Ure
Abstract:
We investigate how to utilize predictive models for selecting appropriate motion planning strategies based on perception uncertainty estimation for agile unmanned aerial vehicle (UAV) navigation tasks. Although there are variety of motion planning and perception algorithms for such tasks, the impact of perception uncertainty is not explicitly handled in many of the current motion algorithms, which…
▽ More
We investigate how to utilize predictive models for selecting appropriate motion planning strategies based on perception uncertainty estimation for agile unmanned aerial vehicle (UAV) navigation tasks. Although there are variety of motion planning and perception algorithms for such tasks, the impact of perception uncertainty is not explicitly handled in many of the current motion algorithms, which leads to performance loss in real-life scenarios where the measurement are often noisy due to external disturbances. We develop a novel framework for embedding perception uncertainty to high level motion planning management, in order to select the best available motion planning approach for the currently estimated perception uncertainty. We estimate the uncertainty in visual inputs using a deep neural network (CovNet) that explicitly predicts the covariance of the current measurements. Next, we train a high level machine learning model for predicting the lowest cost motion planning algorithm given the current estimate of covariance as well as the UAV states. We demonstrate on both real-life data and drone racing simulations that our approach, named uncertainty driven motion planning switcher (UDS) yields the safest and fastest trajectories among compared alternatives. Furthermore, we show that the developed approach learns how to trade-off safety with agility by switching to motion planners that leads to more agile trajectories when the estimated covariance is high and vice versa.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Decentralized State-Dependent Markov Chain Synthesis with an Application to Swarm Guidance
Authors:
Samet Uzun,
Nazim Kemal Ure,
Behcet Acikmese
Abstract:
This paper introduces a decentralized state-dependent Markov chain synthesis (DSMC) algorithm for finite-state Markov chains. We present a state-dependent consensus protocol that achieves exponential convergence under mild technical conditions, without relying on any connectivity assumptions regarding the dynamic network topology. Utilizing the proposed consensus protocol, we develop the DSMC algo…
▽ More
This paper introduces a decentralized state-dependent Markov chain synthesis (DSMC) algorithm for finite-state Markov chains. We present a state-dependent consensus protocol that achieves exponential convergence under mild technical conditions, without relying on any connectivity assumptions regarding the dynamic network topology. Utilizing the proposed consensus protocol, we develop the DSMC algorithm, updating the Markov matrix based on the current state while ensuring the convergence conditions of the consensus protocol. This result establishes the desired steady-state distribution for the resulting Markov chain, ensuring exponential convergence from all initial distributions while adhering to transition constraints and minimizing state transitions. The DSMC's performance is demonstrated through a probabilistic swarm guidance example, which interprets the spatial distribution of a swarm comprising a large number of mobile agents as a probability distribution and utilizes the Markov chain to compute transition probabilities between states. Simulation results demonstrate faster convergence for the DSMC based algorithm when compared to the previous Markov chain based swarm guidance algorithms.
△ Less
Submitted 26 April, 2024; v1 submitted 4 December, 2020;
originally announced December 2020.
-
A Probabilistic Guidance Approach to Swarm-to-Swarm Engagement Problem
Authors:
Samet Uzun,
Nazim Kemal Ure
Abstract:
This paper introduces a probabilistic guidance approach for the swarm-to-swarm engagement problem. The idea is based on driving the controlled swarm towards an adversary swarm, where the adversary swarm aims to converge to a stationary distribution that corresponds to a defended base location. The probabilistic approach is based on designing a Markov chain for the distribution of the swarm to conv…
▽ More
This paper introduces a probabilistic guidance approach for the swarm-to-swarm engagement problem. The idea is based on driving the controlled swarm towards an adversary swarm, where the adversary swarm aims to converge to a stationary distribution that corresponds to a defended base location. The probabilistic approach is based on designing a Markov chain for the distribution of the swarm to converge a stationary distribution. This approach is decentralized, so each agent can propagate its position independently of other agents. Our main contribution is the formulation of the swarm-to-swarm engagement as an optimization problem where the population of each swarm decays with each engagement and determining a desired distribution for the controlled swarm to converge time-varying distribution and eliminate agents of the adversary swarm until adversary swarm enters the defended base location. We demonstrate the validity of proposed approach on several swarm engagement scenarios.
△ Less
Submitted 28 November, 2020;
originally announced December 2020.
-
A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward
Authors:
M. Ugur Yavas,
N. Kemal Ure,
Tufan Kumbasar
Abstract:
Automated lane change is one of the most challenging task to be solved of highly automated vehicles due to its safety-critical, uncertain and multi-agent nature. This paper presents the novel deployment of the state of art Q learning method, namely Rainbow DQN, that uses a new safety driven rewarding scheme to tackle the issues in an dynamic and uncertain simulation environment. We present various…
▽ More
Automated lane change is one of the most challenging task to be solved of highly automated vehicles due to its safety-critical, uncertain and multi-agent nature. This paper presents the novel deployment of the state of art Q learning method, namely Rainbow DQN, that uses a new safety driven rewarding scheme to tackle the issues in an dynamic and uncertain simulation environment. We present various comparative results to show that our novel approach of having reward feedback from the safety layer dramatically increases both the agent's performance and sample efficiency. Furthermore, through the novel deployment of Rainbow DQN, it is shown that more intuition about the agent's actions is extracted by examining the distributions of generated Q values of the agents. The proposed algorithm shows superior performance to the baseline algorithm in the challenging scenarios with only 200000 training steps (i.e. equivalent to 55 hours driving).
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Sample Efficient Interactive End-to-End Deep Learning for Self-Driving Cars with Selective Multi-Class Safe Dataset Aggregation
Authors:
Yunus Bicer,
Ali Alizadeh,
Nazim Kemal Ure,
Ahmetcan Erdogan,
Orkun Kizilirmak
Abstract:
The objective of this paper is to develop a sample efficient end-to-end deep learning method for self-driving cars, where we attempt to increase the value of the information extracted from samples, through careful analysis obtained from each call to expert driverś policy. End-to-end imitation learning is a popular method for computing self-driving car policies. The standard approach relies on coll…
▽ More
The objective of this paper is to develop a sample efficient end-to-end deep learning method for self-driving cars, where we attempt to increase the value of the information extracted from samples, through careful analysis obtained from each call to expert driverś policy. End-to-end imitation learning is a popular method for computing self-driving car policies. The standard approach relies on collecting pairs of inputs (camera images) and outputs (steering angle, etc.) from an expert policy and fitting a deep neural network to this data to learn the driving policy. Although this approach had some successful demonstrations in the past, learning a good policy might require a lot of samples from the expert driver, which might be resource-consuming. In this work, we develop a novel framework based on the Safe Dateset Aggregation (safe DAgger) approach, where the current learned policy is automatically segmented into different trajectory classes, and the algorithm identifies trajectory segments or classes with the weak performance at each step. Once the trajectory segments with weak performance identified, the sampling algorithm focuses on calling the expert policy only on these segments, which improves the convergence rate. The presented simulation results show that the proposed approach can yield significantly better performance compared to the standard Safe DAgger algorithm while using the same amount of samples from the expert.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Development of A Stochastic Traffic Environment with Generative Time-Series Models for Improving Generalization Capabilities of Autonomous Driving Agents
Authors:
Anil Ozturk,
Mustafa Burak Gunel,
Melih Dal,
Ugur Yavas,
Nazim Kemal Ure
Abstract:
Automated lane changing is a critical feature for advanced autonomous driving systems. In recent years, reinforcement learning (RL) algorithms trained on traffic simulators yielded successful results in computing lane changing policies that strike a balance between safety, agility and compensating for traffic uncertainty. However, many RL algorithms exhibit simulator bias and policies trained on s…
▽ More
Automated lane changing is a critical feature for advanced autonomous driving systems. In recent years, reinforcement learning (RL) algorithms trained on traffic simulators yielded successful results in computing lane changing policies that strike a balance between safety, agility and compensating for traffic uncertainty. However, many RL algorithms exhibit simulator bias and policies trained on simple simulators do not generalize well to realistic traffic scenarios. In this work, we develop a data driven traffic simulator by training a generative adverserial network (GAN) on real life trajectory data. The simulator generates randomized trajectories that resembles real life traffic interactions between vehicles, which enables training the RL agent on much richer and realistic scenarios. We demonstrate through simulations that RL agents that are trained on GAN-based traffic simulator has stronger generalization capabilities compared to RL agents trained on simple rule-driven simulators.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment
Authors:
Ali Alizadeh,
Majid Moghadam,
Yunus Bicer,
Nazim Kemal Ure,
Ugur Yavas,
Can Kurtulus
Abstract:
Autonomous lane changing is a critical feature for advanced autonomous driving systems, that involves several challenges such as uncertainty in other driver's behaviors and the trade-off between safety and agility. In this work, we develop a novel simulation environment that emulates these challenges and train a deep reinforcement learning agent that yields consistent performance in a variety of d…
▽ More
Autonomous lane changing is a critical feature for advanced autonomous driving systems, that involves several challenges such as uncertainty in other driver's behaviors and the trade-off between safety and agility. In this work, we develop a novel simulation environment that emulates these challenges and train a deep reinforcement learning agent that yields consistent performance in a variety of dynamic and uncertain traffic scenarios. Results show that the proposed data-driven approach performs significantly better in noisy environments compared to methods that rely solely on heuristics.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.