-
Applying Neural Monte Carlo Tree Search to Unsignalized Multi-intersection Scheduling for Autonomous Vehicles
Authors:
Yucheng Shi,
Wenlong Wang,
Xiaowen Tao,
Ivana Dusparic,
Vinny Cahill
Abstract:
Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, wh…
▽ More
Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, where automated vehicles' access to potential conflict zones must be dynamically scheduled. In this paper, we apply Neural Monte Carlo Tree Search (NMCTS) to the challenging task of scheduling platoons of vehicles crossing unsignalized intersections. Crucially, we introduce a transformation model that maps successive sequences of potentially conflicting road-space reservation requests from platoons of vehicles into a series of board-game-like problems and use NMCTS to search for solutions representing optimal road-space allocation schedules in the context of past allocations. To optimize search, we incorporate a prioritized re-sampling method with parallel NMCTS (PNMCTS) to improve the quality of training data. To optimize training, a curriculum learning strategy is used to train the agent to schedule progressively more complex boards culminating in overlapping boards that represent busy intersections. In a busy single four-way unsignalized intersection simulation, PNMCTS solved 95\% of unseen scenarios, reducing crossing time by 43\% in light and 52\% in heavy traffic versus first-in, first-out control. In a 3x3 multi-intersection network, the proposed method maintained free-flow in light traffic when all intersections are under control of PNMCTS and outperformed state-of-the-art RL-based traffic-light controllers in average travel time by 74.5\% and total throughput by 16\% in heavy traffic.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Authors:
Wenlong Wang,
Ivana Dusparic,
Yucheng Shi,
Ke Zhang,
Vinny Cahill
Abstract:
Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been exp…
▽ More
Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been explored, each with its own challenges. Currently, recurrent neural network (RNN)-based world models struggle with vanishing gradients and capturing long-term dependencies. Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O(n^2)$, where $n$ is the sequence length.
To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and enabling efficient training with longer sequences. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early training stages. Combining these techniques, Drama achieves a normalised score on the Atari100k benchmark that is competitive with other state-of-the-art (SOTA) model-based RL algorithms, using only a 7 million-parameter world model. Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Our code is available at https://github.com/realwenlongwang/Drama.git.
△ Less
Submitted 16 May, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Semifactual Explanations for Reinforcement Learning
Authors:
Jasmina Gajcin,
Jovan Jeromela,
Ivana Dusparic
Abstract:
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with rea…
▽ More
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks. Semifactual explanations aim to explain an outcome by providing "even if" scenarios, such as "even if the car were moving twice as slowly, it would still have to swerve to avoid crashing". Semifactuals help users understand the effects of different factors on the outcome and support the optimisation of resources. While extensively studied in psychology and even utilised in supervised learning, semifactuals have not been used to explain the decisions of RL systems. In this work, we develop a first approach to generating semifactual explanations for RL agents. We start by defining five properties of desirable semifactual explanations in RL and then introducing SGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactual explanations in RL. We evaluate the algorithms in two standard RL environments and find that they generate semifactuals that are easier to reach, represent the agent's policy better, and are more diverse compared to baselines. Lastly, we conduct and analyse a user study to assess the participant's perception of semifactual explanations of the agent's actions.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems
Authors:
Juan C. Rosero,
Ivana Dusparic,
Nicolás Cardozo
Abstract:
Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function wit…
▽ More
Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: ε-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and ε-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function.
△ Less
Submitted 30 September, 2024; v1 submitted 2 August, 2024;
originally announced August 2024.
-
ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies
Authors:
Jasmina Gajcin,
Ivana Dusparic
Abstract:
Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL…
▽ More
Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems
Authors:
Mateo Sanabria,
Ivana Dusparic,
Nicolas Cardozo
Abstract:
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, wh…
▽ More
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, where the detection of failures is constrained to communication problems, and resolution strategies often consist of replacing complete components. Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties. Such monitors are functionally expressive and can be defined at run time to detect failure states at any execution point. Once failure states are detected, we use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences. Finally, to execute the learned strategies, we extract them as COP variations that activate dynamically whenever the failure state is detected, overwriting the base system behavior with the recovery strategy for that state. We validate the feasibility and effectiveness of our framework through a prototypical reactive application for tracking mouse movements, and the DeltaIoT exemplar for self-healing systems. Our results demonstrate that with just the definition of monitors, the system is effective in detecting and recovering from failures between 55%-92% of the cases in the first application, and at par with the predefined strategies in the second application.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification
Authors:
Jasmina Gajcin,
James McCarthy,
Rahul Nair,
Radu Marinescu,
Elizabeth Daly,
Ivana Dusparic
Abstract:
A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observ…
▽ More
A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the feedback and reduce user effort and feedback frequency. We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Density-Aware Reinforcement Learning to Optimise Energy Efficiency in UAV-Assisted Networks
Authors:
Babatunji Omoniwa,
Boris Galkin,
Ivana Dusparic
Abstract:
Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in th…
▽ More
Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in the sky while serving mobile users, they may be faced with interference from nearby UAV cells or other access points sharing the same frequency band, thereby impacting the system's energy efficiency (EE). Recent multi-agent reinforcement learning (MARL) approaches applied to optimise the users' coverage worked well in reasonably even densities but might not perform as well in uneven users' distribution, i.e., in urban road networks with uneven concentration of vehicles. In this work, we propose a density-aware communication-enabled multi-agent decentralised double deep Q-network (DACEMAD-DDQN) approach that maximises the total system's EE by jointly optimising the trajectory of each UAV, the number of connected users, and the UAVs' energy consumption while keeping track of dense and uneven users' distribution. Our result outperforms state-of-the-art MARL approaches in terms of EE by as much as 65% - 85%.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Prevalence of Code Smells in Reinforcement Learning Projects
Authors:
Nicolás Cardozo,
Ivana Dusparic,
Christian Cabrera
Abstract:
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed b…
▽ More
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.
△ Less
Submitted 3 August, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Reservation of Virtualized Resources with Optimistic Online Learning
Authors:
Jean-Baptiste Monteil,
George Iosifidis,
Ivana Dusparic
Abstract:
The virtualization of wireless networks enables new services to access network resources made available by the Network Operator (NO) through a Network Slicing market. The different service providers (SPs) have the opportunity to lease the network resources from the NO to constitute slices that address the demand of their specific network service. The goal of any SP is to maximize its service utili…
▽ More
The virtualization of wireless networks enables new services to access network resources made available by the Network Operator (NO) through a Network Slicing market. The different service providers (SPs) have the opportunity to lease the network resources from the NO to constitute slices that address the demand of their specific network service. The goal of any SP is to maximize its service utility and minimize costs from leasing resources while facing uncertainties of the prices of the resources and the users' demand. In this paper, we propose a solution that allows the SP to decide its online reservation policy, which aims to maximize its service utility and minimize its cost of reservation simultaneously. We design the Optimistic Online Learning for Reservation (OOLR) solution, a decision algorithm built upon the Follow-the-Regularized Leader (FTRL), that incorporates key predictions to assist the decision-making process. Our solution achieves a $\mathcal{O}(\sqrt{T})$ regret bound where $T$ represents the horizon. We integrate a prediction model into the OOLR solution and we demonstrate through numerical results the efficacy of the combined models' solution against the FTRL baseline.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning
Authors:
Jasmina Gajcin,
Ivana Dusparic
Abstract:
While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to g…
▽ More
While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior compared to the current state-of-the-art approaches.
△ Less
Submitted 10 October, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning
Authors:
Alberto Castagna,
Ivana Dusparic
Abstract:
Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on e…
▽ More
Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
△ Less
Submitted 28 July, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Causal Counterfactuals for Improving the Robustness of Reinforcement Learning
Authors:
Tom He,
Jasmina Gajcin,
Ivana Dusparic
Abstract:
Reinforcement learning (RL) is used in various robotic applications. RL enables agents to learn tasks autonomously by interacting with the environment. The more critical the tasks are, the higher the demand for the robustness of the RL systems. Causal RL combines RL and causal inference to make RL more robust. Causal RL agents use a causal representation to capture the invariant causal mechanisms…
▽ More
Reinforcement learning (RL) is used in various robotic applications. RL enables agents to learn tasks autonomously by interacting with the environment. The more critical the tasks are, the higher the demand for the robustness of the RL systems. Causal RL combines RL and causal inference to make RL more robust. Causal RL agents use a causal representation to capture the invariant causal mechanisms that can be transferred from one task to another. Currently, there is limited research in Causal RL, and existing solutions are usually not complete or feasible for real-world applications. In this work, we propose CausalCF, the first complete Causal RL solution incorporating ideas from Causal Curiosity and CoPhy. Causal Curiosity provides an approach for using interventions, and CoPhy is modified to enable the RL agent to perform counterfactuals. Causal Curiosity has been applied to robotic grasping and manipulation tasks in CausalWorld. CausalWorld provides a realistic simulation environment based on the TriFinger robot. We apply CausalCF to complex robotic tasks and show that it improves the RL agent's robustness using CausalWorld.
△ Less
Submitted 5 June, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Deep W-Networks: Solving Multi-Objective Optimisation Problems With Deep Reinforcement Learning
Authors:
Jernej Hribar,
Luke Hackett,
Ivana Dusparic
Abstract:
In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state…
▽ More
In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state spaces. To address this issue, we replace underlying Q-tables with DQN, and propose an addition of W-Networks, as a replacement for tabular weights (W) representations. We evaluate the resulting Deep W-Networks (DWN) approach in two widely-accepted multi-objective RL benchmarks: deep sea treasure and multi-objective mountain car. We show that DWN solves the competition between multiple policies while outperforming the baseline in the form of a DQN solution. Additionally, we demonstrate that the proposed algorithm can find the Pareto front in both tested environments.
△ Less
Submitted 23 February, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
Authors:
Jasmina Gajcin,
Ivana Dusparic
Abstract:
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users…
▽ More
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL.
△ Less
Submitted 9 February, 2024; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks
Authors:
Babatunji Omoniwa,
Boris Galkin,
Ivana Dusparic
Abstract:
Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static and mobile ground users in situations of increased network demand or points of failure in existing terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the…
▽ More
Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static and mobile ground users in situations of increased network demand or points of failure in existing terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the system's energy efficiency (EE). Recent approaches focus on optimising the system's EE by optimising the trajectory of UAVs serving only static ground users and neglecting mobile users. Several others neglect the impact of interference from nearby UAV cells, assuming an interference-free network environment. Despite growing research interest in decentralised control over centralised UAVs' control, direct collaboration among UAVs to improve coordination while optimising the systems' EE has not been adequately explored. To address this, we propose a direct collaborative communication-enabled multi-agent decentralised double deep Q-network (CMAD-DDQN) approach. The CMAD-DDQN is a collaborative algorithm that allows UAVs to explicitly share their telemetry via existing 3GPP guidelines by communicating with their nearest neighbours. This allows the agent-controlled UAVs to optimise their 3D flight trajectories by filling up knowledge gaps and converging to optimal policies. Simulation results show that the proposed approach outperforms existing baselines in terms of maximising the systems' EE without degrading coverage performance in the network. The CMAD-DDQN approach outperforms the MAD-DDQN that neglects direct collaboration among UAVs, the multi-agent deep deterministic policy gradient (MADDPG) and random policy approaches that consider a 2D UAV deployment design while neglecting interference from nearby UAV cells by about 15%, 65% and 85%, respectively.
△ Less
Submitted 27 June, 2023; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Boolean Decision Rules for Reinforcement Learning Policy Summarisation
Authors:
James McCarthy,
Rahul Nair,
Elizabeth Daly,
Radu Marinescu,
Ivana Dusparic
Abstract:
Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's…
▽ More
Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's policy. We evaluate our proposed approach using a DQN agent trained on an implementation of a lava gridworld and show that, given a hand-crafted feature representation of this gridworld, simple generalised rules can be created, giving a post-hoc explainable summary of the agent's policy. We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy, as well as discuss how creating simple rule summaries of an agent's policy may help in the debugging process of RL agents.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Guidelines for Artifacts to Support Industry-Relevant Research on Self-Adaptation
Authors:
Danny Weyns,
Ilias Gerostathopoulos,
Barbora Buhnova,
Nicolas Cardozo,
Emilia Cioroaica,
Ivana Dusparic,
Lars Grunske,
Pooyan Jamshidi,
Christine Julien,
Judith Michael,
Gabriel Moreno,
Shiva Nejati,
Patrizio Pelliccione,
Federico Quin,
Genaina Rodrigues,
Bradley Schmerl,
Marco Vieira,
Thomas Vogel,
Rebekka Wohlrab
Abstract:
Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevan…
▽ More
Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevant to industry. This paper provides a set of guidelines for artifacts that aim at supporting industry-relevant research on self-adaptation. The guidelines that are grounded on data obtained from a survey with practitioners were derived during working sessions at the 17th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. Artifact providers can use the guidelines for aligning future artifacts with industry needs; they can also be used to evaluate the industrial relevance of existing artifacts. We also propose an artifact template.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
FedSA: Accelerating Intrusion Detection in Collaborative Environments with Federated Simulated Annealing
Authors:
Helio N. Cunha Neto,
Ivana Dusparic,
Diogo M. F. Mattos,
Natalia C. Fernandes
Abstract:
Fast identification of new network attack patterns is crucial for improving network security. Nevertheless, identifying an ongoing attack in a heterogeneous network is a non-trivial task. Federated learning emerges as a solution to collaborative training for an Intrusion Detection System (IDS). The federated learning-based IDS trains a global model using local machine learning models provided by f…
▽ More
Fast identification of new network attack patterns is crucial for improving network security. Nevertheless, identifying an ongoing attack in a heterogeneous network is a non-trivial task. Federated learning emerges as a solution to collaborative training for an Intrusion Detection System (IDS). The federated learning-based IDS trains a global model using local machine learning models provided by federated participants without sharing local data. However, optimization challenges are intrinsic to federated learning. This paper proposes the Federated Simulated Annealing (FedSA) metaheuristic to select the hyperparameters and a subset of participants for each aggregation round in federated learning. FedSA optimizes hyperparameters linked to the global model convergence. The proposal reduces aggregation rounds and speeds up convergence. Thus, FedSA accelerates learning extraction from local models, requiring fewer IDS updates. The proposal assessment shows that the FedSA global model converges in less than ten communication rounds. The proposal requires up to 50% fewer aggregation rounds to achieve approximately 97% accuracy in attack detection than the conventional aggregation approach.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Optimising Energy Efficiency in UAV-Assisted Networks using Deep Reinforcement Learning
Authors:
Babatunji Omoniwa,
Boris Galkin,
Ivana Dusparic
Abstract:
In this letter, we study the energy efficiency (EE) optimisation of unmanned aerial vehicles (UAVs) providing wireless coverage to static and mobile ground users. Recent multi-agent reinforcement learning approaches optimise the system's EE using a 2D trajectory design, neglecting interference from nearby UAV cells. We aim to maximise the system's EE by jointly optimising each UAV's 3D trajectory,…
▽ More
In this letter, we study the energy efficiency (EE) optimisation of unmanned aerial vehicles (UAVs) providing wireless coverage to static and mobile ground users. Recent multi-agent reinforcement learning approaches optimise the system's EE using a 2D trajectory design, neglecting interference from nearby UAV cells. We aim to maximise the system's EE by jointly optimising each UAV's 3D trajectory, number of connected users, and the energy consumed, while accounting for interference. Thus, we propose a cooperative Multi-Agent Decentralised Double Deep Q-Network (MAD-DDQN) approach. Our approach outperforms existing baselines in terms of EE by as much as 55 -- 80%.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning
Authors:
Jasmina Gajcin,
Ivana Dusparic
Abstract:
Despite notable results in various fields over the recent years, deep reinforcement learning (DRL) algorithms lack transparency, affecting user trust and hindering their deployment to high-risk tasks. Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space, preventing safe deployment to real tasks where…
▽ More
Despite notable results in various fields over the recent years, deep reinforcement learning (DRL) algorithms lack transparency, affecting user trust and hindering their deployment to high-risk tasks. Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space, preventing safe deployment to real tasks where such correlations might be broken. In this work, we examine whether an agent relies on spurious correlations in critical states, and propose an alternative subset of features on which it should base its decisions instead, to make it less susceptible to causal confusion. Our goal is to increase transparency of DRL agents by exposing the influence of learned spurious correlations on its decisions, and offering advice to developers about feature selection in different parts of state space, to avoid causal confusion. We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment, by executing its policy in alternative environments where certain correlations between features do not hold. We demonstrate our approach in taxi and grid world environments, where ReCCoVER detects states in which an agent relies on spurious correlations and offers a set of features that should be considered instead.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Enabling Deep Reinforcement Learning on Energy Constrained Devices at the Edge of the Network
Authors:
Jernej Hribar,
Ivana Dusparic
Abstract:
Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to o…
▽ More
Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents
Authors:
Jasmina Gajcin,
Rahul Nair,
Tejaswini Pedapati,
Radu Marinescu,
Elizabeth Daly,
Ivana Dusparic
Abstract:
In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose betwe…
▽ More
In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose between offered policies, and can help developers understand different behaviors that emerge from various reward functions and training hyperparameters in RL systems. In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives. We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents. Furthermore, we use only data on preference-based differences in order to generate contrasting explanations about agents' preferences. Finally, we test and evaluate our approach on an autonomous driving task and compare the behavior of a safety-oriented policy and one that prefers speed.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Multi-Agent Transfer Learning in Reinforcement Learning-Based Ride-Sharing Systems
Authors:
Alberto Castagna,
Ivana Dusparic
Abstract:
Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adap…
▽ More
Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adaptation times. In particular, there is a significant potential of applying TL in multi-agent RL systems, where multiple agents can share knowledge with each other, as well as with new agents that join the system. To obtain the most from inter-agent transfer, transfer roles (i.e., determining which agents act as sources and which as targets), as well as relevant transfer content parameters (e.g., transfer size) should be selected dynamically in each particular situation. As a first step towards fully dynamic transfers, in this paper we investigate the impact of TL transfer parameters with fixed source and target roles. Specifically, we label every agent-environment interaction with agent's epistemic confidence, and we filter the shared examples using varying threshold levels and sample sizes. We investigate impact of these parameters in two scenarios, a standard predator-prey RL benchmark and a simulation of a ride-sharing system with 200 vehicle agents and 10,000 ride-requests.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Multi-Agent Deep Reinforcement Learning For Optimising Energy Efficiency of Fixed-Wing UAV Cellular Access Points
Authors:
Boris Galkin,
Babatunji Omoniwa,
Ivana Dusparic
Abstract:
Unmanned Aerial Vehicles (UAVs) promise to become an intrinsic part of next generation communications, as they can be deployed to provide wireless connectivity to ground users to supplement existing terrestrial networks. The majority of the existing research into the use of UAV access points for cellular coverage considers rotary-wing UAV designs (i.e. quadcopters). However, we expect fixed-wing U…
▽ More
Unmanned Aerial Vehicles (UAVs) promise to become an intrinsic part of next generation communications, as they can be deployed to provide wireless connectivity to ground users to supplement existing terrestrial networks. The majority of the existing research into the use of UAV access points for cellular coverage considers rotary-wing UAV designs (i.e. quadcopters). However, we expect fixed-wing UAVs to be more appropriate for connectivity purposes in scenarios where long flight times are necessary (such as for rural coverage), as fixed-wing UAVs rely on a more energy-efficient form of flight when compared to the rotary-wing design. As fixed-wing UAVs are typically incapable of hovering in place, their deployment optimisation involves optimising their individual flight trajectories in a way that allows them to deliver high quality service to the ground users in an energy-efficient manner. In this paper, we propose a multi-agent deep reinforcement learning approach to optimise the energy efficiency of fixed-wing UAV cellular access points while still allowing them to deliver high-quality service to users on the ground. In our decentralized approach, each UAV is equipped with a Dueling Deep Q-Network (DDQN) agent which can adjust the 3D trajectory of the UAV over a series of timesteps. By coordinating with their neighbours, the UAVs adjust their individual flight trajectories in a manner that optimises the total system energy efficiency. We benchmark the performance of our approach against a series of heuristic trajectory planning strategies, and demonstrate that our method can improve the system energy efficiency by as much as 70%.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Analyse or Transmit: Utilising Correlation at the Edge with Deep Reinforcement Learning
Authors:
Jernej Hribar,
Ryoichi Shinkuma,
George Iosifidis,
Ivana Dusparic
Abstract:
Millions of sensors, cameras, meters, and other edge devices are deployed in networks to collect and analyse data. In many cases, such devices are powered only by Energy Harvesting(EH) and have limited energy available to analyse acquired data. When edge infrastructure is available, a device has a choice: to perform analysis locally or offload the task to other resource-rich devices such as cloudl…
▽ More
Millions of sensors, cameras, meters, and other edge devices are deployed in networks to collect and analyse data. In many cases, such devices are powered only by Energy Harvesting(EH) and have limited energy available to analyse acquired data. When edge infrastructure is available, a device has a choice: to perform analysis locally or offload the task to other resource-rich devices such as cloudlet servers. However, such a choice carries a price in terms of consumed energy and accuracy. On the one hand, transmitting raw data can result in a higher energy cost in comparison to the required energy to process data locally. On the other hand, performing data analytics on servers can improve the task's accuracy. Additionally, due to the correlation between information sent by multiple devices, accuracy might not be affected if some edge devices decide to neither process nor send data and preserve energy instead. For such a scenario, we propose a Deep Reinforcement Learning (DRL) based solution capable of learning and adapting the policy to the time-varying energy arrival due to EH patterns. We leverage two datasets, one to model energy an EH device can collect and the other to model the correlation between cameras. Furthermore, we compare the proposed solution performance to three baseline policies. Our results show that we can increase accuracy by 15% in comparison to conventional approaches while preventing outages.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Energy-aware optimization of UAV base stations placement via decentralized multi-agent Q-learning
Authors:
Babatunji Omoniwa,
Boris Galkin,
Ivana Dusparic
Abstract:
Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-…
▽ More
Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-based (RL) approaches have been previously used to improve energy utilization of multiple UAVs, however, a central cloud controller is assumed to have complete knowledge of the end-devices' locations, i.e., the controller periodically scans and sends updates for UAV decision-making. This assumption is impractical in dynamic network environments with UAVs serving mobile ground devices. To address this problem, we propose a decentralized Q-learning approach, where each UAV-BS is equipped with an autonomous agent that maximizes the connectivity of mobile ground devices while improving its energy utilization. Experimental results show that the proposed design significantly outperforms the centralized approaches in jointly maximizing the number of connected ground devices and the energy utilization of the UAV-BSs.
△ Less
Submitted 4 November, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
A reinforcement learning approach to improve communication performance and energy utilization in fog-based IoT
Authors:
Babatunji Omoniwa,
Maxime Gueriau,
Ivana Dusparic
Abstract:
Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy wh…
▽ More
Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy when they move and transmit to distant destinations. As such, power-control mechanisms and intelligent mobility of the relay devices are critical in improving communication performance and energy utilization. In this paper, we propose a Q-learning-based decentralized approach where each mobile fog relay agent (MFRA) is controlled by an autonomous agent which uses reinforcement learning to simultaneously improve communication performance and energy utilization. Each autonomous agent learns based on the feedback from the destination and its own energy levels whether to remain active and forward the message, or become passive for that transmission phase. We evaluate the approach by comparing with the centralized approach, and observe that with lesser number of MFRAs, our approach is able to ensure reliable delivery of data and reduce overall energy cost by 56.76\% -- 88.03\%.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Adaptation to Unknown Situations as the Holy Grail of Learning-Based Self-Adaptive Systems: Research Directions
Authors:
Ivana Dusparic,
Nicolas Cardozo
Abstract:
Self-adaptive systems continuously adapt to changes in their execution environment. Capturing all possible changes to define suitable behaviour beforehand is unfeasible, or even impossible in the case of unknown changes, hence human intervention may be required. We argue that adapting to unknown situations is the ultimate challenge for self-adaptive systems. Learning-based approaches are used to l…
▽ More
Self-adaptive systems continuously adapt to changes in their execution environment. Capturing all possible changes to define suitable behaviour beforehand is unfeasible, or even impossible in the case of unknown changes, hence human intervention may be required. We argue that adapting to unknown situations is the ultimate challenge for self-adaptive systems. Learning-based approaches are used to learn the suitable behaviour to exhibit in the case of unknown situations, to minimize or fully remove human intervention. While such approaches can, to a certain extent, generalize existing adaptations to new situations, there is a number of breakthroughs that need to be achieved before systems can adapt to general unknown and unforeseen situations. We posit the research directions that need to be explored to achieve unanticipated adaptation from the perspective of learning-based self-adaptive systems. At minimum, systems need to define internal representations of previously unseen situations on-the-fly, extrapolate the relationship to the previously encountered situations to evolve existing adaptations, and reason about the feasibility of achieving their intrinsic goals in the new set of conditions. We close discussing whether, even when we can, we should indeed build systems that define their own behaviour and adapt their goals, without involving a human supervisor.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Auto-COP: Adaptation Generation in Context-Oriented Programming using Reinforcement Learning Options
Authors:
Nicolás Cardozo,
Ivana Dusparic
Abstract:
Self-adaptive software systems continuously adapt in response to internal and external changes in their execution environment, captured as contexts. The COP paradigm posits a technique for the development of self-adaptive systems, capturing their main characteristics with specialized programming language constructs. COP adaptations are specified as independent modules composed in and out of the ba…
▽ More
Self-adaptive software systems continuously adapt in response to internal and external changes in their execution environment, captured as contexts. The COP paradigm posits a technique for the development of self-adaptive systems, capturing their main characteristics with specialized programming language constructs. COP adaptations are specified as independent modules composed in and out of the base system as contexts are activated and deactivated in response to sensed circumstances from the surrounding environment. However, the definition of adaptations, their contexts and associated specialized behavior, need to be specified at design time. In complex CPS this is intractable due to new unpredicted operating conditions. We propose Auto-COP, a new technique to enable generation of adaptations at run time. Auto-COP uses RL options to build action sequences, based on the previous instances of the system execution. Options are explored in interaction with the environment, and the most suitable options for each context are used to generate adaptations exploiting COP. To validate Auto-COP, we present two case studies exhibiting different system characteristics and application domains: a driving assistant and a robot delivery system. We present examples of Auto-COP code generated at run time, to illustrate the types of circumstances (contexts) requiring adaptation, and the corresponding generated adaptations for each context. We confirm that the generated adaptations exhibit correct system behavior measured by domain-specific performance metrics, while reducing the number of required execution/actuation steps by a factor of two showing that the adaptations are regularly selected by the running system as adaptive behavior is more appropriate than the execution of primitive actions.
△ Less
Submitted 3 August, 2023; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Mobility for Cellular-Connected UAVs: challenges for the network provider
Authors:
Erika Fonseca,
Boris Galkin,
Marvin Kelly,
Luiz A. DaSilva,
Ivana Dusparic
Abstract:
Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the…
▽ More
Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the network. We analyse the 3GPP specifications, the existing research literature, and a publicly available UAV connectivity dataset, to describe the challenges. We classify these challenges into network planning and network optimisation categories. We discuss the challenge of planning network coverage when considering coverage for flying users and the PCI collision and confusion issues that can be aggravated by these users. In discussing network optimisation challenges, we introduce Automatic Neighbouring Relation (ANR) and handover challenges, specifically the number of neighbours in the Neighbour Relation Table (NRT), and their potential deletion and block-listing, the frequent number of handovers and the possibility that the UAV disconnects because of handover issues. We discuss possible approaches to address the presented challenges and use a real-world dataset to support our findings about these challenges and their importance.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Experimental Evaluation of a UAV User QoS from a Two-Tier 3.6GHz Spectrum Network
Authors:
Boris Galkin,
Erika Fonseca,
Gavin Lee,
Conor Duff,
Marvin Kelly,
Edward Emmanuel,
Ivana Dusparic
Abstract:
Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and anten…
▽ More
Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and antenna misalignment. In this work, we perform a novel measurement campaign of the performance of a UAV user when it connects to an experimental two-tier cellular network in two different areas of Dublin city's Smart Docklands, which includes massive MIMO macrocells and wirelessly-backhauled small cells. We measure Reference Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), Signal to Interference and Noise Ratio (SINR), the downlink throughput, and the small cell handover rate. Our results show that increasing the UAV height reduces the performance in both tiers, due to issues such as antenna misalignment. The small cell tier, however, can maintain relatively stable performance across the entire range of UAV heights, suggesting that UAV users can successfully connect to small cells during their flight. Furthermore, we demonstrate that while the UAV handover rate significantly fluctuates at different heights, the overall observed handover rates are very low. Our results highlight the potential for small cells in urban areas to provide connectivity to UAVs.
△ Less
Submitted 9 April, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.
-
REQIBA: Regression and Deep Q-Learning for Intelligent UAV Cellular User to Base Station Association
Authors:
Boris Galkin,
Erika Fonseca,
Ramy Amer,
Luiz A. DaSilva,
Ivana Dusparic
Abstract:
Unmanned Aerial Vehicles (UAVs) are emerging as important users of next-generation cellular networks. By operating in the sky, UAV users experience very different radio conditions than terrestrial users, due to factors such as strong Line-of-Sight (LoS) channels (and interference) and Base Station (BS) antenna misalignment. As a consequence, the UAVs may experience significant degradation to their…
▽ More
Unmanned Aerial Vehicles (UAVs) are emerging as important users of next-generation cellular networks. By operating in the sky, UAV users experience very different radio conditions than terrestrial users, due to factors such as strong Line-of-Sight (LoS) channels (and interference) and Base Station (BS) antenna misalignment. As a consequence, the UAVs may experience significant degradation to their received quality of service, particularly when they are moving and are subject to frequent handovers. The solution is to allow the UAV to be aware of its surrounding environment, and intelligently connect into the cellular network taking advantage of this awareness. In this paper we present REgression and deep Q-learning for Intelligent UAV cellular user to Base station Association (REQIBA), a solution that allows a UAV flying over an urban area to intelligently connect to underlying BSs, using information about the received signal powers, the BS locations, and the surrounding building topology. We demonstrate how REQIBA can as much as double the total UAV throughput, when compared to heuristic association schemes similar to those commonly used by terrestrial users. We also evaluate how environmental factors such as UAV height, building density, and throughput loss due to handovers impact the performance of our solution.
△ Less
Submitted 3 November, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning
Authors:
Erika Fonseca,
Boris Galkin,
Ramy Amer,
Luiz A. DaSilva,
Ivana Dusparic
Abstract:
Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connect…
▽ More
Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput or spectrum efficiency that it experiences. The proposed solution is evaluated in two settings: using a series of generated environments where we vary the number of BS and building densities, and in a scenario using real-world data obtained from an experiment in Dublin, Ireland. Results show that our proposed RL-based solution improves UAVs QoS by 6% to 41%, depending on the scenario. We also conclude that, when flying at heights higher than the buildings, building density variation has no impact on UAV QoS. On the other hand, BS density can negatively impact UAV QoS, with higher numbers of BSs generating more interference and deteriorating UAV performance.
△ Less
Submitted 13 April, 2022; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Multi-agent Deep Reinforcement Learning for Zero Energy Communities
Authors:
Amit Prasad,
Ivana Dusparic
Abstract:
Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of…
▽ More
Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of energy sharing in such a community. This is different from previously addressed energy sharing between buildings as our focus is on the improvement of community energy status, while traditionally research focused on reducing losses due to transmission and storage, or achieving economic gains. We model this problem in a multi-agent environment and propose a Deep Reinforcement Learning (DRL) based solution. Each building is represented by an intelligent agent that learns over time the appropriate behaviour to share energy. We have evaluated the proposed solution in a multi-agent simulation built using osBrain. Results indicate that with time agents learn to collaborate and learn a policy comparable to the optimal policy, which in turn improves the ZEC's energy status. Buildings with no renewables preferred to request energy from their neighbours rather than from the supply grid.
△ Less
Submitted 27 June, 2019; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Volunteers in the Smart City: Comparison of Contribution Strategies on Human-Centered Measures
Authors:
Stefano Bennati,
Ivana Dusparic,
Rhythima Shinde,
Catholijn M. Jonker
Abstract:
Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depe…
▽ More
Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depends on contributors and free-riders, which benefit from the service without contributing own resources. Following this design principle, different classes of algorithms for resource management are evaluated with respect to human-centered measures, i.e., privacy, fairness and social welfare. Trade-offs that characterize algorithms are discussed across two smart city application scenarios. These results might help Smart City application designers to choose a suitable algorithm given a scenario-specific set of requirements, and users to choose a service based on an algorithm that matches their preferences.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Decentralised Multi-Agent Reinforcement Learning for Dynamic and Uncertain Environments
Authors:
Andrei Marinescu,
Ivana Dusparic,
Adam Taylor,
Vinny Cahill,
Siobhán Clarke
Abstract:
Multi-Agent Reinforcement Learning (MARL) is a widely used technique for optimization in decentralised control problems. However, most applications of MARL are in static environments, and are not suitable when agent behaviour and environment conditions are dynamic and uncertain. Addressing uncertainty in such environments remains a challenging problem for MARL-based systems. The dynamic nature of…
▽ More
Multi-Agent Reinforcement Learning (MARL) is a widely used technique for optimization in decentralised control problems. However, most applications of MARL are in static environments, and are not suitable when agent behaviour and environment conditions are dynamic and uncertain. Addressing uncertainty in such environments remains a challenging problem for MARL-based systems. The dynamic nature of the environment causes previous knowledge of how agents interact to become outdated. Advanced knowledge of potential changes through prediction significantly supports agents converging to near-optimal control solutions. In this paper we propose P-MARL, a decentralised MARL algorithm enhanced by a prediction mechanism that provides accurate information regarding up-coming changes in the environment. This prediction is achieved by employing an Artificial Neural Network combined with a Self-Organising Map that detects and matches changes in the environment. The proposed algorithm is validated in a realistic smart-grid scenario, and provides a 92% Pareto efficient solution to an electric vehicle charging problem.
△ Less
Submitted 16 September, 2014;
originally announced September 2014.