Search | arXiv e-print repository

Applying Neural Monte Carlo Tree Search to Unsignalized Multi-intersection Scheduling for Autonomous Vehicles

Authors: Yucheng Shi, Wenlong Wang, Xiaowen Tao, Ivana Dusparic, Vinny Cahill

Abstract: Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, wh… ▽ More Dynamic scheduling of access to shared resources by autonomous systems is a challenging problem, characterized as being NP-hard. The complexity of this task leads to a combinatorial explosion of possibilities in highly dynamic systems where arriving requests must be continuously scheduled subject to strong safety and time constraints. An example of such a system is an unsignalized intersection, where automated vehicles' access to potential conflict zones must be dynamically scheduled. In this paper, we apply Neural Monte Carlo Tree Search (NMCTS) to the challenging task of scheduling platoons of vehicles crossing unsignalized intersections. Crucially, we introduce a transformation model that maps successive sequences of potentially conflicting road-space reservation requests from platoons of vehicles into a series of board-game-like problems and use NMCTS to search for solutions representing optimal road-space allocation schedules in the context of past allocations. To optimize search, we incorporate a prioritized re-sampling method with parallel NMCTS (PNMCTS) to improve the quality of training data. To optimize training, a curriculum learning strategy is used to train the agent to schedule progressively more complex boards culminating in overlapping boards that represent busy intersections. In a busy single four-way unsignalized intersection simulation, PNMCTS solved 95\% of unseen scenarios, reducing crossing time by 43\% in light and 52\% in heavy traffic versus first-in, first-out control. In a 3x3 multi-intersection network, the proposed method maintained free-flow in light traffic when all intersections are under control of PNMCTS and outperformed state-of-the-art RL-based traffic-light controllers in average travel time by 74.5\% and total throughput by 16\% in heavy traffic. △ Less

Submitted 24 October, 2024; originally announced October 2024.

arXiv:2410.08893 [pdf, other]

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Authors: Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill

Abstract: Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been exp… ▽ More Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been explored, each with its own challenges. Currently, recurrent neural network (RNN)-based world models struggle with vanishing gradients and capturing long-term dependencies. Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O(n^2)$, where $n$ is the sequence length. To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and enabling efficient training with longer sequences. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early training stages. Combining these techniques, Drama achieves a normalised score on the Atari100k benchmark that is competitive with other state-of-the-art (SOTA) model-based RL algorithms, using only a 7 million-parameter world model. Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Our code is available at https://github.com/realwenlongwang/Drama.git. △ Less

Submitted 16 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: Published as a conference paper at ICLR 2025

arXiv:2409.05435 [pdf, other]

Semifactual Explanations for Reinforcement Learning

Authors: Jasmina Gajcin, Jovan Jeromela, Ivana Dusparic

Abstract: Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with rea… ▽ More Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks. Semifactual explanations aim to explain an outcome by providing "even if" scenarios, such as "even if the car were moving twice as slowly, it would still have to swerve to avoid crashing". Semifactuals help users understand the effects of different factors on the outcome and support the optimisation of resources. While extensively studied in psychology and even utilised in supervised learning, semifactuals have not been used to explain the decisions of RL systems. In this work, we develop a first approach to generating semifactual explanations for RL agents. We start by defining five properties of desirable semifactual explanations in RL and then introducing SGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactual explanations in RL. We evaluate the algorithms in two standard RL environments and find that they generate semifactuals that are easier to reach, represent the agent's policy better, and are more diverse compared to baselines. Lastly, we conduct and analyse a user study to assess the participant's perception of semifactual explanations of the agent's actions. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 9 pages, 2 figures, 4 tables

arXiv:2408.01188 [pdf, other]

Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems

Authors: Juan C. Rosero, Ivana Dusparic, Nicolás Cardozo

Abstract: Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function wit… ▽ More Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) as it enables learning at runtime without the need for a model of the environment or predefined actions. However, most applications of RL in AS, such as those based on Q-learning, can only optimize one objective, making it necessary in multi-objective systems to combine multiple objectives in a single objective function with predefined weights. A number of Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) and apply it to the Emergent Web Servers exemplar, a self-adaptive server, to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: ε-greedy algorithm and Deep Q-Networks. Our initial evaluation shows that DWN optimizes multiple objectives simultaneously with similar results than DQN and ε-greedy approaches, having a better performance for some metrics, and avoids issues associated with combining multiple objectives into a single utility function. △ Less

Submitted 30 September, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: pages, Accepted to AI4AS 2024 workshop

arXiv:2402.06503 [pdf, other]

ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

Authors: Jasmina Gajcin, Ivana Dusparic

Abstract: Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL… ▽ More Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 17 pages, 4 Figures

arXiv:2401.12405 [pdf, other]

Learning Recovery Strategies for Dynamic Self-healing in Reactive Systems

Authors: Mateo Sanabria, Ivana Dusparic, Nicolas Cardozo

Abstract: Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, wh… ▽ More Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, where the detection of failures is constrained to communication problems, and resolution strategies often consist of replacing complete components. Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties. Such monitors are functionally expressive and can be defined at run time to detect failure states at any execution point. Once failure states are detected, we use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences. Finally, to execute the learned strategies, we extract them as COP variations that activate dynamically whenever the failure state is detected, overwriting the base system behavior with the recovery strategy for that state. We validate the feasibility and effectiveness of our framework through a prototypical reactive application for tracking mouse movements, and the DeltaIoT exemplar for self-healing systems. Our results demonstrate that with just the definition of monitors, the system is effective in detecting and recovering from failures between 55%-92% of the cases in the first application, and at par with the predefined strategies in the second application. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Preprint accepted to 19th International Conference on Software Engineering for Adaptive and Self-Managing Systems (SEAMS24)

arXiv:2308.15969 [pdf, other]

Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

Authors: Jasmina Gajcin, James McCarthy, Rahul Nair, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

Abstract: A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observ… ▽ More A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the feedback and reduce user effort and feedback frequency. We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 7 pages, 2 figures

arXiv:2306.08785 [pdf, other]

Density-Aware Reinforcement Learning to Optimise Energy Efficiency in UAV-Assisted Networks

Authors: Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

Abstract: Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in th… ▽ More Unmanned aerial vehicles (UAVs) serving as aerial base stations can be deployed to provide wireless connectivity to mobile users, such as vehicles. However, the density of vehicles on roads often varies spatially and temporally primarily due to mobility and traffic situations in a geographical area, making it difficult to provide ubiquitous service. Moreover, as energy-constrained UAVs hover in the sky while serving mobile users, they may be faced with interference from nearby UAV cells or other access points sharing the same frequency band, thereby impacting the system's energy efficiency (EE). Recent multi-agent reinforcement learning (MARL) approaches applied to optimise the users' coverage worked well in reasonably even densities but might not perform as well in uneven users' distribution, i.e., in urban road networks with uneven concentration of vehicles. In this work, we propose a density-aware communication-enabled multi-agent decentralised double deep Q-network (DACEMAD-DDQN) approach that maximises the total system's EE by jointly optimising the trajectory of each UAV, the number of connected users, and the UAVs' energy consumption while keeping track of dense and uneven users' distribution. Our result outperforms state-of-the-art MARL approaches in terms of EE by as much as 65% - 85%. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 7 pages, To appear in the conference proceedings of IEEE WiMob 2023, Montreal, Canada

arXiv:2303.10236 [pdf, other]

Prevalence of Code Smells in Reinforcement Learning Projects

Authors: Nicolás Cardozo, Ivana Dusparic, Christian Cabrera

Abstract: Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed b… ▽ More Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms. △ Less

Submitted 3 August, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Paper preprint for the 2nd International Conference on AI Engineering Software Engineering for AI CAIN2023

arXiv:2303.08772 [pdf, other]

Reservation of Virtualized Resources with Optimistic Online Learning

Authors: Jean-Baptiste Monteil, George Iosifidis, Ivana Dusparic

Abstract: The virtualization of wireless networks enables new services to access network resources made available by the Network Operator (NO) through a Network Slicing market. The different service providers (SPs) have the opportunity to lease the network resources from the NO to constitute slices that address the demand of their specific network service. The goal of any SP is to maximize its service utili… ▽ More The virtualization of wireless networks enables new services to access network resources made available by the Network Operator (NO) through a Network Slicing market. The different service providers (SPs) have the opportunity to lease the network resources from the NO to constitute slices that address the demand of their specific network service. The goal of any SP is to maximize its service utility and minimize costs from leasing resources while facing uncertainties of the prices of the resources and the users' demand. In this paper, we propose a solution that allows the SP to decide its online reservation policy, which aims to maximize its service utility and minimize its cost of reservation simultaneously. We design the Optimistic Online Learning for Reservation (OOLR) solution, a decision algorithm built upon the Follow-the-Regularized Leader (FTRL), that incorporates key predictions to assist the decision-making process. Our solution achieves a $\mathcal{O}(\sqrt{T})$ regret bound where $T$ represents the horizon. We integrate a prediction model into the OOLR solution and we demonstrate through numerical results the efficacy of the combined models' solution against the FTRL baseline. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 7 pages, 4 figures, ICC 2023 conference

arXiv:2303.04475 [pdf, other]

RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Authors: Jasmina Gajcin, Ivana Dusparic

Abstract: While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to g… ▽ More While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals that are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behavior of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly probable desired outcomes. We use a heuristic tree search of the agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior compared to the current state-of-the-art approaches. △ Less

Submitted 10 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 10 pages, 3 figures, 3 tables

arXiv:2303.01170 [pdf, other]

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Authors: Alberto Castagna, Ivana Dusparic

Abstract: Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on e… ▽ More Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed. △ Less

Submitted 28 July, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2211.05551 [pdf, other]

Causal Counterfactuals for Improving the Robustness of Reinforcement Learning

Authors: Tom He, Jasmina Gajcin, Ivana Dusparic

Abstract: Reinforcement learning (RL) is used in various robotic applications. RL enables agents to learn tasks autonomously by interacting with the environment. The more critical the tasks are, the higher the demand for the robustness of the RL systems. Causal RL combines RL and causal inference to make RL more robust. Causal RL agents use a causal representation to capture the invariant causal mechanisms… ▽ More Reinforcement learning (RL) is used in various robotic applications. RL enables agents to learn tasks autonomously by interacting with the environment. The more critical the tasks are, the higher the demand for the robustness of the RL systems. Causal RL combines RL and causal inference to make RL more robust. Causal RL agents use a causal representation to capture the invariant causal mechanisms that can be transferred from one task to another. Currently, there is limited research in Causal RL, and existing solutions are usually not complete or feasible for real-world applications. In this work, we propose CausalCF, the first complete Causal RL solution incorporating ideas from Causal Curiosity and CoPhy. Causal Curiosity provides an approach for using interventions, and CoPhy is modified to enable the RL agent to perform counterfactuals. Causal Curiosity has been applied to robotic grasping and manipulation tasks in CausalWorld. CausalWorld provides a realistic simulation environment based on the TriFinger robot. We apply CausalCF to complex robotic tasks and show that it improves the RL agent's robustness using CausalWorld. △ Less

Submitted 5 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to ARMS-2023 (ARMS-2023: AAMAS 2023 Workshop on Autonomous Robots and Multirobot Systems)

arXiv:2211.04813 [pdf, other]

doi 10.5220/0011610300003393

Deep W-Networks: Solving Multi-Objective Optimisation Problems With Deep Reinforcement Learning

Authors: Jernej Hribar, Luke Hackett, Ivana Dusparic

Abstract: In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state… ▽ More In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state spaces. To address this issue, we replace underlying Q-tables with DQN, and propose an addition of W-Networks, as a replacement for tabular weights (W) representations. We evaluate the resulting Deep W-Networks (DWN) approach in two widely-accepted multi-objective RL benchmarks: deep sea treasure and multi-objective mountain car. We show that DWN solves the competition between multiple policies while outperforming the baseline in the form of a DQN solution. Additionally, we demonstrate that the proposed algorithm can find the Pareto front in both tested environments. △ Less

Submitted 23 February, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.11846 [pdf, other]

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

Authors: Jasmina Gajcin, Ivana Dusparic

Abstract: While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users… ▽ More While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL. △ Less

Submitted 9 February, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: 32 pages, 6 figures

arXiv:2210.00041 [pdf, other]

Communication-Enabled Deep Reinforcement Learning to Optimise Energy-Efficiency in UAV-Assisted Networks

Authors: Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

Abstract: Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static and mobile ground users in situations of increased network demand or points of failure in existing terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the… ▽ More Unmanned aerial vehicles (UAVs) are increasingly deployed to provide wireless connectivity to static and mobile ground users in situations of increased network demand or points of failure in existing terrestrial cellular infrastructure. However, UAVs are energy-constrained and experience the challenge of interference from nearby UAV cells sharing the same frequency spectrum, thereby impacting the system's energy efficiency (EE). Recent approaches focus on optimising the system's EE by optimising the trajectory of UAVs serving only static ground users and neglecting mobile users. Several others neglect the impact of interference from nearby UAV cells, assuming an interference-free network environment. Despite growing research interest in decentralised control over centralised UAVs' control, direct collaboration among UAVs to improve coordination while optimising the systems' EE has not been adequately explored. To address this, we propose a direct collaborative communication-enabled multi-agent decentralised double deep Q-network (CMAD-DDQN) approach. The CMAD-DDQN is a collaborative algorithm that allows UAVs to explicitly share their telemetry via existing 3GPP guidelines by communicating with their nearest neighbours. This allows the agent-controlled UAVs to optimise their 3D flight trajectories by filling up knowledge gaps and converging to optimal policies. Simulation results show that the proposed approach outperforms existing baselines in terms of maximising the systems' EE without degrading coverage performance in the network. The CMAD-DDQN approach outperforms the MAD-DDQN that neglects direct collaboration among UAVs, the multi-agent deep deterministic policy gradient (MADDPG) and random policy approaches that consider a 2D UAV deployment design while neglecting interference from nearby UAV cells by about 15%, 65% and 85%, respectively. △ Less

Submitted 27 June, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 16 pages, 22 figures, Under-review. Extension of the work arXiv:2204.01597

arXiv:2207.08651 [pdf, other]

Boolean Decision Rules for Reinforcement Learning Policy Summarisation

Authors: James McCarthy, Rahul Nair, Elizabeth Daly, Radu Marinescu, Ivana Dusparic

Abstract: Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's… ▽ More Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's policy. We evaluate our proposed approach using a DQN agent trained on an implementation of a lava gridworld and show that, given a hand-crafted feature representation of this gridworld, simple generalised rules can be created, giving a post-hoc explainable summary of the agent's policy. We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy, as well as discuss how creating simple rule summaries of an agent's policy may help in the debugging process of RL agents. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.12492 [pdf, other]

Guidelines for Artifacts to Support Industry-Relevant Research on Self-Adaptation

Authors: Danny Weyns, Ilias Gerostathopoulos, Barbora Buhnova, Nicolas Cardozo, Emilia Cioroaica, Ivana Dusparic, Lars Grunske, Pooyan Jamshidi, Christine Julien, Judith Michael, Gabriel Moreno, Shiva Nejati, Patrizio Pelliccione, Federico Quin, Genaina Rodrigues, Bradley Schmerl, Marco Vieira, Thomas Vogel, Rebekka Wohlrab

Abstract: Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevan… ▽ More Artifacts support evaluating new research results and help comparing them with the state of the art in a field of interest. Over the past years, several artifacts have been introduced to support research in the field of self-adaptive systems. While these artifacts have shown their value, it is not clear to what extent these artifacts support research on problems in self-adaptation that are relevant to industry. This paper provides a set of guidelines for artifacts that aim at supporting industry-relevant research on self-adaptation. The guidelines that are grounded on data obtained from a survey with practitioners were derived during working sessions at the 17th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. Artifact providers can use the guidelines for aligning future artifacts with industry needs; they can also be used to evaluate the industrial relevance of existing artifacts. We also propose an artifact template. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: 7 pages

arXiv:2205.11519 [pdf, other]

FedSA: Accelerating Intrusion Detection in Collaborative Environments with Federated Simulated Annealing

Authors: Helio N. Cunha Neto, Ivana Dusparic, Diogo M. F. Mattos, Natalia C. Fernandes

Abstract: Fast identification of new network attack patterns is crucial for improving network security. Nevertheless, identifying an ongoing attack in a heterogeneous network is a non-trivial task. Federated learning emerges as a solution to collaborative training for an Intrusion Detection System (IDS). The federated learning-based IDS trains a global model using local machine learning models provided by f… ▽ More Fast identification of new network attack patterns is crucial for improving network security. Nevertheless, identifying an ongoing attack in a heterogeneous network is a non-trivial task. Federated learning emerges as a solution to collaborative training for an Intrusion Detection System (IDS). The federated learning-based IDS trains a global model using local machine learning models provided by federated participants without sharing local data. However, optimization challenges are intrinsic to federated learning. This paper proposes the Federated Simulated Annealing (FedSA) metaheuristic to select the hyperparameters and a subset of participants for each aggregation round in federated learning. FedSA optimizes hyperparameters linked to the global model convergence. The proposal reduces aggregation rounds and speeds up convergence. Thus, FedSA accelerates learning extraction from local models, requiring fewer IDS updates. The proposal assessment shows that the FedSA global model converges in less than ten communication rounds. The proposal requires up to 50% fewer aggregation rounds to achieve approximately 97% accuracy in attack detection than the conventional aggregation approach. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2204.01597 [pdf, other]

Optimising Energy Efficiency in UAV-Assisted Networks using Deep Reinforcement Learning

Authors: Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

Abstract: In this letter, we study the energy efficiency (EE) optimisation of unmanned aerial vehicles (UAVs) providing wireless coverage to static and mobile ground users. Recent multi-agent reinforcement learning approaches optimise the system's EE using a 2D trajectory design, neglecting interference from nearby UAV cells. We aim to maximise the system's EE by jointly optimising each UAV's 3D trajectory,… ▽ More In this letter, we study the energy efficiency (EE) optimisation of unmanned aerial vehicles (UAVs) providing wireless coverage to static and mobile ground users. Recent multi-agent reinforcement learning approaches optimise the system's EE using a 2D trajectory design, neglecting interference from nearby UAV cells. We aim to maximise the system's EE by jointly optimising each UAV's 3D trajectory, number of connected users, and the energy consumed, while accounting for interference. Thus, we propose a cooperative Multi-Agent Decentralised Double Deep Q-Network (MAD-DDQN) approach. Our approach outperforms existing baselines in terms of EE by as much as 55 -- 80%. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 5 pages, Submitted to for publication in the IEEE Wireless Communication Letters

arXiv:2203.11211 [pdf, other]

ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning

Authors: Jasmina Gajcin, Ivana Dusparic

Abstract: Despite notable results in various fields over the recent years, deep reinforcement learning (DRL) algorithms lack transparency, affecting user trust and hindering their deployment to high-risk tasks. Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space, preventing safe deployment to real tasks where… ▽ More Despite notable results in various fields over the recent years, deep reinforcement learning (DRL) algorithms lack transparency, affecting user trust and hindering their deployment to high-risk tasks. Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space, preventing safe deployment to real tasks where such correlations might be broken. In this work, we examine whether an agent relies on spurious correlations in critical states, and propose an alternative subset of features on which it should base its decisions instead, to make it less susceptible to causal confusion. Our goal is to increase transparency of DRL agents by exposing the influence of learned spurious correlations on its decisions, and offering advice to developers about feature selection in different parts of state space, to avoid causal confusion. We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment, by executing its policy in alternative environments where certain correlations between features do not hold. We demonstrate our approach in taxi and grid world environments, where ReCCoVER detects states in which an agent relies on spurious correlations and offers a set of features that should be considered instead. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 18 pages, 4 tables, 4 figures

arXiv:2201.07308 [pdf, other]

doi 10.1109/WCNC51071.2022.9771901

Enabling Deep Reinforcement Learning on Energy Constrained Devices at the Edge of the Network

Authors: Jernej Hribar, Ivana Dusparic

Abstract: Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to o… ▽ More Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2112.09462 [pdf, other]

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Authors: Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

Abstract: In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose betwe… ▽ More In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose between offered policies, and can help developers understand different behaviors that emerge from various reward functions and training hyperparameters in RL systems. In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives. We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents. Furthermore, we use only data on preference-based differences in order to generate contrasting explanations about agents' preferences. Finally, we test and evaluate our approach on an autonomous driving task and compare the behavior of a safety-oriented policy and one that prefers speed. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: 7 pages, 3 figures

arXiv:2112.00424 [pdf, other]

Multi-Agent Transfer Learning in Reinforcement Learning-Based Ride-Sharing Systems

Authors: Alberto Castagna, Ivana Dusparic

Abstract: Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adap… ▽ More Reinforcement learning (RL) has been used in a range of simulated real-world tasks, e.g., sensor coordination, traffic light control, and on-demand mobility services. However, real world deployments are rare, as RL struggles with dynamic nature of real world environments, requiring time for learning a task and adapting to changes in the environment. Transfer Learning (TL) can help lower these adaptation times. In particular, there is a significant potential of applying TL in multi-agent RL systems, where multiple agents can share knowledge with each other, as well as with new agents that join the system. To obtain the most from inter-agent transfer, transfer roles (i.e., determining which agents act as sources and which as targets), as well as relevant transfer content parameters (e.g., transfer size) should be selected dynamically in each particular situation. As a first step towards fully dynamic transfers, in this paper we investigate the impact of TL transfer parameters with fixed source and target roles. Specifically, we label every agent-environment interaction with agent's epistemic confidence, and we filter the shared examples using varying threshold levels and sample sizes. We investigate impact of these parameters in two scenarios, a standard predator-prey RL benchmark and a simulation of a ride-sharing system with 200 vehicle agents and 10,000 ride-requests. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2111.02258 [pdf, other]

Multi-Agent Deep Reinforcement Learning For Optimising Energy Efficiency of Fixed-Wing UAV Cellular Access Points

Authors: Boris Galkin, Babatunji Omoniwa, Ivana Dusparic

Abstract: Unmanned Aerial Vehicles (UAVs) promise to become an intrinsic part of next generation communications, as they can be deployed to provide wireless connectivity to ground users to supplement existing terrestrial networks. The majority of the existing research into the use of UAV access points for cellular coverage considers rotary-wing UAV designs (i.e. quadcopters). However, we expect fixed-wing U… ▽ More Unmanned Aerial Vehicles (UAVs) promise to become an intrinsic part of next generation communications, as they can be deployed to provide wireless connectivity to ground users to supplement existing terrestrial networks. The majority of the existing research into the use of UAV access points for cellular coverage considers rotary-wing UAV designs (i.e. quadcopters). However, we expect fixed-wing UAVs to be more appropriate for connectivity purposes in scenarios where long flight times are necessary (such as for rural coverage), as fixed-wing UAVs rely on a more energy-efficient form of flight when compared to the rotary-wing design. As fixed-wing UAVs are typically incapable of hovering in place, their deployment optimisation involves optimising their individual flight trajectories in a way that allows them to deliver high quality service to the ground users in an energy-efficient manner. In this paper, we propose a multi-agent deep reinforcement learning approach to optimise the energy efficiency of fixed-wing UAV cellular access points while still allowing them to deliver high-quality service to users on the ground. In our decentralized approach, each UAV is equipped with a Dueling Deep Q-Network (DDQN) agent which can adjust the 3D trajectory of the UAV over a series of timesteps. By coordinating with their neighbours, the UAVs adjust their individual flight trajectories in a manner that optimises the total system energy efficiency. We benchmark the performance of our approach against a series of heuristic trajectory planning strategies, and demonstrate that our method can improve the system energy efficiency by as much as 70%. △ Less

Submitted 3 November, 2021; originally announced November 2021.

arXiv:2109.14535 [pdf, other]

doi 10.1109/GLOBECOM46510.2021.9685166

Analyse or Transmit: Utilising Correlation at the Edge with Deep Reinforcement Learning

Authors: Jernej Hribar, Ryoichi Shinkuma, George Iosifidis, Ivana Dusparic

Abstract: Millions of sensors, cameras, meters, and other edge devices are deployed in networks to collect and analyse data. In many cases, such devices are powered only by Energy Harvesting(EH) and have limited energy available to analyse acquired data. When edge infrastructure is available, a device has a choice: to perform analysis locally or offload the task to other resource-rich devices such as cloudl… ▽ More Millions of sensors, cameras, meters, and other edge devices are deployed in networks to collect and analyse data. In many cases, such devices are powered only by Energy Harvesting(EH) and have limited energy available to analyse acquired data. When edge infrastructure is available, a device has a choice: to perform analysis locally or offload the task to other resource-rich devices such as cloudlet servers. However, such a choice carries a price in terms of consumed energy and accuracy. On the one hand, transmitting raw data can result in a higher energy cost in comparison to the required energy to process data locally. On the other hand, performing data analytics on servers can improve the task's accuracy. Additionally, due to the correlation between information sent by multiple devices, accuracy might not be affected if some edge devices decide to neither process nor send data and preserve energy instead. For such a scenario, we propose a Deep Reinforcement Learning (DRL) based solution capable of learning and adapting the policy to the time-varying energy arrival due to EH patterns. We leverage two datasets, one to model energy an EH device can collect and the other to model the correlation between cameras. Furthermore, we compare the proposed solution performance to three baseline policies. Our results show that we can increase accuracy by 15% in comparison to conventional approaches while preventing outages. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2106.00845 [pdf, other]

Energy-aware optimization of UAV base stations placement via decentralized multi-agent Q-learning

Authors: Babatunji Omoniwa, Boris Galkin, Ivana Dusparic

Abstract: Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-… ▽ More Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-based (RL) approaches have been previously used to improve energy utilization of multiple UAVs, however, a central cloud controller is assumed to have complete knowledge of the end-devices' locations, i.e., the controller periodically scans and sends updates for UAV decision-making. This assumption is impractical in dynamic network environments with UAVs serving mobile ground devices. To address this problem, we propose a decentralized Q-learning approach, where each UAV-BS is equipped with an autonomous agent that maximizes the connectivity of mobile ground devices while improving its energy utilization. Experimental results show that the proposed design significantly outperforms the centralized approaches in jointly maximizing the number of connected ground devices and the energy utilization of the UAV-BSs. △ Less

Submitted 4 November, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: Accepted paper for presentation and publication in the Proceedings of IEEE CCNC 2022, Las Vegas, USA

arXiv:2106.00654 [pdf, other]

A reinforcement learning approach to improve communication performance and energy utilization in fog-based IoT

Authors: Babatunji Omoniwa, Maxime Gueriau, Ivana Dusparic

Abstract: Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy wh… ▽ More Recent research has shown the potential of using available mobile fog devices (such as smartphones, drones, domestic and industrial robots) as relays to minimize communication outages between sensors and destination devices, where localized Internet-of-Things services (e.g., manufacturing process control, health and security monitoring) are delivered. However, these mobile relays deplete energy when they move and transmit to distant destinations. As such, power-control mechanisms and intelligent mobility of the relay devices are critical in improving communication performance and energy utilization. In this paper, we propose a Q-learning-based decentralized approach where each mobile fog relay agent (MFRA) is controlled by an autonomous agent which uses reinforcement learning to simultaneously improve communication performance and energy utilization. Each autonomous agent learns based on the feedback from the destination and its own energy levels whether to remain active and forward the message, or become passive for that transmission phase. We evaluate the approach by comparing with the centralized approach, and observe that with lesser number of MFRAs, our approach is able to ensure reliable delivery of data and reduce overall energy cost by 56.76\% -- 88.03\%. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: Submitted and published in IEEE proceedings

arXiv:2103.06908 [pdf, ps, other]

Adaptation to Unknown Situations as the Holy Grail of Learning-Based Self-Adaptive Systems: Research Directions

Authors: Ivana Dusparic, Nicolas Cardozo

Abstract: Self-adaptive systems continuously adapt to changes in their execution environment. Capturing all possible changes to define suitable behaviour beforehand is unfeasible, or even impossible in the case of unknown changes, hence human intervention may be required. We argue that adapting to unknown situations is the ultimate challenge for self-adaptive systems. Learning-based approaches are used to l… ▽ More Self-adaptive systems continuously adapt to changes in their execution environment. Capturing all possible changes to define suitable behaviour beforehand is unfeasible, or even impossible in the case of unknown changes, hence human intervention may be required. We argue that adapting to unknown situations is the ultimate challenge for self-adaptive systems. Learning-based approaches are used to learn the suitable behaviour to exhibit in the case of unknown situations, to minimize or fully remove human intervention. While such approaches can, to a certain extent, generalize existing adaptations to new situations, there is a number of breakthroughs that need to be achieved before systems can adapt to general unknown and unforeseen situations. We posit the research directions that need to be explored to achieve unanticipated adaptation from the perspective of learning-based self-adaptive systems. At minimum, systems need to define internal representations of previously unseen situations on-the-fly, extrapolate the relationship to the previously encountered situations to evolve existing adaptations, and reason about the feasibility of achieving their intrinsic goals in the new set of conditions. We close discussing whether, even when we can, we should indeed build systems that define their own behaviour and adapt their goals, without involving a human supervisor. △ Less

Submitted 11 March, 2021; originally announced March 2021.

arXiv:2103.06757 [pdf, other]

Auto-COP: Adaptation Generation in Context-Oriented Programming using Reinforcement Learning Options

Authors: Nicolás Cardozo, Ivana Dusparic

Abstract: Self-adaptive software systems continuously adapt in response to internal and external changes in their execution environment, captured as contexts. The COP paradigm posits a technique for the development of self-adaptive systems, capturing their main characteristics with specialized programming language constructs. COP adaptations are specified as independent modules composed in and out of the ba… ▽ More Self-adaptive software systems continuously adapt in response to internal and external changes in their execution environment, captured as contexts. The COP paradigm posits a technique for the development of self-adaptive systems, capturing their main characteristics with specialized programming language constructs. COP adaptations are specified as independent modules composed in and out of the base system as contexts are activated and deactivated in response to sensed circumstances from the surrounding environment. However, the definition of adaptations, their contexts and associated specialized behavior, need to be specified at design time. In complex CPS this is intractable due to new unpredicted operating conditions. We propose Auto-COP, a new technique to enable generation of adaptations at run time. Auto-COP uses RL options to build action sequences, based on the previous instances of the system execution. Options are explored in interaction with the environment, and the most suitable options for each context are used to generate adaptations exploiting COP. To validate Auto-COP, we present two case studies exhibiting different system characteristics and application domains: a driving assistant and a robot delivery system. We present examples of Auto-COP code generated at run time, to illustrate the types of circumstances (contexts) requiring adaptation, and the corresponding generated adaptations for each context. We confirm that the generated adaptations exhibit correct system behavior measured by domain-specific performance metrics, while reducing the number of required execution/actuation steps by a factor of two showing that the adaptations are regularly selected by the running system as adaptive behavior is more appropriate than the execution of primitive actions. △ Less

Submitted 3 August, 2023; v1 submitted 11 March, 2021; originally announced March 2021.

Comments: Submitted to journal of Information and software technology. 22 pages

arXiv:2102.12899 [pdf, other]

Mobility for Cellular-Connected UAVs: challenges for the network provider

Authors: Erika Fonseca, Boris Galkin, Marvin Kelly, Luiz A. DaSilva, Ivana Dusparic

Abstract: Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the… ▽ More Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the network. We analyse the 3GPP specifications, the existing research literature, and a publicly available UAV connectivity dataset, to describe the challenges. We classify these challenges into network planning and network optimisation categories. We discuss the challenge of planning network coverage when considering coverage for flying users and the PCI collision and confusion issues that can be aggravated by these users. In discussing network optimisation challenges, we introduce Automatic Neighbouring Relation (ANR) and handover challenges, specifically the number of neighbours in the Neighbour Relation Table (NRT), and their potential deletion and block-listing, the frequent number of handovers and the possibility that the UAV disconnects because of handover issues. We discuss possible approaches to address the presented challenges and use a real-world dataset to support our findings about these challenges and their importance. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 6 pages, 4 figures

arXiv:2011.03236 [pdf, other]

Experimental Evaluation of a UAV User QoS from a Two-Tier 3.6GHz Spectrum Network

Authors: Boris Galkin, Erika Fonseca, Gavin Lee, Conor Duff, Marvin Kelly, Edward Emmanuel, Ivana Dusparic

Abstract: Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and anten… ▽ More Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and antenna misalignment. In this work, we perform a novel measurement campaign of the performance of a UAV user when it connects to an experimental two-tier cellular network in two different areas of Dublin city's Smart Docklands, which includes massive MIMO macrocells and wirelessly-backhauled small cells. We measure Reference Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), Signal to Interference and Noise Ratio (SINR), the downlink throughput, and the small cell handover rate. Our results show that increasing the UAV height reduces the performance in both tiers, due to issues such as antenna misalignment. The small cell tier, however, can maintain relatively stable performance across the entire range of UAV heights, suggesting that UAV users can successfully connect to small cells during their flight. Furthermore, we demonstrate that while the UAV handover rate significantly fluctuates at different heights, the overall observed handover rates are very low. Our results highlight the potential for small cells in urban areas to provide connectivity to UAVs. △ Less

Submitted 9 April, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

arXiv:2010.01126 [pdf, ps, other]

REQIBA: Regression and Deep Q-Learning for Intelligent UAV Cellular User to Base Station Association

Authors: Boris Galkin, Erika Fonseca, Ramy Amer, Luiz A. DaSilva, Ivana Dusparic

Abstract: Unmanned Aerial Vehicles (UAVs) are emerging as important users of next-generation cellular networks. By operating in the sky, UAV users experience very different radio conditions than terrestrial users, due to factors such as strong Line-of-Sight (LoS) channels (and interference) and Base Station (BS) antenna misalignment. As a consequence, the UAVs may experience significant degradation to their… ▽ More Unmanned Aerial Vehicles (UAVs) are emerging as important users of next-generation cellular networks. By operating in the sky, UAV users experience very different radio conditions than terrestrial users, due to factors such as strong Line-of-Sight (LoS) channels (and interference) and Base Station (BS) antenna misalignment. As a consequence, the UAVs may experience significant degradation to their received quality of service, particularly when they are moving and are subject to frequent handovers. The solution is to allow the UAV to be aware of its surrounding environment, and intelligently connect into the cellular network taking advantage of this awareness. In this paper we present REgression and deep Q-learning for Intelligent UAV cellular user to Base station Association (REQIBA), a solution that allows a UAV flying over an urban area to intelligently connect to underlying BSs, using information about the received signal powers, the BS locations, and the surrounding building topology. We demonstrate how REQIBA can as much as double the total UAV throughput, when compared to heuristic association schemes similar to those commonly used by terrestrial users. We also evaluate how environmental factors such as UAV height, building density, and throughput loss due to handovers impact the performance of our solution. △ Less

Submitted 3 November, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: To appear in IEEE Transactions on Vehicular Technology (TVT)

arXiv:2007.13695 [pdf, other]

Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning

Authors: Erika Fonseca, Boris Galkin, Ramy Amer, Luiz A. DaSilva, Ivana Dusparic

Abstract: Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connect… ▽ More Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput or spectrum efficiency that it experiences. The proposed solution is evaluated in two settings: using a series of generated environments where we vary the number of BS and building densities, and in a scenario using real-world data obtained from an experiment in Dublin, Ireland. Results show that our proposed RL-based solution improves UAVs QoS by 6% to 41%, depending on the scenario. We also conclude that, when flying at heights higher than the buildings, building density variation has no impact on UAV QoS. On the other hand, BS density can negatively impact UAV QoS, with higher numbers of BSs generating more interference and deteriorating UAV performance. △ Less

Submitted 13 April, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

arXiv:1810.03679 [pdf, other]

Multi-agent Deep Reinforcement Learning for Zero Energy Communities

Authors: Amit Prasad, Ivana Dusparic

Abstract: Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of… ▽ More Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of energy sharing in such a community. This is different from previously addressed energy sharing between buildings as our focus is on the improvement of community energy status, while traditionally research focused on reducing losses due to transmission and storage, or achieving economic gains. We model this problem in a multi-agent environment and propose a Deep Reinforcement Learning (DRL) based solution. Each building is represented by an intelligent agent that learns over time the appropriate behaviour to share energy. We have evaluated the proposed solution in a multi-agent simulation built using osBrain. Results indicate that with time agents learn to collaborate and learn a policy comparable to the optimal policy, which in turn improves the ZEC's energy status. Buildings with no renewables preferred to request energy from their neighbours rather than from the supply grid. △ Less

Submitted 27 June, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

Comments: Accepted at ISGT Europe 2019

MSC Class: 97R40 ACM Class: I.2.11; I.2.6

arXiv:1805.09090 [pdf, other]

Volunteers in the Smart City: Comparison of Contribution Strategies on Human-Centered Measures

Authors: Stefano Bennati, Ivana Dusparic, Rhythima Shinde, Catholijn M. Jonker

Abstract: Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depe… ▽ More Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depends on contributors and free-riders, which benefit from the service without contributing own resources. Following this design principle, different classes of algorithms for resource management are evaluated with respect to human-centered measures, i.e., privacy, fairness and social welfare. Trade-offs that characterize algorithms are discussed across two smart city application scenarios. These results might help Smart City application designers to choose a suitable algorithm given a scenario-specific set of requirements, and users to choose a service based on an algorithm that matches their preferences. △ Less

Submitted 23 May, 2018; originally announced May 2018.

arXiv:1409.4561 [pdf, other]

Decentralised Multi-Agent Reinforcement Learning for Dynamic and Uncertain Environments

Authors: Andrei Marinescu, Ivana Dusparic, Adam Taylor, Vinny Cahill, Siobhán Clarke

Abstract: Multi-Agent Reinforcement Learning (MARL) is a widely used technique for optimization in decentralised control problems. However, most applications of MARL are in static environments, and are not suitable when agent behaviour and environment conditions are dynamic and uncertain. Addressing uncertainty in such environments remains a challenging problem for MARL-based systems. The dynamic nature of… ▽ More Multi-Agent Reinforcement Learning (MARL) is a widely used technique for optimization in decentralised control problems. However, most applications of MARL are in static environments, and are not suitable when agent behaviour and environment conditions are dynamic and uncertain. Addressing uncertainty in such environments remains a challenging problem for MARL-based systems. The dynamic nature of the environment causes previous knowledge of how agents interact to become outdated. Advanced knowledge of potential changes through prediction significantly supports agents converging to near-optimal control solutions. In this paper we propose P-MARL, a decentralised MARL algorithm enhanced by a prediction mechanism that provides accurate information regarding up-coming changes in the environment. This prediction is achieved by employing an Artificial Neural Network combined with a Self-Organising Map that detects and matches changes in the environment. The proposed algorithm is validated in a realistic smart-grid scenario, and provides a 92% Pareto efficient solution to an electric vehicle charging problem. △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: 7 pages, 7 figures, 1 Table, 1 algorithm, conference

Showing 1–37 of 37 results for author: Dusparic, I