Search | arXiv e-print repository

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Authors: Yucheng Shi, Alexandros Agapitos, David Lynch, Giorgio Cruciata, Cengis Hasan, Hao Wang, Yayu Yao, Aleksandar Milenovic

Abstract: In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to appr… ▽ More In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO. △ Less

Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.19462 [pdf, other]

Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation

Authors: Cengis Hasan, Alexandros Agapitos, David Lynch, Alberto Castagna, Giorgio Cruciata, Hao Wang, Aleksandar Milenovic

Abstract: We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulatio… ▽ More We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold compared to a reinitialise-and-retrain baseline without any drop in optimisation gain. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Published at ECML 2023

arXiv:2111.08587 [pdf, other]

Offline Contextual Bandits for Wireless Network Optimization

Authors: Miguel Suau, Alexandros Agapitos, David Lynch, Derek Farrell, Mingqi Zhou, Aleksandar Milenovic

Abstract: The explosion in mobile data traffic together with the ever-increasing expectations for higher quality of service call for the development of AI algorithms for wireless network optimization. In this paper, we investigate how to learn policies that can automatically adjust the configuration parameters of every cell in the network in response to the changes in the user demand. Our solution combines… ▽ More The explosion in mobile data traffic together with the ever-increasing expectations for higher quality of service call for the development of AI algorithms for wireless network optimization. In this paper, we investigate how to learn policies that can automatically adjust the configuration parameters of every cell in the network in response to the changes in the user demand. Our solution combines existent methods for offline learning and adapts them in a principled way to overcome crucial challenges arising in this context. Empirical results suggest that our proposed method will achieve important performance gains when deployed in the real network while satisfying practical constrains on computational efficiency. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Showing 1–3 of 3 results for author: Milenovic, A