Search | arXiv e-print repository

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Authors: Jun Jet Tai, Jordan K. Terry, Mauro S. Innocente, James Brusey, Nadjim Horri

Abstract: An inherent problem of reinforcement learning is performing exploration of an environment through random actions, of which a large portion can be unproductive. Instead, exploration can be improved by initializing the learning policy with an existing (previously learned or hard-coded) oracle policy, offline data, or demonstrations. In the case of using an oracle policy, it can be unclear how best t… ▽ More An inherent problem of reinforcement learning is performing exploration of an environment through random actions, of which a large portion can be unproductive. Instead, exploration can be improved by initializing the learning policy with an existing (previously learned or hard-coded) oracle policy, offline data, or demonstrations. In the case of using an oracle policy, it can be unclear how best to incorporate the oracle policy's experience into the learning policy in a way that maximizes learning sample efficiency. In this paper, we propose a method termed Critic Confidence Guided Exploration (CCGE) for incorporating such an oracle policy into standard actor-critic reinforcement learning algorithms. More specifically, CCGE takes in the oracle policy's actions as suggestions and incorporates this information into the learning scheme when uncertainty is high, while ignoring it when the uncertainty is low. CCGE is agnostic to methods of estimating uncertainty, and we show that it is equally effective with two different techniques. Empirically, we evaluate the effect of CCGE on various benchmark reinforcement learning tasks, and show that this idea can lead to improved sample efficiency and final performance. Furthermore, when evaluated on sparse reward environments, CCGE is able to perform competitively against adjacent algorithms that also leverage an oracle policy. Our experiments show that it is possible to utilize uncertainty as a heuristic to guide exploration using an oracle in reinforcement learning. We expect that this will inspire more research in this direction, where various heuristics are used to determine the direction of guidance provided to learning. △ Less

Submitted 21 August, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Under review at TMLR

arXiv:2109.10761 [pdf, other]

Stigmergy-based collision-avoidance algorithm for self-organising swarms

Authors: Paolo Grasso, Mauro Sebastián Innocente

Abstract: Real-time multi-agent collision-avoidance algorithms comprise a key enabling technology for the practical use of self-organising swarms of drones. This paper proposes a decentralised reciprocal collision-avoidance algorithm, which is based on stigmergy and scalable. The algorithm is computationally inexpensive, based on the gradient of the locally measured dynamic cumulative signal strength field… ▽ More Real-time multi-agent collision-avoidance algorithms comprise a key enabling technology for the practical use of self-organising swarms of drones. This paper proposes a decentralised reciprocal collision-avoidance algorithm, which is based on stigmergy and scalable. The algorithm is computationally inexpensive, based on the gradient of the locally measured dynamic cumulative signal strength field which results from the signals emitted by the swarm. The signal strength acts as a repulsor on each drone, which then tends to steer away from the noisiest regions (cluttered environment), thus avoiding collisions. The magnitudes of these repulsive forces can be tuned to control the relative importance assigned to collision avoidance with respect to the other phenomena affecting the agent's dynamics. We carried out numerical experiments on a self-organising swarm of drones aimed at fighting wildfires autonomously. As expected, it has been found that the collision rate can be reduced either by decreasing the cruise speed of the agents and/or by increasing the sampling frequency of the global signal strength field. A convenient by-product of the proposed collision-avoidance algorithm is that it helps maintain diversity in the swarm, thus enhancing exploration. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Accepted for publication in Proceedings of the 5th International Conference on Computational Vision and Bio Inspired Computing. To be published in Springer's Advances in Intelligent Systems and Computing

arXiv:2104.12475 [pdf, other]

Particle Swarms Reformulated towards a Unified and Flexible Framework

Authors: Mauro Sebastián Innocente

Abstract: The Particle Swarm Optimisation (PSO) algorithm has undergone countless modifications and adaptations since its original formulation in 1995. Some of these have become mainstream whereas many others have not been adopted and faded away. Thus, a myriad of alternative formulations have been proposed to the extent that the question arises as to what the basic features of an algorithm must be to belon… ▽ More The Particle Swarm Optimisation (PSO) algorithm has undergone countless modifications and adaptations since its original formulation in 1995. Some of these have become mainstream whereas many others have not been adopted and faded away. Thus, a myriad of alternative formulations have been proposed to the extent that the question arises as to what the basic features of an algorithm must be to belong in the PSO family. The aim of this paper is to establish what defines a PSO algorithm and to attempt to formulate it in such a way that it encompasses many existing variants. Therefore, different versions of the method may be posed as settings within the proposed unified framework. In addition, the proposed formulation generalises, decouples and incorporates features to the method providing more flexibility to the behaviour of each particle. The closed forms of the trajectory difference equation are obtained, different types of behaviour are identified, stochasticity is decoupled, and traditionally global features such as sociometries and constraint-handling are re-defined as particle's attributes. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Preprint The final authenticated article will be published by Springer-Nature in the Lecture Notes in Computer Science series. This research will be presented at the Twelfth International Conference on Swarm Intelligence (ICSI 2021)

Journal ref: Lecture Notes in Computer Science, ICSI 2021

arXiv:2101.11944 [pdf]

Coefficients' Settings in Particle Swarm Optimization: Insight and Guidelines

Authors: Mauro S. Innocente, Johann Sienz

Abstract: Particle Swam Optimization is a population-based and gradient-free optimization method developed by mimicking social behaviour observed in nature. Its ability to optimize is not specifically implemented but emerges in the global level from local interactions. In its canonical version, there are three factors that govern a particle's trajectory: 1) inertia from its previous displacement; 2) attract… ▽ More Particle Swam Optimization is a population-based and gradient-free optimization method developed by mimicking social behaviour observed in nature. Its ability to optimize is not specifically implemented but emerges in the global level from local interactions. In its canonical version, there are three factors that govern a particle's trajectory: 1) inertia from its previous displacement; 2) attraction to its best experience; and 3) attraction to a given neighbour's best experience. The importance given to each of these factors is regulated by three coefficients: 1) the inertia; 2) the individuality; and 3) the sociality weights. Their settings rule the trajectory of the particle when pulled by these two attractors. Different speeds and forms of convergence of a particle towards its attractor(s) take place for different settings of the coefficients. A more general formulation is presented aiming for a better control of the embedded randomness. Guidelines to select the coefficients' settings to obtain the desired behaviour are offered. The convergence speed of the algorithm also depends on the speed of spread of information within the swarm. The latter is governed by the structure of the neighbourhood, whose study is beyond the scope of this paper. The objective here is to help understand the core of the PSO paradigm from the bottom up by offering some insight into the form of the particles' trajectories, and to provide some guidelines as to how to decide upon the settings of the coefficients in the particles' velocity update equation in the proposed formulation to obtain the type of behaviour desired for the problem at hand. General-purpose settings are also suggested. The relationship between the proposed formulation and both the classical and constricted PSO formulations are also provided. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: Preprint submitted to E. Dvorkin, M. Goldschmit, & M. Storti (Eds.), Mecánica Computacional: Computational Intelligence Techniques for Optimization and Data Modeling (B) (Vol. XXIX, pp. 9253-9269). Asociación Argentina de Mecánica Computacional, Buenos Aires, Argentina, 2010. Open access published version here: https://cimec.org.ar/ojs/index.php/mc/article/view/3666

arXiv:2101.11441 [pdf]

doi 10.4203/ccp.93.123

Pseudo-Adaptive Penalization to Handle Constraints in Particle Swarm Optimizers

Authors: Mauro S. Innocente, Johann Sienz

Abstract: The penalization method is a popular technique to provide particle swarm optimizers with the ability to handle constraints. The downside is the need of penalization coefficients whose settings are problem-specific. While adaptive coefficients can be found in the literature, a different adaptive scheme is proposed in this paper, where coefficients are kept constant. A pseudo-adaptive relaxation of… ▽ More The penalization method is a popular technique to provide particle swarm optimizers with the ability to handle constraints. The downside is the need of penalization coefficients whose settings are problem-specific. While adaptive coefficients can be found in the literature, a different adaptive scheme is proposed in this paper, where coefficients are kept constant. A pseudo-adaptive relaxation of the tolerances for constraint violations while penalizing only violations beyond such tolerances results in a pseudo-adaptive penalization. A particle swarm optimizer is tested on a suite of benchmark problems for three types of tolerance relaxation: no relaxation; self-tuned initial relaxation with deterministic decrease; and self-tuned initial relaxation with pseudo-adaptive decrease. Other authors' results are offered as frames of reference. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Preprint submitted to Proceedings of the tenth International Conference on Computational Structures Technology

arXiv:2101.11439 [pdf]

doi 10.4203/csets.26.10

Individual and Social Behaviour in Particle Swarm Optimizers

Authors: Johann Sienz, Mauro S. Innocente

Abstract: Three basic factors govern the individual behaviour of a particle: the inertia from its previous displacement; the attraction to its own best experience; and the attraction to a given neighbour's best experience. The importance awarded to each factor is controlled by three coefficients: the inertia; the individuality; and the sociality weights. The social behaviour is ruled by the structure of the… ▽ More Three basic factors govern the individual behaviour of a particle: the inertia from its previous displacement; the attraction to its own best experience; and the attraction to a given neighbour's best experience. The importance awarded to each factor is controlled by three coefficients: the inertia; the individuality; and the sociality weights. The social behaviour is ruled by the structure of the social network, which defines the neighbours that are to inform of their experiences to a given particle. This paper presents a study of the influence of different settings of the coefficients as well as of the combined effect of different settings and different neighbourhood topologies on the speed and form of convergence. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Preprint submitted to Developments and Applications in Engineering Computational Technology

arXiv:2101.11096 [pdf]

doi 10.4203/csets.20.6

Particle Swarm Optimization: Fundamental Study and its Application to Optimization and to Jetty Scheduling Problems

Authors: Johann Sienz, Mauro S. Innocente

Abstract: The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature. While particle swarm optimizers share such advantages, they outperform evolutionary algorithms in that they require lower computational cost and easier implementation, involving no operator design and few coefficients to be tuned. However, even marginal variations in the set… ▽ More The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature. While particle swarm optimizers share such advantages, they outperform evolutionary algorithms in that they require lower computational cost and easier implementation, involving no operator design and few coefficients to be tuned. However, even marginal variations in the settings of these coefficients greatly influence the dynamics of the swarm. Since this paper does not intend to study their tuning, general-purpose settings are taken from previous studies, and virtually the same algorithm is used to optimize a variety of notably different problems. Thus, following a review of the paradigm, the algorithm is tested on a set of benchmark functions and engineering problems taken from the literature. Later, complementary lines of code are incorporated to adapt the method to combinatorial optimization as it occurs in scheduling problems, and a real case is solved using the same optimizer with the same settings. The aim is to show the flexibility and robustness of the approach, which can handle a wide variety of problems. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: Preprint submitted to Trends in Engineering Computational Technology. arXiv admin note: text overlap with arXiv:2101.10933

arXiv:2101.10936 [pdf]

Combining Particle Swarm Optimizer with SQP Local Search for Constrained Optimization Problems

Authors: Carwyn Pelley, Mauro S. Innocente, Johann Sienz

Abstract: The combining of a General-Purpose Particle Swarm Optimizer (GP-PSO) with Sequential Quadratic Programming (SQP) algorithm for constrained optimization problems has been shown to be highly beneficial to the refinement, and in some cases, the success of finding a global optimum solution. It is shown that the likely difference between leading algorithms are in their local search ability. A compariso… ▽ More The combining of a General-Purpose Particle Swarm Optimizer (GP-PSO) with Sequential Quadratic Programming (SQP) algorithm for constrained optimization problems has been shown to be highly beneficial to the refinement, and in some cases, the success of finding a global optimum solution. It is shown that the likely difference between leading algorithms are in their local search ability. A comparison with other leading optimizers on the tested benchmark suite, indicate the hybrid GP-PSO with implemented local search to compete along side other leading PSO algorithms. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Preprint submitted to the 8th ASMO UK Conference on Engineering Design Optimization

arXiv:2101.10935 [pdf]

Numerical Comparison of Neighbourhood Topologies in Particle Swarm Optimization

Authors: Mauro S. Innocente, Johann Sienz

Abstract: Particle Swarm Optimization is a global optimizer in the sense that it has the ability to escape poor local optima. However, if the spread of information within the population is not adequately performed, premature convergence may occur. The convergence speed and hence the reluctance of the algorithm to getting trapped in suboptimal solutions are controlled by the settings of the coefficients in t… ▽ More Particle Swarm Optimization is a global optimizer in the sense that it has the ability to escape poor local optima. However, if the spread of information within the population is not adequately performed, premature convergence may occur. The convergence speed and hence the reluctance of the algorithm to getting trapped in suboptimal solutions are controlled by the settings of the coefficients in the velocity update equation as well as by the neighbourhood topology. The coefficients settings govern the trajectories of the particles towards the good locations identified, whereas the neighbourhood topology controls the form and speed of spread of information within the population (i.e. the update of the social attractor). Numerous neighbourhood topologies have been proposed and implemented in the literature. This paper offers a numerical comparison of the performances exhibited by five different neighbourhood topologies combined with four different coefficients' settings when optimizing a set of benchmark unconstrained problems. Despite the optimum topology being problem-dependent, it appears that dynamic neighbourhoods with the number of interconnections increasing as the search progresses should be preferred for a non-problem-specific optimizer. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Preprint submitted to the 8th ASMO UK Conference on Engineering Design Optimization

arXiv:2101.10933 [pdf]

Constraint-Handling Techniques for Particle Swarm Optimization Algorithms

Authors: Mauro S. Innocente, Johann Sienz

Abstract: Population-based methods can cope with a variety of different problems, including problems of remarkably higher complexity than those traditional methods can handle. The main procedure consists of successively updating a population of candidate solutions, performing a parallel exploration instead of traditional sequential exploration. While the origins of the PSO method are linked to bird flock si… ▽ More Population-based methods can cope with a variety of different problems, including problems of remarkably higher complexity than those traditional methods can handle. The main procedure consists of successively updating a population of candidate solutions, performing a parallel exploration instead of traditional sequential exploration. While the origins of the PSO method are linked to bird flock simulations, it is a stochastic optimization method in the sense that it relies on random coefficients to introduce creativity, and a bottom-up artificial intelligence-based approach in the sense that its intelligent behaviour emerges in a higher level than the individuals' rather than deterministically programmed. As opposed to EAs, the PSO involves no operator design and few coefficients to be tuned. Since this paper does not intend to study such tuning, general-purpose settings are taken from previous studies. The PSO algorithm requires the incorporation of some technique to handle constraints. A popular one is the penalization method, which turns the original constrained problem into unconstrained by penalizing infeasible solutions. Other techniques can be specifically designed for PSO. Since these strategies present advantages and disadvantages when compared to one another, there is no obvious best constraint-handling technique (CHT) for all problems. The aim here is to develop and compare different CHTs suitable for PSOs, which are incorporated to an algorithm with general-purpose settings. The comparisons are performed keeping the remaining features of the algorithm the same, while comparisons to other authors' results are offered as a frame of reference for the optimizer as a whole. Thus, the penalization, preserving feasibility and bisection methods are discussed, implemented, and tested on two suites of benchmark problems. Three neighbourhood sizes are also considered in the experiments. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: Preprint submitted to the 7th ASMO UK Conference on Engineering Design Optimization

arXiv:2101.10901 [pdf]

Population-Based Methods: PARTICLE SWARM OPTIMIZATION -- Development of a General-Purpose Optimizer and Applications

Authors: Mauro S. Innocente

Abstract: This thesis is concerned with continuous, static, and single-objective optimization problems subject to inequality constraints. Nevertheless, some methods to handle other kinds of problems are briefly reviewed. The particle swarm optimization paradigm was inspired by previous simulations of the cooperative behaviour observed in social beings. It is a bottom-up, randomly weighted, population-based… ▽ More This thesis is concerned with continuous, static, and single-objective optimization problems subject to inequality constraints. Nevertheless, some methods to handle other kinds of problems are briefly reviewed. The particle swarm optimization paradigm was inspired by previous simulations of the cooperative behaviour observed in social beings. It is a bottom-up, randomly weighted, population-based method whose ability to optimize emerges from local, individual-to-individual interactions. As opposed to traditional methods, it can deal with different problems with few or no adaptation due to the fact that it does profit from problem-specific features of the problem at issue but performs a parallel, cooperative exploration of the search-space by means of a population of individuals. The main goal of this thesis consists of developing an optimizer that can perform reasonably well on most problems. Hence, the influence of the settings of the algorithm's parameters on the behaviour of the system is studied, some general-purpose settings are sought, and some variations to the canonical version are proposed aiming to turn it into a more general-purpose optimizer. Since no termination condition is included in the canonical version, this thesis is also concerned with the design of some stopping criteria which allow the iterative search to be terminated if further significant improvement is unlikely, or if a certain number of time-steps are reached. In addition, some constraint-handling techniques are incorporated into the canonical algorithm to handle inequality constraints. Finally, the capabilities of the proposed general-purpose optimizers are illustrated by optimizing a few benchmark problems. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: MSc Thesis

arXiv:2101.10326 [pdf]

A Study of the Fundamental Parameters of Particle Swarm Optimizers

Authors: Mauro S. Innocente, Johann Sienz

Abstract: The range of applications of traditional optimization methods are limited by the features of the object variables, and of both the objective and the constraint functions. In contrast, population-based algorithms whose optimization capabilities are emergent properties, such as evolutionary algorithms and particle swarm optimization, present almost no restriction on those features and can handle dif… ▽ More The range of applications of traditional optimization methods are limited by the features of the object variables, and of both the objective and the constraint functions. In contrast, population-based algorithms whose optimization capabilities are emergent properties, such as evolutionary algorithms and particle swarm optimization, present almost no restriction on those features and can handle different optimization problems with few or no adaptations. Their main drawbacks consist of their comparatively higher computational cost and difficulty in handling equality constraints. The particle swarm optimization method is sometimes viewed as an evolutionary algorithm because of their many similarities, despite not being inspired by the same metaphor: they evolve a population of individuals taking into account previous experiences and using stochastic operators to introduce new responses. The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature for decades. While the particle swarm optimizers share such advantages, their main desirable features when compared to evolutionary algorithms are their lower computational cost and easier implementation, involving no operator design and few parameters to be tuned. However, even slight modifications of these parameters greatly influence the dynamics of the swarm. This paper deals with the effect of the settings of the parameters of the particles' velocity update equation on the behaviour of the system. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: submitted to the 7th World Congress on Structural and Multidisciplinary Optimization, COEX Seoul, 21 May - 25 May 2007, Korea. arXiv admin note: substantial text overlap with arXiv:2101.09835

arXiv:2101.09974 [pdf]

Optimal Flexural Design of FRP-Reinforced Concrete Beams Using a Particle Swarm Optimizer

Authors: M. S. Innocente, Ll. Torres, X. Cahís, G. Barbeta, A. Catalán

Abstract: The design of the cross-section of an FRP-reinforced concrete beam is an iterative process of estimating both its dimensions and the reinforcement ratio, followed by the check of the compliance of a number of strength and serviceability constraints. The process continues until a suitable solution is found. Since there are infinite solutions to the problem, it appears convenient to define some opti… ▽ More The design of the cross-section of an FRP-reinforced concrete beam is an iterative process of estimating both its dimensions and the reinforcement ratio, followed by the check of the compliance of a number of strength and serviceability constraints. The process continues until a suitable solution is found. Since there are infinite solutions to the problem, it appears convenient to define some optimality criteria so as to measure the relative goodness of the different solutions. This paper intends to develop a preliminary least-cost section design model that follows the recommendations in the ACI 440.1 R-06, and uses a relatively new artificial intelligence technique called particle swarm optimization (PSO) to handle the optimization tasks. The latter is based on the intelligence that emerges from the low-level interactions among a number of relatively non-intelligent individuals within a population. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Submitted to FRPRCS-8, University of Patras, Patras, Greece, July 16-18, 2007

arXiv:2101.09835 [pdf]

Particle Swarm Optimization: Development of a General-Purpose Optimizer

Authors: Mauro S. Innocente, Johann Sienz

Abstract: Traditional methods present a very restrictive range of applications, mainly limited by the features of the function to be optimized and of the constraint functions. In contrast, evolutionary algorithms present almost no restriction to the features of these functions, although the most appropriate constraint-handling technique is still an open question. The particle swarm optimization (PSO) method… ▽ More Traditional methods present a very restrictive range of applications, mainly limited by the features of the function to be optimized and of the constraint functions. In contrast, evolutionary algorithms present almost no restriction to the features of these functions, although the most appropriate constraint-handling technique is still an open question. The particle swarm optimization (PSO) method is sometimes viewed as another evolutionary algorithm because of their many similarities, despite not being inspired by the same metaphor. Namely, they evolve a population of individuals taking into consideration previous experiences and using stochastic operators to introduce new responses. The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature for decades. While all such advantages are valid when comparing the PSO paradigm to traditional methods, its main advantages with respect to evolutionary algorithms consist of its noticeably lower computational cost and easier implementation. In fact, the plain version can be programmed in a few lines of code, involving no operator design and few parameters to be tuned. This paper deals with three important aspects of the method: the influence of the parameters' tuning on the behaviour of the system; the design of stopping criteria so that the reliability of the solution found can be somehow estimated and computational cost can be saved; and the development of appropriate techniques to handle constraints, given that the original method is designed for unconstrained optimization problems. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: 6th ASMO UK / ISSMO conference. Oxford, 3rd-4th July 2006

arXiv:2012.07968 [pdf, other]

FasteNet: A Fast Railway Fastener Detector

Authors: Jun Jet Tai, Mauro S. Innocente, Owais Mehmood

Abstract: In this work, a novel high-speed railway fastener detector is introduced. This fully convolutional network, dubbed FasteNet, foregoes the notion of bounding boxes and performs detection directly on a predicted saliency map. Fastenet uses transposed convolutions and skip connections, the effective receptive field of the network is 1.5$\times$ larger than the average size of a fastener, enabling the… ▽ More In this work, a novel high-speed railway fastener detector is introduced. This fully convolutional network, dubbed FasteNet, foregoes the notion of bounding boxes and performs detection directly on a predicted saliency map. Fastenet uses transposed convolutions and skip connections, the effective receptive field of the network is 1.5$\times$ larger than the average size of a fastener, enabling the network to make predictions with high confidence, without sacrificing output resolution. In addition, due to the saliency map approach, the network is able to vote for the presence of a fastener up to 30 times per fastener, boosting prediction accuracy. Fastenet is capable of running at 110 FPS on an Nvidia GTX 1080, while taking in inputs of 1600$\times$512 with an average of 14 fasteners per image. Our source is open here: https://github.com/jjshoots/DL\_FasteNet.git △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 8 pages

Showing 1–15 of 15 results for author: Innocente, M S