-
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Authors:
Andrei Kucharavy,
Rachid Guerraoui,
Ljiljana Dolamic
Abstract:
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Arti…
▽ More
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table.
In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD.
We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models.
△ Less
Submitted 20 May, 2023;
originally announced June 2023.
-
Host-Pathongen Co-evolution Inspired Algorithm Enables Robust GAN Training
Authors:
Andrei Kucharavy,
El Mahdi El Mhamdi,
Rachid Guerraoui
Abstract:
Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the ge…
▽ More
Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the generation of impressive imitations of real-life films, images and texts, whose fakeness is barely noticeable to humans. Despite their impressive performance, training GANs remains to this day more of an art than a reliable procedure, in a large part due to training process stability. Generators are susceptible to mode dropping and convergence to random patterns, which have to be mitigated by computationally expensive multiple restarts. Curiously, GANs bear an uncanny similarity to a co-evolution of a pathogen and its host's immune system in biology. In a biological context, the majority of potential pathogens indeed never make it and are kept at bay by the hots' immune system. Yet some are efficient enough to present a risk of a serious condition and recurrent infections. Here, we explore that similarity to propose a more robust algorithm for GANs training. We empirically show the increased stability and a better ability to generate high-quality images while using less computational power.
△ Less
Submitted 9 June, 2020; v1 submitted 22 May, 2020;
originally announced June 2020.
-
Learning to Gather without Communication
Authors:
El Mahdi El Mhamdi,
Rachid Guerraoui,
Alexandre Maurer,
Vladislav Tempez
Abstract:
A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rule…
▽ More
A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
When Neurons Fail
Authors:
El Mahdi El Mhamdi,
Rachid Guerraoui
Abstract:
We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a qu…
▽ More
We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a quantity, we call the \textit{Forward Error Propagation}, capturing how much error is propagated by a neural network when a given number of components is failing, computing this quantity only requires looking at the topology of the network, while experimentally assessing the robustness of a network requires the costly experiment of looking at all the possible inputs and testing all the possible configurations of the network corresponding to different failure situations, facing a discouraging combinatorial explosion.
We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). Interestingly, as we show in the paper, our bound can easily be extended to the case where synapses can fail.
We show how our bound can be leveraged to quantify the effect of memory cost reduction on the accuracy of a neural network, to estimate the amount of information any neuron needs from its preceding layer, enabling thereby a boosting scheme that prevents neurons from waiting for unnecessary signals. We finally discuss the trade-off between neural networks robustness and learning cost.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.