-
Bayesian continual learning and forgetting in neural networks
Authors:
Djohan Bonnet,
Kellian Cottart,
Tifenn Hirtzlin,
Tarcisius Januel,
Thomas Dalgaty,
Elisa Vianello,
Damien Querlioz
Abstract:
Biological synapses effortlessly balance memory retention and flexibility, yet artificial neural networks still struggle with the extremes of catastrophic forgetting and catastrophic remembering. Here, we introduce Metaplasticity from Synaptic Uncertainty (MESU), a Bayesian framework that updates network parameters according their uncertainty. This approach allows a principled combination of learn…
▽ More
Biological synapses effortlessly balance memory retention and flexibility, yet artificial neural networks still struggle with the extremes of catastrophic forgetting and catastrophic remembering. Here, we introduce Metaplasticity from Synaptic Uncertainty (MESU), a Bayesian framework that updates network parameters according their uncertainty. This approach allows a principled combination of learning and forgetting that ensures that critical knowledge is preserved while unused or outdated information is gradually released. Unlike standard Bayesian approaches -- which risk becoming overly constrained, and popular continual-learning methods that rely on explicit task boundaries, MESU seamlessly adapts to streaming data. It further provides reliable epistemic uncertainty estimates, allowing out-of-distribution detection, the only computational cost being to sample the weights multiple times to provide proper output statistics. Experiments on image-classification benchmarks demonstrate that MESU mitigates catastrophic forgetting, while maintaining plasticity for new tasks. When training 200 sequential permuted MNIST tasks, MESU outperforms established continual learning techniques in terms of accuracy, capability to learn additional tasks, and out-of-distribution data detection. Additionally, due to its non-reliance on task boundaries, MESU outperforms conventional learning techniques on the incremental training of CIFAR-100 tasks consistently in a wide range of scenarios. Our results unify ideas from metaplasticity, Bayesian inference, and Hessian-based regularization, offering a biologically-inspired pathway to robust, perpetual learning.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA
Authors:
Hiroshi Nakano,
Krzysztof Blachut,
Kamil Jeziorek,
Piotr Wzorek,
Manon Dampfhoffer,
Thomas Mesquida,
Hiroaki Nishi,
Tomasz Kryjak,
Thomas Dalgaty
Abstract:
As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions can be made locally using an AI model. However, a hardware-software approach capable of making low-latency predictions with low power consumption is required. In this paper, we present…
▽ More
As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions can be made locally using an AI model. However, a hardware-software approach capable of making low-latency predictions with low power consumption is required. In this paper, we present a hardware implementation of an event-graph neural network for time-series classification. We leverage an artificial cochlea model to convert the input time-series signals into a sparse event-data format that allows the event-graph to drastically reduce the number of calculations relative to other AI methods. We implemented the design on a SoC FPGA and applied it to the real-time processing of the Spiking Heidelberg Digits (SHD) dataset to benchmark our approach against competitive solutions. Our method achieves a floating-point accuracy of 92.7% on the SHD dataset for the base model, which is only 2.4% and 2% less than the state-of-the-art models with over 10% and 67% fewer model parameters, respectively. It also outperforms FPGA-based spiking neural network implementations by 19.3% and 4.5%, achieving 92.3% accuracy for the quantised model while using fewer computational resources and reducing latency.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Bayesian Metaplasticity from Synaptic Uncertainty
Authors:
Djohan Bonnet,
Tifenn Hirtzlin,
Tarcisius Januel,
Thomas Dalgaty,
Damien Querlioz,
Elisa Vianello
Abstract:
Catastrophic forgetting remains a challenge for neural networks, especially in lifelong learning scenarios. In this study, we introduce MEtaplasticity from Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference principles. MESU harnesses synaptic uncertainty to retain information over time, with its update rule closely approximating the diagonal Newton's method for synaptic…
▽ More
Catastrophic forgetting remains a challenge for neural networks, especially in lifelong learning scenarios. In this study, we introduce MEtaplasticity from Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference principles. MESU harnesses synaptic uncertainty to retain information over time, with its update rule closely approximating the diagonal Newton's method for synaptic updates. Through continual learning experiments on permuted MNIST tasks, we demonstrate MESU's remarkable capability to maintain learning performance across 100 tasks without the need of explicit task boundaries.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Scaling-up Memristor Monte Carlo with magnetic domain-wall physics
Authors:
Thomas Dalgaty,
Shogo Yamada,
Anca Molnos,
Eiji Kawasaki,
Thomas Mesquida,
François Rummens,
Tatsuo Shibata,
Yukihiro Urakawa,
Yukio Terasaki,
Tomoyuki Sasaki,
Marc Duranton
Abstract:
By exploiting the intrinsic random nature of nanoscale devices, Memristor Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due to multiple algorithmic and device-level limitations, existing demonstrations have been restricted to very small neural network models and datasets. We discuss these limitations, and describe how they can be overcome, by mapping the stochastic gr…
▽ More
By exploiting the intrinsic random nature of nanoscale devices, Memristor Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due to multiple algorithmic and device-level limitations, existing demonstrations have been restricted to very small neural network models and datasets. We discuss these limitations, and describe how they can be overcome, by mapping the stochastic gradient Langevin dynamics (SGLD) algorithm onto the physics of magnetic domain-wall Memristors to scale-up MMC models by five orders of magnitude. We propose the push-pull pulse programming method that realises SGLD in-physics, and use it to train a domain-wall based ResNet18 on the CIFAR-10 dataset. On this task, we observe no performance degradation relative to a floating point model down to an update precision of between 6 and 7-bits, indicating we have made a step towards a large-scale edge learning system leveraging noisy analogue devices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks
Authors:
Filippo Moro,
E. Esmanhotto,
T. Hirtzlin,
N. Castellani,
A. Trabelsi,
T. Dalgaty,
G. Molas,
F. Andrieu,
S. Brivio,
S. Spiga,
G. Indiveri,
M. Payvand,
E. Vianello
Abstract:
Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we…
▽ More
Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we assessed the performance and variability of RRAM-based neuromorphic circuits that were designed and fabricated using a 130\,nm technology node. Based on these results, we propose a Neuromorphic Hardware Calibrated (NHC) SNN, where the learning circuits are calibrated on the measured data. We show that by taking into account the measured heterogeneity characteristics in the off-chip learning phase, the NHC SNN self-corrects its hardware non-idealities and learns to solve benchmark tasks with high accuracy. This work demonstrates how to cope with the heterogeneity of neurons and synapses for increasing classification accuracy in temporal tasks.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
PCM-trace: Scalable Synaptic Eligibility Traces with Resistivity Drift of Phase-Change Materials
Authors:
Yigit Demirag,
Filippo Moro,
Thomas Dalgaty,
Gabriele Navarro,
Charlotte Frenkel,
Giacomo Indiveri,
Elisa Vianello,
Melika Payvand
Abstract:
Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip lear…
▽ More
Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip learning mechanisms.Recently, a new class of three-factor spike-based learning rules have been proposed that can solve the temporal credit assignment problem and approximate the error back-propagation algorithm on complex tasks. However, the efficient implementation of these rules on hybrid CMOS/memristive architectures is still an open challenge. Here we present a new neuromorphic building block,called PCM-trace, which exploits the drift behavior of phase-change materials to implement long lasting eligibility traces, a critical ingredient of three-factor learning rules. We demonstrate how the proposed approach improves the area efficiency by >10X compared to existing solutions and demonstrates a techno-logically plausible learning algorithm supported by experimental data from device measurements
△ Less
Submitted 16 February, 2021; v1 submitted 14 February, 2021;
originally announced February 2021.
-
In-situ learning harnessing intrinsic resistive memory variability through Markov Chain Monte Carlo Sampling
Authors:
Thomas Dalgaty,
Niccolo Castellani,
Damien Querlioz,
Elisa Vianello
Abstract:
Resistive memory technologies promise to be a key component in unlocking the next generation of intelligent in-memory computing systems that can act and learn locally at the edge. However, current approaches to in-memory machine learning focus often on the implementation of models and algorithms which cannot be reconciled with the true, physical properties of resistive memory. Consequently, these…
▽ More
Resistive memory technologies promise to be a key component in unlocking the next generation of intelligent in-memory computing systems that can act and learn locally at the edge. However, current approaches to in-memory machine learning focus often on the implementation of models and algorithms which cannot be reconciled with the true, physical properties of resistive memory. Consequently, these properties, in particular cycle-to-cycle conductance variability, are considered as non-idealities that require mitigation. Here by contrast, we embrace these properties by selecting a more appropriate machine learning model and algorithm. We implement a Markov Chain Monte Carlo sampling algorithm within a fabricated array of 16,384 devices, configured as a Bayesian machine learning model. The algorithm is realised in-situ, by exploiting the devices as random variables from the perspective of their cycle-to-cycle conductance variability. We train experimentally the memory array to perform an illustrative supervised learning task as well as a malignant breast tissue recognition task, achieving an accuracy of 96.3%. Then, using a behavioural model of resistive memory calibrated on array level measurements, we apply the same approach to the Cartpole reinforcement learning task. In all cases our proposed approach outperformed software-based neural network models realised using an equivalent number of memory elements. This result lays a foundation for a new path in-memory machine learning, compatible with the true properties of resistive memory technologies, that can bring localised learning capabilities to intelligent edge computing systems.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.