Search | arXiv e-print repository

Bayesian continual learning and forgetting in neural networks

Authors: Djohan Bonnet, Kellian Cottart, Tifenn Hirtzlin, Tarcisius Januel, Thomas Dalgaty, Elisa Vianello, Damien Querlioz

Abstract: Biological synapses effortlessly balance memory retention and flexibility, yet artificial neural networks still struggle with the extremes of catastrophic forgetting and catastrophic remembering. Here, we introduce Metaplasticity from Synaptic Uncertainty (MESU), a Bayesian framework that updates network parameters according their uncertainty. This approach allows a principled combination of learn… ▽ More Biological synapses effortlessly balance memory retention and flexibility, yet artificial neural networks still struggle with the extremes of catastrophic forgetting and catastrophic remembering. Here, we introduce Metaplasticity from Synaptic Uncertainty (MESU), a Bayesian framework that updates network parameters according their uncertainty. This approach allows a principled combination of learning and forgetting that ensures that critical knowledge is preserved while unused or outdated information is gradually released. Unlike standard Bayesian approaches -- which risk becoming overly constrained, and popular continual-learning methods that rely on explicit task boundaries, MESU seamlessly adapts to streaming data. It further provides reliable epistemic uncertainty estimates, allowing out-of-distribution detection, the only computational cost being to sample the weights multiple times to provide proper output statistics. Experiments on image-classification benchmarks demonstrate that MESU mitigates catastrophic forgetting, while maintaining plasticity for new tasks. When training 200 sequential permuted MNIST tasks, MESU outperforms established continual learning techniques in terms of accuracy, capability to learn additional tasks, and out-of-distribution data detection. Additionally, due to its non-reliance on task boundaries, MESU outperforms conventional learning techniques on the incremental training of CIFAR-100 tasks consistently in a wide range of scenarios. Our results unify ideas from metaplasticity, Bayesian inference, and Hessian-based regularization, offering a biologically-inspired pathway to robust, perpetual learning. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2503.06629 [pdf, other]

Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

Authors: Hiroshi Nakano, Krzysztof Blachut, Kamil Jeziorek, Piotr Wzorek, Manon Dampfhoffer, Thomas Mesquida, Hiroaki Nishi, Tomasz Kryjak, Thomas Dalgaty

Abstract: As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions can be made locally using an AI model. However, a hardware-software approach capable of making low-latency predictions with low power consumption is required. In this paper, we present… ▽ More As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions can be made locally using an AI model. However, a hardware-software approach capable of making low-latency predictions with low power consumption is required. In this paper, we present a hardware implementation of an event-graph neural network for time-series classification. We leverage an artificial cochlea model to convert the input time-series signals into a sparse event-data format that allows the event-graph to drastically reduce the number of calculations relative to other AI methods. We implemented the design on a SoC FPGA and applied it to the real-time processing of the Spiking Heidelberg Digits (SHD) dataset to benchmark our approach against competitive solutions. Our method achieves a floating-point accuracy of 92.7% on the SHD dataset for the base model, which is only 2.4% and 2% less than the state-of-the-art models with over 10% and 67% fewer model parameters, respectively. It also outperforms FPGA-based spiking neural network implementations by 19.3% and 4.5%, achieving 92.3% accuracy for the quantised model while using fewer computational resources and reducing latency. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: Paper accepted for the 21st International Symposium on Applied Reconfigurable Computing ARC 2025, Sevilla, Spain, April 9-11, 2025

arXiv:2312.10153 [pdf, other]

Bayesian Metaplasticity from Synaptic Uncertainty

Authors: Djohan Bonnet, Tifenn Hirtzlin, Tarcisius Januel, Thomas Dalgaty, Damien Querlioz, Elisa Vianello

Abstract: Catastrophic forgetting remains a challenge for neural networks, especially in lifelong learning scenarios. In this study, we introduce MEtaplasticity from Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference principles. MESU harnesses synaptic uncertainty to retain information over time, with its update rule closely approximating the diagonal Newton's method for synaptic… ▽ More Catastrophic forgetting remains a challenge for neural networks, especially in lifelong learning scenarios. In this study, we introduce MEtaplasticity from Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference principles. MESU harnesses synaptic uncertainty to retain information over time, with its update rule closely approximating the diagonal Newton's method for synaptic updates. Through continual learning experiments on permuted MNIST tasks, we demonstrate MESU's remarkable capability to maintain learning performance across 100 tasks without the need of explicit task boundaries. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.02771 [pdf, ps, other]

Scaling-up Memristor Monte Carlo with magnetic domain-wall physics

Authors: Thomas Dalgaty, Shogo Yamada, Anca Molnos, Eiji Kawasaki, Thomas Mesquida, François Rummens, Tatsuo Shibata, Yukihiro Urakawa, Yukio Terasaki, Tomoyuki Sasaki, Marc Duranton

Abstract: By exploiting the intrinsic random nature of nanoscale devices, Memristor Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due to multiple algorithmic and device-level limitations, existing demonstrations have been restricted to very small neural network models and datasets. We discuss these limitations, and describe how they can be overcome, by mapping the stochastic gr… ▽ More By exploiting the intrinsic random nature of nanoscale devices, Memristor Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due to multiple algorithmic and device-level limitations, existing demonstrations have been restricted to very small neural network models and datasets. We discuss these limitations, and describe how they can be overcome, by mapping the stochastic gradient Langevin dynamics (SGLD) algorithm onto the physics of magnetic domain-wall Memristors to scale-up MMC models by five orders of magnitude. We propose the push-pull pulse programming method that realises SGLD in-physics, and use it to train a domain-wall based ResNet18 on the CIFAR-10 dataset. On this task, we observe no performance degradation relative to a floating point model down to an update precision of between 6 and 7-bits, indicating we have made a step towards a large-scale edge learning system leveraging noisy analogue devices. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Presented at the 1st workshop on Machine Learning with New Compute Paradigms (MLNCP) at NeurIPS 2023 (New Orleans, USA)

arXiv:2202.05094 [pdf, other]

Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks

Authors: Filippo Moro, E. Esmanhotto, T. Hirtzlin, N. Castellani, A. Trabelsi, T. Dalgaty, G. Molas, F. Andrieu, S. Brivio, S. Spiga, G. Indiveri, M. Payvand, E. Vianello

Abstract: Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we… ▽ More Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we assessed the performance and variability of RRAM-based neuromorphic circuits that were designed and fabricated using a 130\,nm technology node. Based on these results, we propose a Neuromorphic Hardware Calibrated (NHC) SNN, where the learning circuits are calibrated on the measured data. We show that by taking into account the measured heterogeneity characteristics in the off-chip learning phase, the NHC SNN self-corrects its hardware non-idealities and learns to solve benchmark tasks with high accuracy. This work demonstrates how to cope with the heterogeneity of neurons and synapses for increasing classification accuracy in temporal tasks. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Preprint for ISCAS2022

arXiv:2102.07260 [pdf, ps, other]

PCM-trace: Scalable Synaptic Eligibility Traces with Resistivity Drift of Phase-Change Materials

Authors: Yigit Demirag, Filippo Moro, Thomas Dalgaty, Gabriele Navarro, Charlotte Frenkel, Giacomo Indiveri, Elisa Vianello, Melika Payvand

Abstract: Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip lear… ▽ More Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip learning mechanisms.Recently, a new class of three-factor spike-based learning rules have been proposed that can solve the temporal credit assignment problem and approximate the error back-propagation algorithm on complex tasks. However, the efficient implementation of these rules on hybrid CMOS/memristive architectures is still an open challenge. Here we present a new neuromorphic building block,called PCM-trace, which exploits the drift behavior of phase-change materials to implement long lasting eligibility traces, a critical ingredient of three-factor learning rules. We demonstrate how the proposed approach improves the area efficiency by >10X compared to existing solutions and demonstrates a techno-logically plausible learning algorithm supported by experimental data from device measurements △ Less

Submitted 16 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Typos are fixed

arXiv:2001.11426 [pdf, ps, other]

In-situ learning harnessing intrinsic resistive memory variability through Markov Chain Monte Carlo Sampling

Authors: Thomas Dalgaty, Niccolo Castellani, Damien Querlioz, Elisa Vianello

Abstract: Resistive memory technologies promise to be a key component in unlocking the next generation of intelligent in-memory computing systems that can act and learn locally at the edge. However, current approaches to in-memory machine learning focus often on the implementation of models and algorithms which cannot be reconciled with the true, physical properties of resistive memory. Consequently, these… ▽ More Resistive memory technologies promise to be a key component in unlocking the next generation of intelligent in-memory computing systems that can act and learn locally at the edge. However, current approaches to in-memory machine learning focus often on the implementation of models and algorithms which cannot be reconciled with the true, physical properties of resistive memory. Consequently, these properties, in particular cycle-to-cycle conductance variability, are considered as non-idealities that require mitigation. Here by contrast, we embrace these properties by selecting a more appropriate machine learning model and algorithm. We implement a Markov Chain Monte Carlo sampling algorithm within a fabricated array of 16,384 devices, configured as a Bayesian machine learning model. The algorithm is realised in-situ, by exploiting the devices as random variables from the perspective of their cycle-to-cycle conductance variability. We train experimentally the memory array to perform an illustrative supervised learning task as well as a malignant breast tissue recognition task, achieving an accuracy of 96.3%. Then, using a behavioural model of resistive memory calibrated on array level measurements, we apply the same approach to the Cartpole reinforcement learning task. In all cases our proposed approach outperformed software-based neural network models realised using an equivalent number of memory elements. This result lays a foundation for a new path in-memory machine learning, compatible with the true properties of resistive memory technologies, that can bring localised learning capabilities to intelligent edge computing systems. △ Less

Submitted 30 January, 2020; originally announced January 2020.

Showing 1–7 of 7 results for author: Dalgaty, T