-
FeNN: A RISC-V vector processor for Spiking Neural Network acceleration
Authors:
Zainab Aizaz,
James C. Knight,
Thomas Nowotny
Abstract:
Spiking Neural Networks (SNNs) have the potential to drastically reduce the energy requirements of AI systems. However, mainstream accelerators like GPUs and TPUs are designed for the high arithmetic intensity of standard ANNs so are not well-suited to SNN simulation. FPGAs are well-suited to applications with low arithmetic intensity as they have high off-chip memory bandwidth and large amounts o…
▽ More
Spiking Neural Networks (SNNs) have the potential to drastically reduce the energy requirements of AI systems. However, mainstream accelerators like GPUs and TPUs are designed for the high arithmetic intensity of standard ANNs so are not well-suited to SNN simulation. FPGAs are well-suited to applications with low arithmetic intensity as they have high off-chip memory bandwidth and large amounts of on-chip memory. Here, we present a novel RISC-V-based soft vector processor (FeNN), tailored to simulating SNNs on FPGAs. Unlike most dedicated neuromorphic hardware, FeNN is fully programmable and designed to be integrated with applications running on standard computers from the edge to the cloud. We demonstrate that, by using stochastic rounding and saturation, FeNN can achieve high numerical precision with low hardware utilisation and that a single FeNN core can simulate an SNN classifier faster than both an embedded GPU and the Loihi neuromorphic system.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Constructive community race: full-density spiking neural network model drives neuromorphic computing
Authors:
Johanna Senk,
Anno C. Kurth,
Steve Furber,
Tobias Gemmeke,
Bruno Golosio,
Arne Heittmann,
James C. Knight,
Eric Müller,
Tobias Noll,
Thomas Nowotny,
Gorka Peraza Coppola,
Luca Peres,
Oliver Rhodes,
Andrew Rowley,
Johannes Schemmel,
Tim Stadtmann,
Tom Tetzlaff,
Gianmarco Tiddia,
Sacha J. van Albada,
José Villamar,
Markus Diesmann
Abstract:
The local circuitry of the mammalian brain is a focus of the search for generic computational principles because it is largely conserved across species and modalities. In 2014 a model was proposed representing all neurons and synapses of the stereotypical cortical microcircuit below $1\,\text{mm}^2$ of brain surface. The model reproduces fundamental features of brain activity but its impact remain…
▽ More
The local circuitry of the mammalian brain is a focus of the search for generic computational principles because it is largely conserved across species and modalities. In 2014 a model was proposed representing all neurons and synapses of the stereotypical cortical microcircuit below $1\,\text{mm}^2$ of brain surface. The model reproduces fundamental features of brain activity but its impact remained limited because of its computational demands. For theory and simulation, however, the model was a breakthrough because it removes uncertainties of downscaling, and larger models are less densely connected. This sparked a race in the neuromorphic computing community and the model became a de facto standard benchmark. Within a few years real-time performance was reached and surpassed at significantly reduced energy consumption. We review how the computational challenge was tackled by different simulation technologies and derive guidelines for the next generation of benchmarks and other domains of science.
△ Less
Submitted 2 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Eventprop training for efficient neuromorphic applications
Authors:
Thomas Shoesmith,
James C. Knight,
Balázs Mészáros,
Jonathan Timcheck,
Thomas Nowotny
Abstract:
Neuromorphic computing can reduce the energy requirements of neural networks and holds the promise to `repatriate' AI workloads back from the cloud to the edge. However, training neural networks on neuromorphic hardware has remained elusive. Here, we instead present a pipeline for training spiking neural networks on GPUs, using the efficient event-driven Eventprop algorithm implemented in mlGeNN,…
▽ More
Neuromorphic computing can reduce the energy requirements of neural networks and holds the promise to `repatriate' AI workloads back from the cloud to the edge. However, training neural networks on neuromorphic hardware has remained elusive. Here, we instead present a pipeline for training spiking neural networks on GPUs, using the efficient event-driven Eventprop algorithm implemented in mlGeNN, and deploying them on Intel's Loihi 2 neuromorphic chip. Our benchmarking on keyword spotting tasks indicates that there is almost no loss in accuracy between GPU and Loihi 2 implementations and that classifying a sample on Loihi 2 is up to 10X faster and uses 200X less energy than on an NVIDIA Jetson Orin Nano.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Efficient Event-based Delay Learning in Spiking Neural Networks
Authors:
Balázs Mészáros,
James C. Knight,
Thomas Nowotny
Abstract:
Spiking Neural Networks (SNNs) compute using sparse communication and are attracting increased attention as a more energy-efficient alternative to traditional Artificial Neural Networks~(ANNs). While standard ANNs are stateless, spiking neurons are stateful and hence intrinsically recurrent, making them well-suited for spatio-temporal tasks. However, the duration of this intrinsic memory is limite…
▽ More
Spiking Neural Networks (SNNs) compute using sparse communication and are attracting increased attention as a more energy-efficient alternative to traditional Artificial Neural Networks~(ANNs). While standard ANNs are stateless, spiking neurons are stateful and hence intrinsically recurrent, making them well-suited for spatio-temporal tasks. However, the duration of this intrinsic memory is limited by synaptic and membrane time constants. Delays are a powerful additional mechanism and, in this paper, we propose a novel event-based training method for SNNs with delays, grounded in the EventProp formalism which enables the calculation of exact gradients with respect to weights and delays. Our method supports multiple spikes per neuron and, to the best of our knowledge, is the first delay learning algorithm to be applied to recurrent SNNs. We evaluate our method on a simple sequence detection task, as well as the Yin-Yang, Spiking Heidelberg Digits, Spiking Speech Commands and Braille letter reading datasets, demonstrating that our algorithm can optimise delays from suboptimal initial conditions and enhance classification accuracy compared to architectures without delays. We also find that recurrent delays are particularly beneficial in small networks. Finally, we show that our approach uses less than half the memory of the current state-of-the-art delay-learning method and is up to 26x faster.
△ Less
Submitted 26 June, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Learning Delays Through Gradients and Structure: Emergence of Spatiotemporal Patterns in Spiking Neural Networks
Authors:
Balázs Mészáros,
James Knight,
Thomas Nowotny
Abstract:
We present a Spiking Neural Network (SNN) model that incorporates learnable synaptic delays through two approaches: per-synapse delay learning via Dilated Convolutions with Learnable Spacings (DCLS) and a dynamic pruning strategy that also serves as a form of delay learning. In the latter approach, the network dynamically selects and prunes connections, optimizing the delays in sparse connectivity…
▽ More
We present a Spiking Neural Network (SNN) model that incorporates learnable synaptic delays through two approaches: per-synapse delay learning via Dilated Convolutions with Learnable Spacings (DCLS) and a dynamic pruning strategy that also serves as a form of delay learning. In the latter approach, the network dynamically selects and prunes connections, optimizing the delays in sparse connectivity settings. We evaluate both approaches on the Raw Heidelberg Digits keyword spotting benchmark using Backpropagation Through Time with surrogate gradients.
Our analysis of the spatio-temporal structure of synaptic interactions reveals that, after training, excitation and inhibition group together in space and time. Notably, the dynamic pruning approach, which employs DEEP R for connection removal and RigL for reconnection, not only preserves these spatio-temporal patterns but outperforms per-synapse delay learning in sparse networks.
Our results demonstrate the potential of combining delay learning with dynamic pruning to develop efficient SNN models for temporal data processing. Moreover, the preservation of spatio-temporal dynamics throughout pruning and rewiring highlights the robustness of these features, providing a solid foundation for future neuromorphic computing applications.
△ Less
Submitted 8 November, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks
Authors:
Kade M. Heckel,
Thomas Nowotny
Abstract:
As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effective…
▽ More
As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effectiveness, these powerful networks often incur high execution costs in production environments. Neuromorphic computing, inspired by biological neural processes, offers a promising alternative. By utilizing temporally-sparse computations, Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint. However, the training of SNNs can be challenging due to their recurrent nature which cannot as easily leverage the massive parallelism of modern AI accelerators. To facilitate the investigation of SNN architectures and dynamics researchers have sought to bridge Python-based deep learning frameworks such as PyTorch or TensorFlow with custom-implemented compute kernels. This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX. By pre-staging data in the expansive vRAM of contemporary accelerators and employing extensive JIT compilation, Spyx allows for SNN optimization to be executed as a unified, low-level program on NVIDIA GPUs or Google TPUs. This approach achieves optimal hardware utilization, surpassing the performance of many existing SNN training frameworks while maintaining considerable flexibility.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Authors:
Jason Yik,
Korneel Van den Berghe,
Douwe den Blanken,
Younes Bouhadjar,
Maxime Fabre,
Paul Hueber,
Weijie Ke,
Mina A Khoei,
Denis Kleyko,
Noah Pacik-Nelson,
Alessandro Pierro,
Philipp Stratmann,
Pao-Sheng Vincent Sun,
Guangzhi Tang,
Shenqi Wang,
Biyan Zhou,
Soikat Hasan Ahmed,
George Vathakkattil Joseph,
Benedetto Leto,
Aurora Micheli,
Anurag Kumar Mishra,
Gregor Lenz,
Tao Sun,
Zergham Ahmed,
Mahmoud Akl
, et al. (75 additional authors not shown)
Abstract:
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu…
▽ More
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of researchers across industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we outline tasks and guidelines for benchmarks across multiple application domains, and present initial performance baselines across neuromorphic and conventional approaches for both benchmark tracks. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.
△ Less
Submitted 14 January, 2025; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Learning efficient backprojections across cortical hierarchies in real time
Authors:
Kevin Max,
Laura Kriener,
Garibaldi Pineda García,
Thomas Nowotny,
Ismael Jaras,
Walter Senn,
Mihai A. Petrovici
Abstract:
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths.
We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered…
▽ More
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths.
We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning.
Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.
△ Less
Submitted 2 February, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Loss shaping enhances exact gradient learning with Eventprop in spiking neural networks
Authors:
Thomas Nowotny,
James P. Turner,
James C. Knight
Abstract:
Event-based machine learning promises more energy-efficient AI on future neuromorphic hardware. Here, we investigate how the recently discovered Eventprop algorithm for gradient descent on exact gradients in spiking neural networks can be scaled up to challenging keyword recognition benchmarks. We implemented Eventprop in the GPU-enhanced Neural Networks framework and used it for training recurren…
▽ More
Event-based machine learning promises more energy-efficient AI on future neuromorphic hardware. Here, we investigate how the recently discovered Eventprop algorithm for gradient descent on exact gradients in spiking neural networks can be scaled up to challenging keyword recognition benchmarks. We implemented Eventprop in the GPU-enhanced Neural Networks framework and used it for training recurrent spiking neural networks on the Spiking Heidelberg Digits and Spiking Speech Commands datasets. We found that learning depended strongly on the loss function and extended Eventprop to a wider class of loss functions to enable effective training. We then tested a large number of data augmentations and regularisations as well as exploring different network structures; and heterogeneous and trainable timescales. We found that when combined with two specific augmentations, the right regularisation and a delay line input, Eventprop networks with one recurrent layer achieved state-of-the-art performance on Spiking Heidelberg Digits and good accuracy on Spiking Speech Commands. In comparison to a leading surrogate-gradient-based SNN training method, our GeNN Eventprop implementation is 3X faster and uses 4X less memory. This work is a significant step towards a low-power neuromorphic alternative to current machine learning paradigms.
△ Less
Submitted 31 January, 2025; v1 submitted 2 December, 2022;
originally announced December 2022.
-
Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)
Authors:
Naresh Balaji,
Esin Yavuz,
Thomas Nowotny
Abstract:
Simulation of spiking neural networks has been traditionally done on high-performance supercomputers or large-scale clusters. Utilizing the parallel nature of neural network computation algorithms, GeNN (GPU Enhanced Neural Network) provides a simulation environment that performs on General Purpose NVIDIA GPUs with a code generation based approach. GeNN allows the users to design and simulate neur…
▽ More
Simulation of spiking neural networks has been traditionally done on high-performance supercomputers or large-scale clusters. Utilizing the parallel nature of neural network computation algorithms, GeNN (GPU Enhanced Neural Network) provides a simulation environment that performs on General Purpose NVIDIA GPUs with a code generation based approach. GeNN allows the users to design and simulate neural networks by specifying the populations of neurons at different stages, their synapse connection densities and the model of individual neurons. In this report we describe work on how to scale synaptic weights based on the configuration of the user-defined network to ensure sufficient spiking and subsequent effective learning. We also discuss optimization strategies particular to GPU computing: sparse representation of synapse connections and occupancy based block-size determination.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
ICT System Design & Implementation Using Wireless Sensors to Support Elderly In-home Assistance
Authors:
Thomas J. Lampoltshammer,
Thomas Nowotny,
Stefan Plank
Abstract:
Around the globe the number of older people in relation to the rest is constantly growing. As a result, medical and care facilities cannot handle the growing number of patients. Therefore, elderly in-home assistance gets more attention an importance. Due to issues regarding memory, physical strength and reduced self-assessment, old people face a lot of challenges in accomplishing their activities…
▽ More
Around the globe the number of older people in relation to the rest is constantly growing. As a result, medical and care facilities cannot handle the growing number of patients. Therefore, elderly in-home assistance gets more attention an importance. Due to issues regarding memory, physical strength and reduced self-assessment, old people face a lot of challenges in accomplishing their activities of daily living. This thesis is meant to address these problems by analysing the required infrastructure of a home-care facility as well as the arising issues regarding used components, especially wireless sensors. After the analysis, a prototype of a home-care system is designed and implemented. Furthermore, the issue of energy consumption of the used wireless sensor node is addressed by modifying the intelligence of the used sensor. After that, the design and components of the prototype used for the energy consumption analysis is explained, together with the programming structure of the sensor nodes used in this thesis. Thereupon, the results are of the simulations are discussed and compared with the authors' expectations. Finally the overall outcomes of the thesis are analysed and summed up, followed by a short outlook of further possible improvements and developments.
△ Less
Submitted 2 March, 2013;
originally announced March 2013.