-
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
Authors:
Jan Finkbeiner,
Thomas Gmeinder,
Mark Pupilli,
Alexander Titterton,
Emre Neftci
Abstract:
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full pow…
▽ More
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today's prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Intelligence Processing Units Accelerate Neuromorphic Learning
Authors:
Pao-Sheng Vincent Sun,
Alexander Titterton,
Anjlee Gopiani,
Tim Santos,
Arindam Basu,
Wei D. Lu,
Jason K. Eshraghian
Abstract:
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking netwo…
▽ More
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking networks. The emergence of Graphcore's Intelligence Processing Units (IPUs) balances the parallelized nature of deep learning workloads with the sequential, reusable, and sparsified nature of operations prevalent when training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by running individual processing threads on smaller data blocks, which is a natural fit for the sequential, non-vectorized steps required to solve spiking neuron dynamical state equations. We present an IPU-optimized release of our custom SNN Python package, snnTorch, which exploits fine-grained parallelism by utilizing low-level, pre-compiled custom operations to accelerate irregular and sparse data access patterns that are characteristic of training SNN workloads. We provide a rigorous performance assessment across a suite of commonly used spiking neuron models, and propose methods to further reduce training run-time via half-precision training. By amortizing the cost of sequential processing into vectorizable population codes, we ultimately demonstrate the potential for integrating domain-specific accelerators with the next generation of neural networks.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Studying the potential of Graphcore IPUs for applications in Particle Physics
Authors:
Lakshan Ram Madhan Mohan,
Alexander Marshall,
Samuel Maddrell-Mander,
Daniel O'Hanlon,
Konstantinos Petridis,
Jonas Rademacker,
Victoria Rege,
Alexander Titterton
Abstract:
This paper presents the first study of Graphcore's Intelligence Processing Unit (IPU) in the context of particle physics applications. The IPU is a new type of processor optimised for machine learning. Comparisons are made for neural-network-based event simulation, multiple-scattering correction, and flavour tagging, implemented on IPUs, GPUs and CPUs, using a variety of neural network architectur…
▽ More
This paper presents the first study of Graphcore's Intelligence Processing Unit (IPU) in the context of particle physics applications. The IPU is a new type of processor optimised for machine learning. Comparisons are made for neural-network-based event simulation, multiple-scattering correction, and flavour tagging, implemented on IPUs, GPUs and CPUs, using a variety of neural network architectures and hyperparameters. Additionally, a Kálmán filter for track reconstruction is implemented on IPUs and GPUs. The results indicate that IPUs hold considerable promise in addressing the rapidly increasing compute needs in particle physics.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Exploring Sensitivity to NMSSM Signatures with Low Missing Transverse Energy at the LHC
Authors:
A. Titterton,
U. Ellwanger,
H. U. Flaecher,
S. Moretti,
C. H. Shepherd-Themistocleous
Abstract:
We examine scenarios in the Next-to-Minimal Supersymmetric Standard Model (NMSSM), where pair-produced squarks and gluinos decay via two cascades, each ending in a stable neutralino as Lightest Supersymmetric Particle (LSP) and a Standard Model (SM)-like Higgs boson, with mass spectra such that the missing transverse energy, $E_{T}^{\text{miss}}$, is very small. Performing two-dimensional paramete…
▽ More
We examine scenarios in the Next-to-Minimal Supersymmetric Standard Model (NMSSM), where pair-produced squarks and gluinos decay via two cascades, each ending in a stable neutralino as Lightest Supersymmetric Particle (LSP) and a Standard Model (SM)-like Higgs boson, with mass spectra such that the missing transverse energy, $E_{T}^{\text{miss}}$, is very small. Performing two-dimensional parameter scans and focusing on the hadronic $H\rightarrow b\bar{b}$ decay giving a $b\bar{b}b\bar{b} + E_{T}^{\text{miss}}$ final state we explore the sensitivity of a current LHC general-purpose jets+$E_{T}^{\text{miss}}$ analysis to such scenarios.
△ Less
Submitted 28 September, 2018; v1 submitted 27 July, 2018;
originally announced July 2018.