-
Neural Signal Compression using RAMAN tinyML Accelerator for BCI Applications
Authors:
Adithya Krishna,
Sohan Debnath,
André van Schaik,
Mahesh Mehendale,
Chetan Singh Thakur
Abstract:
High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. Ho…
▽ More
High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. However, transmitting raw neural data presents significant challenges due to limited communication bandwidth and resultant excessive heating. To address this challenge, we propose a neural signal compression scheme utilizing Convolutional Autoencoders (CAEs), which achieves a compression ratio of up to 150 for compressing local field potentials (LFPs). The CAE encoder section is implemented on RAMAN, an energy-efficient tinyML accelerator designed for edge computing, and subsequently deployed on an Efinix Ti60 FPGA with 37.3k LUTs and 8.6k register utilization. RAMAN leverages sparsity in activation and weights through zero skipping, gating, and weight compression techniques. Additionally, we employ hardware-software co-optimization by pruning CAE encoder model parameters using a hardware-aware balanced stochastic pruning strategy, resolving workload imbalance issues and eliminating indexing overhead to reduce parameter storage requirements by up to 32.4%. Using the proposed compact depthwise separable convolutional autoencoder (DS-CAE) model, the compressed neural data from RAMAN is reconstructed offline with superior signal-to-noise and distortion ratios (SNDR) of 22.6 dB and 27.4 dB, along with R2 scores of 0.81 and 0.94, respectively, evaluated on two monkey neural recordings.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras
Authors:
Satyapreet Singh Yadav,
Nirupam Roy,
Chetan Singh Thakur
Abstract:
Conventional frame-based cameras often struggle with limited dynamic range, leading to saturation and loss of detail when capturing scenes with significant brightness variations. Neuromorphic cameras, inspired by human retina, offer a solution by providing an inherently high dynamic range. This capability enables them to capture both bright and faint celestial objects without saturation effects, p…
▽ More
Conventional frame-based cameras often struggle with limited dynamic range, leading to saturation and loss of detail when capturing scenes with significant brightness variations. Neuromorphic cameras, inspired by human retina, offer a solution by providing an inherently high dynamic range. This capability enables them to capture both bright and faint celestial objects without saturation effects, preserving details across a wide range of luminosities. This paper investigates the application of neuromorphic imaging technology for capturing celestial bodies across a wide range of flux levels. Its advantages are demonstrated through examples such as the bright planet Saturn with its faint moons and the bright star Sirius A alongside its faint companion, Sirius B.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits
Authors:
Satyapreet Singh Yadav,
Bikram Pradhan,
Kenil Rajendrabhai Ajudiya,
T. S. Kumar,
Nirupam Roy,
Andre Van Schaik,
Chetan Singh Thakur
Abstract:
To deepen our understanding of optical astronomy, we must advance imaging technology to overcome conventional frame-based cameras' limited dynamic range and temporal resolution. Our Perspective paper examines how neuromorphic cameras can effectively address these challenges. Drawing inspiration from the human retina, neuromorphic cameras excel in speed and high dynamic range by utilizing asynchron…
▽ More
To deepen our understanding of optical astronomy, we must advance imaging technology to overcome conventional frame-based cameras' limited dynamic range and temporal resolution. Our Perspective paper examines how neuromorphic cameras can effectively address these challenges. Drawing inspiration from the human retina, neuromorphic cameras excel in speed and high dynamic range by utilizing asynchronous pixel operation and logarithmic photocurrent conversion, making them highly effective for celestial imaging. We use 1300 mm terrestrial telescope to demonstrate the neuromorphic camera's ability to simultaneously capture faint and bright celestial sources while preventing saturation effects. We illustrate its photometric capabilities through aperture photometry of a star field with faint stars. Detection of the faint gas cloud structure of the Trapezium cluster during a full moon night highlights the camera's high dynamic range, effectively mitigating static glare from lunar illumination. Our investigations also include detecting meteorite passing near the Moon and Earth, as well as imaging satellites and anthropogenic debris with exceptionally high temporal resolution using a 200mm telescope. Our observations show the immense potential of neuromorphic cameras in advancing astronomical optical imaging and pushing the boundaries of observational astronomy.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Neuromorphic Retina: An FPGA-based Emulator
Authors:
Prince Philip,
Pallab Kumar Nath,
Kapil Jainwal,
Andre van Schaik,
Chetan Singh Thakur
Abstract:
Implementing accurate models of the retina is a challenging task, particularly in the context of creating visual prosthetics and devices. Notwithstanding the presence of diverse artificial renditions of the retina, the imperative task persists to pursue a more realistic model. In this work, we are emulating a neuromorphic retina model on an FPGA. The key feature of this model is its powerful adapt…
▽ More
Implementing accurate models of the retina is a challenging task, particularly in the context of creating visual prosthetics and devices. Notwithstanding the presence of diverse artificial renditions of the retina, the imperative task persists to pursue a more realistic model. In this work, we are emulating a neuromorphic retina model on an FPGA. The key feature of this model is its powerful adaptation to luminance and contrast, which allows it to accurately emulate the sensitivity of the biological retina to changes in light levels. Phasic and tonic cells are realizable in the retina in the simplest way possible. Our FPGA implementation of the proposed biologically inspired digital retina, incorporating a receptive field with a center-surround structure, is reconfigurable and can support 128*128 pixel images at a frame rate of 200fps. It consumes 1720 slices, approximately 3.7k Look-Up Tables (LUTs), and Flip-Flops (FFs) on the FPGA. This implementation provides a high-performance, low-power, and small-area solution and could be a significant step forward in the development of biologically plausible retinal prostheses with enhanced information processing capabilities
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs
Authors:
Ankita Nandi,
Krishil Gandhi,
Mahendra Pratap Singh,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Diverse computing paradigms have emerged to meet the growing needs for intelligent energy-efficient systems. The Margin Propagation (MP) framework, being one such initiative in the analog computing domain, stands out due to its scalability across biasing conditions, temperatures, and diminishing process technology nodes. However, the lack of digital-like automation tools for designing analog syste…
▽ More
Diverse computing paradigms have emerged to meet the growing needs for intelligent energy-efficient systems. The Margin Propagation (MP) framework, being one such initiative in the analog computing domain, stands out due to its scalability across biasing conditions, temperatures, and diminishing process technology nodes. However, the lack of digital-like automation tools for designing analog systems (including that of MP analog) hinders their adoption for designing large systems. The inherent scalability and modularity of MP systems present a unique opportunity in this regard. This paper introduces KALAM (toolKit for Automating high-Level synthesis of Analog computing systeMs), which leverages factor graphs as the foundational paradigm for synthesizing MP-based analog computing systems. Factor graphs are the basis of various signal processing tasks and, when coupled with MP, can be used to design scalable and energy-efficient analog signal processors. Using Python scripting language, the KALAM automation flow translates an input factor graph to its equivalent SPICE-compatible circuit netlist that can be used to validate the intended functionality. KALAM also allows the integration of design optimization strategies such as precision tuning, variable elimination, and mathematical simplification. We demonstrate KALAM's versatility for tasks such as Bayesian inference, Low-Density Parity Check (LDPC) decoding, and Artificial Neural Networks (ANN). Simulation results of the netlists align closely with software implementations, affirming the efficacy of our proposed automation tool.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Low-latency machine learning FPGA accelerator for multi-qubit-state discrimination
Authors:
Pradeep Kumar Gautam,
Shantharam Kalipatnapu,
Shankaranarayanan H,
Ujjawal Singhal,
Benjamin Lienhard,
Vibhor Singh,
Chetan Singh Thakur
Abstract:
Measuring a qubit state is a fundamental yet error-prone operation in quantum computing. These errors can arise from various sources, such as crosstalk, spontaneous state transitions, and excitations caused by the readout pulse. Here, we utilize an integrated approach to deploy neural networks onto field-programmable gate arrays (FPGA). We demonstrate that implementing a fully connected neural net…
▽ More
Measuring a qubit state is a fundamental yet error-prone operation in quantum computing. These errors can arise from various sources, such as crosstalk, spontaneous state transitions, and excitations caused by the readout pulse. Here, we utilize an integrated approach to deploy neural networks onto field-programmable gate arrays (FPGA). We demonstrate that implementing a fully connected neural network accelerator for multi-qubit readout is advantageous, balancing computational complexity with low latency requirements without significant loss in accuracy. The neural network is implemented by quantizing weights, activation functions, and inputs. The hardware accelerator performs frequency-multiplexed readout of five superconducting qubits in less than 50 ns on a radio frequency system on chip (RFSoC) ZCU111 FPGA, marking the advent of RFSoC-based low-latency multi-qubit readout using neural networks. These modules can be implemented and integrated into existing quantum control and readout platforms, making the RFSoC ZCU111 ready for experimental deployment.
△ Less
Submitted 14 August, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
EventF2S: Asynchronous and Sparse Spiking AER Framework using Neuromorphic-Friendly Algorithm
Authors:
Lakshmi Annamalai,
Chetan Singh Thakur
Abstract:
Bio-inspired Address Event Representation (AER) sensors have attracted significant popularity owing to their low power consumption, high sparsity, and high temporal resolution. Spiking Neural Network (SNN) has become the inherent choice for AER data processing. However, the integration of the AER-SNN paradigm has not adequately explored asynchronous processing, neuromorphic compatibility, and spar…
▽ More
Bio-inspired Address Event Representation (AER) sensors have attracted significant popularity owing to their low power consumption, high sparsity, and high temporal resolution. Spiking Neural Network (SNN) has become the inherent choice for AER data processing. However, the integration of the AER-SNN paradigm has not adequately explored asynchronous processing, neuromorphic compatibility, and sparse spiking, which are the key requirements of resource-constrained applications. To address this gap, we introduce a brain-inspired AER-SNN object recognition solution, which includes a data encoder integrated with a First-To-Spike recognition network. Being fascinated by the functionality of neurons in the visual cortex, we designed the solution to be asynchronous and compatible with neuromorphic hardware. Furthermore, we have adapted the principle of denoising and First-To-Spike coding to achieve optimal spike signaling, significantly reducing computation costs. Experimental evaluation has demonstrated that the proposed method incurs significantly less computation cost to achieve state-of-the-art competitive accuracy. Overall, the proposed solution offers an asynchronous and cost-effective AER recognition system that harnesses the full potential of AER sensors.
△ Less
Submitted 28 January, 2024;
originally announced February 2024.
-
Margin Propagation based XOR-SAT Solvers for Decoding of LDPC Codes
Authors:
Ankita Nandi,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Decoding of Low-Density Parity Check (LDPC) codes can be viewed as a special case of XOR-SAT problems, for which low-computational complexity bit-flipping algorithms have been proposed in the literature. However, a performance gap exists between the bit-flipping LDPC decoding algorithms and the benchmark LDPC decoding algorithms, such as the Sum-Product Algorithm (SPA). In this paper, we propose a…
▽ More
Decoding of Low-Density Parity Check (LDPC) codes can be viewed as a special case of XOR-SAT problems, for which low-computational complexity bit-flipping algorithms have been proposed in the literature. However, a performance gap exists between the bit-flipping LDPC decoding algorithms and the benchmark LDPC decoding algorithms, such as the Sum-Product Algorithm (SPA). In this paper, we propose an XOR-SAT solver using log-sum-exponential functions and demonstrate its advantages for LDPC decoding. This is then approximated using the Margin Propagation formulation to attain a low-complexity LDPC decoder. The proposed algorithm uses soft information to decide the bit-flips that maximize the number of parity check constraints satisfied over an optimization function. The proposed solver can achieve results that are within $0.1$dB of the Sum-Product Algorithm for the same number of code iterations. It is also at least 10x lesser than other Gradient-Descent Bit Flipping decoding algorithms, which are also bit-flipping algorithms based on optimization functions. The approximation using the Margin Propagation formulation does not require any multipliers, resulting in significantly lower computational complexity than other soft-decision Bit-Flipping LDPC decoders.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge
Authors:
Adithya Krishna,
Srikanth Rohit Nudurupati,
Chandana D G,
Pritesh Dwivedi,
André van Schaik,
Mahesh Mehendale,
Chetan Singh Thakur
Abstract:
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this paper, we present RAMAN, a Re-configurable and spArse tinyML Accelerator f…
▽ More
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this paper, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency. RAMAN can be configured to support a wide range of DNN topologies - consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy vs power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results and compare the same with the state-of-the-art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
Neuromorphic Computing with AER using Time-to-Event-Margin Propagation
Authors:
Madhuvanthi Srivatsav R,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Address-Event-Representation (AER) is a spike-routing protocol that allows the scaling of neuromorphic and spiking neural network (SNN) architectures to a size that is comparable to that of digital neural network architectures. However, in conventional neuromorphic architectures, the AER protocol and, in general, any virtual interconnect plays only a passive role in computation, i.e., only for rou…
▽ More
Address-Event-Representation (AER) is a spike-routing protocol that allows the scaling of neuromorphic and spiking neural network (SNN) architectures to a size that is comparable to that of digital neural network architectures. However, in conventional neuromorphic architectures, the AER protocol and, in general, any virtual interconnect plays only a passive role in computation, i.e., only for routing spikes and events. In this paper, we show how causal temporal primitives like delay, triggering, and sorting inherent in the AER protocol itself can be exploited for scalable neuromorphic computing using our proposed technique called Time-to-Event Margin Propagation (TEMP). The proposed TEMP-based AER architecture is fully asynchronous and relies on interconnect delays for memory and computing as opposed to conventional and local multiply-and-accumulate (MAC) operations. We show that the time-based encoding in the TEMP neural network produces a spatio-temporal representation that can encode a large number of discriminatory patterns. As a proof-of-concept, we show that a trained TEMP-based convolutional neural network (CNN) can demonstrate an accuracy greater than 99% on the MNIST dataset. Overall, our work is a biologically inspired computing paradigm that brings forth a new dimension of research to the field of neuromorphic computing.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Multiplierless In-filter Computing for tinyML Platforms
Authors:
Abhishek Ramdas Nair,
Pallab Kumar Nath,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter…
▽ More
Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter acoustic classification using Margin Propagation (MP) approximation used in low-power edge devices deployable in remote areas with limited connectivity. The entire design of this classification framework is based on template-based kernel machine, which include feature extraction and inference, and uses basic primitives like addition/subtraction, shift, and comparator operations, for hardware implementation. Unlike full precision training methods for traditional classification, we use MP-based approximation for training, including backpropagation mitigating approximation errors. The proposed framework is general enough for acoustic classification. However, we demonstrate the hardware friendliness of this framework by implementing a parallel Finite Impulse Response (FIR) filter bank in a kernel machine classifier optimized for a Field Programmable Gate Array (FPGA). The FIR filter acts as the feature extractor and non-linear kernel for the kernel machine implemented using MP approximation and a downsampling method to reduce the order of the filters. The FPGA implementation on Spartan 7 shows that the MP-approximated in-filter kernel machine is more efficient than traditional classification frameworks with just less than 1K slices.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Theoretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate
Authors:
Lakshmi Annamalai,
Chetan Singh Thakur
Abstract:
Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Maki…
▽ More
Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the gap of analytically relating the effect of \textbf{BN} to the regularization parameter is still missing. This paper aims to bring out the data-dependent auto-tuning of the regularization parameter by \textbf{BN} with analytical proofs. We have posed \textbf{BN} as a constrained optimization imposed on non-\textbf{BN} weights through which we demonstrate its data statistics dependant auto-tuning of regularization parameter. We have also given analytical proof for its behavior under a noisy input scenario, which reveals the signal vs. noise tuning of the regularization parameter. We have also substantiated our claim with empirical results from the MNIST dataset experiments.
△ Less
Submitted 18 October, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Process, Bias and Temperature Scalable CMOS Analog Computing Circuits for Machine Learning
Authors:
Pratik Kumar,
Ankita Nandi,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Analog computing is attractive compared to digital computing due to its potential for achieving higher computational density and higher energy efficiency. However, unlike digital circuits, conventional analog computing circuits cannot be easily mapped across different process nodes due to differences in transistor biasing regimes, temperature variations and limited dynamic range. In this work, we…
▽ More
Analog computing is attractive compared to digital computing due to its potential for achieving higher computational density and higher energy efficiency. However, unlike digital circuits, conventional analog computing circuits cannot be easily mapped across different process nodes due to differences in transistor biasing regimes, temperature variations and limited dynamic range. In this work, we generalize the previously reported margin-propagation-based analog computing framework for designing novel \textit{shape-based analog computing} (S-AC) circuits that can be easily cross-mapped across different process nodes. Similar to digital designs S-AC designs can also be scaled for precision, speed, and power. As a proof-of-concept, we show several examples of S-AC circuits implementing mathematical functions that are commonly used in machine learning (ML) architectures. Using circuit simulations we demonstrate that the circuit input/output characteristics remain robust when mapped from a planar CMOS 180nm process to a FinFET 7nm process. Also, using benchmark datasets we demonstrate that the classification accuracy of a S-AC based neural network remains robust when mapped across the two processes and to changes in temperature.
△ Less
Submitted 4 January, 2023; v1 submitted 11 May, 2022;
originally announced May 2022.
-
SQ-CARS: A Scalable Quantum Control and Readout System
Authors:
Ujjawal Singhal,
Shantharam Kalipatnapu,
Pradeep Kumar Gautam,
Sourav Majumder,
Vaibhav Venkata Lakshmi Pabbisetty,
Srivatsava Jandhyala,
Vibhor Singh,
Chetan Singh Thakur
Abstract:
Qubits are the basic building blocks of a quantum processor which require electromagnetic pulses in giga hertz frequency range and latency in nanoseconds for control and readout. In this paper, we address three main challenges associated with room temperature electronics used for controlling and measuring superconducting qubits: scalability, direct microwave synthesis, and a unified user interface…
▽ More
Qubits are the basic building blocks of a quantum processor which require electromagnetic pulses in giga hertz frequency range and latency in nanoseconds for control and readout. In this paper, we address three main challenges associated with room temperature electronics used for controlling and measuring superconducting qubits: scalability, direct microwave synthesis, and a unified user interface. To tackle these challenges, we have developed SQ-CARS, a system based on the ZCU111 evaluation kit. SQ-CARS is designed to be scalable, configurable, and phase synchronized, providing multi-qubit control and readout capabilities. The system offers an interactive Python framework, making it user-friendly. Scalability to a larger number of qubits is achieved by deterministic synchronization of multiple channels. The system supports direct synthesis of arbitrary vector microwave pulses using the second-Nyquist zone technique, from 4 to 9 GHz. It also features on-board data processing like tunable low pass filters and configurable rotation blocks, enabling lock-in detection and low-latency active feedback for quantum experiments. All control and readout features are accessible through an on-board Python framework. To validate the performance of SQ-CARS, we conducted various time-domain measurements to characterize a superconducting transmon qubit. Our results were compared against traditional setups commonly used in similar experiments. With deterministic synchronisation of control and readout channels, and an open-source approach for programming, SQ-CARS paves the way for advanced experiments with superconducting qubits.
△ Less
Submitted 6 August, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Bias-Scalable Near-Memory CMOS Analog Processor for Machine Learning
Authors:
Pratik Kumar,
Ankita Nandi,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
Bias-scalable analog computing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For instance, ML implementations for server workloads are focused on higher computational throughput for faster training, whereas ML implementations for edge devices are focused on energy-efficient inference. In this paper, we demonstrate the implementation…
▽ More
Bias-scalable analog computing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For instance, ML implementations for server workloads are focused on higher computational throughput for faster training, whereas ML implementations for edge devices are focused on energy-efficient inference. In this paper, we demonstrate the implementation of bias-scalable approximate analog computing circuits using the generalization of the margin-propagation principle called shape-based analog computing (S-AC). The resulting S-AC core integrates several near-memory compute elements, which include: (a) non-linear activation functions; (b) inner-product compute circuits; and (c) a mixed-signal compressive memory, all of which can be scaled for performance or power while preserving its functionality. Using measured results from prototypes fabricated in a 180nm CMOS process, we demonstrate that the performance of computing modules remains robust to transistor biasing and variations in temperature. In this paper, we also demonstrate the effect of bias-scalability and computational accuracy on a simple ML regression task.
△ Less
Submitted 4 January, 2023; v1 submitted 10 February, 2022;
originally announced February 2022.
-
In-filter Computing For Designing Ultra-light Acoustic Pattern Recognizers
Authors:
Abhishek Ramdas Nair,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of…
▽ More
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine (SVM). The result of this integration is a template-based SVM whose memory and computational footprint (training and inference) is light enough to be implemented on an FPGA-based IoT platform. While the proposed in-filter computing framework is general enough, in this paper, we demonstrate this concept using a Cascade of Asymmetric Resonator with Inner Hair Cells (CAR-IHC) based acoustic feature extraction algorithm. The complete system has been optimized using time-multiplexing and parallel-pipeline techniques for a Xilinx Spartan 7 series Field Programmable Gate Array (FPGA). We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only ~ 1.5k Look-Up Tables (LUTs) and ~ 2.8k Flip-Flops (FFs), a significant improvement over other approaches.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices
Authors:
Abhishek Ramdas Nair,
Pallab Kumar Nath,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based in…
▽ More
We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations.
△ Less
Submitted 9 September, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Event-LSTM: An Unsupervised and Asynchronous Learning-based Representation for Event-based Data
Authors:
Lakshmi Annamalai,
Vignesh Ramanathan,
Chetan Singh Thakur
Abstract:
Event cameras are activity-driven bio-inspired vision sensors, thereby resulting in advantages such as sparsity,high temporal resolution, low latency, and power consumption. Given the different sensing modality of event camera and high quality of conventional vision paradigm, event processing is predominantly solved by transforming the sparse and asynchronous events into 2D grid and subsequently a…
▽ More
Event cameras are activity-driven bio-inspired vision sensors, thereby resulting in advantages such as sparsity,high temporal resolution, low latency, and power consumption. Given the different sensing modality of event camera and high quality of conventional vision paradigm, event processing is predominantly solved by transforming the sparse and asynchronous events into 2D grid and subsequently applying standard vision pipelines. Despite the promising results displayed by supervised learning approaches in 2D grid generation, these approaches treat the task in supervised manner. Labeled task specific ground truth event data is challenging to acquire. To overcome this limitation, we propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers as a promising alternative to learn 2D grid representation from event sequence. Compared to competing supervised approaches, ours is a task-agnostic approach ideally suited for the event domain, where task specific labeled data is scarce. We also tailor the proposed solution to exploit asynchronous nature of event stream, which gives it desirable charateristics such as speed invariant and energy-efficient 2D grid generation. Besides, we also push state-of-the-art event de-noising forward by introducing memory into the de-noising process. Evaluations on activity recognition and gesture recognition demonstrate that our approach yields improvement over state-of-the-art approaches, while providing the flexibilty to learn from unlabelled data.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Source localization using particle filtering on FPGA for robotic navigation with imprecise binary measurement
Authors:
Adithya Krishna,
André van Schaik,
Chetan Singh Thakur
Abstract:
Particle filtering is a recursive Bayesian estimation technique that has gained popularity recently for tracking and localization applications. It uses Monte Carlo simulation and has proven to be a very reliable technique to model non-Gaussian and non-linear elements of physical systems. Particle filters outperform various other traditional filters like Kalman filters in non-Gaussian and non-linea…
▽ More
Particle filtering is a recursive Bayesian estimation technique that has gained popularity recently for tracking and localization applications. It uses Monte Carlo simulation and has proven to be a very reliable technique to model non-Gaussian and non-linear elements of physical systems. Particle filters outperform various other traditional filters like Kalman filters in non-Gaussian and non-linear settings due to their non-analytical and non-parametric nature. However, a significant drawback of particle filters is their computational complexity, which inhibits their use in real-time applications with conventional CPU or DSP based implementation schemes. This paper proposes a modification to the existing particle filter algorithm and presents a highspeed and dedicated hardware architecture. The architecture incorporates pipelining and parallelization in the design to reduce execution time considerably. The design is validated for a source localization problem wherein we estimate the position of a source in real-time using the particle filter algorithm implemented on hardware. The validation setup relies on an Unmanned Ground Vehicle (UGV) with a photodiode housing on top to sense and localize a light source. We have prototyped the design using Artix-7 field-programmable gate array (FPGA), and resource utilization for the proposed system is presented. Further, we show the execution time and estimation accuracy of the high-speed architecture and observe a significant reduction in computational time. Our implementation of particle filters on FPGA is scalable and modular, with a low execution time of about 5.62 us for processing 1024 particles and can be deployed for real-time applications.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
A Neuromorphic Proto-Object Based Dynamic Visual Saliency Model with an FPGA Implementation
Authors:
Jamal Lottier Molin,
Chetan Singh Thakur,
Ralph Etienne-Cummings,
Ernst Niebur
Abstract:
The ability to attend to salient regions of a visual scene is an innate and necessary preprocessing step for both biological and engineered systems performing high-level visual tasks (e.g. object detection, tracking, and classification). Computational efficiency, in regard to processing bandwidth and speed, is improved by only devoting computational resources to salient regions of the visual stimu…
▽ More
The ability to attend to salient regions of a visual scene is an innate and necessary preprocessing step for both biological and engineered systems performing high-level visual tasks (e.g. object detection, tracking, and classification). Computational efficiency, in regard to processing bandwidth and speed, is improved by only devoting computational resources to salient regions of the visual stimuli. In this paper, we first present a neuromorphic, bottom-up, dynamic visual saliency model based on the notion of proto-objects. This is achieved by incorporating the temporal characteristics of the visual stimulus into the model, similarly to the manner in which early stages of the human visual system extracts temporal information. This neuromorphic model outperforms state-of-the-art dynamic visual saliency models in predicting human eye fixations on a commonly used video dataset with associated eye tracking data. Secondly, for this model to have practical applications, it must be capable of performing its computations in real-time under low-power, small-size, and lightweight constraints. To address this, we introduce a Field-Programmable Gate Array implementation of the model on an Opal Kelly 7350 Kintex-7 board. This novel hardware implementation allows for processing of up to 23.35 frames per second running on a 100 MHz clock - better than 26x speedup from the software implementation.
△ Less
Submitted 11 April, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Real-Time Object Detection and Localization in Compressive Sensed Video on Embedded Hardware
Authors:
Yeshwanth Ravi Theja Bethi,
Sathyaprakash Narayanan,
Venkat Rangan,
Chetan Singh Thakur
Abstract:
Every day around the world, interminable terabytes of data are being captured for surveillance purposes. A typical 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive results in signal processing by reducing the sampling band…
▽ More
Every day around the world, interminable terabytes of data are being captured for surveillance purposes. A typical 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive results in signal processing by reducing the sampling bandwidth. Different sampling mechanisms were developed to incorporate compressive sensing in image and video acquisition. Pixel-wise coded exposure is one among the promising sensing paradigms for capturing videos in the compressed domain, which was also realized into an all-CMOS sensor \cite{Xiong2017}. Though cameras that perform compressive sensing save a lot of bandwidth at the time of sampling and minimize the memory required to store videos, we cannot do much in terms of processing until the videos are reconstructed to the original frames. But, the reconstruction of compressive-sensed (CS) videos still takes a lot of time and is also computationally expensive. In this work, we show that object detection and localization can be possible directly on the CS frames (easily upto 20x compression). To our knowledge, this is the first time that the problem of object detection and localization on CS frames has been attempted. Hence, we also created a dataset for training in the CS domain. We were able to achieve a good accuracy of 46.27\% mAP(Mean Average Precision) with the proposed model with an inference time of 23ms directly on the compressed frames(approx. 20 original domain frames), this facilitated for real-time inference which was verified on NVIDIA TX2 embedded board. Our framework will significantly reduce the communication bandwidth, and thus reduction in power as the video compression will be done at the image sensor processing core.
△ Less
Submitted 18 April, 2021; v1 submitted 18 December, 2019;
originally announced December 2019.
-
EvAn: Neuromorphic Event-based Anomaly Detection
Authors:
Lakshmi Annamalai,
Anirban Chakraborty,
Chetan Singh Thakur
Abstract:
Event-based cameras are bio-inspired novel sensors that asynchronously record changes in illumination in the form of events, thus resulting in significant advantages over conventional cameras in terms of low power utilization, high dynamic range, and no motion blur. Moreover, such cameras, by design, encode only the relative motion between the scene and the sensor (and not the static background) t…
▽ More
Event-based cameras are bio-inspired novel sensors that asynchronously record changes in illumination in the form of events, thus resulting in significant advantages over conventional cameras in terms of low power utilization, high dynamic range, and no motion blur. Moreover, such cameras, by design, encode only the relative motion between the scene and the sensor (and not the static background) to yield a very sparse data structure, which can be utilized for various motion analytics tasks. In this paper, for the first time in event data analytics community, we leverage these advantages of an event camera towards a critical vision application - video anomaly detection. We propose to model the motion dynamics in the event domain with dual discriminator conditional Generative adversarial Network (cGAN) built on state-of-the-art architectures. To adapt event data for using as input to cGAN, we also put forward a deep learning solution to learn a novel representation of event data, which retains the sparsity of the data as well as encode the temporal information readily available from these sensors. Since there is no existing dataset for anomaly detection in event domain, we also provide an anomaly detection event dataset with an exhaustive set of anomalies. Careful analysis reveals that the proposed method results in huge reduction in computational complexity as compared to previous state-of-the-art conventional anomaly detection networks.
△ Less
Submitted 15 February, 2020; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Unlimited Dynamic Range Analog-to-Digital Conversion
Authors:
Adithya Krishna,
Sunil Rudresh,
Vishal Shaw,
Hemanth Reddy Sabbella,
Chandra Sekhar Seelamantula,
Chetan Singh Thakur
Abstract:
Analog-to-digital converters (ADCs) provide the link between continuous-time signals and their discrete-time counterparts, and the Shannon-Nyquist sampling theorem provides the mathematical foundation. Real-world signals have a variable amplitude range, whereas ADCs, by design, have a limited input dynamic range, which results in out-of-range signals getting clipped. In this paper, we propose an u…
▽ More
Analog-to-digital converters (ADCs) provide the link between continuous-time signals and their discrete-time counterparts, and the Shannon-Nyquist sampling theorem provides the mathematical foundation. Real-world signals have a variable amplitude range, whereas ADCs, by design, have a limited input dynamic range, which results in out-of-range signals getting clipped. In this paper, we propose an unlimited dynamic range ADC (UDR-ADC) that is based on the modulo operation (self-reset feature) to alleviate the problem of clipping. The self-reset feature allows for wrapping of the input amplitudes, which preserves the input dynamic range. We present the signal model and a reconstruction technique to recover the original signal samples from the modulo measurements. We validate the operation of the proposed ADC using circuit simulations in 65 nm complementary metal-oxide-semiconductor (CMOS) process technology. The validation is supplemented by a hardware prototype designed using discrete components. A performance assessment in terms of area, power requirement, and the signal-to-quantization-noise ratio (SQNR) shows that the UDR-ADC outperforms the standard ones.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Multiplierless and Sparse Machine Learning based on Margin Propagation Networks
Authors:
Nazreen P. M.,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
The new generation of machine learning processors have evolved from multi-core and parallel architectures that were designed to efficiently implement matrix-vector-multiplications (MVMs). This is because at the fundamental level, neural network and machine learning operations extensively use MVM operations and hardware compilers exploit the inherent parallelism in MVM operations to achieve hardwar…
▽ More
The new generation of machine learning processors have evolved from multi-core and parallel architectures that were designed to efficiently implement matrix-vector-multiplications (MVMs). This is because at the fundamental level, neural network and machine learning operations extensively use MVM operations and hardware compilers exploit the inherent parallelism in MVM operations to achieve hardware acceleration on GPUs and FPGAs. However, many IoT and edge computing platforms require embedded ML devices close to the network in order to compensate for communication cost and latency. Hence a natural question to ask is whether MVM operations are even necessary to implement ML algorithms and whether simpler hardware primitives can be used to implement an ultra-energy-efficient ML processor/architecture. In this paper we propose an alternate hardware-software codesign of ML and neural network architectures where instead of using MVM operations and non-linear activation functions, the architecture only uses simple addition and thresholding operations to implement inference and learning. At the core of the proposed approach is margin-propagation (MP) based computation that maps multiplications into additions and additions into a dynamic rectifying-linear-unit (ReLU) operations. This mapping results in significant improvement in computational and hence energy cost. In this paper, we show how the MP network formulation can be applied for designing linear classifiers, shallow multi-layer perceptrons and support vector networks suitable fot IoT platforms and tiny ML applications. We show that these MP based classifiers give comparable results to that of their traditional counterparts for benchmark UCI datasets, with the added advantage of reduction in computational complexity enabling an improvement in energy efficiency.
△ Less
Submitted 5 November, 2020; v1 submitted 5 October, 2019;
originally announced October 2019.
-
A closed-loop all-electronic pixel-wise adaptive imaging system for high dynamic range video
Authors:
Jie,
Zhang,
Jonathan P. Newman,
Xiao Wang,
Chetan Singh Thakur,
John Rattray,
Ralph Etienne-Cummings,
Matthew A. Wilson
Abstract:
We demonstrated a CMOS imaging system that adapts each pixel's exposure and sampling rate to capture high dynamic range (HDR) videos. The system consist of a custom designed image sensor with pixel-wise exposure configurability and a real-time pixel exposure controller. These parts operate in a closed-loop to sample, detect and optimize each pixel's exposure and sampling rate to minimize local reg…
▽ More
We demonstrated a CMOS imaging system that adapts each pixel's exposure and sampling rate to capture high dynamic range (HDR) videos. The system consist of a custom designed image sensor with pixel-wise exposure configurability and a real-time pixel exposure controller. These parts operate in a closed-loop to sample, detect and optimize each pixel's exposure and sampling rate to minimize local region's underexposure, overexposure and motion blurring. Exposure control is implemented using all-integrated electronics without external optical modulation. This reduces overall system size and power consumption.
The image sensor is implemented using a standard 130nm CMOS process while the exposure controller is implemented on a computer. We performed experiments under complex lighting and motion condition to test performance of the system, and demonstrate the benefit of pixel-wise adaptive imaging on the performance of computer vision tasks such as segmentation, motion estimation and object recognition.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
A Compressive Sensing Video dataset using Pixel-wise coded exposure
Authors:
Sathyaprakash Narayanan,
Yeshwanth Bethi,
Chetan Singh Thakur
Abstract:
Manifold amount of video data gets generated every minute as we read this document, ranging from surveillance to broadcasting purposes. There are two roadblocks that restrain us from using this data as such, first being the storage which restricts us from only storing the information based on the hardware constraints. Secondly, the computation required to process this data is highly expensive whic…
▽ More
Manifold amount of video data gets generated every minute as we read this document, ranging from surveillance to broadcasting purposes. There are two roadblocks that restrain us from using this data as such, first being the storage which restricts us from only storing the information based on the hardware constraints. Secondly, the computation required to process this data is highly expensive which makes it infeasible to work on them. Compressive sensing(CS)[2] is a signal process technique[11], through optimization, the sparsity of a signal can be exploited to recover it from far fewer samples than required by the Shannon-Nyquist sampling theorem. There are two conditions under which recovery is possible. The first one is sparsity which requires the signal to be sparse in some domain. The second one is incoherence which is applied through the isometric property which is sufficient for sparse signals[9][10]. To sustain these characteristics, preserving all attributes in the uncompressed domain would help any kind of in this field. However, existing dataset fallback in terms of continuous tracking of all the object present in the scene, very few video datasets have comprehensive continuous tracking of objects. To address these problems collectively, in this work we propose a new comprehensive video dataset, where the data is compressed using pixel-wise coded exposure [3] that resolves various other impediments.
△ Less
Submitted 8 July, 2019; v1 submitted 24 May, 2019;
originally announced May 2019.
-
A high-performance MoS2 synaptic device with floating gate engineering for Neuromorphic Computing
Authors:
Tathagata Paul,
Tanweer Ahmed,
Krishna Kanhaiya Tiwari,
Chetan Singh Thakur,
Arindam Ghosh
Abstract:
As one of the most important members of the two dimensional chalcogenide family, molybdenum disulphide (MoS2) has played a fundamental role in the advancement of low dimensional electronic, optoelectronic and piezoelectric designs. Here, we demonstrate a new approach to solid state synaptic transistors using two dimensional MoS2 floating gate memories. By using an extended floating gate architectu…
▽ More
As one of the most important members of the two dimensional chalcogenide family, molybdenum disulphide (MoS2) has played a fundamental role in the advancement of low dimensional electronic, optoelectronic and piezoelectric designs. Here, we demonstrate a new approach to solid state synaptic transistors using two dimensional MoS2 floating gate memories. By using an extended floating gate architecture which allows the device to be operated at near-ideal subthreshold swing of 77 mV/decade over four decades of drain current, we have realised a charge tunneling based synaptic memory with performance comparable to the state of the art in neuromorphic designs. The device successfully demonstrates various features of a biological synapse, including pulsed potentiation and relaxation of channel conductance, as well as spike time dependent plasticity (STDP). Our device returns excellent energy efficiency figures and provides a robust platform based on ultrathin two dimensional nanosheets for future neuromorphic applications.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines
Authors:
P. Kumar,
A. R. Nair,
O. Chatterjee,
T. Paul,
A. Ghosh,
S. Chakrabartty,
C. S. Thakur
Abstract:
This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardw…
▽ More
This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardware implementation of the same is also discussed, which utilizes novel low power memtransistor based cross-bar architecture, and is robust to device mismatch and randomness. We used memtransistor measurement data, and showed that the designed SVMs can achieve classification accuracy comparable to traditional SVMs on both synthetic and real-world benchmark datasets. This framework would be beneficial for design of SVM based wake-up systems for internet of things (IoTs) and edge devices where memtransistors can be used to optimize system's energy-efficiency and perform in-memory matrix-vector multiplication (MVM).
△ Less
Submitted 29 May, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Large-Scale Neuromorphic Spiking Array Processors: A quest to mimic the brain
Authors:
Chetan Singh Thakur,
Jamal Molin,
Gert Cauwenberghs,
Giacomo Indiveri,
Kundan Kumar,
Ning Qiao,
Johannes Schemmel,
Runchun Wang,
Elisabetta Chicca,
Jennifer Olson Hasler,
Jae-sun Seo,
Shimeng Yu,
Yu Cao,
André van Schaik,
Ralph Etienne-Cummings
Abstract:
Neuromorphic engineering (NE) encompasses a diverse range of approaches to information processing that are inspired by neurobiological systems, and this feature distinguishes neuromorphic systems from conventional computing systems. The brain has evolved over billions of years to solve difficult engineering problems by using efficient, parallel, low-power computation. The goal of NE is to design s…
▽ More
Neuromorphic engineering (NE) encompasses a diverse range of approaches to information processing that are inspired by neurobiological systems, and this feature distinguishes neuromorphic systems from conventional computing systems. The brain has evolved over billions of years to solve difficult engineering problems by using efficient, parallel, low-power computation. The goal of NE is to design systems capable of brain-like computation. Numerous large-scale neuromorphic projects have emerged recently. This interdisciplinary field was listed among the top 10 technology breakthroughs of 2014 by the MIT Technology Review and among the top 10 emerging technologies of 2015 by the World Economic Forum. NE has two-way goals: one, a scientific goal to understand the computational properties of biological neural systems by using models implemented in integrated circuits (ICs); second, an engineering goal to exploit the known properties of biological systems to design and implement efficient devices for engineering applications. Building hardware neural emulators can be extremely useful for simulating large-scale neural models to explain how intelligent behavior arises in the brain. The principle advantages of neuromorphic emulators are that they are highly energy efficient, parallel and distributed, and require a small silicon area. Thus, compared to conventional CPUs, these neuromorphic emulators are beneficial in many engineering applications such as for the porting of deep learning algorithms for various recognitions tasks. In this review article, we describe some of the most significant neuromorphic spiking emulators, compare the different architectures and approaches used by them, illustrate their advantages and drawbacks, and highlight the capabilities that each can deliver to neural modelers.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
An FPGA-based Massively Parallel Neuromorphic Cortex Simulator
Authors:
Runchun Wang,
Chetan Singh Thakur,
Andre van Schaik
Abstract:
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structura…
▽ More
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
A Stochastic Approach to STDP
Authors:
Runchun Wang,
Chetan Singh Thakur,
Tara Julia Hamilton,
Jonathan Tapson,
André van Schaik
Abstract:
We present a digital implementation of the Spike Timing Dependent Plasticity (STDP) learning rule. The proposed digital implementation consists of an exponential decay generator array and a STDP adaptor array. On the arrival of a pre- and post-synaptic spike, the STDP adaptor will send a digital spike to the decay generator. The decay generator will then generate an exponential decay, which will b…
▽ More
We present a digital implementation of the Spike Timing Dependent Plasticity (STDP) learning rule. The proposed digital implementation consists of an exponential decay generator array and a STDP adaptor array. On the arrival of a pre- and post-synaptic spike, the STDP adaptor will send a digital spike to the decay generator. The decay generator will then generate an exponential decay, which will be used by the STDP adaptor to perform the weight adaption. The exponential decay, which is computational expensive, is efficiently implemented by using a novel stochastic approach, which we analyse and characterise here. We use a time multiplexing approach to achieve 8192 (8k) virtual STDP adaptors and decay generators with only one physical implementation of each. We have validated our stochastic STDP approach with measurement results of a balanced excitation/inhibition experiment. Our stochastic approach is ideal for implementing the STDP learning rule in large-scale spiking neural networks running in real time.
△ Less
Submitted 13 March, 2016;
originally announced March 2016.
-
A Reconfigurable Mixed-signal Implementation of a Neuromorphic ADC
Authors:
Ying Xu,
Chetan Singh Thakur,
Tara Julia Hamilton,
Jonathan Tapson,
Runchun Wang,
Andre van Schaik
Abstract:
We present a neuromorphic Analogue-to-Digital Converter (ADC), which uses integrate-and-fire (I&F) neurons as the encoders of the analogue signal, with modulated inhibitions to decohere the neuronal spikes trains. The architecture consists of an analogue chip and a control module. The analogue chip comprises two scan chains and a twodimensional integrate-and-fire neuronal array. Individual neurons…
▽ More
We present a neuromorphic Analogue-to-Digital Converter (ADC), which uses integrate-and-fire (I&F) neurons as the encoders of the analogue signal, with modulated inhibitions to decohere the neuronal spikes trains. The architecture consists of an analogue chip and a control module. The analogue chip comprises two scan chains and a twodimensional integrate-and-fire neuronal array. Individual neurons are accessed via the chains one by one without any encoder decoder or arbiter. The control module is implemented on an FPGA (Field Programmable Gate Array), which sends scan enable signals to the scan chains and controls the inhibition for individual neurons. Since the control module is implemented on an FPGA, it can be easily reconfigured. Additionally, we propose a pulse width modulation methodology for the lateral inhibition, which makes use of different pulse widths indicating different strengths of inhibition for each individual neuron to decohere neuronal spikes. Software simulations in this paper tested the robustness of the proposed ADC architecture to fixed random noise. A circuit simulation using ten neurons shows the performance and the feasibility of the architecture.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
A compact aVLSI conductance-based silicon neuron
Authors:
Runchun Wang,
Chetan Singh Thakur,
Tara Julia Hamilton,
Jonathan Tapson,
Andre van Schaik
Abstract:
We present an analogue Very Large Scale Integration (aVLSI) implementation that uses first-order lowpass filters to implement a conductance-based silicon neuron for high-speed neuromorphic systems. The aVLSI neuron consists of a soma (cell body) and a single synapse, which is capable of linearly summing both the excitatory and inhibitory postsynaptic potentials (EPSP and IPSP) generated by the spi…
▽ More
We present an analogue Very Large Scale Integration (aVLSI) implementation that uses first-order lowpass filters to implement a conductance-based silicon neuron for high-speed neuromorphic systems. The aVLSI neuron consists of a soma (cell body) and a single synapse, which is capable of linearly summing both the excitatory and inhibitory postsynaptic potentials (EPSP and IPSP) generated by the spikes arriving from different sources. Rather than biasing the silicon neuron with different parameters for different spiking patterns, as is typically done, we provide digital control signals, generated by an FPGA, to the silicon neuron to obtain different spiking behaviours. The proposed neuron is only ~26.5 um2 in the IBM 130nm process and thus can be integrated at very high density. Circuit simulations show that this neuron can emulate different spiking behaviours observed in biological neurons.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
A neuromorphic hardware architecture using the Neural Engineering Framework for pattern recognition
Authors:
Runchun Wang,
Chetan Singh Thakur,
Tara Julia Hamilton,
Jonathan Tapson,
Andre van Schaik
Abstract:
We present a hardware architecture that uses the Neural Engineering Framework (NEF) to implement large-scale neural networks on Field Programmable Gate Arrays (FPGAs) for performing pattern recognition in real time. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks. We will first present the architecture of the proposed neural network implemented usi…
▽ More
We present a hardware architecture that uses the Neural Engineering Framework (NEF) to implement large-scale neural networks on Field Programmable Gate Arrays (FPGAs) for performing pattern recognition in real time. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks. We will first present the architecture of the proposed neural network implemented using fixed-point numbers and demonstrate a routine that computes the decoding weights by using the online pseudoinverse update method (OPIUM) in a parallel and distributed manner. The proposed system is efficiently implemented on a compact digital neural core. This neural core consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. As a proof of concept, we combined 128 identical neural cores together to build a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture is not limited to handwriting recognition, but is generally applicable as an extremely fast pattern recognition processor for various kinds of patterns such as speech and images.
△ Less
Submitted 20 July, 2015;
originally announced July 2015.
-
A Trainable Neuromorphic Integrated Circuit that Exploits Device Mismatch
Authors:
Chetan Singh Thakur,
Runchun Wang,
Tara Julia Hamilton,
Jonathan Tapson,
Andre van Schaik
Abstract:
Random device mismatch that arises as a result of scaling of the CMOS (complementary metal-oxide semi-conductor) technology into the deep submicron regime degrades the accuracy of analogue circuits. Methods to combat this increase the complexity of design. We have developed a novel neuromorphic system called a Trainable Analogue Block (TAB), which exploits device mismatch as a means for random pro…
▽ More
Random device mismatch that arises as a result of scaling of the CMOS (complementary metal-oxide semi-conductor) technology into the deep submicron regime degrades the accuracy of analogue circuits. Methods to combat this increase the complexity of design. We have developed a novel neuromorphic system called a Trainable Analogue Block (TAB), which exploits device mismatch as a means for random projections of the input to a higher dimensional space. The TAB framework is inspired by the principles of neural population coding operating in the biological nervous system. Three neuronal layers, namely input, hidden, and output, constitute the TAB framework, with the number of hidden layer neurons far exceeding the input layer neurons. Here, we present measurement results of the first prototype TAB chip built using a 65nm process technology and show its learning capability for various regression tasks. Our TAB chip exploits inherent randomness and variability arising due to the fabrication process to perform various learning tasks. Additionally, we characterise each neuron and discuss the statistical variability of its tuning curve that arises due to random device mismatch, a desirable property for the learning capability of the TAB. We also discuss the effect of the number of hidden neurons and the resolution of output weights on the accuracy of the learning capability of the TAB.
△ Less
Submitted 10 July, 2015;
originally announced July 2015.
-
An Online Learning Algorithm for Neuromorphic Hardware Implementation
Authors:
Chetan Singh Thakur,
Runchun Wang,
Saeed Afshar,
Gregory Cohen,
Tara Julia Hamilton,
Jonathan Tapson,
Andre van Schaik
Abstract:
We propose a sign-based online learning (SOL) algorithm for a neuromorphic hardware framework called Trainable Analogue Block (TAB). The TAB framework utilises the principles of neural population coding, implying that it encodes the input stimulus using a large pool of nonlinear neurons. The SOL algorithm is a simple weight update rule that employs the sign of the hidden layer activation and the s…
▽ More
We propose a sign-based online learning (SOL) algorithm for a neuromorphic hardware framework called Trainable Analogue Block (TAB). The TAB framework utilises the principles of neural population coding, implying that it encodes the input stimulus using a large pool of nonlinear neurons. The SOL algorithm is a simple weight update rule that employs the sign of the hidden layer activation and the sign of the output error, which is the difference between the target output and the predicted output. The SOL algorithm is easily implementable in hardware, and can be used in any artificial neural network framework that learns weights by minimising a convex cost function. We show that the TAB framework can be trained for various regression tasks using the SOL algorithm.
△ Less
Submitted 30 July, 2017; v1 submitted 11 May, 2015;
originally announced May 2015.
-
A neuromorphic hardware framework based on population coding
Authors:
Chetan Singh Thakur,
Tara Julia Hamilton,
Runchun Wang,
Jonathan Tapson,
André van Schaik
Abstract:
In the biological nervous system, large neuronal populations work collaboratively to encode sensory stimuli. These neuronal populations are characterised by a diverse distribution of tuning curves, ensuring that the entire range of input stimuli is encoded. Based on these principles, we have designed a neuromorphic system called a Trainable Analogue Block (TAB), which encodes given input stimuli u…
▽ More
In the biological nervous system, large neuronal populations work collaboratively to encode sensory stimuli. These neuronal populations are characterised by a diverse distribution of tuning curves, ensuring that the entire range of input stimuli is encoded. Based on these principles, we have designed a neuromorphic system called a Trainable Analogue Block (TAB), which encodes given input stimuli using a large population of neurons with a heterogeneous tuning curve profile. Heterogeneity of tuning curves is achieved using random device mismatches in VLSI (Very Large Scale Integration) process and by adding a systematic offset to each hidden neuron. Here, we present measurement results of a single test cell fabricated in a 65nm technology to verify the TAB framework. We have mimicked a large population of neurons by re-using measurement results from the test cell by varying offset. We thus demonstrate the learning capability of the system for various regression tasks. The TAB system may pave the way to improve the design of analogue circuits for commercial applications, by rendering circuits insensitive to random mismatch that arises due to the manufacturing process.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.
-
FPGA Implementation of the CAR Model of the Cochlea
Authors:
Chetan Singh Thakur,
Tara Julia Hamilton,
Jonathan Tapson,
Richard F. Lyon,
André van Schaik
Abstract:
The front end of the human auditory system, the cochlea, converts sound signals from the outside world into neural impulses transmitted along the auditory pathway for further processing. The cochlea senses and separates sound in a nonlinear active fashion, exhibiting remarkable sensitivity and frequency discrimination. Although several electronic models of the cochlea have been proposed and implem…
▽ More
The front end of the human auditory system, the cochlea, converts sound signals from the outside world into neural impulses transmitted along the auditory pathway for further processing. The cochlea senses and separates sound in a nonlinear active fashion, exhibiting remarkable sensitivity and frequency discrimination. Although several electronic models of the cochlea have been proposed and implemented, none of these are able to reproduce all the characteristics of the cochlea, including large dynamic range, large gain and sharp tuning at low sound levels, and low gain and broad tuning at intense sound levels. Here, we implement the Cascade of Asymmetric Resonators (CAR) model of the cochlea on an FPGA. CAR represents the basilar membrane filter in the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. CAR-FAC is a neuromorphic model of hearing based on a pole-zero filter cascade model of auditory filtering. It uses simple nonlinear extensions of conventional digital filter stages that are well suited to FPGA implementations, so that we are able to implement up to 1224 cochlear sections on Virtex-6 FPGA to process sound data in real time. The FPGA implementation of the electronic cochlea described here may be used as a front-end sound analyser for various machine-hearing applications.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.