Search | arXiv e-print repository

Towards a physically realistic computationally efficient DVS pixel model

Abstract: Dynamic Vision Sensor (DVS) event camera models are important tools for predicting camera response, optimizing biases, and generating realistic simulated datasets. Existing DVS models have been useful, but have not demonstrated high realism for challenging HDR scenes combined with adequate computational efficiency for array-level scene simulation. This paper reports progress towards a physically r… ▽ More Dynamic Vision Sensor (DVS) event camera models are important tools for predicting camera response, optimizing biases, and generating realistic simulated datasets. Existing DVS models have been useful, but have not demonstrated high realism for challenging HDR scenes combined with adequate computational efficiency for array-level scene simulation. This paper reports progress towards a physically realistic and computationally efficient DVS model based on large-signal differential equations derived from circuit analysis, with parameters fitted from pixel measurements and circuit simulation. These are combined with an efficient stochastic event generation mechanism based on first-passage-time theory, allowing accurate noise generation with timesteps greater than 1000x longer than previous methods △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: Presented in 2025 International Image Sensor Workshop

arXiv:2411.02019 [pdf, other]

Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement

Authors: Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Vamsi Krishna Ithapu, Shih-Chii Liu

Abstract: Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the… ▽ More Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the acoustic environment at a low frame rate, and a fast branch that performs SE in the time domain at the needed higher frame rate to match the required latency. Specifically, the fast branch employs a state space model where its state transition process is dynamically modulated by the slow branch. Experiments on a SE task with a 2 ms algorithmic latency requirement using the Voice Bank + Demand dataset show that our approach reduces computation cost by 70% compared to a baseline single-branch network with equivalent parameters, without compromising enhancement performance. Furthermore, by leveraging the SlowFast framework, we implemented a network that achieves an algorithmic latency of just 62.5 μs (one sample point at 16 kHz sample rate) with a computation cost of 100 M MACs/s, while scoring a PESQ-NB of 3.12 and SISNR of 16.62. △ Less

Submitted 4 January, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

Comments: Accepted to ICASSP 2025

arXiv:2409.09648 [pdf, other]

SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux

Authors: Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck

Abstract: This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more sensitive at 14x lower illumination than existing commercial and prototype cameras. Event cameras output a sparse stream of brightness change events. Their high dynamic range (HDR), quick response, and high temporal resolution provide key advantages for scientific applications that involve low lighting conditions and spa… ▽ More This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more sensitive at 14x lower illumination than existing commercial and prototype cameras. Event cameras output a sparse stream of brightness change events. Their high dynamic range (HDR), quick response, and high temporal resolution provide key advantages for scientific applications that involve low lighting conditions and sparse visual events. However, current DVS are hindered by low sensitivity, resulting from shot noise and pixel-to-pixel mismatch. Commercial DVS have a minimum brightness change threshold of >10%. Sensitive prototypes achieved as low as 1%, but required kilo-lux illumination. Our SciDVS prototype fabricated in a 180nm CMOS image sensor process achieves 1.7% sensitivity at chip illumination of 0.7 lx and 18 Hz bandwidth. Novel features of SciDVS are (1) an auto-centering in-pixel preamplifier providing intrascene HDR and increased sensitivity, (2) improved control of bandwidth to limit shot noise, and (3) optional pixel binning, allowing the user to trade spatial resolution for sensitivity. △ Less

Submitted 15 September, 2024; originally announced September 2024.

Comments: Presented at ESSERC 2024

arXiv:2408.12425 [pdf, other]

doi 10.21437/Interspeech.2024-958

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

Authors: Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Shih-Chii Liu

Abstract: This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the com… ▽ More This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the computation cost of the conventional RNN to be reduced during network inference. As a realization of the DG-RNN, we further propose the Dynamic Gated Recurrent Unit (D-GRU) which does not require additional parameters. Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes. △ Less

Submitted 24 September, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: Proceedings of Interspeech 2024

arXiv:2407.08681 [pdf, other]

Hardware Neural Control of CartPole and F1TENTH Race Car

Authors: Marcin Paluch, Florian Bolli, Xiang Deng, Antonio Rios Navarro, Chang Gao, Tobi Delbruck

Abstract: Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race ca… ▽ More Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race car. Our results show that the NCs match the control performance of the NMPCs in simulation and outperform it in reality, due to the faster control rate that is afforded by the quick FPGA NC inference. We demonstrate kHz control rates for a physical cartpole and offloading control to the FPGA hardware on the F1TENTH car. Code and hardware implementation for this paper are available at https:// github.com/SensorsINI/Neural-Control-Tools. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2405.03905 [pdf, other]

doi 10.1109/TCASAI.2024.3507694

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

Abstract: This paper introduces DeltaKWS, to the best of our knowledge, the first $Δ$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network ($Δ$RNN) classifier leveraging temporal similarities betwee… ▽ More This paper introduces DeltaKWS, to the best of our knowledge, the first $Δ$RNN-enabled fine-grained temporal sparsity-aware KWS IC for voice-controlled devices. The 65 nm prototype chip features a number of techniques to enhance performance, area, and power efficiencies, specifically: 1) a bio-inspired delta-gated recurrent neural network ($Δ$RNN) classifier leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses; 2) an IIR BPF-based FEx that leverages mixed-precision quantization, low-cost computing structure and channel selection; 3) a 24 kB 0.6 V near-$V_\text{TH}$ weight SRAM that achieves 6.6X lower read power than the foundry-provided SRAM. From chip measurement results, we show that the DeltaKWS achieves an 11/12-class GSCD accuracy of 90.5%/89.5% respectively and energy consumption of 36 nJ/decision in 65 nm CMOS process. At 87% temporal sparsity, computing latency and energy/inference are reduced by 2.4X/3.4X, respectively. The IIR BPF-based FEx, $Δ$RNN accelerator, and 24 kB near-$V_\text{TH}$ SRAM blocks occupy 0.084 mm$^{2}$, 0.319 mm$^{2}$, and 0.381 mm$^{2}$ respectively (0.78 mm$^{2}$ in total). △ Less

Submitted 26 November, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: This paper has been accepted for publication in the IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)

arXiv:2304.04706 [pdf, other]

Shining light on the DVS pixel: A tutorial and discussion about biasing and optimization

Authors: Rui Graça, Brian McReynolds, Tobi Delbruck

Abstract: The operation of the DVS event camera is controlled by the user through adjusting different bias parameters. These biases affect the response of the camera by controlling - among other parameters - the bandwidth, sensitivity, and maximum firing rate of the pixels. Besides determining the response of the camera to input signals, biases significantly impact its noise performance. Bias optimization i… ▽ More The operation of the DVS event camera is controlled by the user through adjusting different bias parameters. These biases affect the response of the camera by controlling - among other parameters - the bandwidth, sensitivity, and maximum firing rate of the pixels. Besides determining the response of the camera to input signals, biases significantly impact its noise performance. Bias optimization is a multivariate process depending on the task and the scene, to which the user's knowledge about pixel design and non-idealities can be of great importance. In this paper, we go step-by-step along the signal pathway of the DVS pixel, shining light on its low-level operation and non-idealities, comparing pixel level measurements with array level measurements, and discussing and how biasing and illumination affect the pixel's behavior. With the results and discussion presented, we aim to help DVS users achieve more hardware-aware camera utilization and modelling. △ Less

Submitted 11 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted at 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 4th International Workshop on Event-Based Vision

arXiv:2304.04019 [pdf, other]

Optimal biasing and physical limits of DVS event noise

Authors: Rui Graca, Brian McReynolds, Tobi Delbruck

Abstract: Under dim lighting conditions, the output of Dynamic Vision Sensor (DVS) event cameras is strongly affected by noise. Photon and electron shot-noise cause a high rate of non-informative events that reduce Signal to Noise ratio. DVS noise performance depends not only on the scene illumination, but also on the user-controllable biasing of the camera. In this paper, we explore the physical limits of… ▽ More Under dim lighting conditions, the output of Dynamic Vision Sensor (DVS) event cameras is strongly affected by noise. Photon and electron shot-noise cause a high rate of non-informative events that reduce Signal to Noise ratio. DVS noise performance depends not only on the scene illumination, but also on the user-controllable biasing of the camera. In this paper, we explore the physical limits of DVS noise, showing that the DVS photoreceptor is limited to a theoretical minimum of 2x photon shot noise, and we discuss how biasing the DVS with high photoreceptor bias and adequate source-follower bias approaches optimal noise performance. We support our conclusions with pixel-level measurements of a DAVIS346 and analysis of a theoretical pixel model. △ Less

Submitted 12 April, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

Comments: Accepted to the 2023 International Image Sensor Workshop (IISW)

arXiv:2304.03494 [pdf, other]

Exploiting Alternating DVS Shot Noise Event Pair Statistics to Reduce Background Activity

Authors: Brian McReynolds, Rui Graca, Tobi Delbruck

Abstract: Dynamic Vision Sensors (DVS) record "events" corresponding to pixel-level brightness changes, resulting in data-efficient representation of a dynamic visual scene. As DVS expand into increasingly diverse applications, non-ideal behaviors in their output under extreme sensing conditions are important to consider. Under low illumination (below ~10 lux) their output begins to be dominated by shot noi… ▽ More Dynamic Vision Sensors (DVS) record "events" corresponding to pixel-level brightness changes, resulting in data-efficient representation of a dynamic visual scene. As DVS expand into increasingly diverse applications, non-ideal behaviors in their output under extreme sensing conditions are important to consider. Under low illumination (below ~10 lux) their output begins to be dominated by shot noise events (SNEs) which increase the data output and obscure true signal. SNE rates can be controlled to some degree by tuning circuit parameters to reduce sensitivity or temporal response bandwidth at the cost of signal loss. Alternatively, an improved understanding of SNE statistics can be leveraged to develop novel techniques for minimizing uninformative sensor output. We first explain a fundamental observation about sequential pairing of opposite polarity SNEs based on pixel circuit logic and validate our theory using DVS recordings and simulations. Finally, we derive a practical result from this new understanding and demonstrate two novel biasing techniques to reduce SNEs by 50% and 80% respectively while still retaining sensitivity and/or temporal resolution. △ Less

Submitted 12 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: IISW 2023, paper R5.6

arXiv:2208.00693 [pdf, other]

doi 10.1109/JSSC.2022.3195610

A 23 $μ$W Keyword Spotting IC with Ring-Oscillator-Based Time-Domain Feature Extraction

Authors: Kwantae Kim, Chang Gao, Rui Graça, Ilya Kiselev, Hoi-Jun Yoo, Tobi Delbruck, Shih-Chii Liu

Abstract: This article presents the first keyword spotting (KWS) IC which uses a ring-oscillator-based time-domain processing technique for its analog feature extractor (FEx). Its extensive usage of time-encoding schemes allows the analog audio signal to be processed in a fully time-domain manner except for the voltage-to-time conversion stage of the analog front-end. Benefiting from fundamental building bl… ▽ More This article presents the first keyword spotting (KWS) IC which uses a ring-oscillator-based time-domain processing technique for its analog feature extractor (FEx). Its extensive usage of time-encoding schemes allows the analog audio signal to be processed in a fully time-domain manner except for the voltage-to-time conversion stage of the analog front-end. Benefiting from fundamental building blocks based on digital logic gates, it offers a better technology scalability compared to conventional voltage-domain designs. Fabricated in a 65 nm CMOS process, the prototyped KWS IC occupies 2.03mm$^{2}$ and dissipates 23 $μ$W power consumption including analog FEx and digital neural network classifier. The 16-channel time-domain FEx achieves 54.89 dB dynamic range for 16 ms frame shift size while consuming 9.3 $μ$W. The measurement result verifies that the proposed IC performs a 12-class KWS task on the Google Speech Command Dataset (GSCD) with >86% accuracy and 12.4 ms latency. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 14 pages, 21 figures, 2 tables

arXiv:2109.08640 [pdf, other]

Unraveling the paradox of intensity-dependent DVS pixel noise

Authors: Rui Graca, Tobi Delbruck

Abstract: Dynamic vision sensor (DVS) event camera output is affected by noise, particularly in dim lighting conditions. A theory explaining how photon and electron noise affect DVS output events has so far not been developed. Moreover, there is no clear understanding of how DVS parameters and operating conditions affect noise. There is an apparent paradox between the real noise data observed from the DVS o… ▽ More Dynamic vision sensor (DVS) event camera output is affected by noise, particularly in dim lighting conditions. A theory explaining how photon and electron noise affect DVS output events has so far not been developed. Moreover, there is no clear understanding of how DVS parameters and operating conditions affect noise. There is an apparent paradox between the real noise data observed from the DVS output and the reported noise measurements of the logarithmic photoreceptor. While measurements of the logarithmic photoreceptor predict that the photoreceptor is approximately a first-order system with RMS noise voltage independent of the photocurrent, DVS output shows higher noise event rates at low light intensity. This paper unravels this paradox by showing how the DVS photoreceptor is a second-order system, and the assumption that it is first-order is generally not reasonable. As we show, at higher photocurrents, the photoreceptor amplifier dominates the frequency response, causing a drop in RMS noise voltage and noise event rate. We bring light to the noise performance of the DVS photoreceptor by presenting a theoretical explanation supported by both transistor-level simulation results and chip measurements. △ Less

Submitted 17 September, 2021; originally announced September 2021.

Comments: Presented in 2021 International Image Sensor Workshop (IISW)

Journal ref: 2021 International Image Sensor Workshop (IISW)

arXiv:2002.03197 [pdf, other]

doi 10.1109/ICRA40945.2020.9196984

Recurrent Neural Network Control of a Hybrid Dynamic Transfemoral Prosthesis with EdgeDRNN Accelerator

Authors: Chang Gao, Rachel Gehlhar, Aaron D. Ames, Shih-Chii Liu, Tobi Delbruck

Abstract: Lower leg prostheses could improve the life quality of amputees by increasing comfort and reducing energy to locomote, but currently control methods are limited in modulating behaviors based upon the human's experience. This paper describes the first steps toward learning complex controllers for dynamical robotic assistive devices. We provide the first example of behavioral cloning to control a po… ▽ More Lower leg prostheses could improve the life quality of amputees by increasing comfort and reducing energy to locomote, but currently control methods are limited in modulating behaviors based upon the human's experience. This paper describes the first steps toward learning complex controllers for dynamical robotic assistive devices. We provide the first example of behavioral cloning to control a powered transfemoral prostheses using a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) running on a custom hardware accelerator that exploits temporal sparsity. The RNN is trained on data collected from the original prosthesis controller. The RNN inference is realized by a novel EdgeDRNN accelerator in real-time. Experimental results show that the RNN can replace the nominal PD controller to realize end-to-end control of the AMPRO3 prosthetic leg walking on flat ground and unforeseen slopes with comparable tracking accuracy. EdgeDRNN computes the RNN about 240 times faster than real time, opening the possibility of running larger networks for more complex tasks in the future. Implementing an RNN on this real-time dynamical system with impacts sets the ground work to incorporate other learned elements of the human-prosthesis system into prosthesis control. △ Less

Submitted 28 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

Comments: Accepted at 2020 International Conference on Robotics and Automation (ICRA 2020)

Journal ref: 2020 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:1912.12193 [pdf, ps, other]

doi 10.1109/AICAS48895.2020.9074001

EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference

Authors: Chang Gao, Antonio Rios-Navarro, Xi Chen, Tobi Delbruck, Shih-Chii Liu

Abstract: This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million paramete… ▽ More This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million parameter 2-layer GRU-RNN, with weights stored in DRAM, show that EdgeDRNN computes them in under 0.5 ms. With 2.42 W wall plug power on an entry level USB powered FPGA board, it achieves latency comparable with a 92 W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 6X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2 GOp/s and a wall plug power efficiency that is over 4X higher than all other platforms. △ Less

Submitted 28 July, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: This paper has been accepted for publication at the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genoa, 2020

Journal ref: 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Showing 1–13 of 13 results for author: Delbruck, T