Event-driven Spectrotemporal Feature Extraction and Classification using a Silicon Cochlea Model
Authors:
Ying Xu,
Samalika Perera,
Yeshwanth Bethi,
Saeed Afshar,
André van Schaik
Abstract:
This paper presents a reconfigurable digital implementation of an event-based binaural cochlear system on a Field Programmable Gate Array (FPGA). It consists of a pair of the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlea models and leaky integrate and fire (LIF) neurons. Additionally, we propose an event-driven SpectroTemporal Receptive Field (STRF) Feature Extrac…
▽ More
This paper presents a reconfigurable digital implementation of an event-based binaural cochlear system on a Field Programmable Gate Array (FPGA). It consists of a pair of the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlea models and leaky integrate and fire (LIF) neurons. Additionally, we propose an event-driven SpectroTemporal Receptive Field (STRF) Feature Extraction using Adaptive Selection Thresholds (FEAST). It is tested on the TIDIGTIS benchmark and compared with current event-based auditory signal processing approaches and neural networks.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
Real-Time Object Detection and Localization in Compressive Sensed Video on Embedded Hardware
Authors:
Yeshwanth Ravi Theja Bethi,
Sathyaprakash Narayanan,
Venkat Rangan,
Chetan Singh Thakur
Abstract:
Every day around the world, interminable terabytes of data are being captured for surveillance purposes. A typical 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive results in signal processing by reducing the sampling band…
▽ More
Every day around the world, interminable terabytes of data are being captured for surveillance purposes. A typical 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive results in signal processing by reducing the sampling bandwidth. Different sampling mechanisms were developed to incorporate compressive sensing in image and video acquisition. Pixel-wise coded exposure is one among the promising sensing paradigms for capturing videos in the compressed domain, which was also realized into an all-CMOS sensor \cite{Xiong2017}. Though cameras that perform compressive sensing save a lot of bandwidth at the time of sampling and minimize the memory required to store videos, we cannot do much in terms of processing until the videos are reconstructed to the original frames. But, the reconstruction of compressive-sensed (CS) videos still takes a lot of time and is also computationally expensive. In this work, we show that object detection and localization can be possible directly on the CS frames (easily upto 20x compression). To our knowledge, this is the first time that the problem of object detection and localization on CS frames has been attempted. Hence, we also created a dataset for training in the CS domain. We were able to achieve a good accuracy of 46.27\% mAP(Mean Average Precision) with the proposed model with an inference time of 23ms directly on the compressed frames(approx. 20 original domain frames), this facilitated for real-time inference which was verified on NVIDIA TX2 embedded board. Our framework will significantly reduce the communication bandwidth, and thus reduction in power as the video compression will be done at the image sensor processing core.
△ Less
Submitted 18 April, 2021; v1 submitted 18 December, 2019;
originally announced December 2019.