Search | arXiv e-print repository

Random Quantum Neural Networks (RQNN) for Noisy Image Recognition

Authors: Debanjan Konar, Erol Gelenbe, Soham Bhandary, Aditya Das Sarma, Attila Cangi

Abstract: Classical Random Neural Networks (RNNs) have demonstrated effective applications in decision making, signal processing, and image recognition tasks. However, their implementation has been limited to deterministic digital systems that output probability distributions in lieu of stochastic behaviors of random spiking signals. We introduce the novel class of supervised Random Quantum Neural Networks… ▽ More Classical Random Neural Networks (RNNs) have demonstrated effective applications in decision making, signal processing, and image recognition tasks. However, their implementation has been limited to deterministic digital systems that output probability distributions in lieu of stochastic behaviors of random spiking signals. We introduce the novel class of supervised Random Quantum Neural Networks (RQNNs) with a robust training strategy to better exploit the random nature of the spiking RNN. The proposed RQNN employs hybrid classical-quantum algorithms with superposition state and amplitude encoding features, inspired by quantum information theory and the brain's spatial-temporal stochastic spiking property of neuron information encoding. We have extensively validated our proposed RQNN model, relying on hybrid classical-quantum algorithms via the PennyLane Quantum simulator with a limited number of \emph{qubits}. Experiments on the MNIST, FashionMNIST, and KMNIST datasets demonstrate that the proposed RQNN model achieves an average classification accuracy of $94.9\%$. Additionally, the experimental findings illustrate the proposed RQNN's effectiveness and resilience in noisy settings, with enhanced image classification accuracy when compared to the classical counterparts (RNNs), classical Spiking Neural Networks (SNNs), and the classical convolutional neural network (AlexNet). Furthermore, the RQNN can deal with noise, which is useful for various applications, including computer vision in NISQ devices. The PyTorch code (https://github.com/darthsimpus/RQN) is made available on GitHub to reproduce the results reported in this manuscript. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2010.03909 [pdf, other]

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Authors: Biswajit Dev Sarma, Rohan Kumar Das

Abstract: Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We lear… ▽ More Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6% in accuracy for speaker identification studies using emotion invariant speaker embedding against average speaker model based framework with different emotions. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Accepted for publication in APSIPA ASC 2020

arXiv:2007.08847 [pdf, ps, other]

Two-stream Fusion Model for Dynamic Hand Gesture Recognition using 3D-CNN and 2D-CNN Optical Flow guided Motion Template

Authors: Debajit Sarma, V. Kavyasree, M. K. Bhuyan

Abstract: The use of hand gestures can be a useful tool for many applications in the human-computer interaction community. In a broad range of areas hand gesture techniques can be applied specifically in sign language recognition, robotic surgery, etc. In the process of hand gesture recognition, proper detection, and tracking of the moving hand become challenging due to the varied shape and size of the hand… ▽ More The use of hand gestures can be a useful tool for many applications in the human-computer interaction community. In a broad range of areas hand gesture techniques can be applied specifically in sign language recognition, robotic surgery, etc. In the process of hand gesture recognition, proper detection, and tracking of the moving hand become challenging due to the varied shape and size of the hand. Here the objective is to track the movement of the hand irrespective of the shape, size, and color of the hand. And, for this, a motion template guided by optical flow (OFMT) is proposed. OFMT is a compact representation of the motion information of a gesture encoded into a single image. In the experimentation, different datasets using bare hand with an open palm, and folded palm wearing green-glove are used, and in both cases, we could generate the OFMT images with equal precision. Recently, deep network-based techniques have shown impressive improvements as compared to conventional hand-crafted feature-based techniques. Moreover, in the literature, it is seen that the use of different streams with informative input data helps to increase the performance in the recognition accuracy. This work basically proposes a two-stream fusion model for hand gesture recognition and a compact yet efficient motion template based on optical flow. Specifically, the two-stream network consists of two layers: a 3D convolutional neural network (C3D) that takes gesture videos as input and a 2D-CNN that takes OFMT images as input. C3D has shown its efficiency in capturing spatio-temporal information of a video. Whereas OFMT helps to eliminate irrelevant gestures providing additional motion information. Though each stream can work independently, they are combined with a fusion scheme to boost the recognition results. We have shown the efficiency of the proposed two-stream network on two databases. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 7 pages, 6 figures, 2 tables. Keywords: Action and gesture recognition, Two-stream fusion model, Optical flow guided motion template (OFMT), 2D and 3D-CNN

Showing 1–3 of 3 results for author: Sarma, D