Skip to main content

Showing 1–2 of 2 results for author: Thanda, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2001.10832  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.IV

    Audio-Visual Decision Fusion for WFST-based and seq2seq Models

    Authors: Rohith Aralikatti, Sharad Roy, Abhinav Thanda, Dilip Kumar Margam, Pujitha Appan Kandala, Tanay Sharma, Shankar M Venkatesan

    Abstract: Under noisy conditions, speech recognition systems suffer from high Word Error Rates (WER). In such cases, information from the visual modality comprising the speaker lip movements can help improve the performance. In this work, we propose novel methods to fuse information from audio and visual modalities at inference time. This enables us to train the acoustic and visual models independently. Fir… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: Submitted for review to ICASSP 2020 on October 21st, 2019

  2. arXiv:1906.12170  [pdf, other

    cs.CV cs.LG cs.SD eess.AS eess.IV

    LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

    Authors: Dilip Kumar Margam, Rohith Aralikatti, Tanay Sharma, Abhinav Thanda, Pujitha A K, Sharad Roy, Shankar M Venkatesan

    Abstract: In recent years, deep learning based machine lipreading has gained prominence. To this end, several architectures such as LipNet, LCANet and others have been proposed which perform extremely well compared to traditional lipreading DNN-HMM hybrid systems trained on DCT features. In this work, we propose a simpler architecture of 3D-2D-CNN-BLSTM network with a bottleneck layer. We also present analy… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: Submitted to Interspeech 2019