Skip to main content

Showing 1–6 of 6 results for author: Mavandadi, S

.
  1. arXiv:2303.15293  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    A Deliberation-based Joint Acoustic and Text Decoder

    Authors: Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu

    Abstract: We propose a new two-pass E2E speech recognition model that improves ASR performance by training on a combination of paired data and unpaired text data. Previously, the joint acoustic and text decoder (JATD) has shown promising results through the use of text data during model training and the recently introduced deliberation architecture has reduced recognition errors by leveraging first-pass dec… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Interspeech 2021

  2. arXiv:2209.06058  [pdf, other

    eess.AS cs.CL

    Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

    Authors: Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-yiin Chang, Parisa Haghani

    Abstract: Language identification is critical for many downstream tasks in automatic speech recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an additional task. In this paper, we propose to modify the structure of the cascaded-encoder-based recurrent neural network transducer (RNN-T) model by integrating a per-frame language identifier (LID) predictor. RNN-T with cascade… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  3. arXiv:2206.14716  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Deliberation by Text-Only and Semi-Supervised Training

    Authors: Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang

    Abstract: Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deli… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  4. arXiv:2204.07553  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Rare Word Recognition with LM-aware MWER Training

    Authors: Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

    Abstract: Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups. In this work, we introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework, to mitigate the training versus inference gap regarding the use… ▽ More

    Submitted 27 June, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: To appear in INTERSPEECH 2022

  5. arXiv:2008.10491  [pdf, other

    eess.AS cs.LG

    Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

    Authors: Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar

    Abstract: End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems. While this simplifies the model architecture, it complicates the task of incorporating text-only data into training, which is important to the recognition of tail words that do not occur often in audio-text pairs. While shallow fusion has been p… ▽ More

    Submitted 25 August, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

  6. arXiv:1212.0992  [pdf

    cs.OH

    BigFoot: Analysis, monitoring, tracking and sharing of bio-medical features of human appendages using consumer-grade home and office based imaging devices

    Authors: Sam Mavandadi, Steve Feng, Frank Yu, Richard Yu, Aydogan Ozcan

    Abstract: Here we describe a system for personal and professional management and analysis of bio-medical images captured using off-the-shelf, consumer-grade imaging devices such as scanners, digital cameras, cellphones, webcams and tablet PCs. Specifically, we describe an implementation of this system for the analysis, monitoring and tracking of conditions and features of human feet using a flatbed scanner… ▽ More

    Submitted 5 December, 2012; originally announced December 2012.