Skip to main content

Showing 1–5 of 5 results for author: Sukhadia, V N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.13431  [pdf, other

    cs.CL cs.SD eess.AS

    Children's Speech Recognition through Discrete Token Enhancement

    Authors: Vrunda N. Sukhadia, Shammur Absar Chowdhury

    Abstract: Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2305.19584  [pdf, other

    cs.CL eess.AS

    The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

    Authors: Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh

    Abstract: Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data. This problem can be solved by exploiting the fact that many of these languages are phonetically similar. These languages can be converted into a Common Label Set (CLS) by mapping sim… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 5 pages,5 figures, submitted to INTERSPEECH2023

  3. arXiv:2211.01669  [pdf, other

    eess.AS cs.SD eess.SP

    Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR

    Authors: Vrunda N. Sukhadia, A. Arunkumar, S. Umesh

    Abstract: This paper proposes a novel technique to obtain better downstream ASR performance from a joint encoder-decoder self-supervised model when trained with speech pooled from two different channels (narrow and wide band). The joint encoder-decoder self-supervised model extends the HuBERT model with a Transformer decoder. HuBERT performs clustering of features and predicts the class of every input frame… ▽ More

    Submitted 3 June, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 5 pages, 5 figures

  4. Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

    Authors: A Arunkumar, Vrunda N Sukhadia, S. Umesh

    Abstract: Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these models optimizes a different loss which gives rise to the possibility of their features being complementary. This paper proposes using an ensemble of such SSL rep… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: 4 pages , 2 figures,submitted to interspeech 2022

  5. Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

    Authors: Vrunda N. Sukhadia, S. Umesh

    Abstract: In this paper, we investigate domain adaptation for low-resource Automatic Speech Recognition (ASR) of target-domain data, when a well-trained ASR model trained with a large dataset is available. We argue that in the encoder-decoder framework, the decoder of the well-trained ASR model is largely tuned towards the source-domain, hurting the performance of target-domain models in vanilla transfer-le… ▽ More

    Submitted 29 May, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: 5 pages,2 figures