Skip to main content

Showing 1–6 of 6 results for author: Adiga, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.05837  [pdf, other

    eess.AS

    Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models

    Authors: Shiva Kumar C, Jitendra Kumar Dhiman, Nagaraj Adiga, Shatrughan Singh

    Abstract: Traditionally, Knowledge Distillation (KD) is used for model compression, often leading to suboptimal performance. In this paper, we evaluate the impact of combining KD loss with alternative pruning techniques, including Low-Rank Factorization (LRF) and l0 regularization, on a conformer-based pre-trained network under the paradigm of Self-Supervised Learning (SSL). We also propose a strategy to jo… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: 5 pages, 2 figures, 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing

  2. arXiv:2312.09842  [pdf, ps, other

    cs.SD eess.AS

    On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

    Authors: Nagaraj Adiga, Jinhwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Recently, the cascaded two-pass architecture has emerged as a strong contender for on-device automatic speech recognition (ASR). A cascade of causal and shallow non-causal encoders coupled with a shared decoder enables operation in both streaming and look-ahead modes. In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation,… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2206.04305  [pdf, other

    eess.AS cs.CL cs.SD

    Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

    Authors: Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga

    Abstract: Detecting and recovering out-of-vocabulary (OOV) words is always challenging for Automatic Speech Recognition (ASR) systems. Many existing methods focus on modeling OOV words by modifying acoustic and language models and integrating context words cleverly into models. To train such complex models, we need a large amount of data with context words, additional training time, and increased model size… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 12 pages

  4. arXiv:2106.10870  [pdf, other

    eess.AS cs.CL cs.SD

    Non-native English lexicon creation for bilingual speech synthesis

    Authors: Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne

    Abstract: Bilingual English speakers speak English as one of their languages. Their English is of a non-native kind, and their conversations are of a code-mixed fashion. The intelligibility of a bilingual text-to-speech (TTS) system for such non-native English speakers depends on a lexicon that captures the phoneme sequence used by non-native speakers. However, due to the lack of non-native English lexicon,… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted for Presentation at Speech Synthesis Workshop (SSW), 2021 (August 2021)

  5. A non-causal FFTNet architecture for speech enhancement

    Authors: Muhammed PV Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou

    Abstract: In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time do… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 5 pages

  6. arXiv:2006.01463  [pdf, other

    cs.SD eess.AS

    An ASR Guided Speech Intelligibility Measure for TTS Model Selection

    Authors: Arun Baby, Saranya Vinnaitherthan, Nagaraj Adiga, Pranav Jawale, Sumukh Badam, Sharath Adavanne, Srikanth Konjeti

    Abstract: The perceptual quality of neural text-to-speech (TTS) is highly dependent on the choice of the model during training. Selecting the model using a training-objective metric such as the least mean squared error does not always correlate with human perception. In this paper, we propose an objective metric based on the phone error rate (PER) to select the TTS model with the best speech intelligibility… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Submitted to INTERSPEECH 2020