Skip to main content

Showing 1–4 of 4 results for author: Chakrabarty, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.19774  [pdf, ps, other

    eess.AS

    DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation

    Authors: Prabash Reddy Male, Swayambhu Nath Ray, Harish Arsikere, Akshat Jaiswal, Prakhar Swarup, Prantik Sen, Debmalya Chakrabarty, K V Vijay Girish, Nikhil Bhave, Frederick Weber, Sambuddha Bhattacharya, Sri Garimella

    Abstract: Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there's limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speec… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2110.09890  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Modal Pre-Training for Automated Speech Recognition

    Authors: David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

    Abstract: Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance. Unfortunately, approaches relying on such hyper-local information tend to be vulnerable to both local-level corruption (such as audio-frame drops, or loud noises) and global-level noise (such as environmental noise, or background noise… ▽ More

    Submitted 15 September, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Presented at ICASSP 2022

  3. arXiv:2008.03923  [pdf, other

    cs.CL eess.AS

    Knowledge Distillation and Data Selection for Semi-Supervised Learning in CTC Acoustic Models

    Authors: Prakhar Swarup, Debmalya Chakrabarty, Ashtosh Sapru, Hitesh Tulsiani, Harish Arsikere, Sri Garimella

    Abstract: Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabelled data in order to improve the accuracy of speech recognition systems. The current study proposes a methodology for integration of two key ideas: 1) SSL using connectionist temporal classification (CTC) objective and teacher-student based learning 2) Designing effective data-selection mechanisms for leverag… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

  4. arXiv:1811.04048  [pdf, ps, other

    eess.AS cs.SD

    Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

    Authors: Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, Mounya Elhilali

    Abstract: Sound event detection is a challenging task, especially for scenes with multiple simultaneous events. While event classification methods tend to be fairly accurate, event localization presents additional challenges, especially when large amounts of labeled data are not available. Task4 of the 2018 DCASE challenge presents an event detection task that requires accuracy in both segmentation and reco… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019