Skip to main content

Showing 1–7 of 7 results for author: Paul, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.12501  [pdf, other

    eess.AS cs.SD

    A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data

    Authors: Minh Tran, Yutong Pang, Debjyoti Paul, Laxmi Pandey, Kevin Jiang, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei

    Abstract: We introduce DAS (Domain Adaptation with Synthetic data), a novel domain adaptation framework for pre-trained ASR model, designed to efficiently adapt to various language-defined domains without requiring any real data. In particular, DAS first prompts large language models (LLMs) to generate domain-specific texts before converting these texts to speech via text-to-speech technology. The synthetic… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  2. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2207.09674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Data Driven Inverse Text Normalization using Data Augmentation

    Authors: Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf

    Abstract: Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  4. arXiv:2008.05809  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

    Authors: Dipjyoti Paul, Muhammed PV Shifas, Yannis Pantazis, Yannis Stylianou

    Abstract: The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To over… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Accepted in INTERSPEECH 2020

  5. arXiv:2008.05289  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

    Authors: Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou

    Abstract: Recent advancements in deep learning led to human-level performance in single-speaker speech synthesis. However, there are still limitations in terms of speech quality when generalizing those systems into multiple-speaker models especially for unseen speakers and unseen recording qualities. For instance, conventional neural vocoders are adjusted to the training speaker and have poor generalization… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

    Comments: Accepted in INTERSPEECH 2020

  6. arXiv:1905.09369  [pdf, other

    math.ST cs.LG eess.SP

    Sparse Equisigned PCA: Algorithms and Performance Bounds in the Noisy Rank-1 Setting

    Authors: Arvind Prasadan, Raj Rao Nadakuditi, Debashis Paul

    Abstract: Singular value decomposition (SVD) based principal component analysis (PCA) breaks down in the high-dimensional and limited sample size regime below a certain critical eigen-SNR that depends on the dimensionality of the system and the number of samples. Below this critical eigen-SNR, the estimates returned by the SVD are asymptotically uncorrelated with the latent principal components. We consider… ▽ More

    Submitted 16 December, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: To appear, Electronic Journal of Statistics, 2020

  7. arXiv:1901.08025  [pdf, ps, other

    cs.MM eess.AS

    Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

    Authors: Dipjyoti Paul, Md Sahidullah, Goutam Saha

    Abstract: Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Journal ref: Published in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA