Skip to main content

Showing 1–5 of 5 results for author: Bhatia, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2412.18566  [pdf, other

    cs.CL eess.AS

    Zero-resource Speech Translation and Recognition with LLMs

    Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

    Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More

    Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

  2. arXiv:2411.12719  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation

    Authors: Praveen Srinivasa Varadhan, Amogh Gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M. Khapra

    Abstract: Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS's pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference sp… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: Accepted in TMLR

  3. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  4. arXiv:2307.00453  [pdf, other

    cs.CL cs.SD eess.AS

    Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

    Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  5. arXiv:2011.15000  [pdf, other

    cs.CV cs.LG eess.IV

    Fast, Self Supervised, Fully Convolutional Color Normalization of H&E Stained Images

    Authors: Abhijeet Patil, Mohd. Talha, Aniket Bhatia, Nikhil Cherian Kurian, Sammed Mangale, Sunil Patel, Amit Sethi

    Abstract: Performance of deep learning algorithms decreases drastically if the data distributions of the training and testing sets are different. Due to variations in staining protocols, reagent brands, and habits of technicians, color variation in digital histopathology images is quite common. Color variation causes problems for the deployment of deep learning-based solutions for automatic diagnosis system… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: --