Skip to main content

Showing 1–4 of 4 results for author: Houston, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.18566  [pdf, other

    cs.CL eess.AS

    Zero-resource Speech Translation and Recognition with LLMs

    Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

    Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More

    Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

  2. arXiv:2412.16530  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

    Authors: Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

    Abstract: Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been lar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP, 4 pages

  3. Sequential Editing for Lifelong Training of Speech Recognition Models

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Nikolaos Pappas, Srikanth Ronanki

    Abstract: Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored tech… ▽ More

    Submitted 18 September, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  4. arXiv:2307.00759  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

    Authors: Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

    Abstract: Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom enti… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Published at INTERSPEECH 2023