Skip to main content

Showing 1–8 of 8 results for author: Tobin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.00039  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

    Authors: Chirag Nagpal, Subhashini Venugopalan, Jimmy Tobin, Marilyn Ladewig, Katherine Heller, Katrin Tomanek

    Abstract: We introduce a large language model (LLM) capable of processing speech inputs and show that tuning it further with reinforcement learning on human preference (RLHF) enables it to adapt better to disordered speech than traditional fine-tuning. Our method replaces low-frequency text tokens in an LLM's vocabulary with audio tokens and enables the model to recognize speech by fine-tuning it on speech… ▽ More

    Submitted 24 December, 2024; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  2. arXiv:2412.19315  [pdf, other

    eess.AS

    Towards a Single ASR Model That Generalizes to Disordered Speech

    Authors: Jimmy Tobin, Katrin Tomanek, Subhashini Venugopalan

    Abstract: This study investigates the impact of integrating a dataset of disordered speech recordings ($\sim$1,000 hours) into the fine-tuning of a near state-of-the-art ASR baseline system. Contrary to what one might expect, despite the data being less than 1% of the training data of the ASR system, we find a considerable improvement in disordered speech recognition accuracy. Specifically, we observe a 33%… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP 2025

  3. arXiv:2409.09190  [pdf, other

    eess.AS cs.SD

    Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

    Authors: Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

    Abstract: Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Interspeech 2024

  4. arXiv:2303.07533  [pdf, other

    eess.AS cs.SD

    Speech Intelligibility Classifiers from 550k Disordered Speech Samples

    Authors: Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner

    Abstract: We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~94K utterances from 100 speakers. We further found… ▽ More

    Submitted 15 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023 camera-ready

  5. arXiv:2209.10591  [pdf, other

    eess.AS cs.CL cs.LG

    Assessing ASR Model Quality on Disordered Speech using BERTScore

    Authors: Jimmy Tobin, Qisheng Li, Subhashini Venugopalan, Katie Seaver, Richard Cave, Katrin Tomanek

    Abstract: Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generati… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted to Interspeech 2022 Workshop on Speech for Social Good

  6. arXiv:2110.04612  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

    Authors: Jimmy Tobin, Katrin Tomanek

    Abstract: This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered speech using small amounts of per-speaker adaptation data. We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data. Word error rate (WER) thresholds… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  7. arXiv:2107.03985  [pdf, other

    eess.AS cs.LG cs.SD

    Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases

    Authors: Subhashini Venugopalan, Joel Shor, Manoj Plakal, Jimmy Tobin, Katrin Tomanek, Jordan R. Green, Michael P. Brenner

    Abstract: Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of diso… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: Accepted at INTERSPEECH 2021

  8. arXiv:1610.03518  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

    Authors: Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba

    Abstract: Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevert… ▽ More

    Submitted 11 October, 2016; originally announced October 2016.