Skip to main content

Showing 1–2 of 2 results for author: Mille, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2412.00265  [pdf, other

    eess.AS

    SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies

    Authors: Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna Anumanchipalli

    Abstract: Speech is a hierarchical collection of text, prosody, emotions, dysfluencies, etc. Automatic transcription of speech that goes beyond text (words) is an underexplored problem. We focus on transcribing speech along with non-fluencies (dysfluencies). The current state-of-the-art pipeline SSDM suffers from complex architecture design, training complexity, and significant shortcomings in the local seq… ▽ More

    Submitted 29 November, 2024; originally announced December 2024.

  2. arXiv:2408.16221  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    SSDM: Scalable Speech Dysfluency Modeling

    Authors: Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna Anumanchipalli

    Abstract: Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this pap… ▽ More

    Submitted 3 October, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 2024 NeurIPS