Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
Authors:
Jinming Zhang,
Xuanru Zhou,
Jiachen Lian,
Shuhe Li,
William Li,
Zoe Ezzes,
Rian Bogley,
Lisa Wauters,
Zachary Miller,
Jet Vonk,
Brittany Morin,
Maria Gorno-Tempini,
Gopala Anumanchipalli
Abstract:
Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the…
▽ More
Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the most comprehensive dysfluent speech corpus with LLM-enhanced dysfluency simulation. This dataset captures 11 dysfluency categories spanning both word and phoneme levels. Building upon this resource, we improve an end-to-end dysfluency detection framework. Experimental validation demonstrates state-of-the-art performance. All data, models, and code are open-sourced at https://github.com/Berkeley-Speech-Group/LLM-Dys.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection
Authors:
Chenxu Guo,
Jiachen Lian,
Xuanru Zhou,
Jinming Zhang,
Shuhe Li,
Zongli Ye,
Hwi Joo Park,
Anaisha Das,
Zoe Ezzes,
Jet Vonk,
Brittany Morin,
Rian Bogley,
Lisa Wauters,
Zachary Miller,
Maria Gorno-Tempini,
Gopala Anumanchipalli
Abstract:
Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-sh…
▽ More
Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency. Unlike previous models, Dysfluent-WFST operates with upstream encoders like WavLM and requires no additional training. It achieves state-of-the-art performance in both phonetic error rate and dysfluency detection on simulated and real speech data. Our approach is lightweight, interpretable, and effective, demonstrating that explicit modeling of pronunciation behavior in decoding, rather than complex architectures, is key to improving dysfluency processing systems.
△ Less
Submitted 24 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.