-
Perceptual Ratings Predict Speech Inversion Articulatory Kinematics in Childhood Speech Sound Disorders
Authors:
Nina R. Benway,
Saba Tabatabaee,
Dongliang Wang,
Benjamin Munson,
Jonathan L. Preston,
Carol Espy-Wilson
Abstract:
Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders.
Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were…
▽ More
Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders.
Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were standardized using the novel 5-point PERCEPT Rating Scale and training protocol. Two research questions examined if the articulatory patterns of inferred vocal tract variables aligned with the perceptual error category for the phones investigated (e.g., tongue tip is more anterior in dentalized /s/ productions than in correct /s/). A third research question examined if gradient PERCEPT Rating Scale scores predicted articulatory proximity to correct productions.
Results: Estimated marginal means from linear mixed models supported 17 of 18 /r/ hypotheses, involving tongue tip and tongue body constrictions. For /s/, estimated marginal means from a second linear mixed model supported 7 of 15 hypotheses, particularly those related to the tongue tip. A third linear mixed model revealed that PERCEPT Rating Scale scores significantly predicted articulatory proximity of errored phones to correct productions.
Conclusion: Inferred vocal tract variables differentiated category and magnitude of articulatory errors for /r/, and to a lesser extent for /s/, aligning with perceptual judgments. These findings support the clinical interpretability of speech inversion vocal tract variables and the PERCEPT Rating Scale in quantifying articulatory proximity to the target sound, particularly for /r/.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders
Authors:
Nina R Benway,
Jonathan L Preston
Abstract:
Because lab accuracy of clinical speech technology systems may be overoptimistic, clinical validation is vital to demonstrate system reproducibility - in this case, the ability of the PERCEPT-R Classifier to predict clinician judgment of American English /r/ during ChainingAI motor-based speech sound disorder intervention. All five participants experienced statistically-significant improvement in…
▽ More
Because lab accuracy of clinical speech technology systems may be overoptimistic, clinical validation is vital to demonstrate system reproducibility - in this case, the ability of the PERCEPT-R Classifier to predict clinician judgment of American English /r/ during ChainingAI motor-based speech sound disorder intervention. All five participants experienced statistically-significant improvement in untreated words following 10 sessions of combined human-ChainingAI treatment. These gains, despite a wide range of PERCEPT-human and human-human (F1-score) agreement, raise questions about best measuring classification performance for clinical speech that may be perceptually ambiguous.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants
Authors:
Nina R Benway,
Jonathan L Preston,
Asif Salekin,
Yi Xiao,
Harshit Sharma,
Tara McAllister
Abstract:
Mispronunciation detection tools could increase treatment access for speech sound disorders impacting, e.g., /r/. We show age-and-sex normalized formant estimation outperforms cepstral representation for detection of fully rhotic vs. derhotic /r/ in the PERCEPT-R Corpus. Gated recurrent neural networks trained on this feature set achieve a mean test participant-specific F1-score =.81 (σx=.10, med…
▽ More
Mispronunciation detection tools could increase treatment access for speech sound disorders impacting, e.g., /r/. We show age-and-sex normalized formant estimation outperforms cepstral representation for detection of fully rhotic vs. derhotic /r/ in the PERCEPT-R Corpus. Gated recurrent neural networks trained on this feature set achieve a mean test participant-specific F1-score =.81 (σx=.10, med = .83, n = 48), with post hoc modeling showing no significant effect of child age or sex.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders
Authors:
Nina R Benway,
Yashish M Siriwardena,
Jonathan L Preston,
Elaine Hitchcock,
Tara McAllister,
Carol Espy-Wilson
Abstract:
Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech. In the absence of an articulator…
▽ More
Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech. In the absence of an articulatory dataset in children with rhotic speech sound disorders, we show that classifiers trained on tract variables from acoustic-to-articulatory speech inversion meet or exceed the performance of state-of-the-art features when predicting clinician judgment of rhoticity. Index Terms: rhotic, speech sound disorder, mispronunciation detection
△ Less
Submitted 25 May, 2023;
originally announced May 2023.