Search | arXiv e-print repository

Perceptual Ratings Predict Speech Inversion Articulatory Kinematics in Childhood Speech Sound Disorders

Authors: Nina R. Benway, Saba Tabatabaee, Dongliang Wang, Benjamin Munson, Jonathan L. Preston, Carol Espy-Wilson

Abstract: Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders. Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were… ▽ More Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders. Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were standardized using the novel 5-point PERCEPT Rating Scale and training protocol. Two research questions examined if the articulatory patterns of inferred vocal tract variables aligned with the perceptual error category for the phones investigated (e.g., tongue tip is more anterior in dentalized /s/ productions than in correct /s/). A third research question examined if gradient PERCEPT Rating Scale scores predicted articulatory proximity to correct productions. Results: Estimated marginal means from linear mixed models supported 17 of 18 /r/ hypotheses, involving tongue tip and tongue body constrictions. For /s/, estimated marginal means from a second linear mixed model supported 7 of 15 hypotheses, particularly those related to the tongue tip. A third linear mixed model revealed that PERCEPT Rating Scale scores significantly predicted articulatory proximity of errored phones to correct productions. Conclusion: Inferred vocal tract variables differentiated category and magnitude of articulatory errors for /r/, and to a lesser extent for /s/, aligning with perceptual judgments. These findings support the clinical interpretability of speech inversion vocal tract variables and the PERCEPT Rating Scale in quantifying articulatory proximity to the target sound, particularly for /r/. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: This manuscript is in submission for publication. It has not yet been peer reviewed

arXiv:2305.19090 [pdf]

doi 10.21437/Interspeech.2023-1882

Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders

Authors: Nina R Benway, Jonathan L Preston

Abstract: Because lab accuracy of clinical speech technology systems may be overoptimistic, clinical validation is vital to demonstrate system reproducibility - in this case, the ability of the PERCEPT-R Classifier to predict clinician judgment of American English /r/ during ChainingAI motor-based speech sound disorder intervention. All five participants experienced statistically-significant improvement in… ▽ More Because lab accuracy of clinical speech technology systems may be overoptimistic, clinical validation is vital to demonstrate system reproducibility - in this case, the ability of the PERCEPT-R Classifier to predict clinician judgment of American English /r/ during ChainingAI motor-based speech sound disorder intervention. All five participants experienced statistically-significant improvement in untreated words following 10 sessions of combined human-ChainingAI treatment. These gains, despite a wide range of PERCEPT-human and human-human (F1-score) agreement, raise questions about best measuring classification performance for clinical speech that may be perceptually ambiguous. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal ref: Proc. INTERSPEECH 2023, 4558-4562

arXiv:2305.16111 [pdf]

doi 10.21437/Interspeech.2023-312

Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants

Authors: Nina R Benway, Jonathan L Preston, Asif Salekin, Yi Xiao, Harshit Sharma, Tara McAllister

Abstract: Mispronunciation detection tools could increase treatment access for speech sound disorders impacting, e.g., /r/. We show age-and-sex normalized formant estimation outperforms cepstral representation for detection of fully rhotic vs. derhotic /r/ in the PERCEPT-R Corpus. Gated recurrent neural networks trained on this feature set achieve a mean test participant-specific F1-score =.81 (σx=.10, med… ▽ More Mispronunciation detection tools could increase treatment access for speech sound disorders impacting, e.g., /r/. We show age-and-sex normalized formant estimation outperforms cepstral representation for detection of fully rhotic vs. derhotic /r/ in the PERCEPT-R Corpus. Gated recurrent neural networks trained on this feature set achieve a mean test participant-specific F1-score =.81 (σx=.10, med = .83, n = 48), with post hoc modeling showing no significant effect of child age or sex. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal ref: Proc. INTERSPEECH 2023, 4563-4567

arXiv:2305.16085 [pdf]

doi 10.21437/Interspeech.2023-1924

Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders

Authors: Nina R Benway, Yashish M Siriwardena, Jonathan L Preston, Elaine Hitchcock, Tara McAllister, Carol Espy-Wilson

Abstract: Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech. In the absence of an articulator… ▽ More Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech. In the absence of an articulatory dataset in children with rhotic speech sound disorders, we show that classifiers trained on tract variables from acoustic-to-articulatory speech inversion meet or exceed the performance of state-of-the-art features when predicting clinician judgment of rhoticity. Index Terms: rhotic, speech sound disorder, mispronunciation detection △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: *denotes equal contribution. To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

Journal ref: Proc. INTERSPEECH 2023, 4568-4572

Showing 1–4 of 4 results for author: Preston, J L