-
Perceptual Ratings Predict Speech Inversion Articulatory Kinematics in Childhood Speech Sound Disorders
Authors:
Nina R. Benway,
Saba Tabatabaee,
Dongliang Wang,
Benjamin Munson,
Jonathan L. Preston,
Carol Espy-Wilson
Abstract:
Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders.
Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were…
▽ More
Purpose: This study evaluated whether articulatory kinematics, inferred by Articulatory Phonology speech inversion neural networks, aligned with perceptual ratings of /r/ and /s/ in the speech of children with speech sound disorders.
Methods: Articulatory Phonology vocal tract variables were inferred for 5,961 utterances from 118 children and 3 adults, aged 2.25-45 years. Perceptual ratings were standardized using the novel 5-point PERCEPT Rating Scale and training protocol. Two research questions examined if the articulatory patterns of inferred vocal tract variables aligned with the perceptual error category for the phones investigated (e.g., tongue tip is more anterior in dentalized /s/ productions than in correct /s/). A third research question examined if gradient PERCEPT Rating Scale scores predicted articulatory proximity to correct productions.
Results: Estimated marginal means from linear mixed models supported 17 of 18 /r/ hypotheses, involving tongue tip and tongue body constrictions. For /s/, estimated marginal means from a second linear mixed model supported 7 of 15 hypotheses, particularly those related to the tongue tip. A third linear mixed model revealed that PERCEPT Rating Scale scores significantly predicted articulatory proximity of errored phones to correct productions.
Conclusion: Inferred vocal tract variables differentiated category and magnitude of articulatory errors for /r/, and to a lesser extent for /s/, aligning with perceptual judgments. These findings support the clinical interpretability of speech inversion vocal tract variables and the PERCEPT Rating Scale in quantifying articulatory proximity to the target sound, particularly for /r/.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Enhancing Acoustic-to-Articulatory Speech Inversion by Incorporating Nasality
Authors:
Saba Tabatabaee,
Suzanne Boyce,
Liran Oren,
Mark Tiede,
Carol Espy-Wilson
Abstract:
Speech is produced through the coordination of vocal tract constricting organs: lips, tongue, velum, and glottis. Previous works developed Speech Inversion (SI) systems to recover acoustic-to-articulatory mappings for lip and tongue constrictions, called oral tract variables (TVs), which were later enhanced by including source information (periodic and aperiodic energies, and F0 frequency) as prox…
▽ More
Speech is produced through the coordination of vocal tract constricting organs: lips, tongue, velum, and glottis. Previous works developed Speech Inversion (SI) systems to recover acoustic-to-articulatory mappings for lip and tongue constrictions, called oral tract variables (TVs), which were later enhanced by including source information (periodic and aperiodic energies, and F0 frequency) as proxies for glottal control. Comparison of the nasometric measures with high-speed nasopharyngoscopy showed that nasalance can serve as ground truth, and that an SI system trained with it reliably recovers velum movement patterns for American English speakers. Here, two SI training approaches are compared: baseline models that estimate oral TVs and nasalance independently, and a synergistic model that combines oral TVs and source features with nasalance. The synergistic model shows relative improvements of 5% in oral TVs estimation and 9% in nasalance estimation compared to the baseline models.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments
Authors:
Saba Tabatabaee,
Jing Liu,
Carol Espy-Wilson
Abstract:
Creating Speaker Verification (SV) systems for classroom settings that are robust to classroom noises such as babble noise is crucial for the development of AI tools that assist educational environments. In this work, we study the efficacy of finetuning with augmented children datasets to adapt the x-vector and ECAPA-TDNN to classroom environments. We demonstrate that finetuning with augmented chi…
▽ More
Creating Speaker Verification (SV) systems for classroom settings that are robust to classroom noises such as babble noise is crucial for the development of AI tools that assist educational environments. In this work, we study the efficacy of finetuning with augmented children datasets to adapt the x-vector and ECAPA-TDNN to classroom environments. We demonstrate that finetuning with augmented children's datasets is powerful in that regard and reduces the Equal Error Rate (EER) of x-vector and ECAPA-TDNN models for both classroom datasets and children speech datasets. Notably, this method reduces EER of the ECAPA-TDNN model on average by half (a 5 % improvement) for classrooms in the MPT dataset compared to the ECAPA-TDNN baseline model. The x-vector model shows an 8 % average improvement for classrooms in the NCTE dataset compared to its baseline.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Joint User Association and UAV Location Optimization for Two-Tired Visible Light Communication Networks
Authors:
Alireza Qazavi,
Foroogh S Tabataba,
Mehdi Naderi Soorki
Abstract:
In this paper, an unmanned aerial vehicle (UAVs)-assisted visible light communication (VLC) has been considered which has two tiers: UAV-to-centroid and device-to-device (D2D). In the UAV-to-centroid tier, each UAV can simultaneously provide communications and illumination for the centroids of the ground users over VLC links. In the D2D tier, the centroids retransmit received data from UAV over D2…
▽ More
In this paper, an unmanned aerial vehicle (UAVs)-assisted visible light communication (VLC) has been considered which has two tiers: UAV-to-centroid and device-to-device (D2D). In the UAV-to-centroid tier, each UAV can simultaneously provide communications and illumination for the centroids of the ground users over VLC links. In the D2D tier, the centroids retransmit received data from UAV over D2D links to the cluster members. For network, the optimization problem of joint user association and deployment location of UAVs is formulated to maximize the received data, satisfy illumination constraint, and the user cluster size. An iterative algorithm is first proposed to transform the optimization problem into a series of two interdependent sub problems. Following the smallest enclosing disk theorem, a random incremental construction method is designed to find the optimal UAV locations. Then, inspired by unsupervised learning method, a clustering algorithm to find a suboptimal user association is proposed. Our simulation results show that the proposed scheme on average guarantees the users brightness 0.77 lux more than their threshold requirements. Moreover, the received bitrate plus number of D2D connected users under our proposed method is 50.69% more than the scenario in which we have RF Link instead of VLC link and do not optimize UAV location.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Dynamic Reflections: Optimizing Energy Efficiency in Multi-IRS Empowered Green Networks
Authors:
Alireza Qazavi Khorasgani,
Foroogh S. Tabataba,
Mehdi Naderi Soorki,
Mohammad Sadegh Fazel
Abstract:
Intelligent Reflecting Surface (IRS) technology is revolutionizing wireless communications by shifting from channel adaptation to a responsive wireless environment. This paper introduces a multi-IRS assisted millimeter wave (mm-wave) system, allowing intelligent on/off control of individual IRS elements. Our objective is to optimize energy efficiency under Quality of Service (QoS) constraints. We…
▽ More
Intelligent Reflecting Surface (IRS) technology is revolutionizing wireless communications by shifting from channel adaptation to a responsive wireless environment. This paper introduces a multi-IRS assisted millimeter wave (mm-wave) system, allowing intelligent on/off control of individual IRS elements. Our objective is to optimize energy efficiency under Quality of Service (QoS) constraints. We propose an algorithm where the Access Point (AP) adjusts transmit beamforming, and IRS elements control phaseshifts and on/off status until convergence. Utilizing a fractional programming (FP) approach for AP beamforming and Simulated Annealing (SA) for IRS subproblems, we achieve a suboptimum optimal solution. A modified nested FP approach addresses the beamforming subproblem. Performance analysis in a practical scenario reveals a significant up to 132.16\% improvement in energy efficiency compared to scenarios with randomly selected IRS on/off status. This highlights the efficacy of our algorithm in enhancing mm-wave communication systems' overall efficiency.
△ Less
Submitted 2 January, 2024; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Resource Allocation for mmWave-NOMA Communication through Multiple Access Points Considering Human Blockages
Authors:
Foad Barghikar,
Foroogh S. Tabataba,
Mehdi Naderi Soorki
Abstract:
In this paper, a new framework for optimizing the resource allocation in a millimeter-wave-non-orthogonal multiple access (mmWave-NOMA) communication for crowded venues is proposed. MmWave communications suffer from severe blockage caused by obstacles such as the human body, especially in a dense region. Thus, a detailed method for modeling the blockage events in the in-venue scenarios is introduc…
▽ More
In this paper, a new framework for optimizing the resource allocation in a millimeter-wave-non-orthogonal multiple access (mmWave-NOMA) communication for crowded venues is proposed. MmWave communications suffer from severe blockage caused by obstacles such as the human body, especially in a dense region. Thus, a detailed method for modeling the blockage events in the in-venue scenarios is introduced. Also, several mmWave access points are considered in different locations. To maximize the network sum rate, the resource allocation problem is formulated as a mixed integer non-linear programming, which is NP-hard in general. Hence, a three-stage low-complex solution is proposed to solve the problem. At first, a user scheduling algorithm, i.e., modified worst connection swapping (MWCS), is proposed. Secondly, the antenna allocation problem is solved using the simulated annealing algorithm. Afterward, to maximize the network sum rate and guarantee the quality of service constraints, a non-convex power allocation optimization problem is solved by adopting the difference of convex programming approach. The simulation results show that, under the blockage effect, the proposed mmWave-NOMA scheme performs on average 23% better than the conventional mmWave-orthogonal multiple access scheme. Moreover, the performance of proposed solution is 11.4% lower than the optimal value while reducing complexity by 96%.
△ Less
Submitted 11 October, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.