-
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
Authors:
Tien-Hong Lo,
Meng-Ting Tsai,
Berlin Chen
Abstract:
Second language (L2) learners can improve their pronunciation by imitating golden speech, especially when the speech that aligns with their respective speech characteristics. This study explores the hypothesis that learner-specific golden speech generated with zero-shot text-to-speech (ZS-TTS) techniques can be harnessed as an effective metric for measuring the pronunciation proficiency of L2 lear…
▽ More
Second language (L2) learners can improve their pronunciation by imitating golden speech, especially when the speech that aligns with their respective speech characteristics. This study explores the hypothesis that learner-specific golden speech generated with zero-shot text-to-speech (ZS-TTS) techniques can be harnessed as an effective metric for measuring the pronunciation proficiency of L2 learners. Building on this exploration, the contributions of this study are at least two-fold: 1) design and development of a systematic framework for assessing the ability of a synthesis model to generate golden speech, and 2) in-depth investigations of the effectiveness of using golden speech in automatic pronunciation assessment (APA). Comprehensive experiments conducted on the L2-ARCTIC and Speechocean762 benchmark datasets suggest that our proposed modeling can yield significant performance improvements with respect to various assessment metrics in relation to some prior arts. To our knowledge, this study is the first to explore the role of golden speech in both ZS-TTS and APA, offering a promising regime for computer-assisted pronunciation training (CAPT).
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Angle-of-Arrival Estimation of Narrow Gaussian Beams for Mobile FSO Platforms
Authors:
Ming-Cheng Tsai,
Muhammad Salman Bashir,
Mohamed-Slim Alouini
Abstract:
Due to the narrow beamwidths of laser Gaussian beams, accurate tracking of laser beam's angle-of-arrival is an important problem in mobile free-space optical communications. In most optical receivers today, fine tracking of angle-of-arrival involves estimating the location of the focused beam spot projected onto a focal plane array. However, for very thin Gaussian beams, both the location as well…
▽ More
Due to the narrow beamwidths of laser Gaussian beams, accurate tracking of laser beam's angle-of-arrival is an important problem in mobile free-space optical communications. In most optical receivers today, fine tracking of angle-of-arrival involves estimating the location of the focused beam spot projected onto a focal plane array. However, for very thin Gaussian beams, both the location as well as the energy of the spot varies considerably with the variation of angle-of-arrival. In this study, we have analyzed the relationship between the angle-of-arrival and the energy of laser spot on the focal plane. We then exploited this relationship to enhance the angle-of-arrival estimation performance of our proposed receiver that takes into account both the location as well as the energy of the laser spot while estimating the angle-of-arrival. The derived Cramer-Rao bounds indicate that the system performance can be enhanced significantly for narrow Gaussian beams when both the spot location and energy are exploited for angle-of-arrival estimation.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility
Authors:
Hsin-Tien Chiang,
Kuo-Hsuan Hung,
Szu-Wei Fu,
Heng-Cheng Kuo,
Ming-Hsueh Tsai,
Yu Tsao
Abstract:
Subjective tests are the gold standard for evaluating speech quality and intelligibility; however, they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measur…
▽ More
Subjective tests are the gold standard for evaluating speech quality and intelligibility; however, they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measures are proposed that combine current objective measures using deep learning techniques to predict subjective quality and intelligibility. The proposed deep learning model reduces the amount of training data without significantly affecting prediction performance. We analyzed the deep learning model to understand how objective measures reflect subjective quality and intelligibility. We also explored the impact of including subjective speech quality ratings on speech intelligibility prediction. Our findings offer valuable insights into the relationship between objective measures and human perceptions.
△ Less
Submitted 10 October, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.