Search | arXiv e-print repository

Pairwise Evaluation of Accent Similarity in Speech Synthesis

Authors: Jinzuomu Zhong, Suyuan Liu, Dan Wells, Korin Richmond

Abstract: Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves prov… ▽ More Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves providing listeners with transcriptions, having them highlight perceived accent differences, and implementing meticulous screening for reliability. Objectively, we utilise pronunciation-related metrics, based on distances between vowel formants and phonetic posteriorgrams, to evaluate accent generation. Comparative experiments reveal that these metrics, alongside accent similarity, speaker similarity, and Mel Cepstral Distortion, can be used. Moreover, our findings underscore significant limitations of common metrics like Word Error Rate in assessing underrepresented accents. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: Accepted by INTERSPEECH 2025

arXiv:2504.19818 [pdf, other]

PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping

Authors: Feng Chen, Ilias Stogiannidis, Andrew Wood, Danilo Bueno, Dominic Williams, Fraser Macfarlane, Bruce Grieve, Darren Wells, Jonathan A. Atkinson, Malcolm J. Hawkesford, Stephen A. Rolfe, Tracy Lawson, Tony Pridmore, Mario Valerio Giuffrida, Sotirios A. Tsaftaris

Abstract: Plant phenotyping increasingly relies on (semi-)automated image-based analysis workflows to improve its accuracy and scalability. However, many existing solutions remain overly complex, difficult to reimplement and maintain, and pose high barriers for users without substantial computational expertise. To address these challenges, we introduce PhenoAssistant: a pioneering AI-driven system that stre… ▽ More Plant phenotyping increasingly relies on (semi-)automated image-based analysis workflows to improve its accuracy and scalability. However, many existing solutions remain overly complex, difficult to reimplement and maintain, and pose high barriers for users without substantial computational expertise. To address these challenges, we introduce PhenoAssistant: a pioneering AI-driven system that streamlines plant phenotyping via intuitive natural language interaction. PhenoAssistant leverages a large language model to orchestrate a curated toolkit supporting tasks including automated phenotype extraction, data visualisation and automated model training. We validate PhenoAssistant through several representative case studies and a set of evaluation tasks. By significantly lowering technical hurdles, PhenoAssistant underscores the promise of AI-driven methodologies to democratising AI adoption in plant biology. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2502.04635 [pdf, other]

Exercise Specialists Evaluation of Robot-led Physical Therapy for People with Parkinsons Disease

Authors: Matthew Lamsey, Meredith D. Wells, Lydia Hamby, Paige Scanlon, Rouida Siddiqui, You Liang Tan, Jerry Feldman, Charles C. Kemp, Madeleine E. Hackney

Abstract: Robot-led physical therapy (PT) offers a promising avenue to enhance the care provided by clinical exercise specialists (ES) and physical and occupational therapists to improve patients' adherence to prescribed exercises outside of a clinic, such as at home. Collaborative efforts among roboticists, ES, physical and occupational therapists, and patients are essential for developing interactive, per… ▽ More Robot-led physical therapy (PT) offers a promising avenue to enhance the care provided by clinical exercise specialists (ES) and physical and occupational therapists to improve patients' adherence to prescribed exercises outside of a clinic, such as at home. Collaborative efforts among roboticists, ES, physical and occupational therapists, and patients are essential for developing interactive, personalized exercise systems that meet each stakeholder's needs. We conducted a user study in which 11 ES evaluated a novel robot-led PT system for people with Parkinson's disease (PD), introduced in [1], focusing on the system's perceived efficacy and acceptance. Utilizing a mixed-methods approach, including technology acceptance questionnaires, task load questionnaires, and semi-structured interviews, we gathered comprehensive insights into ES perspectives and experiences after interacting with the system. Findings reveal a broadly positive reception, which highlights the system's capacity to augment traditional PT for PD, enhance patient engagement, and ensure consistent exercise support. We also identified two key areas for improvement: incorporating more human-like feedback systems and increasing the robot's ease of use. This research emphasizes the value of incorporating robotic aids into PT for PD, offering insights that can guide the development of more effective and user-friendly rehabilitation technologies. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: 11 pages, 4 figures

arXiv:2406.08911 [pdf, other]

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2312.14398 [pdf, other]

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Authors: Cheng Gong, Xin Wang, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang, Korin Richmond, Junichi Yamagishi

Abstract: Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voices, but there is growing interest in developing systems that can synthesize voices for new… ▽ More Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voices, but there is growing interest in developing systems that can synthesize voices for new speakers using only a few seconds of their speech. This paper presents ZMM-TTS, a multilingual and multispeaker framework utilizing quantized latent speech representations from a large-scale, pre-trained, self-supervised model. Our paper combines text-based and speech-based self-supervised learning models for multilingual speech synthesis. Our proposed model has zero-shot generalization ability not only for unseen speakers but also for unseen languages. We have conducted comprehensive subjective and objective evaluations through a series of experiments. Our model has proven effective in terms of speech naturalness and similarity for both seen and unseen speakers in six high-resource languages. We also tested the efficiency of our method on two hypothetically low-resource languages. The results are promising, indicating that our proposed approach can synthesize audio that is intelligible and has a high degree of similarity to the target speaker's voice, even without any training data for the new, unseen language. △ Less

Submitted 26 August, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE/ACM TASLP, 16 pages plus 1 page of bio and photos

arXiv:2312.13279 [pdf, other]

Stretch with Stretch: Physical Therapy Exercise Games Led by a Mobile Manipulator

Authors: Matthew Lamsey, You Liang Tan, Meredith D. Wells, Madeline Beatty, Zexuan Liu, Arjun Majumdar, Kendra Washington, Jerry Feldman, Naveen Kuppuswamy, Elizabeth Nguyen, Arielle Wallenstein, Madeleine E. Hackney, Charles C. Kemp

Abstract: Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use… ▽ More Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use in homes and clinics. We present Stretch with Stretch (SWS), a novel robotic system for leading stretching exercise games for older adults with PD. SWS consists of a compact and lightweight mobile manipulator (Hello Robot Stretch RE1) that visually and verbally guides users through PT exercises. The robot's soft end effector serves as a target that users repetitively reach towards and press with a hand, foot, or knee. For each exercise, target locations are customized for the individual via a visually estimated kinematic model, a haptically estimated range of motion, and the person's exercise performance. The system includes sound effects and verbal feedback from the robot to keep users engaged throughout a session and augment physical exercise with cognitive exercise. We conducted a user study for which people with PD (n=10) performed 6 exercises with the system. Participants perceived the SWS to be useful and easy to use. They also reported mild to moderate perceived exertion (RPE). △ Less

Submitted 21 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2111.14788 [pdf, other]

doi 10.21468/SciPostPhys.12.6.187

Comparing Machine Learning and Interpolation Methods for Loop-Level Calculations

Authors: Ibrahim Chahrour, James D. Wells

Abstract: The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but ofte… ▽ More The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but often the motivation of using one method over another is lacking. Comparing these methods is typically highly dependent on the problem at hand, so we specify to the case where we can evaluate the function a large number of times, after which quick and accurate evaluation can take place. We consider four interpolation and three machine learning techniques and compare their performance on three toy functions, the four-point scalar Passarino-Veltman $D_0$ function, and the two-loop self-energy master integral $M$. We find that in low dimensions ($d = 3$), traditional interpolation techniques like the Radial Basis Function perform very well, but in higher dimensions ($d=5, 6, 9$) we find that multi-layer perceptrons (a.k.a neural networks) do not suffer as much from the curse of dimensionality and provide the fastest and most accurate predictions. △ Less

Submitted 28 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 30 pages, 17 figures, v2:added a few references, v3: new title, added a few references

Journal ref: SciPost Phys. 12, 187 (2022)

arXiv:1910.13247 [pdf, other]

doi 10.1016/j.camwa.2020.02.022

The deal.II finite element library: design, features, and insights

Authors: Daniel Arndt, Wolfgang Bangerth, Denis Davydov, Timo Heister, Luca Heltai, Martin Kronbichler, Matthias Maier, Jean-Paul Pelteret, Bruno Turcksin, David Wells

Abstract: deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diver… ▽ More deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades. △ Less

Submitted 17 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: 36 pages, 3 figures

Journal ref: Computers {\&} Mathematics with Applications, 81: 407--422, 2021

Showing 1–8 of 8 results for author: Wells, D