-
Pairwise Evaluation of Accent Similarity in Speech Synthesis
Authors:
Jinzuomu Zhong,
Suyuan Liu,
Dan Wells,
Korin Richmond
Abstract:
Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves prov…
▽ More
Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves providing listeners with transcriptions, having them highlight perceived accent differences, and implementing meticulous screening for reliability. Objectively, we utilise pronunciation-related metrics, based on distances between vowel formants and phonetic posteriorgrams, to evaluate accent generation. Comparative experiments reveal that these metrics, alongside accent similarity, speaker similarity, and Mel Cepstral Distortion, can be used. Moreover, our findings underscore significant limitations of common metrics like Word Error Rate in assessing underrepresented accents.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
Authors:
Feng Chen,
Ilias Stogiannidis,
Andrew Wood,
Danilo Bueno,
Dominic Williams,
Fraser Macfarlane,
Bruce Grieve,
Darren Wells,
Jonathan A. Atkinson,
Malcolm J. Hawkesford,
Stephen A. Rolfe,
Tracy Lawson,
Tony Pridmore,
Mario Valerio Giuffrida,
Sotirios A. Tsaftaris
Abstract:
Plant phenotyping increasingly relies on (semi-)automated image-based analysis workflows to improve its accuracy and scalability. However, many existing solutions remain overly complex, difficult to reimplement and maintain, and pose high barriers for users without substantial computational expertise. To address these challenges, we introduce PhenoAssistant: a pioneering AI-driven system that stre…
▽ More
Plant phenotyping increasingly relies on (semi-)automated image-based analysis workflows to improve its accuracy and scalability. However, many existing solutions remain overly complex, difficult to reimplement and maintain, and pose high barriers for users without substantial computational expertise. To address these challenges, we introduce PhenoAssistant: a pioneering AI-driven system that streamlines plant phenotyping via intuitive natural language interaction. PhenoAssistant leverages a large language model to orchestrate a curated toolkit supporting tasks including automated phenotype extraction, data visualisation and automated model training. We validate PhenoAssistant through several representative case studies and a set of evaluation tasks. By significantly lowering technical hurdles, PhenoAssistant underscores the promise of AI-driven methodologies to democratising AI adoption in plant biology.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Exercise Specialists Evaluation of Robot-led Physical Therapy for People with Parkinsons Disease
Authors:
Matthew Lamsey,
Meredith D. Wells,
Lydia Hamby,
Paige Scanlon,
Rouida Siddiqui,
You Liang Tan,
Jerry Feldman,
Charles C. Kemp,
Madeleine E. Hackney
Abstract:
Robot-led physical therapy (PT) offers a promising avenue to enhance the care provided by clinical exercise specialists (ES) and physical and occupational therapists to improve patients' adherence to prescribed exercises outside of a clinic, such as at home. Collaborative efforts among roboticists, ES, physical and occupational therapists, and patients are essential for developing interactive, per…
▽ More
Robot-led physical therapy (PT) offers a promising avenue to enhance the care provided by clinical exercise specialists (ES) and physical and occupational therapists to improve patients' adherence to prescribed exercises outside of a clinic, such as at home. Collaborative efforts among roboticists, ES, physical and occupational therapists, and patients are essential for developing interactive, personalized exercise systems that meet each stakeholder's needs. We conducted a user study in which 11 ES evaluated a novel robot-led PT system for people with Parkinson's disease (PD), introduced in [1], focusing on the system's perceived efficacy and acceptance. Utilizing a mixed-methods approach, including technology acceptance questionnaires, task load questionnaires, and semi-structured interviews, we gathered comprehensive insights into ES perspectives and experiences after interacting with the system. Findings reveal a broadly positive reception, which highlights the system's capacity to augment traditional PT for PD, enhance patient engagement, and ensure consistent exercise support. We also identified two key areas for improvement: incorporating more human-like feedback systems and increasing the robot's ease of use. This research emphasizes the value of incorporating robotic aids into PT for PD, offering insights that can guide the development of more effective and user-friendly rehabilitation technologies.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Authors:
Cheng Gong,
Erica Cooper,
Xin Wang,
Chunyu Qiang,
Mengzhe Geng,
Dan Wells,
Longbiao Wang,
Jianwu Dang,
Marc Tessier,
Aidan Pine,
Korin Richmond,
Junichi Yamagishi
Abstract:
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on…
▽ More
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Authors:
Cheng Gong,
Xin Wang,
Erica Cooper,
Dan Wells,
Longbiao Wang,
Jianwu Dang,
Korin Richmond,
Junichi Yamagishi
Abstract:
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voices, but there is growing interest in developing systems that can synthesize voices for new…
▽ More
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voices, but there is growing interest in developing systems that can synthesize voices for new speakers using only a few seconds of their speech. This paper presents ZMM-TTS, a multilingual and multispeaker framework utilizing quantized latent speech representations from a large-scale, pre-trained, self-supervised model. Our paper combines text-based and speech-based self-supervised learning models for multilingual speech synthesis. Our proposed model has zero-shot generalization ability not only for unseen speakers but also for unseen languages. We have conducted comprehensive subjective and objective evaluations through a series of experiments. Our model has proven effective in terms of speech naturalness and similarity for both seen and unseen speakers in six high-resource languages. We also tested the efficiency of our method on two hypothetically low-resource languages. The results are promising, indicating that our proposed approach can synthesize audio that is intelligible and has a high degree of similarity to the target speaker's voice, even without any training data for the new, unseen language.
△ Less
Submitted 26 August, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Stretch with Stretch: Physical Therapy Exercise Games Led by a Mobile Manipulator
Authors:
Matthew Lamsey,
You Liang Tan,
Meredith D. Wells,
Madeline Beatty,
Zexuan Liu,
Arjun Majumdar,
Kendra Washington,
Jerry Feldman,
Naveen Kuppuswamy,
Elizabeth Nguyen,
Arielle Wallenstein,
Madeleine E. Hackney,
Charles C. Kemp
Abstract:
Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use…
▽ More
Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use in homes and clinics. We present Stretch with Stretch (SWS), a novel robotic system for leading stretching exercise games for older adults with PD. SWS consists of a compact and lightweight mobile manipulator (Hello Robot Stretch RE1) that visually and verbally guides users through PT exercises. The robot's soft end effector serves as a target that users repetitively reach towards and press with a hand, foot, or knee. For each exercise, target locations are customized for the individual via a visually estimated kinematic model, a haptically estimated range of motion, and the person's exercise performance. The system includes sound effects and verbal feedback from the robot to keep users engaged throughout a session and augment physical exercise with cognitive exercise. We conducted a user study for which people with PD (n=10) performed 6 exercises with the system. Participants perceived the SWS to be useful and easy to use. They also reported mild to moderate perceived exertion (RPE).
△ Less
Submitted 21 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Comparing Machine Learning and Interpolation Methods for Loop-Level Calculations
Authors:
Ibrahim Chahrour,
James D. Wells
Abstract:
The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but ofte…
▽ More
The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but often the motivation of using one method over another is lacking. Comparing these methods is typically highly dependent on the problem at hand, so we specify to the case where we can evaluate the function a large number of times, after which quick and accurate evaluation can take place. We consider four interpolation and three machine learning techniques and compare their performance on three toy functions, the four-point scalar Passarino-Veltman $D_0$ function, and the two-loop self-energy master integral $M$. We find that in low dimensions ($d = 3$), traditional interpolation techniques like the Radial Basis Function perform very well, but in higher dimensions ($d=5, 6, 9$) we find that multi-layer perceptrons (a.k.a neural networks) do not suffer as much from the curse of dimensionality and provide the fastest and most accurate predictions.
△ Less
Submitted 28 March, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
The deal.II finite element library: design, features, and insights
Authors:
Daniel Arndt,
Wolfgang Bangerth,
Denis Davydov,
Timo Heister,
Luca Heltai,
Martin Kronbichler,
Matthias Maier,
Jean-Paul Pelteret,
Bruno Turcksin,
David Wells
Abstract:
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diver…
▽ More
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades.
△ Less
Submitted 17 February, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.