Search | arXiv e-print repository

Expanding and Analyzing ODAQ -- the Open Dataset of Audio Quality

Authors: Sascha Dick, Christoph Thompson, Chih-Wei Wu, Matteo Torcoli, Pablo Delgado, Phillip A. Williams, Emanuel Habets

Abstract: The Open Dataset of Audio Quality (ODAQ) was recently introduced to address the scarcity of openly available audio datasets with corresponding subjective quality scores. The dataset, released under permissive licenses, comprises audio material processed using six different signal processing methods operating at five quality levels, along with corresponding subjective test results. To expand the da… ▽ More The Open Dataset of Audio Quality (ODAQ) was recently introduced to address the scarcity of openly available audio datasets with corresponding subjective quality scores. The dataset, released under permissive licenses, comprises audio material processed using six different signal processing methods operating at five quality levels, along with corresponding subjective test results. To expand the dataset, we provided listener training to university students to conduct further subjective tests and obtained results consistent with previous expert listeners. We also showed how different training approaches affect the use of absolute scales and anchors. The expanded dataset now comprises results from three international laboratories providing a total of 42 listeners and 10080 subjective scores. This paper provides the details of the expansion and an in-depth analysis. As part of this analysis, we initiate the use of ODAQ as a benchmark to evaluate objective audio quality metrics in their ability to predict subjective scores △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: Accepted for presentation at the Audio Engineering Society (AES) 157th Convention, October 2024, New York, USA

arXiv:2503.24063 [pdf, other]

A robot-assisted pipeline to rapidly scan 1.7 million historical aerial photographs

Authors: Sheila Masson, Alan Potts, Allan Williams, Steve Berggreen, Kevin McLaren, Sam Martin, Eugenio Noda, Nicklas Nordfors, Nic Ruecroft, Hannah Druckenmiller, Solomon Hsiang, Andreas Madestam, Anna Tompsett

Abstract: During the 20th Century, aerial surveys captured hundreds of millions of high-resolution photographs of the earth's surface. These images, the precursors to modern satellite imagery, represent an extraordinary visual record of the environmental and social upheavals of the 20th Century. However, most of these images currently languish in physical archives where retrieval is difficult and costly. Di… ▽ More During the 20th Century, aerial surveys captured hundreds of millions of high-resolution photographs of the earth's surface. These images, the precursors to modern satellite imagery, represent an extraordinary visual record of the environmental and social upheavals of the 20th Century. However, most of these images currently languish in physical archives where retrieval is difficult and costly. Digitization could revolutionize access, but manual scanning is slow and expensive. Here, we describe and validate a novel robot-assisted pipeline that increases worker productivity in scanning 30-fold, applied at scale to digitize an archive of 1.7 million historical aerial photographs from 65 countries. △ Less

Submitted 8 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.14304 [pdf, other]

RoMedFormer: A Rotary-Embedding Transformer Foundation Model for 3D Genito-Pelvic Structure Segmentation in MRI and CT

Authors: Yuheng Li, Mingzhe Hu, Richard L. J. Qiu, Maria Thor, Andre Williams, Deborah Marshall, Xiaofeng Yang

Abstract: Deep learning-based segmentation of genito-pelvic structures in MRI and CT is crucial for applications such as radiation therapy, surgical planning, and disease diagnosis. However, existing segmentation models often struggle with generalizability across imaging modalities, and anatomical variations. In this work, we propose RoMedFormer, a rotary-embedding transformer-based foundation model designe… ▽ More Deep learning-based segmentation of genito-pelvic structures in MRI and CT is crucial for applications such as radiation therapy, surgical planning, and disease diagnosis. However, existing segmentation models often struggle with generalizability across imaging modalities, and anatomical variations. In this work, we propose RoMedFormer, a rotary-embedding transformer-based foundation model designed for 3D female genito-pelvic structure segmentation in both MRI and CT. RoMedFormer leverages self-supervised learning and rotary positional embeddings to enhance spatial feature representation and capture long-range dependencies in 3D medical data. We pre-train our model using a diverse dataset of 3D MRI and CT scans and fine-tune it for downstream segmentation tasks. Experimental results demonstrate that RoMedFormer achieves superior performance segmenting genito-pelvic organs. Our findings highlight the potential of transformer-based architectures in medical image segmentation and pave the way for more transferable segmentation frameworks. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2412.08983 [pdf, other]

An Event-Triggered Framework for Trust-Mediated Human-Autonomy Interaction

Authors: Daniel A. Williams, Airlie Chapman, Chris Manzie

Abstract: Inspired by the increased cooperation between humans and autonomous systems, we present a new hybrid systems framework capturing the interconnected dynamics underlying these interactions. The framework accommodates models arising from both the autonomous systems and cognitive psychology literature in order to represent key elements such as human trust in the autonomous system. The intermittent nat… ▽ More Inspired by the increased cooperation between humans and autonomous systems, we present a new hybrid systems framework capturing the interconnected dynamics underlying these interactions. The framework accommodates models arising from both the autonomous systems and cognitive psychology literature in order to represent key elements such as human trust in the autonomous system. The intermittent nature of human interactions are incorporated by asynchronous event-triggered sampling at the framework's human-autonomous system interfaces. We illustrate important considerations for tuning framework parameters by investigating a practical application to an autonomous robotic swarm search and rescue scenario. In this way, we demonstrate how the proposed framework may assist in designing more efficient and effective interactions between humans and autonomous systems. △ Less

Submitted 12 December, 2024; originally announced December 2024.

arXiv:2411.08135 [pdf, other]

On the Role of Speech Data in Reducing Toxicity Detection Bias

Authors: Samuel J. Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà

Abstract: Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage thes… ▽ More Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction. △ Less

Submitted 16 May, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

Comments: Accepted at NAACL 2025

Journal ref: In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (Volume 1), pages 1454-1468

arXiv:2411.00697 [pdf]

All-Optical Excitable Spiking Laser Neuron in InP Generic Integration Technology

Authors: Lukas Puts, Daan Lenstra, Kevin A. Williams, Weiming Yao

Abstract: Brain-inspired, neuromorphic devices implemented in integrated photonic hardware have attracted significant interest recently as part of efforts towards novel non-von Neumann computing paradigms that make use of the low loss, high-speed and parallel operations in optics. An all-optical spiking laser neuron fabricated on the indium-phosphide generic integration technology platform may be a practica… ▽ More Brain-inspired, neuromorphic devices implemented in integrated photonic hardware have attracted significant interest recently as part of efforts towards novel non-von Neumann computing paradigms that make use of the low loss, high-speed and parallel operations in optics. An all-optical spiking laser neuron fabricated on the indium-phosphide generic integration technology platform may be a practical alternative to other semi-integrated photonic and electronic-based spiking neuron implementations. Owing to the large number of predefined building blocks, a plethora of applications have benefitted already from the generic integration process. This technology platform has now been utilised for the first time to demonstrate an all-optical spiking laser neuron. This paper present and discusses the design and measurement of the ultra-fast and rich spiking dynamics in these devices. We show that under external pulse injection and operated slightly below the lasing threshold, the laser neuron exhibits an excitable mode, in addition to a self-spiking mode far above the threshold when no pulse is injected. In the excitable mode, the required injected pulse energy is much lower than that of the generated excited response, meeting an important requirement for neuron cascadability. In addition, we investigate excitability at different injection wavelengths below the lasing wavelength, as well as the ultra-fast temporal properties of the spiking response. All of the discussed characteristics point to the laser neuron being an important candidate for scaling up to future fully-connected, multi-wavelength all-optical photonic spiking neural networks in indium-phosphide generic integration technology. △ Less

Submitted 1 November, 2024; originally announced November 2024.

Comments: 21 pages, 13 figures

arXiv:2401.00197 [pdf, other]

ODAQ: Open Dataset of Audio Quality

Authors: Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international labora… ▽ More Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international laboratories. ODAQ contains 240 audio samples and corresponding quality scores. Each audio sample is rated by 26 listeners. The audio samples are stereo audio signals sampled at 44.1 or 48 kHz and are processed by a total of 6 method classes, each operating at different quality levels. The processing method classes are designed to generate quality degradations possibly encountered during audio coding and source separation, and the quality levels for each method class span the entire quality range. The diversity of the processing methods, the large span of quality levels, the high sampling frequency, and the pool of international listeners make ODAQ particularly suited for further research into subjective and objective audio quality. The dataset is released with permissive licenses, and the software used to conduct the listening test is also made publicly available. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

arXiv:2312.14069 [pdf, other]

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Authors: Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux

Abstract: We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a… ▽ More We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level. △ Less

Submitted 14 October, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted at EMNLP 2024 (Main)

arXiv:2309.02539 [pdf, other]

doi 10.1109/OJSP.2023.3339428

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem. △ Less

Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

Journal ref: IEEE Open Journal of Signal Processing, vol. 5, pp. 73-81, 2024

arXiv:2211.13846 [pdf, other]

Asynchronous Event-Triggered Control for Non-Linear Systems

Authors: Daniel A. Williams, Airlie Chapman, Chris Manzie

Abstract: With the increasing ubiquity of networked control systems, various strategies for sampling constituent subsystems' outputs have emerged. In contrast with periodic sampling, event-triggered control provides a way to efficiently sample a subsystem and conserve network resource usage, by triggering an update only when a state-dependent error threshold is satisfied. Herein we describe a novel scheme f… ▽ More With the increasing ubiquity of networked control systems, various strategies for sampling constituent subsystems' outputs have emerged. In contrast with periodic sampling, event-triggered control provides a way to efficiently sample a subsystem and conserve network resource usage, by triggering an update only when a state-dependent error threshold is satisfied. Herein we describe a novel scheme for asynchronous event-triggered measurement and control (ETC) of a nonlinear plant using sampler subsystems with hybrid dynamics. We extend existing ETC literature by adopting a more general representation of the sampler subsystem dynamics that do not require trigger periodicity or simultaneity, thus accommodating different sampling schemes for both synchronous and asynchronous ETC applications. We ensure that the plant and controller trigger rules are not susceptible to Zeno behavior by employing auxiliary timer variables in conjunction with state-dependent error thresholds. We conclude with a numerical example in order to illustrate important practical considerations when applying such schemes. △ Less

Submitted 5 April, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

arXiv:2203.14437 [pdf, other]

Individual and Team Trust Preferences for Robotic Swarm Behaviors

Authors: Elena M Vella, Daniel A Williams, Airlie Chapman, Chris Manzie

Abstract: Trust between humans and multi-agent robotic swarms may be analyzed using human preferences. These preferences are expressed by an individual as a sequence of ordered comparisons between pairs of swarm behaviors. An individual's preference graph can be formed from this sequence. In addition, swarm behaviors may be mapped to a feature vector space. We formulate a linear optimization problem to loca… ▽ More Trust between humans and multi-agent robotic swarms may be analyzed using human preferences. These preferences are expressed by an individual as a sequence of ordered comparisons between pairs of swarm behaviors. An individual's preference graph can be formed from this sequence. In addition, swarm behaviors may be mapped to a feature vector space. We formulate a linear optimization problem to locate a trusted behavior in the feature space. Extending to human teams, we define a novel distinctiveness metric using a sparse optimization formulation to cluster similar individuals from a collection of individuals' labeled pairwise preferences. The case of anonymized unlabeled pairwise preferences is also examined to find the average trusted behavior and minimum covariance bound, providing insights into group cohesion. A user study was conducted, with results suggesting that individuals with similar trust profiles can be clustered to facilitate human-swarm teaming. △ Less

Submitted 27 March, 2022; originally announced March 2022.

arXiv:2110.00044 [pdf, other]

doi 10.1109/TAES.2022.3218496

Trajectory Planning with Deep Reinforcement Learning in High-Level Action Spaces

Authors: Kyle R. Williams, Rachel Schlossman, Daniel Whitten, Joe Ingram, Srideep Musuvathy, Anirudh Patel, James Pagan, Kyle A. Williams, Sam Green, Anirban Mazumdar, Julie Parish

Abstract: This paper presents a technique for trajectory planning based on continuously parameterized high-level actions (motion primitives) of variable duration. This technique leverages deep reinforcement learning (Deep RL) to formulate a policy which is suitable for real-time implementation. There is no separation of motion primitive generation and trajectory planning: each individual short-horizon motio… ▽ More This paper presents a technique for trajectory planning based on continuously parameterized high-level actions (motion primitives) of variable duration. This technique leverages deep reinforcement learning (Deep RL) to formulate a policy which is suitable for real-time implementation. There is no separation of motion primitive generation and trajectory planning: each individual short-horizon motion is formed during the Deep RL training to achieve the full-horizon objective. Effectiveness of the technique is demonstrated numerically on a well-studied trajectory generation problem and a planning problem on a known obstacle-rich map. This paper also develops a new loss function term for policy-gradient-based Deep RL, which is analogous to an anti-windup mechanism in feedback control. We demonstrate the inclusion of this new term in the underlying optimization increases the average policy return in our numerical example. △ Less

Submitted 12 August, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Aerospace and Electronic Systems, 59 (2023) 2513-2529

arXiv:2001.07739 [pdf, ps, other]

EMOPAIN Challenge 2020: Multimodal Pain Evaluation from Facial and Bodily Expressions

Authors: Joy O. Egede, Siyang Song, Temitayo A. Olugbade, Chongyang Wang, Amanda Williams, Hongying Meng, Min Aung, Nicholas D. Lane, Michel Valstar, Nadia Bianchi-Berthouze

Abstract: The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours. The objective of the challenge is to promote research in the development of assistive technologies… ▽ More The EmoPain 2020 Challenge is the first international competition aimed at creating a uniform platform for the comparison of machine learning and multimedia processing methods of automatic chronic pain assessment from human expressive behaviour, and also the identification of pain-related behaviours. The objective of the challenge is to promote research in the development of assistive technologies that help improve the quality of life for people with chronic pain via real-time monitoring and feedback to help manage their condition and remain physically active. The challenge also aims to encourage the use of the relatively underutilised, albeit vital bodily expression signals for automatic pain and pain-related emotion recognition. This paper presents a description of the challenge, competition guidelines, bench-marking dataset, and the baseline systems' architecture and performance on the three sub-tasks: pain estimation from facial expressions, pain recognition from multimodal movement, and protective movement behaviour detection. △ Less

Submitted 9 March, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

Comments: 8 pages

Showing 1–13 of 13 results for author: Williams, A