-
Designing Neural Synthesizers for Low-Latency Interaction
Authors:
Franco Caspe,
Jordie Shier,
Mark Sandler,
Charalampos Saitis,
Andrew McPherson
Abstract:
Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we…
▽ More
Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we investigate the sources of latency and jitter typically found in interactive NAS models. We then apply this analysis to the task of timbre transfer using RAVE, a convolutional variational autoencoder for audio waveforms introduced by Caillon et al. in 2021. Finally, we present an iterative design approach for optimizing latency. This culminates with a model we call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. We implement it in a specialized inference framework for low-latency, real-time inference and present a proof-of-concept audio plugin compatible with audio signals from musical instruments. We expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.
△ Less
Submitted 11 April, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Robotically adjustable kinematics in a wrist-driven orthosis eases grasping across tasks
Authors:
Erin Y. Chang,
Andrew I. W. McPherson,
Hannah S. Stuart
Abstract:
Without finger function, people with C5-7 spinal cord injury (SCI) regularly utilize wrist extension to passively close the fingers and thumb together for grasping. Wearable assistive grasping devices often focus on this familiar wrist-driven technique to provide additional support and amplify grasp force. Despite recent research advances in modernizing these tools, people with SCI often abandon s…
▽ More
Without finger function, people with C5-7 spinal cord injury (SCI) regularly utilize wrist extension to passively close the fingers and thumb together for grasping. Wearable assistive grasping devices often focus on this familiar wrist-driven technique to provide additional support and amplify grasp force. Despite recent research advances in modernizing these tools, people with SCI often abandon such wearable assistive devices in the long term. We suspect that the wrist constraints imposed by such devices generate undesirable reach and grasp kinematics. Here we show that using continuous robotic motor assistance to give users more adaptability in their wrist posture prior to wrist-driven grasping reduces task difficulty and perceived exertion. Our results demonstrate that more free wrist mobility allows users to select comfortable and natural postures depending on task needs, which improves the versatility of the assistive grasping device for easier use across different hand poses in the arm's workspace. This behavior holds the potential to improve ease of use and desirability of future device designs through new modes of combining both body-power and robotic automation.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Real-time Timbre Remapping with Differentiable DSP
Authors:
Jordie Shier,
Charalampos Saitis,
Andrew Robertson,
Andrew McPherson
Abstract:
Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging…
▽ More
Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
FM Tone Transfer with Envelope Learning
Authors:
Franco Caspe,
Andrew McPherson,
Mark Sandler
Abstract:
Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversi…
▽ More
Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context.
In this work, we present a discussion on current Tone Transfer architectures for the task of controlling synthetic audio with musical instruments and discuss their challenges in allowing expressive performances. Next, we introduce Envelope Learning, a novel method for designing Tone Transfer architectures that map musical events using a training objective at the synthesis parameter level. Our technique can render note beginnings and endings accurately and for a variety of sounds; these are essential steps for improving musical articulation, phrasing, and sound diversity with Tone Transfer. Finally, we implement a VST plugin for real-time live use and discuss possibilities for improvement.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis
Authors:
Jordie Shier,
Franco Caspe,
Andrew Robertson,
Mark Sandler,
Charalampos Saitis,
Andrew McPherson
Abstract:
Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified sy…
▽ More
Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
Authors:
Ben Hayes,
Jordie Shier,
György Fazekas,
Andrew McPherson,
Charalampos Saitis
Abstract:
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including mu…
▽ More
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar
Authors:
Andrea Martelloni,
Andrew P McPherson,
Mathieu Barthet
Abstract:
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) percep…
▽ More
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform
Authors:
Teresa Pelinski,
Rodrigo Diaz,
Adán L. Benito Temprano,
Andrew McPherson
Abstract:
Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners i…
▽ More
Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners into physical audio computing; however, the need for cross-compilation environments and low-level software development tools for deploying embedded deep learning models imposes high entry barriers on non-expert users. We present a pipeline for deploying neural networks in the Bela embedded hardware platform. In our pipeline, we include a tool to record a multichannel dataset of sensor signals. Additionally, we provide a dockerised cross-compilation environment for faster compilation. With this pipeline, we aim to provide a template for programmers and makers to prototype and experiment with neural networks for real-time embedded musical applications.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
Authors:
Franco Caspe,
Andrew McPherson,
Mark Sandler
Abstract:
FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers…
▽ More
FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Tenodesis Grasp Emulator: Kinematic Assessment of Wrist-Driven Orthotic Control
Authors:
Erin Y. Chang,
Raghid Mardini,
Andrew I. W. McPherson,
Yuri Gloumakov,
Hannah S. Stuart
Abstract:
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired…
▽ More
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired hand function. Subjects perform a series of grasp-and-release tasks in order to compare normal (test control) and constrained wrist-driven modes, showing significant compensation as a result of the constraint. A motor-augmented mode is also compared against traditional wrist-driven operation, to explore the potential role of hybrid human-robot control. We find that both the passive wrist-driven and motor-augmented modes fulfill different roles throughout various tasks tested. Thus, we conclude that a flexible control scheme that can alter intervention based on the task at hand holds the potential to reduce compensation in future work.
△ Less
Submitted 9 November, 2023; v1 submitted 22 November, 2021;
originally announced November 2021.
-
An embedded multichannel sound acquisition system for drone audition
Authors:
Michael Clayton,
Lin Wang,
Andrew McPherson,
Andrea Cavallaro
Abstract:
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mou…
▽ More
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mounted on a quadcopter. In addition to recording and storing locally the sound from multiple microphones simultaneously, the embedded system can connect wirelessly to a remote terminal to transfer audio files for further processing. This will be the first stage towards creating a fully embedded solution for drone audition. We present experimental results obtained by state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere
Authors:
Jieyang Chen,
Qiang Guan,
Xin Liang,
Louis James Vernon,
Allen McPherson,
Li-Ta Lo,
Zizhong Chen,
James Paul Ahrens
Abstract:
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud b…
▽ More
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud bring great benefits to the application development, build and deployment processes. While cloud platforms already widely support containers, HPC systems still have non-uniform support of container technologies. In this work, we propose a unified runtime framework -- Build and Execution Environment (BEE) across both HPC and cloud platforms that allows users to run their containerized HPC applications across all supported platforms without modification. We design four BEE backends for four different classes of HPC or cloud platform so that together they cover the majority of mainstream computing platforms for HPC users. Evaluations show that BEE provides an easy-to-use unified user interface, execution environment, and comparable performance.
△ Less
Submitted 27 February, 2021; v1 submitted 19 December, 2017;
originally announced December 2017.
-
Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples
Authors:
Andrew McPherson,
Andrew Roth,
Gavin Ha,
Sohrab P. Shah,
Cedric Chauve,
S. Cenk Sahinalp
Abstract:
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic…
▽ More
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy.
△ Less
Submitted 24 April, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.