Skip to main content

Showing 1–31 of 31 results for author: Reiss, J D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.05858  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Bimodal Connection Attention Fusion for Speech Emotion Recognition

    Authors: Jiachen Luo, Huy Phan, Lin Wang, Joshua D. Reiss

    Abstract: Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which includes three main modules: the interactive connection… ▽ More

    Submitted 22 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

  2. arXiv:2502.14405  [pdf, other

    cs.SD eess.AS

    Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects

    Authors: Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Audio effects are extensively used at every stage of audio and music content creation. The majority of differentiable audio effects modeling approaches fall into the black-box or gray-box paradigms; and most models have been proposed and applied to nonlinear effects like guitar amplifiers, overdrive, distortion, fuzz and compressor. Although a plethora of architectures have been introduced for the… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  3. arXiv:2502.11668  [pdf, other

    cs.SD

    NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects

    Authors: Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

    Abstract: We present NablAFx, an open-source framework developed to support research in differentiable black-box and gray-box modeling of audio effects. Built in PyTorch, NablAFx offers a versatile ecosystem to configure, train, evaluate, and compare various architectural approaches. It includes classes to manage model architectures, datasets, and training, along with features to compute and log losses, met… ▽ More

    Submitted 25 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  4. arXiv:2412.15023  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    FolAI: Synchronized Foley Sound Generation with Semantic and Temporal Alignment

    Authors: Riccardo Fosco Gramaccioni, Christian Marinoni, Emilian Postolache, Marco Comunità, Luca Cosmo, Joshua D. Reiss, Danilo Comminiello

    Abstract: Traditional sound design workflows rely on manual alignment of audio events to visual cues, as in Foley sound design, where everyday actions like footsteps or object interactions are recreated to match the on-screen motion. This process is time-consuming, difficult to scale, and lacks automation tools that preserve creative intent. Despite recent advances in vision-to-audio generation, producing t… ▽ More

    Submitted 5 May, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  5. arXiv:2412.03373  [pdf

    cs.SD eess.AS

    Exploring trends in audio mixes and masters: Insights from a dataset analysis

    Authors: Angeliki Mourgela, Elio Quinton, Spyridon Bissas, Joshua D. Reiss, David Ronan

    Abstract: We present an analysis of a dataset of audio metrics and aesthetic considerations about mixes and masters provided by the web platform MixCheck studio. The platform is designed for educational purposes, primarily targeting amateur music producers, and aimed at analysing their recordings prior to them being released. The analysis focuses on the following data points: integrated loudness, mono compa… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: 11 pages, 6 figures, Presented at the AES 157th Convention October 2024, New York, USA

  6. arXiv:2410.21233  [pdf, other

    cs.SD eess.AS

    ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

    Authors: Christian J. Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized tra… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted to ISMIR 2024. Code available https://github.com/csteinmetz1/st-ito

  7. arXiv:2404.17821  [pdf

    cs.SD cs.MM eess.AS

    An automatic mixing speech enhancement system for multi-track audio

    Authors: Xiaojing Liu, Hongwei Ai, Joshua D. Reiss

    Abstract: We propose a speech enhancement system for multitrack audio. The system will minimize auditory masking while allowing one to hear multiple simultaneous speakers. The system can be used in multiple communication scenarios e.g., teleconferencing, invoice gaming, and live streaming. The ITU-R BS.1387 Perceptual Evaluation of Audio Quality (PEAQ) model is used to evaluate the amount of masking in the… ▽ More

    Submitted 18 October, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 5 pages

  8. arXiv:2404.07970  [pdf, other

    eess.AS cs.LG cs.SD

    Differentiable All-pole Filters for Time-varying Audio Systems

    Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

    Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous wo… ▽ More

    Submitted 18 October, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Published at DAFx 2024

  9. arXiv:2310.15247  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

    Authors: Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss

    Abstract: Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no refer… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  10. arXiv:2310.11364  [pdf, other

    cs.SD eess.AS

    High-Fidelity Noise Reduction with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Thomas Walther, Joshua D. Reiss

    Abstract: Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control pa… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at the 155th Convention of the Audio Engineering Society

  11. arXiv:2309.14761  [pdf, other

    eess.AS cs.SD

    Optimization Techniques for a Physical Model of Human Vocalisation

    Authors: Mateo Cámara, Zhiyuan Xu, Yisu Zong, José Luis Blanco, Joshua D. Reiss

    Abstract: We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between rea… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to DAFx 2023

  12. arXiv:2308.16177  [pdf, other

    cs.SD eess.AS

    General Purpose Audio Effect Removal

    Authors: Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss

    Abstract: Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  13. arXiv:2307.04702  [pdf, other

    cs.SD eess.AS

    Vocal Tract Area Estimation by Gradient Descent

    Authors: David Südholt, Mateo Cámara, Zhiyuan Xu, Joshua D. Reiss

    Abstract: Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimiz… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Accepted to DAFx 2023

  14. arXiv:2305.13262  [pdf, other

    cs.SD cs.LG eess.AS

    Modulation Extraction for LFO-driven Audio Effects

    Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunità, Joshua D. Reiss

    Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at https://christhetree.github.io/mod_extraction/

  15. arXiv:2211.00497  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Modelling black-box audio effects with time-varying feature modulation

    Authors: Marco Comunità, Christian J. Steinmetz, Huy Phan, Joshua D. Reiss

    Abstract: Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the wi… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  16. arXiv:2207.08759  [pdf, other

    cs.SD eess.AS

    Style Transfer of Audio Effects with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss

    Abstract: We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effe… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Preprint. To appear in the Journal of the Audio Engineering Society

  17. arXiv:2112.02926  [pdf, other

    eess.AS cs.SD

    Steerable discovery of neural audio effects

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio e… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted to NeurIPS 2021 Workshop on Machine Learning for Creativity and Design

  18. arXiv:2110.09605  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks

    Authors: Marco Comunità, Huy Phan, Joshua D. Reiss

    Abstract: Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as s… ▽ More

    Submitted 10 December, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  19. arXiv:2110.03691  [pdf, other

    eess.SP cs.LG cs.SD eess.AS

    Direct design of biquad filter cascades with deep learning by sampling random polynomials

    Authors: Joseph T. Colonel, Christian J. Steinmetz, Marcus Michelen, Joshua D. Reiss

    Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to i… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  20. arXiv:2110.01436  [pdf, other

    eess.AS cs.SD

    WaveBeat: End-to-end beat and downbeat tracking in the time domain

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectra… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: To appear at the 151st AES Convention

  21. arXiv:2102.06200  [pdf, other

    eess.AS cs.SD

    Efficient neural networks for real-time modeling of analog dynamic range compression

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches have demonstrated success in modeling analog audio effects. Nevertheless, challenges remain in modeling more complex effects that involve time-varying nonlinear elements, such as dynamic range compressors. Existing neural network approaches for modeling compression either ignore the device parameters, do not attain sufficient accuracy, or otherwise require large noncausal… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Updated and will appear at 152nd AES Convention (note title change)

  22. arXiv:2012.03216  [pdf, other

    cs.SD cs.LG eess.AS

    Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks

    Authors: Marco Comunità, Dan Stowell, Joshua D. Reiss

    Abstract: Despite the popularity of guitar effects, there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion and fuzz guitar effects. A novel dataset of processed electric guitar samples was assemb… ▽ More

    Submitted 6 December, 2020; originally announced December 2020.

    Journal ref: JAES Volume 69 Issue 7/8 pp. 594-604; July 2021

  23. arXiv:2010.04237  [pdf, other

    eess.AS cs.SD

    Randomized Overdrive Neural Networks

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: By processing audio signals in the time-domain with randomly weighted temporal convolutional networks (TCNs), we uncover a wide range of novel, yet controllable overdrive effects. We discover that architectural aspects, such as the depth of the network, the kernel size, the number of channels, the activation function, as well as the weight initialization, all have a clear impact on the sonic chara… ▽ More

    Submitted 4 August, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Updating project URL. Now https://csteinmetz1.github.io/ronn

  24. arXiv:1910.10105  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Modeling plate and spring reverberation using a DSP-informed deep neural network

    Authors: Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation. Nowadays they are often used in music production for aesthetic reasons due to their particular sonic characteristics. The modeling of these audio processors and their perceptual qualities is difficult since they use mechanical elements together with analog electron… ▽ More

    Submitted 17 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020. Source code, dataset, audio examples and more detailed diagrams: https://mchijmma.github.io/modeling-plate-spring-reverb/

  25. arXiv:1905.06148  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A general-purpose deep learning approach to model time-varying audio effects

    Authors: Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning a… ▽ More

    Submitted 21 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: audio files: https://mchijmma.github.io/modeling-time-varying/

  26. arXiv:1901.11436  [pdf, other

    stat.ML cs.LG cs.SD eess.AS eess.SP

    End-to-End Probabilistic Inference for Nonstationary Audio Analysis

    Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

    Abstract: A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters… ▽ More

    Submitted 27 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019

  27. arXiv:1811.02489  [pdf, other

    eess.SP cs.LG cs.SD eess.AS stat.ML

    Unifying Probabilistic Models for Time-Frequency Analysis

    Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

    Abstract: In audio signal processing, probabilistic time-frequency models have many benefits over their non-probabilistic counterparts. They adapt to the incoming signal, quantify uncertainty, and measure correlation between the signal's amplitude and phase information, making time domain resynthesis straightforward. However, these models are still not widely used since they come at a high computational cos… ▽ More

    Submitted 12 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

  28. arXiv:1810.06603  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Modeling of nonlinear audio effects with end-to-end deep neural networks

    Authors: Marco A. Martínez Ramirez, Joshua D. Reiss

    Abstract: In the context of music production, distortion effects are mainly used for aesthetic reasons and are usually applied to electric musical instruments. Most existing methods for nonlinear modeling are often either simplified or optimized to a very specific circuit. In this work, we investigate deep learning architectures for audio processing and we aim to find a general purpose end-to-end deep neura… ▽ More

    Submitted 6 March, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Presented at the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019

  29. arXiv:1803.11154  [pdf, other

    eess.IV cs.SD eess.AS

    An empirical approach to the relationship between emotion and music production quality

    Authors: David Ronan, Joshua D. Reiss, Hatice Gunes

    Abstract: In music production, the role of the mix engineer is to take recorded music and convey the expressed emotions as professionally sounding as possible. We investigated the relationship between music production quality and musically induced and perceived emotions. A listening test was performed where 10 critical listeners and 10 non-critical listeners evaluated 10 songs. There were two mixes of each… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: 12 Pages

  30. arXiv:1803.09960   

    eess.AS cs.SD

    Automatic Minimisation of Masking in Multitrack Audio using Subgroups

    Authors: David Ronan, Zheng Ma, Paul Mc Namara, Hatice Gunes, Joshua D. Reiss

    Abstract: The iterative process of masking minimisation when mixing multitrack audio is a challenging optimisation problem, in part due to the complexity and non-linearity of auditory perception. In this article, we first propose a multitrack masking metric inspired by the MPEG psychoacoustic model. We investigate different audio processing techniques to manipulate the frequency and dynamic characteristics… ▽ More

    Submitted 5 January, 2021; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: Need to resolve ownership of intellectual property

  31. arXiv:1802.00680  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    A Generative Model for Natural Sounds Based on Latent Force Modelling

    Authors: William J. Wilkinson, Joshua D. Reiss, Dan Stowell

    Abstract: Recent advances in analysis of subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitudes to be a crucial component of perception. Probabilistic latent variable analysis is particularly revealing, but existing approaches don't incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay and feedback.… ▽ More

    Submitted 27 March, 2019; v1 submitted 2 February, 2018; originally announced February 2018.

    Comments: 10 pages, 5 figures