Skip to main content

Showing 1–18 of 18 results for author: Steinmetz, C J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.14405  [pdf, other

    cs.SD eess.AS

    Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects

    Authors: Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Audio effects are extensively used at every stage of audio and music content creation. The majority of differentiable audio effects modeling approaches fall into the black-box or gray-box paradigms; and most models have been proposed and applied to nonlinear effects like guitar amplifiers, overdrive, distortion, fuzz and compressor. Although a plethora of architectures have been introduced for the… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  2. arXiv:2410.21233  [pdf, other

    cs.SD eess.AS

    ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

    Authors: Christian J. Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized tra… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted to ISMIR 2024. Code available https://github.com/csteinmetz1/st-ito

  3. arXiv:2403.16331  [pdf, other

    cs.SD cs.LG eess.AS

    Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

    Authors: Hanzhi Yin, Gang Cheng, Christian J. Steinmetz, Ruibin Yuan, Richard M. Stern, Roger B. Dannenberg

    Abstract: We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured stat… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  4. arXiv:2311.01526  [pdf, other

    cs.SD cs.LG eess.AS

    ATGNN: Audio Tagging Graph Neural Network

    Authors: Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell

    Abstract: Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  5. arXiv:2310.11364  [pdf, other

    cs.SD eess.AS

    High-Fidelity Noise Reduction with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Thomas Walther, Joshua D. Reiss

    Abstract: Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control pa… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at the 155th Convention of the Audio Engineering Society

  6. arXiv:2308.16177  [pdf, other

    cs.SD eess.AS

    General Purpose Audio Effect Removal

    Authors: Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss

    Abstract: Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  7. arXiv:2305.13262  [pdf, other

    cs.SD cs.LG eess.AS

    Modulation Extraction for LFO-driven Audio Effects

    Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunità, Joshua D. Reiss

    Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at https://christhetree.github.io/mod_extraction/

  8. arXiv:2304.04394  [pdf, other

    eess.AS cs.SD

    Leveraging Neural Representations for Audio Manipulation

    Authors: Scott H. Hawley, Christian J. Steinmetz

    Abstract: We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and p… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted as Express Paper for AES Europe 2023, https://aeseurope.com/

  9. arXiv:2211.00497  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Modelling black-box audio effects with time-varying feature modulation

    Authors: Marco Comunità, Christian J. Steinmetz, Huy Phan, Joshua D. Reiss

    Abstract: Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the wi… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  10. arXiv:2207.08759  [pdf, other

    cs.SD eess.AS

    Style Transfer of Audio Effects with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss

    Abstract: We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effe… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Preprint. To appear in the Journal of the Audio Engineering Society

  11. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  12. arXiv:2112.02926  [pdf, other

    eess.AS cs.SD

    Steerable discovery of neural audio effects

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio e… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted to NeurIPS 2021 Workshop on Machine Learning for Creativity and Design

  13. arXiv:2110.03691  [pdf, other

    eess.SP cs.LG cs.SD eess.AS

    Direct design of biquad filter cascades with deep learning by sampling random polynomials

    Authors: Joseph T. Colonel, Christian J. Steinmetz, Marcus Michelen, Joshua D. Reiss

    Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to i… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  14. arXiv:2110.01436  [pdf, other

    eess.AS cs.SD

    WaveBeat: End-to-end beat and downbeat tracking in the time domain

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectra… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: To appear at the 151st AES Convention

  15. arXiv:2107.07503  [pdf, other

    eess.AS cs.SD

    Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech

    Authors: Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia

    Abstract: Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-in… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: Accepted to WASPAA 2021. See details at https://facebookresearch.github.io/FiNS/

  16. arXiv:2102.06200  [pdf, other

    eess.AS cs.SD

    Efficient neural networks for real-time modeling of analog dynamic range compression

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches have demonstrated success in modeling analog audio effects. Nevertheless, challenges remain in modeling more complex effects that involve time-varying nonlinear elements, such as dynamic range compressors. Existing neural network approaches for modeling compression either ignore the device parameters, do not attain sufficient accuracy, or otherwise require large noncausal… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Updated and will appear at 152nd AES Convention (note title change)

  17. arXiv:2010.10291  [pdf, other

    eess.AS cs.SD

    Automatic multitrack mixing with a differentiable mixing console of neural audio effects

    Authors: Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

    Abstract: Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weig… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  18. arXiv:2010.04237  [pdf, other

    eess.AS cs.SD

    Randomized Overdrive Neural Networks

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: By processing audio signals in the time-domain with randomly weighted temporal convolutional networks (TCNs), we uncover a wide range of novel, yet controllable overdrive effects. We discover that architectural aspects, such as the depth of the network, the kernel size, the number of channels, the activation function, as well as the weight initialization, all have a clear impact on the sonic chara… ▽ More

    Submitted 4 August, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Updating project URL. Now https://csteinmetz1.github.io/ronn