Skip to main content

Showing 1–30 of 30 results for author: Välimäki, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2504.04751  [pdf, other

    eess.AS cs.AI

    Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

    Authors: Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic, Vesa Välimäki

    Abstract: Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem.This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using black- and gray-box… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Submitted to the 28th International Conference on Digital Audio Effects (DAFx25)

  2. arXiv:2501.18470  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Resampling Filter Design for Multirate Neural Audio Effect Processing

    Authors: Alistair Carson, Vesa Välimäki, Alec Wright, Stefan Bilbao

    Abstract: Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate ind… ▽ More

    Submitted 27 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing

  3. arXiv:2501.05959  [pdf, other

    eess.AS

    Estimation and Restoration of Unknown Nonlinear Distortion using Diffusion

    Authors: Michal Švento, Eloi Moliner, Lauri Juvela, Alec Wright, Vesa Välimäki

    Abstract: The restoration of nonlinearly distorted audio signals, alongside the identification of the applied memoryless nonlinear operation, is studied. The paper focuses on the difficult but practically important case in which both the nonlinearity and the original input signal are unknown. The proposed method uses a generative diffusion model trained unconditionally on guitar or speech signals to jointly… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: Submitted to the Journal of Audio Engineering Society, special issue "The Sound of Digital Audio Effects"

  4. arXiv:2410.01562  [pdf, other

    eess.AS cs.LG cs.SD

    HRTF Estimation using a Score-based Prior

    Authors: Etienne Thuillier, Jean-Marie Lemercier, Eloi Moliner, Timo Gerkmann, Vesa Välimäki

    Abstract: We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour o… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  5. arXiv:2409.08723  [pdf, other

    eess.AS

    FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

    Authors: Gloria Dal Santo, Gian Marco De Bortoli, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

    Abstract: We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the de… ▽ More

    Submitted 14 April, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

  6. arXiv:2408.14836  [pdf, ps, other

    eess.AS

    Similarity Metrics For Late Reverberation

    Authors: Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

    Abstract: Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. These metrics are differentiable and can be utilize… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  7. arXiv:2408.07472  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models

    Authors: Jean-Marie Lemercier, Eloi Moliner, Simon Welker, Vesa Välimäki, Timo Gerkmann

    Abstract: This paper presents an unsupervised method for single-channel blind dereverberation and room impulse response (RIR) estimation, called BUDDy. The algorithm is rooted in Bayesian posterior sampling: it combines a likelihood model enforcing fidelity to the reverberant measurement, and an anechoic speech prior implemented by an unconditional diffusion model. We design a parametric filter representing… ▽ More

    Submitted 25 March, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  8. arXiv:2407.13242  [pdf, other

    eess.AS cs.SD

    Fade-in Reverberation in Multi-room Environments Using the Common-Slope Model

    Authors: Kyung Yun Lee, Nils Meyer-Kahlen, Georg Götz, U. Peter Svensson, Sebastian J. Schlecht, Vesa Välimäki

    Abstract: In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the "fade-in" of reverberation. Based on recent work o… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 2024 AES 5th International Conference on Audio for Virtual and Augmented Reality

  9. arXiv:2406.06293  [pdf, other

    eess.AS cs.SD eess.SP

    Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

    Authors: Alistair Carson, Alec Wright, Jatin Chowdhury, Vesa Välimäki, Stefan Bilbao

    Abstract: In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in Proc. DAFx24, Guildford, UK, September 2024

  10. arXiv:2405.04272  [pdf, other

    eess.AS cs.LG cs.SD

    BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

    Authors: Eloi Moliner, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, Vesa Välimäki

    Abstract: In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the rever… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Submitted to IWAENC 2024

  11. arXiv:2403.20090  [pdf, other

    eess.AS

    Non-Exponential Reverberation Modeling Using Dark Velvet Noise

    Authors: Jon Fagerström, Sebastian J. Schlecht, Vesa Välimäki

    Abstract: Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling late reverberation with arbitrary temporal energy… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in the Journal of Audio Engineering Society

  12. arXiv:2403.18636  [pdf, other

    eess.AS cs.SD

    A Diffusion-Based Generative Equalizer for Music Restoration

    Authors: Eloi Moliner, Maija Turunen, Filip Elvander, Vesa Välimäki

    Abstract: This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a nove… ▽ More

    Submitted 13 March, 2025; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Presented at DAFx24. Historical music restoration examples are available at: http://research.spa.aalto.fi/publications/papers/dafx-babe2/

  13. arXiv:2402.11216  [pdf

    eess.AS cs.SD

    Optimizing tiny colorless feedback delay networks

    Authors: Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

    Abstract: A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a tiny differentiable feedback delay network, with as… ▽ More

    Submitted 12 March, 2025; v1 submitted 17 February, 2024; originally announced February 2024.

  14. arXiv:2402.09821  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Models for Audio Restoration

    Authors: Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann

    Abstract: With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to reco… ▽ More

    Submitted 11 November, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Currently in revision for IEEE Signal Processing Magazine Special Issue "Model-based and Data-Driven Audio Signal Processing"

    Journal ref: IEEE Signal Processing Magazine, Jan 2025

  15. arXiv:2312.14586  [pdf, other

    eess.AS cs.SD

    Noise Morphing for Audio Time Stretching

    Authors: Eloi Moliner, Leonardo Fierro, Alec Wright, Matti Hämäläinen, Vesa Välimäki

    Abstract: This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior re… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: submitted to IEEE Signal Processing Letters

  16. arXiv:2310.13430  [pdf, other

    eess.AS cs.LG

    HRTF Interpolation using a Spherical Neural Process Meta-Learner

    Authors: Etienne Thuillier, Craig Jin, Vesa Välimäki

    Abstract: Several individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the subject's HRTF, acquired using acoustic measuremen… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 12 pages. 11 figures. Submitted for publication in IEEE/ACM Transactions on Audio, Speech and Language Processing (T-ASL)

    ACM Class: I.5.4; J.2

  17. arXiv:2306.01433  [pdf, other

    eess.AS cs.LG cs.SD

    Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach

    Authors: Eloi Moliner, Filip Elvander, Vesa Välimäki

    Abstract: Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveragi… ▽ More

    Submitted 30 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  18. arXiv:2305.16862  [pdf, other

    eess.AS cs.LG cs.SD

    Neural modeling of magnetic tape recorders

    Authors: Otto Mikkonen, Alec Wright, Eloi Moliner, Vesa Välimäki

    Abstract: The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hystere… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to DAFx 2023. For accompanying web page, see http://research.spa.aalto.fi/publications/papers/dafx23-neural-tape/

  19. Diffusion-Based Audio Inpainting

    Authors: Eloi Moliner, Vesa Välimäki

    Abstract: Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of existing methods produce plausible reconstructions when the gap lengths are short, but struggle to reconstruct gaps larger than about 100 ms. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting. The proposed method uses an unconditionally… ▽ More

    Submitted 10 January, 2025; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Submitted for publication to the Journal of Audio Engineering Society on January 30th, 2023

    Journal ref: Journal of the Audio Engineering Society 72, no. 3 (2024): 100-113

  20. arXiv:2211.16992  [pdf, other

    eess.AS cs.SD

    Extreme Audio Time Stretching Using Neural Synthesis

    Authors: Leonardo Fierro, Alec Wright, Vesa Välimäki, Matti Hämäläinen

    Abstract: A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of established TSM methods, often based on a phase vocoder stru… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023 on Oct 27, 2022

  21. arXiv:2211.00943  [pdf, other

    eess.AS cs.SD

    Adversarial Guitar Amplifier Modelling With Unpaired Data

    Authors: Alec Wright, Vesa Välimäki, Lauri Juvela

    Abstract: We propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. We train a deep neural network using an adversarial approach, with the goal of transforming the timbre of a guitar, into the timbre of another guitar after audio effects processing has been applied, for example, by a guitar amplifier. The model training requires no paired data, a… ▽ More

    Submitted 20 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  22. arXiv:2210.15228  [pdf, other

    eess.AS cs.SD

    Solving Audio Inverse Problems with a Diffusion Model

    Authors: Eloi Moliner, Jaakko Lehtinen, Vesa Välimäki

    Abstract: This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Tra… ▽ More

    Submitted 18 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accetpted at ICASSP 2023

  23. arXiv:2210.14041  [pdf, other

    eess.AS cs.SD

    Enhanced Fuzzy Decomposition of Sound Into Sines, Transients, and Noise

    Authors: Leonardo Fierro, Vesa Välimäki

    Abstract: The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, transient, or noise. This paper proposes an enhance… ▽ More

    Submitted 30 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Submitted for publication to the Journal of Audio Engineering Society on October 20th, 2022

  24. arXiv:2206.06259  [pdf, other

    eess.AS

    Realistic Gramophone Noise Synthesis using a Diffusion Model

    Authors: Eloi Moliner, Vesa Välimäki

    Abstract: This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is… ▽ More

    Submitted 30 June, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: accepted at DAFx 20in22

  25. arXiv:2205.01897  [pdf, other

    eess.AS cs.LG cs.SD

    Virtual Analog Modeling of Distortion Circuits Using Neural Ordinary Differential Equations

    Authors: Jan Wilczek, Alec Wright, Vesa Välimäki, Emanuël Habets

    Abstract: Recent research in deep learning has shown that neural networks can learn differential equations governing dynamical systems. In this paper, we adapt this concept to Virtual Analog (VA) modeling to learn the ordinary differential equations (ODEs) governing the first-order and the second-order diode clipper. The proposed models achieve performance comparable to state-of-the-art recurrent neural net… ▽ More

    Submitted 1 July, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: 8 pages, 10 figures, accepted for DAFx 2022 conference, for associated audio examples, see https://thewolfsound.com/publications/dafx2022/

  26. arXiv:2204.06478  [pdf, other

    eess.AS cs.SD

    BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks

    Authors: Eloi Moliner, Vesa Välimäki

    Abstract: Audio bandwidth extension aims to expand the spectrum of narrow-band audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes BEHM-GAN, a model based on generative adversarial networks, as a practical solution to this problem. The proposed method w… ▽ More

    Submitted 28 June, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted at IEEE Transactions on Audio, Speech, and Language Processing

  27. arXiv:2202.08702  [pdf, other

    eess.AS cs.SD

    A Two-Stage U-Net for High-Fidelity Denoising of Historical Recordings

    Authors: Eloi Moliner, Vesa Välimäki

    Abstract: Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of audio, and is trained using realistic noisy data t… ▽ More

    Submitted 19 February, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: Accepted at ICASSP 2022

  28. arXiv:2110.04082  [pdf, other

    eess.AS cs.SD eess.SP

    A Method for Capturing and Reproducing Directional Reverberation in Six Degrees of Freedom

    Authors: Benoit Alary, Vesa Välimäki

    Abstract: The reproduction of acoustics is an important aspect of the preservation of cultural heritage. A common approach is to capture an impulse response in a hall and auralize it by convolving an input signal with the measured reverberant response. For immersive applications, it is typical to acquire spatial impulse responses using a spherical microphone array to capture the reverberant sound field. Whi… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: This work has been accepted for the I3DA 2021 International Conference and will be submitted to IEEE Xplore Digital Library for possible publication

  29. arXiv:1911.08922  [pdf, other

    eess.AS cs.SD

    Perceptual Loss Function for Neural Modelling of Audio Systems

    Authors: Alec Wright, Vesa Välimäki

    Abstract: This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the error-to-signal ratio loss function was used during network training, with a first-order highpass pre-emphasis filter applied to both the target signal and neural network output. This work considers more perceptually releva… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

  30. arXiv:1811.00334  [pdf, other

    eess.AS cs.SD

    Deep Learning for Tube Amplifier Emulation

    Authors: Eero-Pekka Damskägg, Lauri Juvela, Etienne Thuillier, Vesa Välimäki

    Abstract: Analog audio effects and synthesizers often owe their distinct sound to circuit nonlinearities. Faithfully modeling such significant aspect of the original sound in virtual analog software can prove challenging. The current work proposes a generic data-driven approach to virtual analog modeling and applies it to the Fender Bassman 56F-A vacuum-tube amplifier. Specifically, a feedforward variant of… ▽ More

    Submitted 20 February, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: Accepted to ICASSP 2019