Search | arXiv e-print repository

Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics

Authors: Orchisama Das, Gloria Dal Santo, Sebastian J. Schlecht, Vesa Valimaki, Zoran Cvetkovic

Abstract: Rendering dynamic reverberation in a complicated acoustic space for moving sources and listeners is challenging but crucial for enhancing user immersion in extended-reality (XR) applications. Capturing spatially varying room impulse responses (RIRs) is costly and often impractical. Moreover, dynamic convolution with measured RIRs is computationally expensive with high memory demands, typically not… ▽ More Rendering dynamic reverberation in a complicated acoustic space for moving sources and listeners is challenging but crucial for enhancing user immersion in extended-reality (XR) applications. Capturing spatially varying room impulse responses (RIRs) is costly and often impractical. Moreover, dynamic convolution with measured RIRs is computationally expensive with high memory demands, typically not available on wearable computing devices. Grouped Feedback Delay Networks (GFDNs), on the other hand, allow efficient rendering of coupled room acoustics. However, its parameters need to be tuned to match the reverberation profile of a coupled space. In this work, we propose the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay. Once trained on a finite set of measurements, the DiffGFDN generalises to unmeasured locations in the space. We propose a parallel processing pipeline that has multiple DiffGFDNs with frequency-independent parameters processing each octave band. The parameters of the DiffGFDN can be updated rapidly during inferencing as sources and listeners move. We evaluate the proposed architecture against the Common Slopes (CS) model on a dataset of RIRs for three coupled rooms. The proposed architecture generates multi-slope late reverberation with low memory and computational requirements, achieving better energy decay relief (EDR) error and slightly worse octave-band energy decay curve (EDC) errors compared to the CS model. Furthermore, DiffGFDN requires an order of magnitude fewer floating-point operations per sample than the CS renderer. △ Less

Submitted 8 August, 2025; originally announced August 2025.

arXiv:2503.18600 [pdf, ps, other]

Joint Spectrogram Separation and TDOA Estimation using Optimal Transport

Authors: Linda Fabiani, Sebastian J. Schlecht, Isabel Haasler, Filip Elvander

Abstract: Separating sources is a common challenge in applications such as speech enhancement and telecommunications, where distinguishing between overlapping sounds helps reduce interference and improve signal quality. Additionally, in multichannel systems, correct calibration and synchronization are essential to separate and locate source signals accurately. This work introduces a method for blind source… ▽ More Separating sources is a common challenge in applications such as speech enhancement and telecommunications, where distinguishing between overlapping sounds helps reduce interference and improve signal quality. Additionally, in multichannel systems, correct calibration and synchronization are essential to separate and locate source signals accurately. This work introduces a method for blind source separation and estimation of the Time Difference of Arrival (TDOA) of signals in the time-frequency domain. Our proposed method effectively separates signal mixtures into their original source spectrograms while simultaneously estimating the relative delays between receivers, using Optimal Transport (OT) theory. By exploiting the structure of the OT problem, we combine the separation and delay estimation processes into a unified framework, optimizing the system through a block coordinate descent algorithm. We analyze the performance of the OT-based estimator under various noise conditions and compare it with conventional TDOA and source separation methods. Numerical simulation results demonstrate that our proposed approach can achieve a significant level of accuracy across diverse noise scenarios for physical speech signals in both TDOA and source separation tasks. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2412.04534 [pdf, ps, other]

Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)

Authors: Matteo Scerbo, Sebastian J. Schlecht, Randall Ali, Lauri Savioja, Enzo De Sena

Abstract: Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listene… ▽ More Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of acoustic radiance transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes. △ Less

Submitted 21 July, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

arXiv:2409.08723 [pdf, other]

FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Authors: Gloria Dal Santo, Gian Marco De Bortoli, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Abstract: We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the de… ▽ More We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the development of differentiable audio systems. It includes predefined filtering modules and auxiliary classes for constructing, training, and logging the optimized systems, all accessible through an intuitive interface. Practical application of these modules is demonstrated through two case studies: the optimization of an artificial reverberator and an active acoustics system for improved response coloration. △ Less

Submitted 14 April, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

arXiv:2408.14836 [pdf, ps, other]

Similarity Metrics For Late Reverberation

Authors: Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Abstract: Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. These metrics are differentiable and can be utilize… ▽ More Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. These metrics are differentiable and can be utilized within a machine-learning framework. We compare the performance of these metrics to two popular audio metrics using a large dataset of room impulse responses encompassing various room configurations and microphone positions. The results indicate that the proposed functions based on averaged power and frequency-band energy decay outperform the baselines with the former exhibiting the most suitable profile towards the minimum. The proposed work holds promise as an improvement to the design and evaluation of reverberation similarity metrics. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2407.13242 [pdf, other]

Fade-in Reverberation in Multi-room Environments Using the Common-Slope Model

Authors: Kyung Yun Lee, Nils Meyer-Kahlen, Georg Götz, U. Peter Svensson, Sebastian J. Schlecht, Vesa Välimäki

Abstract: In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the "fade-in" of reverberation. Based on recent work o… ▽ More In multi-room environments, modelling the sound propagation is complex due to the coupling of rooms and diverse source-receiver positions. A common scenario is when the source and the receiver are in different rooms without a clear line of sight. For such source-receiver configurations, an initial increase in energy is observed, referred to as the "fade-in" of reverberation. Based on recent work of representing inhomogeneous and anisotropic reverberation with common decay times, this work proposes an extended parametric model that enables the modelling of the fade-in phenomenon. The method performs fitting on the envelopes, instead of energy decay functions, and allows negative amplitudes of decaying exponentials. We evaluate the method on simulated and measured multi-room environments, where we show that the proposed approach can now model the fade-ins that were unrealisable with the previous method. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 2024 AES 5th International Conference on Audio for Virtual and Augmented Reality

arXiv:2403.20090 [pdf, other]

Non-Exponential Reverberation Modeling Using Dark Velvet Noise

Authors: Jon Fagerström, Sebastian J. Schlecht, Vesa Välimäki

Abstract: Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling late reverberation with arbitrary temporal energy… ▽ More Previous research on late-reverberation modeling has mainly focused on exponentially decaying room impulse responses, whereas methods for accurately modeling non-exponential reverberation remain challenging. This paper extends the previously proposed basic dark-velvet-noise reverberation algorithm and proposes a parametrization scheme for modeling late reverberation with arbitrary temporal energy decay. Each pulse in the velvet-noise sequence is routed to a single dictionary filter that is selected from a set of filters based on weighted probabilities. The probabilities control the spectral evolution of the late-reverberation model and are optimized to fit a target impulse response via non-negative least-squares optimization. In this way, the frequency-dependent energy decay of a target late-reverberation impulse response can be fitted with mean and maximum T60 errors of 4% and 8%, respectively, requiring about 50% less coloration filters than a previously proposed filtered velvet-noise algorithm. Furthermore, the extended dark-velvet-noise reverberation algorithm allows the modeled impulse response to be gated, the frequency-dependent reverberation time to be modified, and the model's spectral evolution and broadband decay to be decoupled. The proposed method is suitable for the parametric late-reverberation synthesis of various acoustic environments, especially spaces that exhibit a non-exponential energy decay, motivating its use in musical audio and virtual reality. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted for publication in the Journal of Audio Engineering Society

arXiv:2402.11216 [pdf]

Optimizing tiny colorless feedback delay networks

Authors: Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Abstract: A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a tiny differentiable feedback delay network, with as… ▽ More A common bane of artificial reverberation algorithms is spectral coloration in the synthesized sound, typically manifesting as metallic ringing, leading to a degradation in the perceived sound quality. In delay network methods, coloration is more pronounced when fewer delay lines are used. This paper presents an optimization framework in which a tiny differentiable feedback delay network, with as few as four delay lines, is used to learn a set of parameters to iteratively reduce coloration. The parameters under optimization include the feedback matrix, as well as the input and output gains. The optimization objective is twofold: to maximize spectral flatness through a spectral loss while maintaining temporal density by penalizing sparseness in the parameter values. A favorable narrow distribution of modal excitation is achieved while maintaining the desired impulse response density. In a subjective assessment, the new method proves effective in reducing perceptual coloration of late reverberation. Compared to the author's previous work, which serves as the baseline and utilizes a sparsity loss in the time domain, the proposed method achieves computational savings while maintaining performance. The effectiveness of this work is demonstrated through two application scenarios where smooth-sounding synthetic room impulse responses are obtained via the introduction of attenuation filters and an optimizable scattering feedback matrix. △ Less

Submitted 12 March, 2025; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.00859 [pdf, other]

Deep Room Impulse Response Completion

Authors: Jackie Lin, Georg Götz, Sebastian J. Schlecht

Abstract: Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct soun… ▽ More Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct sound and early reflections encapsulate sufficient information about room geometry and absorption characteristics. Building upon this premise, we propose a novel task termed "RIR completion," aimed at synthesizing the late reverberation given only the early portion (50 ms) of the response. To this end, we introduce DECOR, Deep Exponential Completion Of Room impulse responses, a deep neural network structured as an autoencoder designed to predict multi-exponential decay envelopes of filtered noise sequences. The interpretability of DECOR's output facilitates its integration with diverse rendering techniques. The proposed method is compared against an adapted state-of-the-art network, and comparable performance shows promising results supporting the feasibility of the RIR completion task. The RIR completion can be widely adapted to enhance RIR generation tasks where fast late reverberation approximation is required. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: The following article has been submitted to the EURASIP Journal on Audio, Speech, and Music Processing

arXiv:2310.07363 [pdf, other]

Damping Density of an Absorptive Shoebox Room Derived from the Image-Source Method

Authors: Sebastian J. Schlecht, Karolina Prawda, Rudolf Rabenstein, Maximilian Schäfer

Abstract: The image-source method is widely applied to compute room impulse responses (RIRs) of shoebox rooms with arbitrary absorption. However, with increasing RIR lengths, the number of image sources grows rapidly, leading to slow computation. In this paper, we derive a closed-form expression for the damping density, which characterizes the overall multi-slope energy decay. The omnidirectional energy dec… ▽ More The image-source method is widely applied to compute room impulse responses (RIRs) of shoebox rooms with arbitrary absorption. However, with increasing RIR lengths, the number of image sources grows rapidly, leading to slow computation. In this paper, we derive a closed-form expression for the damping density, which characterizes the overall multi-slope energy decay. The omnidirectional energy decay over time is directly derived from the damping density. The resulting energy decay model accurately matches the late reverberation simulated via the image-source method. The proposed model allows the fast stochastic synthesis of late reverberation by shaping noise with the energy envelope. Simulations of various wall damping coefficients demonstrate the model's accuracy. The proposed model consistently outperforms the energy decay prediction accuracy compared to a state-of-the-art approximation method. The paper elaborates on the proposed damping density's applicability to modeling multi-sloped sound energy decay, predicting reverberation time in non-diffuse sound fields, and fast frequency-dependent RIR synthesis. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2205.09644 [pdf, other]

doi 10.1121/10.0013416

Neural network for multi-exponential sound energy decay analysis

Authors: Georg Götz, Ricardo Falcón Pérez, Sebastian J. Schlecht, Ville Pulkki

Abstract: An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20000 EDF measurements conducted in various acoustic environments. The evaluation shows that… ▽ More An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20000 EDF measurements conducted in various acoustic environments. The evaluation shows that the proposed neural network architecture robustly estimates the model parameters from large datasets of measured EDFs, while being lightweight and computationally efficient. An implementation of the proposed neural network is publicly available. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: The following article has been submitted to the Journal of the Acoustical Society of America (JASA). After it is published, it will be found at http://asa.scitation.org/journal/jas

Journal ref: J. Acoust. Soc. Am., Vol. 152, No. 2, pp. 942-953, 2022

arXiv:2204.10125 [pdf, other]

Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Authors: Julian D. Parker, Sebastian J. Schlecht, Rudolf Rabenstein, Maximilian Schäfer

Abstract: Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from… ▽ More Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems. △ Less

Submitted 1 June, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: Accepted to DAFx2022

arXiv:2007.07337 [pdf, other]

doi 10.1109/TSP.2021.3053507

Allpass Feedback Delay Networks

Authors: Sebastian J. Schlecht

Abstract: In the 1960s, Schroeder and Logan introduced delay line-based allpass filters, which are still popular due to their computational efficiency and versatile applicability in artificial reverberation, decorrelation, and dispersive system design. In this work, we extend the theory of allpass systems to any arbitrary connection of delay lines, namely feedback delay networks (FDNs). We present a charact… ▽ More In the 1960s, Schroeder and Logan introduced delay line-based allpass filters, which are still popular due to their computational efficiency and versatile applicability in artificial reverberation, decorrelation, and dispersive system design. In this work, we extend the theory of allpass systems to any arbitrary connection of delay lines, namely feedback delay networks (FDNs). We present a characterization of uniallpass FDNs, i.e., FDNs, which are allpass for an arbitrary choice of delays. Further, we develop a solution to the completion problem, i.e., given an FDN feedback matrix to determine the remaining gain parameters such that the FDN is allpass. Particularly useful for the completion problem are feedback matrices, which yield a homogeneous decay of all system modes. Finally, we apply the uniallpass characterization to previous FDN designs, namely, Schroeder's series allpass and Gardner's nested allpass for single-input, single-output systems, and, Poletti's unitary reverberator for multi-input, multi-output systems and demonstrate the significant extension of the design space. △ Less

Submitted 18 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:1901.08865 [pdf, other]

doi 10.1109/TSP.2019.2937286

Modal Decomposition of Feedback Delay Networks

Authors: Sebastian J. Schlecht, Emanuël A. P. Habets

Abstract: Feedback delay networks (FDNs) belong to a general class of recursive filters which are widely used in sound synthesis and physical modeling applications. We present a numerical technique to compute the modal decomposition of the FDN transfer function. The proposed pole finding algorithm is based on the Ehrlich-Aberth iteration for matrix polynomials and has improved computational performance of u… ▽ More Feedback delay networks (FDNs) belong to a general class of recursive filters which are widely used in sound synthesis and physical modeling applications. We present a numerical technique to compute the modal decomposition of the FDN transfer function. The proposed pole finding algorithm is based on the Ehrlich-Aberth iteration for matrix polynomials and has improved computational performance of up to three orders of magnitude compared to a scalar polynomial root finder. We demonstrate how explicit knowledge of the FDN's modal behavior facilitates analysis and improvements for artificial reverberation. The statistical distribution of mode frequency and residue magnitudes demonstrate that relatively few modes contribute a large portion of impulse response energy. △ Less

Submitted 25 January, 2019; originally announced January 2019.

arXiv:1606.07729 [pdf, other]

doi 10.1109/TSP.2016.2637323

On Lossless Feedback Delay Networks

Authors: Sebastian J. Schlecht, Emanuel A. P. Habets

Abstract: Lossless Feedback Delay Networks (FDNs) are commonly used as a design prototype for artificial reverberation algorithms. The lossless property is dependent on the feedback matrix, which connects the output of a set of delays to their inputs, and the lengths of the delays. Both, unitary and triangular feedback matrices are known to constitute lossless FDNs, however, the most general class of lossle… ▽ More Lossless Feedback Delay Networks (FDNs) are commonly used as a design prototype for artificial reverberation algorithms. The lossless property is dependent on the feedback matrix, which connects the output of a set of delays to their inputs, and the lengths of the delays. Both, unitary and triangular feedback matrices are known to constitute lossless FDNs, however, the most general class of lossless feedback matrices has not been identified. In this contribution, it is shown that the FDN is lossless for any set of delays, if all irreducible components of the feedback matrix are diagonally similar to a unitary matrix. The necessity of the generalized class of feedback matrices is demonstrated by examples of FDN designs proposed in literature. △ Less

Submitted 24 June, 2016; originally announced June 2016.

Showing 1–15 of 15 results for author: Schlecht, S J