Search | arXiv e-print repository

FlexiSAGA: A Flexible Systolic Array GEMM Accelerator for Sparse and Dense Processing

Authors: Mika Markus Müller, Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Oliver Bringmann

Abstract: Artificial Intelligence (AI) algorithms, such as Deep Neural Networks (DNNs), have become an important tool for a wide range of applications, from computer vision to natural language processing. However, the computational complexity of DNN inference poses a significant challenge, particularly for processing on resource-constrained edge devices. One promising approach to address this challenge is t… ▽ More Artificial Intelligence (AI) algorithms, such as Deep Neural Networks (DNNs), have become an important tool for a wide range of applications, from computer vision to natural language processing. However, the computational complexity of DNN inference poses a significant challenge, particularly for processing on resource-constrained edge devices. One promising approach to address this challenge is the exploitation of sparsity in DNN operator weights. In this work, we present FlexiSAGA, an architecturally configurable and dataflow-flexible AI hardware accelerator for the sparse and dense processing of general matrix multiplications (GEMMs). FlexiSAGA supports seven different sparse and dense dataflows, enabling efficient processing of resource intensive DNN operators. Additionally, we propose a DNN pruning method specifically tailored towards the FlexiSAGA architecture, allowing for near-optimal processing of dense and sparse convolution and fully-connected operators, facilitating a DNN/HW co-design flow. Our results show a whole DNN sparse-over-dense inference speedup ranging from 1.41 up to 4.28, outperforming commercial and literature-reported accelerator platforms. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: Accepted Version for: SAMOS XXV

arXiv:2502.14405 [pdf, other]

Differentiable Black-box and Gray-box Modeling of Nonlinear Audio Effects

Authors: Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

Abstract: Audio effects are extensively used at every stage of audio and music content creation. The majority of differentiable audio effects modeling approaches fall into the black-box or gray-box paradigms; and most models have been proposed and applied to nonlinear effects like guitar amplifiers, overdrive, distortion, fuzz and compressor. Although a plethora of architectures have been introduced for the… ▽ More Audio effects are extensively used at every stage of audio and music content creation. The majority of differentiable audio effects modeling approaches fall into the black-box or gray-box paradigms; and most models have been proposed and applied to nonlinear effects like guitar amplifiers, overdrive, distortion, fuzz and compressor. Although a plethora of architectures have been introduced for the task at hand there is still lack of understanding on the state of the art, since most publications experiment with one type of nonlinear audio effect and a very small number of devices. In this work we aim to shed light on the audio effects modeling landscape by comparing black-box and gray-box architectures on a large number of nonlinear audio effects, identifying the most suitable for a wide range of devices. In the process, we also: introduce time-varying gray-box models and propose models for compressor, distortion and fuzz, publish a large dataset for audio effects research - ToneTwist AFx https://github.com/mcomunita/tonetwist-afx-dataset - that is also the first open to community contributions, evaluate models on a variety of metrics and conduct extensive subjective evaluation. Code https://github.com/mcomunita/nablafx and supplementary material https://github.com/mcomunita/nnlinafx-supp-material are also available. △ Less

Submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.11668 [pdf, other]

NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects

Authors: Marco Comunità, Christian J. Steinmetz, Joshua D. Reiss

Abstract: We present NablAFx, an open-source framework developed to support research in differentiable black-box and gray-box modeling of audio effects. Built in PyTorch, NablAFx offers a versatile ecosystem to configure, train, evaluate, and compare various architectural approaches. It includes classes to manage model architectures, datasets, and training, along with features to compute and log losses, met… ▽ More We present NablAFx, an open-source framework developed to support research in differentiable black-box and gray-box modeling of audio effects. Built in PyTorch, NablAFx offers a versatile ecosystem to configure, train, evaluate, and compare various architectural approaches. It includes classes to manage model architectures, datasets, and training, along with features to compute and log losses, metrics and media, and plotting functions to facilitate detailed analysis. It incorporates implementations of established black-box architectures and conditioning methods, as well as differentiable DSP blocks and controllers, enabling the creation of both parametric and non-parametric gray-box signal chains. The code is accessible at https://github.com/mcomunita/nablafx. △ Less

Submitted 25 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

arXiv:2412.13330 [pdf, other]

Simulating imperfect quantum optical circuits using unsymmetrized bases

Authors: John Steinmetz, Maike Ostmann, Alex Neville, Brendan Pankovich, Adel Sohbi

Abstract: Fault-tolerant photonic quantum computing requires the generation of large entangled resource states. The required size of these states makes it challenging to simulate the effects of errors such as loss and partial distinguishability. For an interferometer with $N$ partially distinguishable input photons and $M$ spatial modes, the Fock basis can have up to ${N+NM-1\choose N}$ elements. We show th… ▽ More Fault-tolerant photonic quantum computing requires the generation of large entangled resource states. The required size of these states makes it challenging to simulate the effects of errors such as loss and partial distinguishability. For an interferometer with $N$ partially distinguishable input photons and $M$ spatial modes, the Fock basis can have up to ${N+NM-1\choose N}$ elements. We show that it is possible to use a much smaller unsymmetrized basis with size $M^N$ without discarding any information. This enables simulations of the joint effect of loss and partial distinguishability on larger states than is otherwise possible. We demonstrate the technique by providing the first-ever simulations of the generation of imperfect qubits encoded using quantum parity codes, including an example where the Hilbert space is over $60$ orders of magnitude smaller than the $N$-photon Fock space. As part of the analysis, we derive the loss mechanism for partially distinguishable photons. △ Less

Submitted 17 December, 2024; originally announced December 2024.

Comments: 16+7 pages, 9 figures

arXiv:2410.21233 [pdf, other]

ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

Authors: Christian J. Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, Joshua D. Reiss

Abstract: Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized tra… ▽ More Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized training techniques. In this work, we introduce ST-ITO, Style Transfer with Inference-Time Optimization, an approach that instead searches the parameter space of an audio effect chain at inference. This method enables control of arbitrary audio effect chains, including unseen and non-differentiable effects. Our approach employs a learned metric of audio production style, which we train through a simple and scalable self-supervised pretraining strategy, along with a gradient-free optimizer. Due to the limited existing evaluation methods for audio production style transfer, we introduce a multi-part benchmark to evaluate audio production style metrics and style transfer systems. This evaluation demonstrates that our audio representation better captures attributes related to audio production and enables expressive style transfer via control of arbitrary audio effects. △ Less

Submitted 28 October, 2024; originally announced October 2024.

Comments: Accepted to ISMIR 2024. Code available https://github.com/csteinmetz1/st-ito

arXiv:2409.08595 [pdf, ps, other]

doi 10.1145/3715122

Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Authors: Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann

Abstract: Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN ma… ▽ More Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared to simulation results, while being several magnitudes faster than an RTL simulation. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Accepted version for: ACM Transactions on Embedded Computing Systems

Journal ref: Volume 24, Year 2025, Issue 2, Pages 1 - 32

arXiv:2406.08330 [pdf, ps, other]

doi 10.1007/978-3-031-78377-7

It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives

Authors: Alexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann

Abstract: Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that… ▽ More Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability. To alleviate this problem, we propose a novel performance modeling methodology that significantly reduces the number of training samples while maintaining good accuracy. Our approach leverages knowledge of the target hardware architecture and initial parameter sweeps to identify a set of Performance Representatives (PR) for deep neural network (DNN) layers. These PRs are then used for benchmarking, building a statistical performance model, and making estimations. This targeted approach drastically reduces the number of training samples needed, opposed to random sampling, to achieve a better estimation accuracy. We achieve a Mean Absolute Percentage Error (MAPE) of as low as 0.02% for single-layer estimations and 0.68% for whole DNN estimations with less than 10000 training samples. The results demonstrate the superiority of our method for single-layer estimations compared to models trained with randomly sampled datasets of the same size. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted version for: SAMOS'24

Journal ref: Embedded Computer Systems: Architectures, Modeling, and Simulation, LNCS, Volume 15226, Year 2024, Pages 59-75

arXiv:2403.16331 [pdf, other]

Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

Authors: Hanzhi Yin, Gang Cheng, Christian J. Steinmetz, Ruibin Yuan, Richard M. Stern, Roger B. Dannenberg

Abstract: We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured stat… ▽ More We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured state space sequence model (S4), as implementing the state-space model (SSM) has proven to be efficient at learning long-range dependencies and is promising for modeling dynamic range compressors. We present in this paper a deep learning model with S4 layers to model the Teletronix LA-2A analog dynamic range compressor. The model is causal, executes efficiently in real time, and achieves roughly the same quality as previous deep-learning models but with fewer parameters. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2311.01526 [pdf, other]

ATGNN: Audio Tagging Graph Neural Network

Authors: Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell

Abstract: Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough… ▽ More Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough to capture irregular audio objects. In this work, we treat the spectrogram in a more flexible way by considering it as graph structure and process it with a novel graph neural architecture called ATGNN. ATGNN not only combines the capability of CNNs with the global information sharing ability of Graph Neural Networks, but also maps semantic relationships between learnable class embeddings and corresponding spectrogram regions. We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset, achieving comparable results to Transformer based models with significantly lower number of learnable parameters. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.11364 [pdf, other]

High-Fidelity Noise Reduction with Differentiable Signal Processing

Authors: Christian J. Steinmetz, Thomas Walther, Joshua D. Reiss

Abstract: Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control pa… ▽ More Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control parameters, operation at lower sample rates, and a tendency to introduce artifacts. On the other hand, signal processing-based noise reduction algorithms offer fine-grained control and operation on a broad range of content, however, they often require manual operation to achieve the best results. To address the limitations of both approaches, in this work we introduce a method that leverages a signal processing-based denoiser that when combined with a neural network controller, enables fully automatic and high-fidelity noise reduction on both speech and music signals. We evaluate our proposed method with objective metrics and a perceptual listening test. Our evaluation reveals that speech enhancement models can be extended to music, however training the model to remove only stationary noise is critical. Furthermore, our proposed approach achieves performance on par with the deep learning models, while being significantly more efficient and introducing fewer artifacts in some cases. Listening examples are available online at https://tape.it/research/denoiser . △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted for publication at the 155th Convention of the Audio Engineering Society

arXiv:2308.16177 [pdf, other]

General Purpose Audio Effect Removal

Authors: Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss

Abstract: Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects… ▽ More Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects are applied with varying source content. This motivates a more general task, which we refer to as general purpose audio effect removal. We developed a dataset for this task using five audio effects across four different sources and used it to train and evaluate a set of existing architectures. We found that no single model performed optimally on all effect types and sources. To address this, we introduced RemFX, an approach designed to mirror the compositionality of applied effects. We first trained a set of the best-performing effect-specific removal models and then leveraged an audio effect classification model to dynamically construct a graph of our models at inference. We found our approach to outperform single model baselines, although examples with many effects present remain challenging. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

arXiv:2305.13262 [pdf, other]

Modulation Extraction for LFO-driven Audio Effects

Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunità, Joshua D. Reiss

Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measu… ▽ More Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measurement from the audio signal is nontrivial, hindering the modeling process. To address this, we propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations. Since our system imposes no restrictions on the LFO signal shape, we demonstrate its ability to extract quasiperiodic, combined, and distorted modulation signals that are relevant to effect modeling. Furthermore, we show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects using only dry and wet audio pairs, overcoming the need to access the audio effect or internal LFO signal. We make our code available and provide the trained audio effect models in a real-time VST plugin. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at https://christhetree.github.io/mod_extraction/

arXiv:2304.04394 [pdf, other]

Leveraging Neural Representations for Audio Manipulation

Authors: Scott H. Hawley, Christian J. Steinmetz

Abstract: We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and p… ▽ More We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and produce visualizations using representations from two different pretrained autoencoders. Our findings indicate that, while some information about audio manipulations is encoded, this information is both limited and encoded in a non-trivial way. This is supported by our attempts to visualize these representations, which demonstrated that trajectories of representations for common manipulations are typically nonlinear and content dependent, even for linear signal manipulations. As a result, it is not yet clear how these pretrained autoencoders can be used to manipulate audio signals, however, our results indicate this may be due to the lack of disentanglement with respect to common audio manipulations. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted as Express Paper for AES Europe 2023, https://aeseurope.com/

arXiv:2303.05913 [pdf, other]

Bootstrap Consistency for the Mack Bootstrap

Authors: Julia Steinmetz, Carsten Jentsch

Abstract: Mack's distribution-free chain ladder reserving model belongs to the most popular approaches in non-life insurance mathematics. Proposed to determine the first two moments of the reserve, it does not allow to identify the whole distribution of the reserve. For this purpose, Mack's model is usually equipped with a tailor-made bootstrap procedure. Although widely used in practice to estimate the res… ▽ More Mack's distribution-free chain ladder reserving model belongs to the most popular approaches in non-life insurance mathematics. Proposed to determine the first two moments of the reserve, it does not allow to identify the whole distribution of the reserve. For this purpose, Mack's model is usually equipped with a tailor-made bootstrap procedure. Although widely used in practice to estimate the reserve risk, no theoretical bootstrap consistency results exist that justify this approach. To fill this gap in the literature, we adopt the framework proposed by Steinmetz and Jentsch (2022) to derive asymptotic theory in Mack's model. By splitting the reserve into two parts corresponding to process and estimation uncertainty, this enables - for the first time - a rigorous investigation also of the validity of the Mack bootstrap. We prove that the (conditional) distribution of the asymptotically dominating process uncertainty part is correctly mimicked by Mack's bootstrap if the parametric family of distributions of the individual development factors is correctly specified. Otherwise, this is not the case. In contrast, the (conditional) distribution of the estimation uncertainty part is generally not correctly captured by Mack's bootstrap. To tackle this, we propose an alternative Mack-type bootstrap, which is designed to capture also the distribution of the estimation uncertainty part. We illustrate our findings by simulations and show that the newly proposed alternative Mack bootstrap performs superior to the Mack bootstrap. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2211.07718 [pdf, other]

Time-Dependent Hamiltonian Reconstruction using Continuous Weak Measurements

Authors: Karthik Siva, Gerwin Koolstra, John Steinmetz, William P. Livingston, Debmalya Das, Larry Chen, John Mark Kreikebaum, Noah Stevenson, Christian Jünger, David I. Santiago, Irfan Siddiqi, Andrew N. Jordan

Abstract: Reconstructing the Hamiltonian of a quantum system is an essential task for characterizing and certifying quantum processors and simulators. Existing techniques either rely on projective measurements of the system before and after coherent time evolution and do not explicitly reconstruct the full time-dependent Hamiltonian or interrupt evolution for tomography. Here, we experimentally demonstrate… ▽ More Reconstructing the Hamiltonian of a quantum system is an essential task for characterizing and certifying quantum processors and simulators. Existing techniques either rely on projective measurements of the system before and after coherent time evolution and do not explicitly reconstruct the full time-dependent Hamiltonian or interrupt evolution for tomography. Here, we experimentally demonstrate that an a priori unknown, time-dependent Hamiltonian can be reconstructed from continuous weak measurements concurrent with coherent time evolution in a system of two superconducting transmons coupled by a flux-tunable coupler. In contrast to previous work, our technique does not require interruptions, which would distort the recovered Hamiltonian. We introduce an algorithm which recovers the Hamiltonian and density matrix from an incomplete set of continuous measurements and demonstrate that it reliably extracts amplitudes of a variety of single qubit and entangling two qubit Hamiltonians. We further demonstrate how this technique reveals deviations from a theoretical control Hamiltonian which would otherwise be missed by conventional techniques. Our work opens up novel applications for continuous weak measurements, such as studying non-idealities in gates, certifying analog quantum simulators, and performing quantum metrology. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Main text: 12 pages, 4 figures. Appendix: 10 pages, 4 figures

arXiv:2211.00497 [pdf, other]

doi 10.1109/ICASSP49357.2023.10097173

Modelling black-box audio effects with time-varying feature modulation

Authors: Marco Comunità, Christian J. Steinmetz, Huy Phan, Joshua D. Reiss

Abstract: Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the wi… ▽ More Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression. To address this, we propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones, an approach that enables learnable adaptation of the intermediate activations. We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics. We provide sound examples, source code, and pretrained models to faciliate reproducibility. △ Less

Submitted 9 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2207.08759 [pdf, other]

Style Transfer of Audio Effects with Differentiable Signal Processing

Authors: Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss

Abstract: We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effe… ▽ More We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effects as differentiable operators in our framework, perform backpropagation through audio effects, and optimize end-to-end using an audio-domain loss. We use a self-supervised training strategy enabling automatic control of audio effects without the use of any labeled or paired training data. We survey a range of existing and new approaches for differentiable signal processing, showing how each can be integrated into our framework while discussing their trade-offs. We evaluate our approach on both speech and music tasks, demonstrating that our approach generalizes both to unseen recordings and even to sample rates different than those seen during training. Our approach produces convincing production style transfer results with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: Preprint. To appear in the Journal of the Audio Engineering Society

arXiv:2203.06252 [pdf, other]

doi 10.1103/PhysRevA.106.032424

Quantum Telescopy Clock Games

Authors: Robert Czupryniak, Eric Chitambar, John Steinmetz, Andrew N. Jordan

Abstract: We consider the clock game-a task formulated in the framework of quantum information theory-that can be used to improve the existing schemes of quantum-enhanced telescopy. The problem of learning when a stellar photon reaches a telescope is translated into an abstract game, which we call the clock game. A winning strategy is provided that involves performing a quantum non-demolition measurement th… ▽ More We consider the clock game-a task formulated in the framework of quantum information theory-that can be used to improve the existing schemes of quantum-enhanced telescopy. The problem of learning when a stellar photon reaches a telescope is translated into an abstract game, which we call the clock game. A winning strategy is provided that involves performing a quantum non-demolition measurement that verifies which stellar spatio-temporal modes are occupied by a photon without disturbing the phase information. We prove tight lower bounds on the entanglement cost needed to win the clock game, with the amount of necessary entangled bits equaling the number of time-bins being distinguished. This lower bound on the entanglement cost applies to any telescopy protocol that aims to non-destructively extract the time-bin information of an incident photon through local measurements, and our result implies that the protocol of Khabiboulline et al. [\text{Phys. Rev. Lett.} 123, 70504 (2019)] is optimal in terms of entanglement consumption. The full task of the phase extraction is also considered, and we show that the quantum Fisher information of the stellar phase can be achieved by local measurements and shared entanglement without the necessity of nonlinear optical operations. The optimal phase measurement is achieved asymptotically with increasing number of ancilla qubits, whereas a single qubit pair is required if nonlinear operations are allowed. △ Less

Submitted 23 November, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: 18 pages, 7 figures

arXiv:2203.03022 [pdf, ps, other]

HEAR: Holistic Evaluation of Audio Representations

Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear. △ Less

Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

arXiv:2112.02926 [pdf, other]

Steerable discovery of neural audio effects

Authors: Christian J. Steinmetz, Joshua D. Reiss

Abstract: Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio e… ▽ More Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio effects, control of these effects is limited and unintuitive. To address this, we introduce a method for the steerable discovery of neural audio effects. This method enables the design of effects using example recordings provided by the user. We demonstrate how this method produces an effect similar to the target effect, along with interesting inaccuracies, while also providing perceptually relevant controls. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: Accepted to NeurIPS 2021 Workshop on Machine Learning for Creativity and Design

arXiv:2110.03691 [pdf, other]

Direct design of biquad filter cascades with deep learning by sampling random polynomials

Authors: Joseph T. Colonel, Christian J. Steinmetz, Marcus Michelen, Joshua D. Reiss

Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to i… ▽ More Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response. We investigate training with different families of random filters, and find training with a variety of filter families enables better generalization when estimating real-world filters, using head-related transfer functions and guitar cabinets as case studies. We compare our method against existing methods including modified Yule-Walker and gradient descent and show our approach is, on average, both faster and more accurate. △ Less

Submitted 16 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Accepted to ICASSP 2022

arXiv:2110.01436 [pdf, other]

WaveBeat: End-to-end beat and downbeat tracking in the time domain

Authors: Christian J. Steinmetz, Joshua D. Reiss

Abstract: Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectra… ▽ More Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field ($\geq$ 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: To appear at the 151st AES Convention

arXiv:2108.01170 [pdf, other]

doi 10.1103/PhysRevA.108.052408

Optimal qubit circuits for quantum-enhanced telescopes

Authors: Robert Czupryniak, John Steinmetz, Paul G. Kwiat, Andrew N. Jordan

Abstract: We propose two optimal phase-estimation schemes that can be used for quantum-enhanced long-baseline interferometry. By using distributed entanglement, it is possible to eliminate the loss of stellar photons during transmission over the baselines. The first protocol is a sequence of gates using nonlinear optical elements, optimized over all possible measurement schemes to saturate the Cramér-Rao bo… ▽ More We propose two optimal phase-estimation schemes that can be used for quantum-enhanced long-baseline interferometry. By using distributed entanglement, it is possible to eliminate the loss of stellar photons during transmission over the baselines. The first protocol is a sequence of gates using nonlinear optical elements, optimized over all possible measurement schemes to saturate the Cramér-Rao bound. The second approach builds on an existing protocol, which encodes the time of arrival of the stellar photon into a quantum memory. Our modified version reduces both the number of ancilla qubits and the number of gate operations by a factor of two. △ Less

Submitted 15 November, 2023; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: 14 pages, 9 figures

Journal ref: Phys. Rev. A 108, 052408 (2023)

arXiv:2107.07503 [pdf, other]

Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech

Authors: Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia

Abstract: Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-in… ▽ More Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-inspired architecture features a time domain encoder and a filtered noise shaping decoder that models the RIR as a summation of decaying filtered noise signals, along with direct sound and early reflection components. Previous methods for acoustic matching utilize either large models to transform audio to match the target room or predict parameters for algorithmic reverberators. Instead, blind estimation of the RIR enables efficient and realistic transformation with a single convolution. An evaluation demonstrates our model not only synthesizes RIRs that match parameters of the target room, such as the $T_{60}$ and DRR, but also more accurately reproduces perceptual characteristics of the target room, as shown in a listening test when compared to deep learning baselines. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: Accepted to WASPAA 2021. See details at https://facebookresearch.github.io/FiNS/

arXiv:2105.00808 [pdf, ps, other]

doi 10.1103/PhysRevA.105.052229

Continuous measurement of a qudit using dispersively coupled radiation

Authors: John Steinmetz, Debmalya Das, Irfan Siddiqi, Andrew N. Jordan

Abstract: We analyze the continuous monitoring of a qudit coupled to a cavity using both phase-preserving and phase-sensitive amplification. The quantum trajectories of the system are described by a stochastic master equation, for which we derive the appropriate Lindblad operators. The measurement back-action causes spiraling in the state coordinates during collapse, which increases as the system levels bec… ▽ More We analyze the continuous monitoring of a qudit coupled to a cavity using both phase-preserving and phase-sensitive amplification. The quantum trajectories of the system are described by a stochastic master equation, for which we derive the appropriate Lindblad operators. The measurement back-action causes spiraling in the state coordinates during collapse, which increases as the system levels become less distinguishable. We discuss two examples: a two-level system and an $N$-dimensional system and meter with rotational symmetry in the quadrature space. We also provide a comparison of the effects of phase-preserving and phase-sensitive detection on the master equation, and show that the average behavior is the same in both cases, but individual trajectories collapse at different rates depending on the measurement axis in the quadrature plane. △ Less

Submitted 4 June, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: 13 pages, 4 figures

Journal ref: Phys. Rev. A 105, 052229 (2022)

arXiv:2103.15752 [pdf, other]

doi 10.1364/OE.444216

Enhanced on-chip frequency measurement using weak value amplification

Authors: John Steinmetz, Kevin Lyons, Meiting Song, Jaime Cardenas, Andrew N. Jordan

Abstract: We present an integrated design to precisely measure optical frequency using weak value amplification with a multi-mode interferometer. The technique involves introducing a weak perturbation to the system and then post-selecting the data in such a way that the signal is amplified without amplifying the technical noise, as has previously been demonstrated in a free-space setup. We demonstrate the a… ▽ More We present an integrated design to precisely measure optical frequency using weak value amplification with a multi-mode interferometer. The technique involves introducing a weak perturbation to the system and then post-selecting the data in such a way that the signal is amplified without amplifying the technical noise, as has previously been demonstrated in a free-space setup. We demonstrate the advantages of a Bragg grating with two band gaps for obtaining simultaneous, stable high transmission and high dispersion. We numerically model the interferometer in order to demonstrate the amplification effect. The device is shown to have advantages over both the free-space implementation and other methods of measuring optical frequency on a chip, such as an integrated Mach-Zehnder interferometer. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: 13 pages, 10 figures

arXiv:2102.06200 [pdf, other]

Efficient neural networks for real-time modeling of analog dynamic range compression

Authors: Christian J. Steinmetz, Joshua D. Reiss

Abstract: Deep learning approaches have demonstrated success in modeling analog audio effects. Nevertheless, challenges remain in modeling more complex effects that involve time-varying nonlinear elements, such as dynamic range compressors. Existing neural network approaches for modeling compression either ignore the device parameters, do not attain sufficient accuracy, or otherwise require large noncausal… ▽ More Deep learning approaches have demonstrated success in modeling analog audio effects. Nevertheless, challenges remain in modeling more complex effects that involve time-varying nonlinear elements, such as dynamic range compressors. Existing neural network approaches for modeling compression either ignore the device parameters, do not attain sufficient accuracy, or otherwise require large noncausal models prohibiting real-time operation. In this work, we propose a modification to temporal convolutional networks (TCNs) enabling greater efficiency without sacrificing performance. By utilizing very sparse convolutional kernels through rapidly growing dilations, our model attains a significant receptive field using fewer layers, reducing computation. Through a detailed evaluation we demonstrate our efficient and causal approach achieves state-of-the-art performance in modeling the analog LA-2A, is capable of real-time operation on CPU, and only requires 10 minutes of training data. △ Less

Submitted 15 April, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

Comments: Updated and will appear at 152nd AES Convention (note title change)

arXiv:2010.10291 [pdf, other]

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

Authors: Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

Abstract: Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weig… ▽ More Applications of deep learning to automatic multitrack mixing are largely unexplored. This is partly due to the limited available data, coupled with the fact that such data is relatively unstructured and variable. To address these challenges, we propose a domain-inspired model with a strong inductive bias for the mixing task. We achieve this with the application of pre-trained sub-networks and weight sharing, as well as with a sum/difference stereo loss function. The proposed model can be trained with a limited number of examples, is permutation invariant with respect to the input ordering, and places no limit on the number of input sources. Furthermore, it produces human-readable mixing parameters, allowing users to manually adjust or refine the generated mix. Results from a perceptual evaluation involving audio engineers indicate that our approach generates mixes that outperform baseline approaches. To the best of our knowledge, this work demonstrates the first approach in learning multitrack mixing conventions from real-world data at the waveform level, without knowledge of the underlying mixing parameters. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2010.04237 [pdf, other]

Randomized Overdrive Neural Networks

Authors: Christian J. Steinmetz, Joshua D. Reiss

Abstract: By processing audio signals in the time-domain with randomly weighted temporal convolutional networks (TCNs), we uncover a wide range of novel, yet controllable overdrive effects. We discover that architectural aspects, such as the depth of the network, the kernel size, the number of channels, the activation function, as well as the weight initialization, all have a clear impact on the sonic chara… ▽ More By processing audio signals in the time-domain with randomly weighted temporal convolutional networks (TCNs), we uncover a wide range of novel, yet controllable overdrive effects. We discover that architectural aspects, such as the depth of the network, the kernel size, the number of channels, the activation function, as well as the weight initialization, all have a clear impact on the sonic character of the resultant effect, without the need for training. In practice, these effects range from conventional overdrive and distortion, to more extreme effects, as the receptive field grows, similar to a fusion of distortion, equalization, delay, and reverb. To enable use by musicians and producers, we provide a real-time plugin implementation. This allows users to dynamically design networks, listening to the results in real-time. We provide a demonstration and code at https://csteinmetz1.github.io/ronn. △ Less

Submitted 4 August, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: Updating project URL. Now https://csteinmetz1.github.io/ronn

arXiv:1803.07615 [pdf, other]

doi 10.1103/PhysRevA.98.012141

Chaos in Continuously Monitored Quantum Systems: An Optimal Path Approach

Authors: Philippe Lewalle, John Steinmetz, Andrew N. Jordan

Abstract: We predict that continuously monitored quantum dynamics can be chaotic. The optimal paths between past and future boundary conditions can diverge exponentially in time when there is time-dependent evolution and continuous weak monitoring. Optimal paths are defined by extremizing the global probability density to move between two boundary conditions. We investigate the onset of chaos in pure-state… ▽ More We predict that continuously monitored quantum dynamics can be chaotic. The optimal paths between past and future boundary conditions can diverge exponentially in time when there is time-dependent evolution and continuous weak monitoring. Optimal paths are defined by extremizing the global probability density to move between two boundary conditions. We investigate the onset of chaos in pure-state qubit systems with optimal paths generated by a periodic Hamiltonian. Specifically, chaotic quantum dynamics are demonstrated in a scheme where two non-commuting observables of a qubit are continuously monitored, and one measurement strength is periodically modulated. The optimal quantum paths in this example bear similarities to the trajectories of the kicked rotor, or standard map, which is a paradigmatic example of classical chaos. We emphasize connections with the concept of resonance between integrable optimal paths and weak periodic perturbations, as well as our previous work on "multipaths", and connect the optimal path chaos to instabilities in the underlying quantum trajectories. △ Less

Submitted 3 August, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: 12+11 pages, 11 figures. Supplemental Animations can be found at << https://drive.google.com/file/d/1cx__Aggt40s3r8ueTe8LlqZyAWLb5NV1/view?usp=sharing >> (all animations in .pdf format), or << https://drive.google.com/drive/folders/12LgI0dCiSjRYoHO9oiWDaXm7S0PzM7Y1?usp=sharing >> (as individual .mp4 files)

Journal ref: Phys. Rev. A 98, 012141 (2018)

arXiv:1512.03667 [pdf]

An Intuitively Complete Analysis of Godel's Incompleteness

Authors: Jason W. Steinmetz

Abstract: A detailed and rigorous analysis of Gödel's proof of his first incompleteness theorem is presented. The purpose of this analysis is two-fold. The first is to reveal what Gödel actually proved to provide a clear and solid foundation upon which to base future research. The second is to construct a coherent explication of Gödel's proof that is not only approachable by the non-specialist, but also bri… ▽ More A detailed and rigorous analysis of Gödel's proof of his first incompleteness theorem is presented. The purpose of this analysis is two-fold. The first is to reveal what Gödel actually proved to provide a clear and solid foundation upon which to base future research. The second is to construct a coherent explication of Gödel's proof that is not only approachable by the non-specialist, but also brings to light the core principles underlying Gödel's proof. △ Less

Submitted 28 April, 2020; v1 submitted 8 December, 2015; originally announced December 2015.

Comments: 31 pages plus 2 appendices. In v2 multiple minor clarifications were made, two errors were fixed, and PDF bookmarks were added

MSC Class: 03F40 (Primary) 03F50; 03A99 (Secondary) ACM Class: F.4.1; I.2.3

arXiv:1110.1658

Algorithm that Solves 3-SAT in Polynomial Time

Authors: Jason W. Steinmetz

Abstract: The question of whether the complexity class P is equal to the complexity class NP has been a seemingly intractable problem for over 4 decades. It has been clear that if an algorithm existed that would solve the problems in the NP class in polynomial time then P would equal NP. However, no one has yet been able to create that algorithm or to successfully prove that such an algorithm cannot exist.… ▽ More The question of whether the complexity class P is equal to the complexity class NP has been a seemingly intractable problem for over 4 decades. It has been clear that if an algorithm existed that would solve the problems in the NP class in polynomial time then P would equal NP. However, no one has yet been able to create that algorithm or to successfully prove that such an algorithm cannot exist. The algorithm that will be presented in this paper solves the 3-satisfiability or 3-CNF-SAT problem, which has been proven to be NP-complete. △ Less

Submitted 2 June, 2015; v1 submitted 5 October, 2011; originally announced October 2011.

Comments: This paper has been withdrawn by the author because the integer operations within the algorithm cannot be proven to have a polynomial run time

ACM Class: F.1.3; I.1.2

Showing 1–32 of 32 results for author: Steinmetz, J