Skip to main content

Showing 1–18 of 18 results for author: Casebeer, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.00681  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Learning to Upsample and Upmix Audio in the Latent Domain

    Authors: Dimitrios Bralios, Paris Smaragdis, Jonah Casebeer

    Abstract: Neural audio autoencoders create compact latent representations that preserve perceptually important information, serving as the foundation for both modern audio compression systems and generation approaches like next-token prediction and latent diffusion. Despite their prevalence, most audio processing operations, such as spatial and spectral up-sampling, still inefficiently operate on raw wavefo… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  2. arXiv:2410.05167  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Presto! Distilling Steps and Layers for Accelerating Music Generation

    Authors: Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

    Abstract: Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion mo… ▽ More

    Submitted 16 April, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted as Spotlight at ICLR 2025

  3. arXiv:2403.00977  [pdf, other

    cs.SD eess.AS

    Scaling Up Adaptive Filter Optimizers

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: We introduce a new online adaptive filtering method called supervised multi-step adaptive filters (SMS-AF). Our method uses neural networks to control or optimize linear multi-delay or multi-channel frequency-domain filters and can flexibly scale-up performance at the cost of increased compute -- a property rarely addressed in the AF literature, but critical for many applications. To do so, we ext… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  4. arXiv:2312.10605  [pdf, other

    cs.SD eess.AS

    Meta-AF Echo Cancellation for Improved Keyword Spotting

    Authors: Jonah Casebeer, Junkai Wu, Paris Smaragdis

    Abstract: Adaptive filters (AFs) are vital for enhancing the performance of downstream tasks, such as speech recognition, sound event detection, and keyword spotting. However, traditional AF design prioritizes isolated signal-level objectives, often overlooking downstream task performance. This can lead to suboptimal performance. Recent research has leveraged meta-learning to automatically learn AF update r… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 5 pages, 2 figures, ICASSP 2024

  5. arXiv:2209.09955  [pdf, other

    cs.SD eess.AS

    Meta-Learning for Adaptive Filters with Higher-Order Frequency Dependencies

    Authors: Junkai Wu, Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filters are applicable to many signal processing tasks including acoustic echo cancellation, beamforming, and more. Adaptive filters are typically controlled using algorithms such as least-mean squares(LMS), recursive least squares(RLS), or Kalman filter updates. Such models are often applied in the frequency domain, assume frequency independent processing, and do not exploit higher-order… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Source code and audio examples: https://jmcasebeer.github.io/metaaf/higher-order

  6. arXiv:2204.11942  [pdf, other

    cs.SD eess.AS eess.SP

    Meta-AF: Meta-Learning for Adaptive Filters

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filtering algorithms are pervasive throughout signal processing and have had a material impact on a wide variety of domains including audio processing, telecommunications, biomedical sensing, astrophysics and cosmology, seismology, and many more. Adaptive filters typically operate via specialized online, iterative optimization methods such as least-mean squares or recursive least squares… ▽ More

    Submitted 21 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted to ACM/IEEE TASLP. Source code and audio examples: https://jmcasebeer.github.io/projects/metaaf

  7. arXiv:2112.04613  [pdf, other

    cs.SD eess.AS

    NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

    Authors: Jonah Casebeer, Jacob Donley, Daniel Wong, Buye Xu, Anurag Kumar

    Abstract: Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this paper, we propose Neural Integrated Covariance Estimators for Beamformers, NICE-Beam. NICE-Beam is a general technique for learning how to estimate time-varying… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  8. arXiv:2110.04284  [pdf, other

    cs.SD eess.AS

    Auto-DSP: Learning to Optimize Acoustic Echo Cancellers

    Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

    Abstract: Adaptive filtering algorithms are commonplace in signal processing and have wide-ranging applications from single-channel denoising to multi-channel acoustic echo cancellation and adaptive beamforming. Such algorithms typically operate via specialized online, iterative optimization methods and have achieved tremendous success, but require expert knowledge, are slow to develop, and are difficult to… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Source code and audio examples: https://jmcasebeer.github.io/projects/auto-dsp/

  9. arXiv:2105.07596  [pdf, other

    cs.SD eess.AS

    Sound Event Detection with Adaptive Frequency Selection

    Authors: Zhepei Wang, Jonah Casebeer, Adam Clemmitt, Efthymios Tzinis, Paris Smaragdis

    Abstract: In this work, we present HIDACT, a novel network architecture for adaptive computation for efficiently recognizing acoustic events. We evaluate the model on a sound event detection task where we train it to adaptively process frequency bands. The model learns to adapt to the input without requesting all frequency sub-bands provided. It can make confident predictions within fewer processing steps,… ▽ More

    Submitted 29 July, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2021

  10. arXiv:2105.04727  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Separate but Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

    Authors: Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis

    Abstract: We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a real-world scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training… ▽ More

    Submitted 26 September, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to WASPAA 21

    Journal ref: 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

  11. arXiv:2102.06610  [pdf, other

    eess.AS cs.LG

    Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

    Authors: Jonah Casebeer, Vinjai Vale, Umut Isik, Jean-Marc Valin, Ritwik Giri, Arvindh Krishnaswamy

    Abstract: Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompa… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, ICASSP 2021

  12. arXiv:2011.07348  [pdf, other

    cs.SD eess.AS

    Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays

    Authors: Jonah Casebeer, Jamshed Kaikaus, Paris Smaragdis

    Abstract: In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trained to reduce communication-costs through a penalty term which represents a task-performance/ communication-cost trade-off. While working within the t… ▽ More

    Submitted 21 April, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

    Comments: 5 pages, 4 figures, ICASSP 2021

  13. arXiv:2002.09286  [pdf, other

    eess.AS cs.LG cs.NE cs.SD stat.ML

    Efficient Trainable Front-Ends for Neural Speech Enhancement

    Authors: Jonah Casebeer, Umut Isik, Shrikant Venkataramani, Arvindh Krishnaswamy

    Abstract: Many neural speech enhancement and source separation systems operate in the time-frequency domain. Such models often benefit from making their Short-Time Fourier Transform (STFT) front-ends trainable. In current literature, these are implemented as large Discrete Fourier Transform matrices; which are prohibitively inefficient for low-compute systems. We present an efficient, trainable front-end ba… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 5 pages, 5 figures, ICASSP 2020

  14. arXiv:1905.01391  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Tensor Factorization for Spatially-Aware Scene Decomposition

    Authors: Jonah Casebeer, Michael Colomb, Paris Smaragdis

    Abstract: We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone. To this end, we formulate a neural network architecture that can be interpreted as a nonnegative tensor factorization of a multi-channel audio recording. By clustering on the learned… ▽ More

    Submitted 26 September, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: 5 pages, 5 figures, accepted to WASPAA 2019

  15. arXiv:1811.07065  [pdf, other

    eess.AS cs.SD eess.SP

    Multipath-enabled private audio with noise

    Authors: Anadi Chaman, Yu-Jeh Liu, Jonah Casebeer, Ivan Dokmanić

    Abstract: We address the problem of privately communicating audio messages to multiple listeners in a reverberant room using a set of loudspeakers. We propose two methods based on emitting noise. In the first method, the loudspeakers emit noise signals that are appropriately filtered so that after echoing along multiple paths in the room, they sum up and descramble to yield distinct meaningful audio message… ▽ More

    Submitted 13 March, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

  16. arXiv:1811.01251  [pdf, other

    cs.SD eess.AS

    Multi-View Networks For Multi-Channel Audio Classification

    Authors: Jonah Casebeer, Zhepei Wang, Paris Smaragdis

    Abstract: In this paper we introduce the idea of multi-view networks for sound classification with multiple sensors. We show how one can build a multi-channel sound recognition model trained on a fixed number of channels, and deploy it to scenarios with arbitrary (and potentially dynamically changing) number of input channels and not observe degradation in performance. We demonstrate that at inference time… ▽ More

    Submitted 26 February, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

    Comments: 5 pages, 7 figures, Accepted to ICASSP 2019

  17. arXiv:1809.05862  [pdf, other

    cs.SD eess.AS

    Cocktails, but no party: multipath-enabled private audio

    Authors: Yu-Jeh Liu, Jonah Casebeer, Ivan Dokmanić

    Abstract: We describe a private audio messaging system that uses echoes to unscramble messages at a few predetermined locations in a room. The system works by splitting the audio into short chunks and emitting them from different loudspeakers. The chunks are filtered so that as they echo around the room, they sum to noise everywhere except at a few chosen focusing spots where they exactly reproduce the inte… ▽ More

    Submitted 16 September, 2018; originally announced September 2018.

  18. arXiv:1806.05296  [pdf, other

    eess.AS cs.SD

    Multi-View Networks for Denoising of Arbitrary Numbers of Channels

    Authors: Jonah Casebeer, Brian Luc, Paris Smaragdis

    Abstract: We propose a set of denoising neural networks capable of operating on an arbitrary number of channels at runtime, irrespective of how many channels they were trained on. We coin the proposed models multi-view networks since they operate using multiple views of the same data. We explore two such architectures and show how they outperform traditional denoising models in multi-channel scenarios. Addi… ▽ More

    Submitted 23 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: 5 pages, 6 figures, IWAENC 2018