Skip to main content

Showing 1–10 of 10 results for author: Strauss, M

Searching in archive eess. Search in all archives.
.
  1. FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates

    Authors: Nicola Pia, Martin Strauss, Markus Multrus, Bernd Edler

    Abstract: This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing flow via an ODE solver to generate a high-quality mel spectrogram. This is the first time that a CFM-… ▽ More

    Submitted 6 April, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  2. arXiv:2408.12982  [pdf, other

    eess.AS

    Inference-Adaptive Neural Steering for Real-Time Area-Based Sound Source Separation

    Authors: Martin Strauss, Wolfgang Mack, María Luis Valero, Okan Köpüklü

    Abstract: We propose a novel Neural Steering technique that adapts the target area of a spatial-aware multi-microphone sound source separation algorithm during inference without the necessity of retraining the deep neural network (DNN). To achieve this, we first train a DNN aiming to retain speech within a target region, defined by an angular span, while suppressing sound sources stemming from other directi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  3. arXiv:2408.09810  [pdf, other

    eess.AS cs.SD

    Efficient Area-based and Speaker-Agnostic Source Separation

    Authors: Martin Strauss, Okan Köpüklü

    Abstract: This paper introduces an area-based source separation method designed for virtual meeting scenarios. The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone array, while suppressing all other sounds. Therefore, we employ an efficient neural network architecture adapted for multi-channel input to encompass the predefi… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Preprint. Accepted to the International Workshop on Acoustic Signal Enhancement (IWAENC 2024)

  4. SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

    Authors: Martin Strauss, Nicola Pia, Nagashree K. S. Rao, Bernd Edler

    Abstract: This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFG… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  5. arXiv:2305.19100  [pdf, other

    eess.AS cs.SD

    Predicting Preferred Dialogue-to-Background Loudness Difference in Dialogue-Separated Audio

    Authors: Luca Resti, Martin Strauss, Matteo Torcoli, Emanuël Habets, Bernd Edler

    Abstract: Dialogue Enhancement (DE) enables the rebalancing of dialogue and background sounds to fit personal preferences and needs in the context of broadcast audio. When individual audio stems are unavailable from production, Dialogue Separation (DS) can be applied to the final audio mixture to obtain estimates of these stems. This work focuses on Preferred Loudness Differences (PLDs) between dialogue and… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Paper accepted at the 15th International Conference on Quality of Multimedia Experience (QoMEX), 4 pages, 2 figures

  6. arXiv:2305.08812  [pdf, other

    cs.LO cs.SE eess.SY

    Slow Down, Move Over: A Case Study in Formal Verification, Refinement, and Testing of the Responsibility-Sensitive Safety Model for Self-Driving Cars

    Authors: Megan Strauss, Stefan Mitsch

    Abstract: Technology advances give us the hope of driving without human error, reducing vehicle emissions and simplifying an everyday task with the future of self-driving cars. Making sure these vehicles are safe is very important to the continuation of this field. In this paper, we formalize the Responsibility-Sensitive Safety model (RSS) for self-driving cars and prove the safety and optimality of this mo… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  7. arXiv:2210.11654  [pdf, other

    eess.AS cs.SD

    Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation

    Authors: Martin Strauss, Matteo Torcoli, Bernd Edler

    Abstract: Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted for Presentation at IEEE SLT 2022

  8. arXiv:2106.09093  [pdf, other

    eess.AS cs.SD

    A Hands-on Comparison of DNNs for Dialog Separation Using Transfer Learning from Music Source Separation

    Authors: Martin Strauss, Jouni Paulus, Matteo Torcoli, Bernd Edler

    Abstract: This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the consid… ▽ More

    Submitted 22 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: accepted in INTERSPEECH 2021

  9. A Flow-Based Neural Network for Time Domain Speech Enhancement

    Authors: Martin Strauss, Bernd Edler

    Abstract: Speech enhancement involves the distinction of a target speech signal from an intrusive background. Although generative approaches using Variational Autoencoders or Generative Adversarial Networks (GANs) have increasingly been used in recent years, normalizing flow (NF) based systems are still scarse, despite their success in related fields. Thus, in this paper we propose a NF framework to directl… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to ICASSP 2021

  10. arXiv:1907.04655  [pdf, other

    eess.SP cs.SD eess.AS

    Audio-Based Search and Rescue with a Drone: Highlights from the IEEE Signal Processing Cup 2019 Student Competition

    Authors: Antoine Deleforge, Diego Di Carlo, Martin Strauss, Romain Serizel, Lucio Marcenaro

    Abstract: Unmanned aerial vehicles (UAV), commonly referred to as drones, have raised increasing interest in recent years. Search and rescue scenarios where humans in emergency situations need to be quickly found in areas difficult to access constitute an important field of application for this technology. While research efforts have mostly focused on developing video-based solutions for this task \cite{lop… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Journal ref: IEEE Signal Processing Magazine, Institute of Electrical and Electronics Engineers, In press