Skip to main content

Showing 1–6 of 6 results for author: Westhausen, N L

Searching in archive cs. Search in all archives.
.
  1. Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

    Authors: Nils L. Westhausen, Hendrik Kayser, Theresa Jansen, Bernd T. Meyer

    Abstract: Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorit… ▽ More

    Submitted 30 October, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: This work is published in IEEE/ACM TASLP. This version corresponds to the accepted version

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4596-4606, 2024

  2. arXiv:2312.05173  [pdf, other

    eess.AS cs.SD

    Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: In this paper, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variable… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted for publication at IEEE ICASSP 2024 OJSP track

  3. arXiv:2204.01300  [pdf, other

    eess.AS cs.SD

    tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: This paper introduces a real-time time-domain packet loss concealment (PLC) neural-network (tPLCnet). It efficiently predicts lost frames from a short context buffer in a sequence-to-one (seq2one) fashion. Because of its seq2one structure, a continuous inference of the model is not required since it can be triggered when packet loss is actually detected. It is trained on 64h of open-source speech… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  4. arXiv:2111.02351  [pdf, other

    cs.SD cs.LG eess.AS

    Weight, Block or Unit? Exploring Sparsity Tradeoffs for Speech Enhancement on Tiny Neural Accelerators

    Authors: Marko Stamenovic, Nils L. Westhausen, Li-Chia Yang, Carl Jensen, Alex Pawlicki

    Abstract: We explore network sparsification strategies with the aim of compressing neural speech enhancement (SE) down to an optimal configuration for a new generation of low power microcontroller based neural accelerators (microNPU's). We examine three unique sparsity structures: weight pruning, block pruning and unit pruning; and discuss their benefits and drawbacks when applied to SE. We focus on the int… ▽ More

    Submitted 9 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: To appear in NeurIPS 2021 Efficient Natural Langauge and Speech Processing Workshop as oral-spotlight presentation

  5. arXiv:2111.01914  [pdf, other

    eess.AS cs.SD

    Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks

    Authors: Nils L. Westhausen, Rainer Huber, Hannah Baumgartner, Ragini Sinha, Jan Rennies, Bernd T. Meyer

    Abstract: Listening to the audio of TV broadcast signals can be challenging for hearing-impaired as well as normal-hearing listeners, especially when background sounds are prominent or too loud compared to the speech signal. This can result in a reduced satisfaction and increased listening effort of the listeners. Since the broadcast sound is usually premixed, we perform a subjective evaluation for quantify… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

  6. arXiv:2005.07551  [pdf, other

    eess.AS cs.SD

    Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: This paper introduces a dual-signal transformation LSTM network (DTLN) for real-time speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge). This approach combines a short-time Fourier transform (STFT) and a learned analysis and synthesis basis in a stacked-network approach with less than one million parameters. The model was trained on 500 h of noisy speech provided by… ▽ More

    Submitted 22 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: Accepted by Interspeech 2020