Skip to main content

Showing 1–11 of 11 results for author: Meyer, B T

Searching in archive eess. Search in all archives.
.
  1. Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

    Authors: Nils L. Westhausen, Hendrik Kayser, Theresa Jansen, Bernd T. Meyer

    Abstract: Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorit… ▽ More

    Submitted 30 October, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: This work is published in IEEE/ACM TASLP. This version corresponds to the accepted version

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4596-4606, 2024

  2. arXiv:2312.05173  [pdf, other

    eess.AS cs.SD

    Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: In this paper, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variable… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted for publication at IEEE ICASSP 2024 OJSP track

  3. arXiv:2307.08858  [pdf, other

    eess.AS

    Low bit rate binaural link for improved ultra low-latency low-complexity multichannel speech enhancement in Hearing Aids

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: Speech enhancement in hearing aids is a challenging task since the hardware limits the number of possible operations and the latency needs to be in the range of only a few milliseconds. We propose a deep-learning model compatible with these limitations, which we refer to as Group-Communication Filter-and-Sum Network (GCFSnet). GCFSnet is a causal multiple-input single output enhancement model usin… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted at WASPAA 2023

  4. arXiv:2304.09585  [pdf, other

    eess.AS

    Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-Embedding Mapping

    Authors: Paul M. Reuter, Christian Rollwage, Bernd T. Meyer

    Abstract: In this paper, we propose a multilingual query-by-example keyword spotting (KWS) system based on a residual neural network. The model is trained as a classifier on a multilingual keyword dataset extracted from Common Voice sentences and fine-tuned using circle loss. We demonstrate the generalization ability of the model to new languages and report a mean reduction in EER of 59.2 % for previously s… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to ICASSP 2023

  5. arXiv:2204.01300  [pdf, other

    eess.AS cs.SD

    tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: This paper introduces a real-time time-domain packet loss concealment (PLC) neural-network (tPLCnet). It efficiently predicts lost frames from a short context buffer in a sequence-to-one (seq2one) fashion. Because of its seq2one structure, a continuous inference of the model is not required since it can be triggered when packet loss is actually detected. It is trained on 64h of open-source speech… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  6. arXiv:2203.09148  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Prediction of speech intelligibility with DNN-based performance measures

    Authors: Angel Mario Castro Martinez, Constantin Spille, Jana Roßbach, Birger Kollmeier, Bernd T. Meyer

    Abstract: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Journal ref: Computer Speech & Language, 74, p.101329 (2022)

  7. arXiv:2111.01914  [pdf, other

    eess.AS cs.SD

    Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks

    Authors: Nils L. Westhausen, Rainer Huber, Hannah Baumgartner, Ragini Sinha, Jan Rennies, Bernd T. Meyer

    Abstract: Listening to the audio of TV broadcast signals can be challenging for hearing-impaired as well as normal-hearing listeners, especially when background sounds are prominent or too loud compared to the speech signal. This can result in a reduced satisfaction and increased listening effort of the listeners. Since the broadcast sound is usually premixed, we perform a subjective evaluation for quantify… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

  8. arXiv:2010.14337  [pdf, other

    eess.AS

    Acoustic echo cancellation with the dual-signal transformation LSTM network

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: This paper applies the dual-signal transformation LSTM network (DTLN) to the task of real-time acoustic echo cancellation (AEC). The DTLN combines a short-time Fourier transformation and a learned feature representation in a stacked network approach, which enables robust information processing in the time-frequency and in the time domain, which also includes phase information. The model is only tr… ▽ More

    Submitted 23 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted in to ICASSP 2021

  9. EEG-based Auditory Attention Decoding: Towards Neuro-Steered Hearing Devices

    Authors: Simon Geirnaert, Servaas Vandecappelle, Emina Alickovic, Alain de Cheveigné, Edmund Lalor, Bernd T. Meyer, Sina Miran, Tom Francart, Alexander Bertrand

    Abstract: People suffering from hearing impairment often have difficulties participating in conversations in so-called `cocktail party' scenarios with multiple people talking simultaneously. Although advanced algorithms exist to suppress background noise in these situations, a hearing device also needs information on which of these speakers the user actually aims to attend to. The correct (attended) speaker… ▽ More

    Submitted 23 April, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

  10. arXiv:2005.07551  [pdf, other

    eess.AS cs.SD

    Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression

    Authors: Nils L. Westhausen, Bernd T. Meyer

    Abstract: This paper introduces a dual-signal transformation LSTM network (DTLN) for real-time speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge). This approach combines a short-time Fourier transform (STFT) and a learned analysis and synthesis basis in a stacked-network approach with less than one million parameters. The model was trained on 500 h of noisy speech provided by… ▽ More

    Submitted 22 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: Accepted by Interspeech 2020

  11. DNN-Based Speech Presence Probability Estimation for Multi-Frame Single-Microphone Speech Enhancement

    Authors: Marvin Tammen, Dörte Fischer, Bernd T. Meyer, Simon Doclo

    Abstract: Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-power-distortionless-response (MFMPDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, pro… ▽ More

    Submitted 14 November, 2022; v1 submitted 21 May, 2019; originally announced May 2019.