Skip to main content

Showing 1–5 of 5 results for author: Hilmes, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.10031  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Unified Learnable 2D Convolutional Feature Extraction for ASR

    Authors: Peter Vieting, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

    Abstract: Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain heavily influenced by classical methods. While this inductive bias may ease the system design, our work aims to develop a more generic front-end for feature extra… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: Accepted at ITG Conference on Speech Communication 2025

  2. arXiv:2506.09804  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Regularizing Learnable Feature Extraction for Automatic Speech Recognition

    Authors: Peter Vieting, Maximilian Kannen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

    Abstract: Neural front-ends are an appealing alternative to traditional, fixed feature extraction pipelines for automatic speech recognition (ASR) systems since they can be directly trained to fit the acoustic model. However, their performance often falls short compared to classical methods, which we show is largely due to their increased susceptibility to overfitting. This work therefore investigates regul… ▽ More

    Submitted 30 September, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: Proceedings of Interspeech 2025

  3. arXiv:2506.01503  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Analyzing the Importance of Blank for CTC-Based Knowledge Distillation

    Authors: Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter

    Abstract: With the rise of large pre-trained foundation models for automatic speech recognition new challenges appear. While the performance of these models is good, runtime and cost of inference increases. One approach to make use of their strength while retaining efficiency is to distill their knowledge to smaller models during training. In this work, we explore different CTC-based distillation variants,… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted for Interspeech 2025

  4. arXiv:2407.17997  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

    Authors: Benedikt Hilmes, Nick Rossenbach, and Ralf Schlüter

    Abstract: In this work we evaluate the utility of synthetic data for training automatic speech recognition (ASR). We use the ASR training data to train a text-to-speech (TTS) system similar to FastSpeech-2. With this TTS we reproduce the original training data, training ASR systems solely on synthetic data. For ASR, we use three different architectures, attention-based encoder-decoder, hybrid deep neural ne… ▽ More

    Submitted 26 October, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: Published at the SynData4GenAI 2024 workshop

  5. arXiv:2310.08132  [pdf, other

    cs.CL cs.SD eess.AS

    On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

    Authors: Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

    Abstract: Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks. It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training. By using a novel oracle setup we show h… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at ASRU 2023