Skip to main content

Showing 1–5 of 5 results for author: Wiesler, S

.
  1. arXiv:2306.06954  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

    Authors: Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler

    Abstract: Convolutional frontends are a typical choice for Transformer-based automatic speech recognition to preprocess the spectrogram, reduce its sequence length, and combine local information in time and frequency similarly. However, the width and height of an audio spectrogram denote different information, e.g., due to reverberation as well as the articulatory system, the time axis has a clear left-to-r… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  2. arXiv:2210.16238  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Contextual-Utterance Training for Automatic Speech Recognition

    Authors: Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler

    Abstract: Recent studies of streaming automatic speech recognition (ASR) recurrent neural network transducer (RNN-T)-based systems have fed the encoder with past contextual information in order to improve its word error rate (WER) performance. In this paper, we first propose a contextual-utterance training technique which makes use of the previous and future contextual utterances in order to do an implicit… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  3. arXiv:2011.10538  [pdf, other

    eess.AS cs.SD

    Improving RNN-T ASR Accuracy Using Context Audio

    Authors: Andreas Schwarz, Ilya Sklyar, Simon Wiesler

    Abstract: We present a training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to learn to exploit context audio from a stream, using segmented or partially labeled sequences of the stream during training. We show that the use of context audio during training and inference can lead to word error rate reductions o… ▽ More

    Submitted 15 June, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  4. arXiv:2008.04034  [pdf, other

    eess.AS cs.SD

    Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

    Authors: Egor Lakomkin, Jahn Heymann, Ilya Sklyar, Simon Wiesler

    Abstract: Subwords are the most widely used output units in end-to-end speech recognition. They combine the best of two worlds by modeling the majority of frequent words directly and at the same time allow open vocabulary speech recognition by backing off to shorter units or characters to construct words unseen during training. However, mapping text to subwords is ambiguous and often multiple segmentation v… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted at Interspeech 2020

  5. arXiv:1611.06148  [pdf, other

    stat.ML cs.LG cs.NE

    Compacting Neural Network Classifiers via Dropout Training

    Authors: Yotaro Kubo, George Tucker, Simon Wiesler

    Abstract: We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency. In the proposed method, we introduce a sparsity-inducing prior on the per unit dropout retention probability so that the optimizer can effectively prune hi… ▽ More

    Submitted 24 May, 2017; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: Submitted to AISTATS 2017 (Short-version is accepted to NIPS Workshop on Efficient Methods for Deep Neural Networks)