Skip to main content

Showing 1–4 of 4 results for author: Nakagome, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.01263  [pdf, ps, other

    cs.CL cs.SD eess.AS

    WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing

    Authors: Yu Nakagome, Michael Hentschel

    Abstract: Despite recent advances in end-to-end speech recognition methods, the output tends to be biased to the training data's vocabulary, resulting in inaccurate recognition of proper nouns and other unknown terms. To address this issue, we propose a method to improve recognition accuracy of such rare words in CTC-based models without additional training or text-to-speech systems. Specifically, keyword s… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  2. arXiv:2406.14890  [pdf, other

    cs.CL eess.AS

    InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions

    Authors: Yu Nakagome, Michael Hentschel

    Abstract: Despite recent advances in end-to-end speech recognition methods, their output is biased to the training data's vocabulary, resulting in inaccurate recognition of unknown terms or proper nouns. To improve the recognition accuracy for a given set of such terms, we propose an adaptation parameter-free approach based on Self-conditioned CTC. Our method improves the recognition accuracy of misrecogniz… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2204.00174  [pdf, other

    cs.CL cs.SD eess.AS

    InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

    Authors: Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

    Abstract: This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions. During the training, intermediate predictions are changed to incorrect intermediate predictions, and fed in… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

    Comments: This paper was submitted to INTERSPEECH2022

  4. arXiv:1911.04228  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function

    Authors: Masahito Togami, Yoshiki Masuyama, Tatsuya Komatsu, Yu Nakagome

    Abstract: In this paper, we propose a multi-channel speech source separation with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation with a statistical model. As a statistical model of microphone input signal, we adopts a time… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.