Skip to main content

Showing 1–4 of 4 results for author: Kumakura, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2201.09427  [pdf, other

    eess.AS cs.SD

    Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

    Authors: Rem Hida, Masaki Hamada, Chie Kamada, Emiru Tsunoo, Toshiyuki Sekiya, Toshiyuki Kumakura

    Abstract: Although end-to-end text-to-speech (TTS) models can generate natural speech, challenges still remain when it comes to estimating sentence-level phonetic and prosodic information from raw text in Japanese TTS systems. In this paper, we propose a method for polyphone disambiguation (PD) and accent prediction (AP). The proposed method incorporates explicit features extracted from morphological analys… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

    Comments: 5 pages, 2 figures. Accepted to ICASSP2022

  2. arXiv:1910.11871  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Towards Online End-to-end Transformer Automatic Speech Recognition

    Authors: Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

    Abstract: The Transformer self-attention network has recently shown promising performance as an alternative to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) systems. However, Transformer has a drawback in that the entire input sequence is required to compute self-attention. We have proposed a block processing method for the Transformer encoder by introducing a context-awar… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: arXiv admin note: text overlap with arXiv:1910.07204

  3. arXiv:1910.07204  [pdf, ps, other

    eess.AS cs.CL

    Transformer ASR with Contextual Block Processing

    Authors: Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe

    Abstract: The Transformer self-attention network has recently shown promising performance as an alternative to recurrent neural networks (RNNs) in end-to-end (E2E) automatic speech recognition (ASR) systems. However, the Transformer has a drawback in that the entire input sequence is required to compute self-attention. In this paper, we propose a new block processing method for the Transformer encoder by in… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted for ASRU 2019

  4. arXiv:1905.07149  [pdf, ps, other

    eess.AS cs.CL cs.SD

    End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

    Authors: Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura

    Abstract: An on-device DNN-HMM speech recognition system efficiently works with a limited vocabulary in the presence of a variety of predictable noise. In such a case, vocabulary and environment adaptation is highly effective. In this paper, we propose a novel method of end-to-end (E2E) adaptation, which adjusts not only an acoustic model (AM) but also a weighted finite-state transducer (WFST). We convert a… ▽ More

    Submitted 24 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

    Comments: accepted for Interspeech 2019