Skip to main content

Showing 1–5 of 5 results for author: Omachi, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2211.05967  [pdf, ps, other

    cs.CL eess.AS

    Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

    Authors: Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

    Abstract: The black-box nature of end-to-end speech translation (E2E ST) systems makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. A major challenge arises f… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  2. arXiv:2012.10128  [pdf, other

    eess.AS

    Toward Streaming ASR with Non-Autoregressive Insertion-based Model

    Authors: Yuya Fujita, Tianzi Wang, Shinji Watanabe, Motoi Omachi

    Abstract: Neural end-to-end (E2E) models have become a promising technique to realize practical automatic speech recognition (ASR) systems. When realizing such a system, one important issue is the segmentation of audio to deal with streaming input or long recording. After audio segmentation, the ASR model with a small real-time factor (RTF) is preferable because the latency of the system can be faster. Rece… ▽ More

    Submitted 16 July, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

  3. arXiv:2005.13211  [pdf, other

    eess.AS cs.SD

    Insertion-Based Modeling for End-to-End Automatic Speech Recognition

    Authors: Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan

    Abstract: End-to-end (E2E) models have gained attention in the research field of automatic speech recognition (ASR). Many E2E models proposed so far assume left-to-right autoregressive generation of an output token sequence except for connectionist temporal classification (CTC) and its variants. However, left-to-right decoding cannot consider the future output context, and it is not always optimal for ASR.… ▽ More

    Submitted 15 November, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: INTERSPEECH 2020

  4. arXiv:1912.11793  [pdf, ps, other

    eess.AS

    Attention-based ASR with Lightweight and Dynamic Convolutions

    Authors: Yuya Fujita, Aswin Shanmugam Subramanian, Motoi Omachi, Shinji Watanabe

    Abstract: End-to-end (E2E) automatic speech recognition (ASR) with sequence-to-sequence models has gained attention because of its simple model training compared with conventional hidden Markov model based ASR. Recently, several studies report the state-of-the-art E2E ASR results obtained by Transformer. Compared to recurrent neural network (RNN) based E2E models, training of Transformer is more efficient a… ▽ More

    Submitted 19 February, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

    Comments: ICASSP 2020

  5. arXiv:1810.10727  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker Selective Beamformer with Keyword Mask Estimation

    Authors: Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita

    Abstract: This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker an… ▽ More

    Submitted 7 November, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: Accepted by SLT2018