Skip to main content

Showing 1–5 of 5 results for author: Someki, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01845  [pdf, ps, other

    eess.AS cs.LG cs.SD

    On-device Streaming Discrete Speech Units

    Authors: Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe

    Abstract: Discrete speech units (DSUs) are derived from clustering the features of self-supervised speech models (S3Ms). DSUs offer significant advantages for on-device streaming speech applications due to their rich phonetic information, high transmission efficiency, and seamless integration with large language models. However, conventional DSU-based approaches are impractical as they require full-length s… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025, source code at https://github.com/Masao-Someki/StreamingDSU

  2. arXiv:2505.14874  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

    Authors: Chin-Jou Li, Eunjung Yeo, Kwanghee Choi, Paula Andrea Pérez-Toro, Masao Someki, Rohan Kumar Das, Zhengjun Yue, Juan Rafael Orozco-Arroyave, Elmar Nöth, David R. Mortensen

    Abstract: Automatic speech recognition (ASR) for dysarthric speech remains challenging due to data scarcity, particularly in non-English languages. To address this, we fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions, then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech.… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 5 pages, 1 figure, Accepted to Interspeech 2025

  3. arXiv:2409.09506  [pdf, other

    cs.SD cs.AI eess.AS

    ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

    Authors: Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe

    Abstract: We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, a… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Accepted to SLT 2024

  4. Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

    Authors: Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

    Abstract: Attention-based encoder-decoder models with autoregressive (AR) decoding have proven to be the dominant approach for automatic speech recognition (ASR) due to their superior accuracy. However, they often suffer from slow inference. This is primarily attributed to the incremental calculation of the decoder. This work proposes a partially AR framework, which employs segment-level vectorized beam sea… ▽ More

    Submitted 30 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted at ASRU 2023

    Journal ref: IEEE Automatic Speech Recognition and Understanding Workshop 2023

  5. A Comparative Study on Transformer vs RNN in Speech Applications

    Authors: Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang

    Abstract: Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS). This paper focuses on an emergent sequence-to-sequence model called Transformer, which achieves state-of-the-art performance in neural machine translation and other natural language processing applications. We underto… ▽ More

    Submitted 28 September, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: Accepted at ASRU 2019

    Journal ref: IEEE Automatic Speech Recognition and Understanding Workshop 2019