Skip to main content

Showing 1–4 of 4 results for author: Onda, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.16207  [pdf, ps, other

    cs.SD eess.AS

    Differentiable K-means for Fully-optimized Discrete Token-based ASR

    Authors: Kentaro Onda, Yosuke Kashiwagi, Emiru Tsunoo, Hayato Futami, Shinji Watanabe

    Abstract: Recent studies have highlighted the potential of discrete tokens derived from self-supervised learning (SSL) models for various speech-related tasks. These tokens serve not only as substitutes for text in language modeling but also as intermediate representations for tasks such as automatic speech recognition (ASR). However, discrete tokens are typically obtained via k-means clustering of SSL feat… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  2. arXiv:2505.16191  [pdf, ps, other

    cs.SD eess.AS

    Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

    Authors: Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

    Abstract: Recently, a method for synthesizing foreign-accented speech only with native speech data using discrete tokens obtained from self-supervised learning (SSL) models was proposed. Considering limited availability of accented speech data, this method is expected to make it much easier to simulate foreign accents. By using the synthesized accented speech as listening materials for humans or training da… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  3. arXiv:2505.16182  [pdf, ps, other

    cs.SD eess.AS

    Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data

    Authors: Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

    Abstract: In this study, we gained insight that contributes to achieving accent-robust ASR using only native speech data. In human perception of non-native speech, the phenomenon known as "interlanguage speech intelligibility benefit" (ISIB) is observed, where non-native listeners who share the native language with the speaker understand the speech better compared even to native listeners. Based on the idea… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  4. arXiv:2407.11370  [pdf, other

    cs.SD cs.CL eess.AS

    A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora

    Authors: Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito

    Abstract: We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH2024