Skip to main content

Showing 1–8 of 8 results for author: Fukayama, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.16191  [pdf, ps, other

    cs.SD eess.AS

    Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

    Authors: Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

    Abstract: Recently, a method for synthesizing foreign-accented speech only with native speech data using discrete tokens obtained from self-supervised learning (SSL) models was proposed. Considering limited availability of accented speech data, this method is expected to make it much easier to simulate foreign accents. By using the synthesized accented speech as listening materials for humans or training da… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  2. arXiv:2505.16182  [pdf, ps, other

    cs.SD eess.AS

    Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data

    Authors: Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

    Abstract: In this study, we gained insight that contributes to achieving accent-robust ASR using only native speech data. In human perception of non-native speech, the phenomenon known as "interlanguage speech intelligibility benefit" (ISIB) is observed, where non-native listeners who share the native language with the speaker understand the speech better compared even to native listeners. Based on the idea… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  3. arXiv:2501.06562  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Discrete Speech Unit Extraction via Independent Component Analysis

    Authors: Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru Fukayama, Shinji Watanabe

    Abstract: Self-supervised speech models (S3Ms) have become a common tool for the speech processing community, leveraging representations for downstream tasks. Clustering S3M representations yields discrete speech units (DSUs), which serve as compact representations for speech signals. DSUs are typically obtained by k-means clustering. Using DSUs often leads to strong performance in various tasks, including… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025 SALMA Workshop. Code available at https://github.com/TomohikoNakamura/ica_dsu_espnet

  4. arXiv:2409.12549  [pdf, other

    cs.SD eess.AS

    FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs

    Authors: Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, Jun Ogata

    Abstract: This study presents FruitsMusic, a metadata corpus of Japanese idol-group songs in the real world, precisely annotated with who sings what and when. Japanese idol-group songs, vital to Japanese pop culture, feature a unique vocal arrangement style, where songs are divided into several segments, and a specific individual or multiple singers are assigned to each segment. To enhance singer diarizatio… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval (ISMIR) Conference 2024, San Francisco, United States

  5. arXiv:2406.08619  [pdf, other

    cs.CL cs.LG eess.AS

    Self-Supervised Speech Representations are More Phonetic than Semantic

    Authors: Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. Various analyses suggest that S3Ms encode linguistic properties. In this work, we seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms. Specifically, we curate a novel dataset of near homophone (phonetically similar) and synonym (semantically similar) word pairs and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024. Source code at https://github.com/juice500ml/phonetic_semantic_probing

  6. JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus

    Authors: Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari

    Abstract: We construct a corpus of Japanese a cappella vocal ensembles (jaCappella corpus) for vocal ensemble separation and synthesis. It consists of 35 copyright-cleared vocal ensemble songs and their audio recordings of individual voice parts. These songs were arranged from out-of-copyright Japanese children's songs and have six voice parts (lead vocal, soprano, alto, tenor, bass, and vocal percussion).… ▽ More

    Submitted 24 February, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted for ICASSP2023

    Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, 5 pages

  7. arXiv:2209.13211  [pdf, other

    eess.AS

    Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders

    Authors: Futa Nakashima, Tomohiko Nakamura, Norihiro Takamune, Satoru Fukayama, Hiroshi Saruwatari

    Abstract: In this paper, we propose a musical instrument sound synthesis (MISS) method based on a variational autoencoder (VAE) that has a hierarchy-inducing latent space for timbre. VAE-based MISS methods embed an input signal into a low-dimensional latent representation that captures the characteristics of the input. Adequately manipulating this representation leads to sound morphing and timbre replacemen… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 8 pages, 4 figures, to be published in Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2022

    Journal ref: 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference

  8. arXiv:2001.02360  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Melody Harmonization with Triad Chords: A Comparative Study

    Authors: Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, Yi-Hsuan Yang

    Abstract: Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching bas… ▽ More

    Submitted 27 April, 2021; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: 20 pages, 6 figures, published in Journal of New Music Research (JNMR), Volume 50 Issue 1