Skip to main content

Showing 1–7 of 7 results for author: Maekaku, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.15218  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SpeechLM: An Open Speech Language Model Toolkit

    Authors: Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, Shinji Watanabe

    Abstract: We present ESPnet-SpeechLM, an open toolkit designed to democratize the development of speech language models (SpeechLMs) and voice-driven agentic applications. The toolkit standardizes speech processing tasks by framing them as universal sequential modeling problems, encompassing a cohesive workflow of data preprocessing, pre-training, inference, and task evaluation. With ESPnet-SpeechLM, users c… ▽ More

    Submitted 24 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  2. arXiv:2403.19207  [pdf, other

    eess.AS

    LV-CTC: Non-autoregressive ASR with CTC and latent variable models

    Authors: Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku

    Abstract: Non-autoregressive (NAR) models for automatic speech recognition (ASR) aim to achieve high accuracy and fast inference by simplifying the autoregressive (AR) generation process of conventional models. Connectionist temporal classification (CTC) is one of the key techniques used in NAR ASR models. In this paper, we propose a new model combining CTC and a latent variable model, which is one of the s… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  3. arXiv:2309.15800  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

    Authors: Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

    Abstract: Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the subsequent model. However, they can still be redundant. Recent investigations proposed the use of discrete speech units derived from self-supervised learning repre… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  4. arXiv:2305.18108  [pdf, other

    cs.SD eess.AS

    Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

    Authors: Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023

  5. arXiv:2204.00540  [pdf, other

    cs.SD cs.CL eess.AS

    End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation

    Authors: Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe

    Abstract: This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS). Compared with conventional E2E ASR models, the proposed E2E model integrates two important modules including a speech enhancement (SE) module and a self-supervise… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  6. arXiv:2110.04590  [pdf, other

    cs.CL cs.SD eess.AS

    An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

    Authors: Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe

    Abstract: Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on evaluating the quality of self-supervised pretrained representations on various tasks without domain restriction, e.g. SUPERB. However, such evaluations… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: To appear in ASRU2021

  7. arXiv:2107.05899  [pdf, ps, other

    cs.SD eess.AS

    Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021

    Authors: Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky

    Abstract: We present a system for the Zero Resource Speech Challenge 2021, which combines a Contrastive Predictive Coding (CPC) with deep cluster. In deep cluster, we first prepare pseudo-labels obtained by clustering the outputs of a CPC network with k-means. Then, we train an additional autoregressive model to classify the previously obtained pseudo-labels in a supervised manner. Phoneme discriminative re… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 July, 2021; originally announced July 2021.