Skip to main content

Showing 1–7 of 7 results for author: Asami, T

.
  1. arXiv:2502.09859  [pdf, other

    eess.AS eess.SP

    Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge

    Authors: Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguch, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki

    Abstract: In this paper, we introduce a multi-talker distant automatic speech recognition (DASR) system we designed for the DASR task 1 of the CHiME-8 challenge. Our system performs speaker counting, diarization, and ASR. It handles various recording conditions, from diner parties to professional meetings and from two to eight speakers. We perform diarization first, followed by speech enhancement, and then… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 55 pages, 12 figures

  2. arXiv:2409.20313  [pdf, other

    eess.AS cs.CL cs.SD

    Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding

    Authors: Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura, Ryo Masumura, Taichi Asami

    Abstract: A hybrid autoregressive transducer (HAT) is a variant of neural transducer that models blank and non-blank posterior distributions separately. In this paper, we propose a novel internal acoustic model (IAM) training strategy to enhance HAT-based speech recognition. IAM consists of encoder and joint networks, which are fully shared and jointly trained with HAT. This joint training not only enhances… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2409.05554  [pdf, other

    eess.AS

    NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge

    Authors: Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki

    Abstract: We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track. It consists of a diarization first pipeline. For diarization, we use end-to-end diarization with vector clustering (EEND-VC) followed by target speaker voice activity detection (TS-VAD) refinement. To deal with various numbers of speakers, we developed a new multi-channel speaker counting approach… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures, CHiME8 challenge

  4. arXiv:2401.17632  [pdf, other

    cs.CL cs.SD eess.AS

    What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

    Authors: Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

    Abstract: Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations. Speech SSL models, such as WavLM, employ masked prediction training to encode general-purpose representations. In contrast, speaker SSL models, exemplified by DINO-based models, adopt utterance-level training objectives primarily for speaker representation. Understanding how these model… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  5. arXiv:2306.08374  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

    Authors: Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

    Abstract: Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken language understanding tasks, implying that the SSL models have the potential to learn not only acoustic but also linguistic information. In this paper,… ▽ More

    Submitted 27 August, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023. This paper has been extended in a subsequent journal paper, see https://ieeexplore.ieee.org/abstract/document/10597571

  6. arXiv:2305.15971  [pdf, other

    eess.AS

    Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

    Authors: Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

    Abstract: Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture. It is a promising approach for streaming applications because it does not incur the extra computation costs of a target speech extraction frontend, which is a critical barrier to quick response. TS-RNNT is trained end-to-end given the input speech (i… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  7. arXiv:1710.00487  [pdf, ps, other

    physics.ins-det cond-mat.mtrl-sci

    YUI and HANA: Control and Visualization Programs for HRC in J-PARC

    Authors: Daichi Kawana, Minoru Soda, Masahiro Yoshida, Yoichi Ikeda, Toshio Asami, Ryosuke Sugiura, Hideki Yoshizawa, Takatsugu Masuda, Takafumi Hawai, Soshi Ibuka, Tetsuya Yokoo, Shinichi Itoh

    Abstract: We developed control and visualization programs, YUI and HANA, for High- Resolution Chopper spectrometer (HRC) installed at BL12 in MLF, J-PARC. YUI is a comprehensive program to control DAQ-middleware, the accessories, and sample environment devices. HANA is a program for the data transformation and visualization of inelastic neutron scattering spectra. In this paper, we describe the basic system… ▽ More

    Submitted 2 October, 2017; v1 submitted 2 October, 2017; originally announced October 2017.

    Comments: 9 pages, 4 figures