Skip to main content

Showing 1–18 of 18 results for author: Ikeshita, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.09859  [pdf, ps, other

    eess.AS eess.SP

    Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge

    Authors: Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki

    Abstract: In this paper, we introduce a multi-talker distant automatic speech recognition (DASR) system we designed for the DASR task 1 of the CHiME-8 challenge. Our system performs speaker counting, diarization, and ASR. It handles various recording conditions, from diner parties to professional meetings and from two to eight speakers. We perform diarization first, followed by speech enhancement, and then… ▽ More

    Submitted 18 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: 55 pages, 12 figures

  2. arXiv:2409.05554  [pdf, other

    eess.AS

    NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge

    Authors: Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki

    Abstract: We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track. It consists of a diarization first pipeline. For diarization, we use end-to-end diarization with vector clustering (EEND-VC) followed by target speaker voice activity detection (TS-VAD) refinement. To deal with various numbers of speakers, we developed a new multi-channel speaker counting approach… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures, CHiME8 challenge

  3. arXiv:2404.14860  [pdf, other

    eess.AS cs.SD

    Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

    Authors: Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE f… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 13 pages, 6 figures, Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing

  4. arXiv:2311.11599  [pdf, other

    eess.AS

    How does end-to-end speech recognition training impact speech enhancement artifacts?

    Authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise a… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 5 pages, 1 figure, 1 table

  5. arXiv:2311.11595  [pdf, ps, other

    eess.AS

    Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss

    Authors: Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino

    Abstract: Array processing performance depends on the number of microphones available. Virtual microphone estimation (VME) has been proposed to increase the number of microphone signals artificially. Neural network-based VME (NN-VME) trains an NN with a VM-level loss to predict a signal at a microphone location that is available during training but not at inference. However, this training objective may not… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, 1 table

  6. arXiv:2306.12820  [pdf, other

    cs.SD eess.AS

    NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction

    Authors: Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2023 (EUSIPCO 2023)

  7. arXiv:2202.00875  [pdf, other

    eess.SP eess.AS

    ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis

    Authors: Rintaro Ikeshita, Tomohiro Nakatani

    Abstract: A majorization-minimization (MM) algorithm for independent vector analysis optimizes a separation matrix $W = [w_1, \ldots, w_m]^h \in \mathbb{C}^{m \times m}$ by minimizing a surrogate function of the form $\mathcal{L}(W) = \sum_{i = 1}^m w_i^h V_i w_i - \log | \det W |^2$, where $m \in \mathbb{N}$ is the number of sensors and positive definite matrices… ▽ More

    Submitted 16 June, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: Accepted for publication in the 30th European Signal Processing Conference (EUSIPCO 2022)

  8. arXiv:2201.06685  [pdf, other

    eess.AS cs.SD

    How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

    Authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as t… ▽ More

    Submitted 30 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: 5 pages, 5 figures, submitted to Interspeech 2022

  9. arXiv:2111.10574  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms

    Authors: Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki

    Abstract: This paper develops a framework that can perform denoising, dereverberation, and source separation accurately by using a relatively small number of microphones. It has been empirically confirmed that Independent Vector Analysis (IVA) can blindly separate N sources from their sound mixture even with diffuse noise when a sufficiently large number (=M) of microphones are available (i.e., M>>N). Howev… ▽ More

    Submitted 24 February, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

    Comments: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing on 27 July 2021, accepted on 22 Feb. 2022

  10. Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

    Authors: Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki

    Abstract: This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractab… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: Accepted by IEEE ICASSP 2021

  11. arXiv:2106.05529  [pdf, other

    cs.SD eess.AS

    Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation

    Authors: Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 5 pages, 2 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)

  12. arXiv:2102.04696  [pdf, other

    eess.AS cs.SD eess.SP

    Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation

    Authors: Rintaro Ikeshita, Tomohiro Nakatani

    Abstract: We address a blind source separation (BSS) problem in a noisy reverberant environment in which the number of microphones $M$ is greater than the number of sources of interest, and the other noise components can be approximated as stationary and Gaussian distributed. Conventional BSS algorithms for the optimization of a multi-input multi-output convolutional beamformer have suffered from a huge com… ▽ More

    Submitted 21 April, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted to IEEE Signal Processing Letters

  13. arXiv:2101.08563  [pdf, ps, other

    cs.SD eess.AS

    A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter

    Authors: Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani

    Abstract: This paper presents a computationally efficient approach to blind source separation (BSS) of audio signals, applicable even when there are more sources than microphones (i.e., the underdetermined case). When there are as many sources as microphones (i.e., the determined case), BSS can be performed computationally efficiently by independent component analysis (ICA). Unfortunately, however, ICA is b… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  14. arXiv:2101.04315  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Neural Network-based Virtual Microphone Estimator

    Authors: Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki

    Abstract: Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices. One direction to address this situation consists of virtually augmenting the number of microphone signals, e.g., based on several physical model assumptions. However, such assumptions are not necessarily met in realistic conditions. In this paper, as an alternative approa… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2021

  15. Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction

    Authors: Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki

    Abstract: In this paper, we address the problem of extracting all super-Gaussian source signals from a linear mixture in which (i) the number of super-Gaussian sources $K$ is less than that of sensors $M$, and (ii) there are up to $M - K$ stationary Gaussian noises that do not need to be extracted. To solve this problem, independent vector extraction (IVE) using a majorization minimization and block coordin… ▽ More

    Submitted 3 May, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: Accepted by IEEE Transactions on Signal Processing

  16. Jointly optimal denoising, dereverberation, and source separation

    Authors: Tomohiro Nakatani, Christoph Boeddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach

    Abstract: This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas… ▽ More

    Submitted 2 August, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing on 12 Feb 2020, Accepted to IEEE/ACM Trans. Audio, Speech, and Language Processing on 14 July 2020

  17. Overdetermined independent vector analysis

    Authors: Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki

    Abstract: We address the convolutive blind source separation problem for the (over-)determined case where (i) the number of nonstationary target-sources $K$ is less than that of microphones $M$, and (ii) there are up to $M - K$ stationary Gaussian noises that need not to be extracted. Independent vector analysis (IVA) can solve the problem by separating into $M$ sources and selecting the top $K$ highly nons… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: To appear at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  18. arXiv:2002.08582  [pdf, ps, other

    cs.SD eess.AS eess.SP

    Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution

    Authors: Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani

    Abstract: In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020