Skip to main content

Showing 1–20 of 20 results for author: Kitamura, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08951  [pdf, other

    cs.SD eess.AS

    Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization

    Authors: Shoma Ayano, Li Li, Shogo Seki, Daichi Kitamura

    Abstract: Spotforming is a target-speaker extraction technique that uses multiple microphone arrays. This method applies beamforming (BF) to each microphone array, and the common components among the BF outputs are estimated as the target source. This study proposes a new common component extraction method based on nonnegative tensor factorization (NTF) for higher model interpretability and more robust spot… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted at EUSIPCO2024

  2. arXiv:2306.12820  [pdf, other

    cs.SD eess.AS

    NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction

    Authors: Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2023 (EUSIPCO 2023)

  3. arXiv:2202.00200  [pdf, other

    cs.SD cs.LG eess.AS

    Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

    Authors: Masaya Kawamura, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: A differentiable digital signal processing (DDSP) autoencoder is a musical sound synthesizer that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. However, it is designed for a monophonic harmonic sound and cannot handle… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures, to appear in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)

  4. arXiv:2109.04658  [pdf, ps, other

    cs.SD eess.AS

    Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

    Authors: Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

    Abstract: Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method,… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: accepted for APSIPA2021

  5. arXiv:2109.00704  [pdf, ps, other

    cs.SD eess.AS

    Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

    Authors: Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo

    Abstract: Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mism… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: 8 pages, 5 figures, accepted for Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA ASC 2021)

  6. arXiv:2109.00237  [pdf, other

    cs.SD eess.AS

    Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization

    Authors: Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: When we place microphones close to a sound source near other sources in audio recording, the obtained audio signal includes undesired sound from the other sources, which is often called cross-talk or bleeding sound. For many audio applications including onstage sound reinforcement and sound editing after a live performance, it is important to reduce the bleeding sound in each recorded signal. Howe… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Accepted and will be presented at APSIPA2021

  7. arXiv:2106.05529  [pdf, other

    cs.SD eess.AS

    Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation

    Authors: Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 5 pages, 2 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)

  8. arXiv:2106.03492  [pdf, other

    cs.SD eess.AS

    Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation

    Authors: Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo

    Abstract: Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is di… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)

  9. arXiv:2105.02491  [pdf, other

    cs.SD eess.AS

    Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction

    Authors: Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, ICASSP2021

  10. arXiv:2007.00416  [pdf, other

    cs.SD eess.AS

    Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution

    Authors: Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, sp… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of the 28th European Signal Processing Conference (EUSIPCO 2020). arXiv admin note: text overlap with arXiv:2002.00579

  11. Consistent Independent Low-Rank Matrix Analysis for Determined Blind Source Separation

    Authors: Daichi Kitamura, Kohei Yatabe

    Abstract: Independent low-rank matrix analysis (ILRMA) is the state-of-the-art algorithm for blind source separation (BSS) in the determined situation (the number of microphones is greater than or equal to that of source signals). ILRMA achieves a great separation performance by modeling the power spectrograms of the source signals via the nonnegative matrix factorization (NMF). Such a highly developed sour… ▽ More

    Submitted 1 November, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Submitted to EURASIP J. Adv. Signal. Process. Accepted on Oct. 30, 2020

  12. arXiv:2004.14091  [pdf, other

    eess.AS cs.SD eess.SP

    Determined BSS based on time-frequency masking and its application to harmonic vector analysis

    Authors: Kohei Yatabe, Daichi Kitamura

    Abstract: This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper. BSS for a convolutive audio mixture is usually performed by multichannel linear filtering when the numbers of microphones and sources are equal (determined situation). This paper addresses such determined BSS based on batch process… ▽ More

    Submitted 14 April, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

  13. arXiv:2002.08582  [pdf, ps, other

    cs.SD eess.AS eess.SP

    Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution

    Authors: Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani

    Abstract: In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

  14. arXiv:2002.00579  [pdf, other

    cs.SD eess.AS

    Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process

    Authors: Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

  15. arXiv:1908.01964  [pdf, other

    cs.SD eess.AS

    Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction

    Authors: Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parame… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

    Comments: 7 pages, 3 figures, To appear in the Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2019 (APSIPA 2019)

  16. arXiv:1906.02482  [pdf, other

    cs.SD eess.AS

    Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation

    Authors: Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial paramete… ▽ More

    Submitted 18 June, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019)

  17. arXiv:1807.03474  [pdf, ps, other

    cs.SD eess.AS

    Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

    Authors: Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic s… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: To appear in the Proc. of IWAENC2018

  18. arXiv:1806.10307  [pdf, other

    eess.AS cs.SD

    Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation

    Authors: Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Comments: 5 pages, 4 figures, To appear in the Proceedings of the 26th European Signal Processing Conference (EUSIPCO 2018)

  19. arXiv:1710.01589  [pdf, other

    cs.SD eess.AS

    Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm

    Authors: Yoshiki Mitsui, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: Preprint Manuscript of 2017 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2017)

  20. arXiv:1708.04795  [pdf, ps, other

    cs.SD

    Independent Low-Rank Matrix Analysis Based on Complex Student's $t$-Distribution for Blind Audio Source Separation

    Authors: Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we intro… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Preprint manuscript of 2017 IEEE International Workshop on Machine Learning for Signal Processing