-
Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization
Authors:
Shoma Ayano,
Li Li,
Shogo Seki,
Daichi Kitamura
Abstract:
Spotforming is a target-speaker extraction technique that uses multiple microphone arrays. This method applies beamforming (BF) to each microphone array, and the common components among the BF outputs are estimated as the target source. This study proposes a new common component extraction method based on nonnegative tensor factorization (NTF) for higher model interpretability and more robust spot…
▽ More
Spotforming is a target-speaker extraction technique that uses multiple microphone arrays. This method applies beamforming (BF) to each microphone array, and the common components among the BF outputs are estimated as the target source. This study proposes a new common component extraction method based on nonnegative tensor factorization (NTF) for higher model interpretability and more robust spotforming against hyperparameters. Moreover, attractor-based regularization was introduced to facilitate the automatic selection of optimal target bases in the NTF. Experimental results show that the proposed method performs better than conventional methods in spotforming performance and also shows some characteristics suitable for practical use.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction
Authors:
Koki Nishida,
Norihiro Takamune,
Rintaro Ikeshita,
Daichi Kitamura,
Hiroshi Saruwatari,
Tomohiro Nakatani
Abstract:
In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA…
▽ More
In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA can achieve considerably fast BSE by incorporating an algorithm developed for independent vector extraction. In addition, to improve the BSE performance of NoisyILRMA, we propose a mechanism to switch the source model with ILRMA-like nonnegative matrix factorization to a more expressive source model during optimization. In the experiment, we show that NoisyILRMA runs faster than a FastMNMF algorithm while maintaining the BSE performance. We also confirm that the switching mechanism improves the BSE performance of NoisyILRMA.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds
Authors:
Masaya Kawamura,
Tomohiko Nakamura,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
A differentiable digital signal processing (DDSP) autoencoder is a musical sound synthesizer that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. However, it is designed for a monophonic harmonic sound and cannot handle…
▽ More
A differentiable digital signal processing (DDSP) autoencoder is a musical sound synthesizer that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. However, it is designed for a monophonic harmonic sound and cannot handle mixtures of harmonic sounds. In this paper, we propose a model (DDSP mixture model) that represents a mixture as the sum of the outputs of multiple pretrained DDSP autoencoders. By fitting the output of the proposed model to the observed mixture, we can directly estimate the synthesis parameters of each source. Through synthesis parameter extraction experiments, we show that the proposed method has high and stable performance compared with a straightforward method that applies the DDSP autoencoder to the signals separated by an audio source separation method.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis
Authors:
Sota Misawa,
Norihiro Takamune,
Tomohiko Nakamura,
Daichi Kitamura,
Hiroshi Saruwatari,
Masakazu Une,
Shoji Makino
Abstract:
Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method,…
▽ More
Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional methods under several noise conditions.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models
Authors:
Takuya Hasumi,
Tomohiko Nakamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Daichi Kitamura,
Yu Takahashi,
Kazunobu Kondo
Abstract:
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mism…
▽ More
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mismatch causes the performance degradation of IDLMA. To tackle this problem, we focus on a blind source separation counterpart of IDLMA, independent low-rank matrix analysis. It uses nonnegative matrix factorization (NMF) as the source model, which can capture source spectral components that only appear in the target mixture, using the low-rank structure of the source spectrogram as a clue. We thus extend the DNN-based source model to encompass the NMF-based source model on the basis of the product-of-expert concept, which we call the product of source models (PoSM). For the proposed PoSM-based IDLMA, we derive a computationally efficient parameter estimation algorithm based on an optimization principle called the majorization-minimization algorithm. Experimental evaluations show the effectiveness of the proposed method.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization
Authors:
Yusaku Mizobuchi,
Daichi Kitamura,
Tomohiko Nakamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
When we place microphones close to a sound source near other sources in audio recording, the obtained audio signal includes undesired sound from the other sources, which is often called cross-talk or bleeding sound. For many audio applications including onstage sound reinforcement and sound editing after a live performance, it is important to reduce the bleeding sound in each recorded signal. Howe…
▽ More
When we place microphones close to a sound source near other sources in audio recording, the obtained audio signal includes undesired sound from the other sources, which is often called cross-talk or bleeding sound. For many audio applications including onstage sound reinforcement and sound editing after a live performance, it is important to reduce the bleeding sound in each recorded signal. However, since microphones are spatially apart from each other in this situation, typical phase-aware blind source separation (BSS) methods cannot be used. We propose a phase-insensitive method for blind bleeding-sound reduction. This method is based on time-channel nonnegative matrix factorization, which is a BSS method using only amplitude spectrograms. With the proposed method, we introduce the gamma-distribution-based prior for leakage levels of bleeding sounds. Its optimization can be interpreted as maximum a posteriori estimation. The experimental results of music bleeding-sound reduction indicate that the proposed method is more effective for bleeding-sound reduction of music signals compared with other BSS methods.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation
Authors:
Naoki Narisawa,
Rintaro Ikeshita,
Norihiro Takamune,
Daichi Kitamura,
Tomohiko Nakamura,
Hiroshi Saruwatari,
Tomohiro Nakatani
Abstract:
We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor…
▽ More
We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor analysis has been proposed. This unsupervised (blind) method, however, severely restrict the structure of frequency covariance matrices (FCMs) to reduce the number of model parameters. As an extension of these conventional approaches, we here propose a supervised method that models FCMs using deep neural networks (DNNs). It is difficult to directly infer FCMs using DNNs. Therefore, we also propose a new FCM model represented as a convex combination of a diagonal FCM and a rank-1 FCM. Our FCM model is flexible enough to not only consider inter-frequency correlation, but also capture the dynamics of time-varying FCMs of nonstationary signals. We infer the proposed FCMs using two DNNs: DNN for power spectrum estimation and DNN for time-domain signal estimation. An experimental result of separating music signals shows that the proposed method provides higher separation performance than IDLMA.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation
Authors:
Takuya Hasumi,
Tomohiko Nakamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Daichi Kitamura,
Yu Takahashi,
Kazunobu Kondo
Abstract:
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is di…
▽ More
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is difficult to obtain reliable estimated power spectrograms of sources at all TF slots. In this paper, we propose an IDLMA extension, empirical Bayesian IDLMA (EB-IDLMA), by introducing a prior distribution of source power spectrograms and treating the source power spectrograms as latent random variables. This treatment allows us to implicitly consider the reliability of the estimated source power spectrograms for the estimation of demixing filters through the hyperparameters of the prior distribution estimated by the DNN. Experimental evaluations show the effectiveness of EB-IDLMA and the importance of introducing the reliability of the estimated source power spectrograms.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction
Authors:
Yuto Kondo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such…
▽ More
Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such as independent low-rank matrix analysis, and estimates the source model parameters simultaneously. In the conventional RCSCME, a direction of the deficient basis is fixed in advance and only the scale is estimated; however, the candidate of this deficient basis is not unique in general. In the proposed RCSCME model, the deficient basis itself can be accurately estimated as a vector variable by solving a vector optimization problem. Also, we derive new update rules based on the EM algorithm. We confirm that the proposed method outperforms conventional methods under several noise conditions.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution
Authors:
Keigo Kamo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, sp…
▽ More
In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, spectrograms of observed signals are assumed to follow a multivariate Gaussian distribution. In this paper, first, to extend the MNMF model, we introduce the multivariate generalized Gaussian distribution as the multivariate sub-Gaussian distribution. Since the cost function of MNMF based on this multivariate sub-Gaussian model is difficult to minimize, we additionally introduce the joint-diagonalizability constraint in spatial covariance matrices to MNMF similarly to FastMNMF, and transform the cost function to the form to which we can apply the auxiliary functions to derive the valid parameter update rules. Finally, from blind source separation experiments, we show that the proposed method outperforms the conventional methods in source-separation accuracy.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Consistent Independent Low-Rank Matrix Analysis for Determined Blind Source Separation
Authors:
Daichi Kitamura,
Kohei Yatabe
Abstract:
Independent low-rank matrix analysis (ILRMA) is the state-of-the-art algorithm for blind source separation (BSS) in the determined situation (the number of microphones is greater than or equal to that of source signals). ILRMA achieves a great separation performance by modeling the power spectrograms of the source signals via the nonnegative matrix factorization (NMF). Such a highly developed sour…
▽ More
Independent low-rank matrix analysis (ILRMA) is the state-of-the-art algorithm for blind source separation (BSS) in the determined situation (the number of microphones is greater than or equal to that of source signals). ILRMA achieves a great separation performance by modeling the power spectrograms of the source signals via the nonnegative matrix factorization (NMF). Such a highly developed source model can solve the permutation problem of the frequency-domain BSS to a large extent, which is the reason for the excellence of ILRMA. In this paper, we further improve the separation performance of ILRMA by additionally considering the general structure of spectrograms, which is called consistency, and hence we call the proposed method Consistent ILRMA. Since a spectrogram is calculated by an overlapping window (and a window function induces spectral smearing called main- and side-lobes), the time-frequency bins depend on each other. In other words, the time-frequency components are related to each other via the uncertainty principle. Such co-occurrence among the spectral components can function as an assistant for solving the permutation problem, which has been demonstrated by a recent study. On the basis of these facts, we propose an algorithm for realizing Consistent ILRMA by slightly modifying the original algorithm. Its performance was extensively evaluated through experiments performed with various window lengths and shift lengths. The results indicated several tendencies of the original and proposed ILRMA that include some topics not fully discussed in the literature. For example, the proposed Consistent ILRMA tends to outperform the original ILRMA when the window length is sufficiently long compared to the reverberation time of the mixing system.
△ Less
Submitted 1 November, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Determined BSS based on time-frequency masking and its application to harmonic vector analysis
Authors:
Kohei Yatabe,
Daichi Kitamura
Abstract:
This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper. BSS for a convolutive audio mixture is usually performed by multichannel linear filtering when the numbers of microphones and sources are equal (determined situation). This paper addresses such determined BSS based on batch process…
▽ More
This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper. BSS for a convolutive audio mixture is usually performed by multichannel linear filtering when the numbers of microphones and sources are equal (determined situation). This paper addresses such determined BSS based on batch processing. To estimate the demixing filters, effective modeling of the source signals is important. One successful example is independent vector analysis (IVA) that models the signals via co-occurrence among the frequency components in each source. To give more freedom to the source modeling, a general framework of determined BSS is presented in this paper. It is based on the plug-and-play scheme using a primal-dual splitting algorithm and enables us to model the source signals implicitly through a time-frequency mask. By using the proposed framework, determined BSS algorithms can be developed by designing masks that enhance the source signals. As an example of its application, we propose HVA by defining a time-frequency mask that enhances the harmonic structure of audio signals via sparsity of cepstrum. The experiments showed that HVA outperforms IVA and independent low-rank matrix analysis (ILRMA) for both speech and music signals. A MATLAB code is provided along with the paper for a reference ( https://doi.org/10.24433/CO.9507820.v1 ).
△ Less
Submitted 14 April, 2021; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution
Authors:
Tatsuki Kondo,
Kanta Fukushige,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Rintaro Ikeshita,
Tomohiro Nakatani
Abstract:
In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does…
▽ More
In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does not guarantee stable convergence. To resolve these problems, first, we propose to extend the generative model to a parametric multivariate Student's t distribution that can deal with various types of signal. Secondly, we derive a new parameter optimization algorithm that guarantees the monotonic nonincrease in the cost function, providing stable convergence. Experimental results reveal that the cost function in the conventional IPSDTA does not display monotonically nonincreasing properties. On the other hand, the proposed method guarantees the monotonic nonincrease in the cost function and outperforms the conventional ILRMA and IPSDTA in the source-separation performance.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process
Authors:
Keigo Kamo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple…
▽ More
In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple sources can be jointly diagonalized. However, its source-separation performance was not improved and the physical meaning of the joint-diagonalization process was unclear. To resolve these problems, we first reveal a close relationship between the joint-diagonalization process and the demixing system used in independent low-rank matrix analysis (ILRMA). Next, motivated by this fact, we propose a new regularized FastMNMF supported by ILRMA and derive convergence-guaranteed parameter update rules. From BSS experiments, we show that the proposed method outperforms the conventional FastMNMF in source-separation accuracy with almost the same computation time.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction
Authors:
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parame…
▽ More
In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parameter update step, we need neither matrix inversion nor multiplication. In an experiment, we show that the proposed accelerated update rule achieves 87 times faster calculation than the naive one.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation
Authors:
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial paramete…
▽ More
In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial parameters of the directional source. Motivated by this fact, we utilize these estimates to restore the lost spatial basis of diffuse noise, which can be considered as an efficient full-rank spatial covariance estimation. BSS experiments show the efficacy of the proposed method in terms of the computational cost and separation performance.
△ Less
Submitted 18 June, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network
Authors:
Shinnosuke Takamichi,
Yuki Saito,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic s…
▽ More
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce the von-Mises-distribution DNN for phase reconstruction. The DNN is a generative model having the von Mises distribution that can model distributions of a periodic variable such as a phase, and the model parameters of the DNN are estimated on the basis of the maximum likelihood criterion. Furthermore, we propose a group-delay loss for DNN training to make the predicted group delay close to a natural group delay. The experimental results demonstrate that 1) the trained DNN can predict group delay accurately more than phases themselves, and 2) our phase reconstruction methods achieve better speech quality than the conventional Griffin-Lim method.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation
Authors:
Shinichi Mogami,
Hayato Sumino,
Daichi Kitamura,
Norihiro Takamune,
Shinnosuke Takamichi,
Hiroshi Saruwatari,
Nobutaka Ono
Abstract:
In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g…
▽ More
In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source generative model including both complex Gaussian and Cauchy distributions. Experiments are conducted using music signals with a training dataset, and the results show the validity of the proposed method in terms of separation accuracy and computational cost.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm
Authors:
Yoshiki Mitsui,
Daichi Kitamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately…
▽ More
In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately optimized by iterative update rules, the difference in the convergence speeds between these models often results in a poor local solution. To solve this problem, we introduce a new parameter that controls the convergence speed of the source model and find the best balance between the optimizations in the spatial and source models for ILRMA.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Independent Low-Rank Matrix Analysis Based on Complex Student's $t$-Distribution for Blind Audio Source Separation
Authors:
Shinichi Mogami,
Daichi Kitamura,
Yoshiki Mitsui,
Norihiro Takamune,
Hiroshi Saruwatari,
Nobutaka Ono
Abstract:
In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we intro…
▽ More
In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we introduce an isotropic complex Student's $t$-distribution as a source generative model, which includes the isotropic complex Gaussian distribution used in conventional ILRMA. Experiments are conducted using both music and speech BSS tasks, and the results show the validity of the proposed method.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.