-
30+ Years of Source Separation Research: Achievements and Future Challenges
Authors:
Shoko Araki,
Nobutaka Ito,
Reinhold Haeb-Umbach,
Gordon Wichern,
Zhong-Qiu Wang,
Yuki Mitsufuji
Abstract:
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to f…
▽ More
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to foster a culture of scientific evaluation in the research field, including challenges, performance metrics, and datasets. We will conclude by discussing current trends and future research directions.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Predictive Simultaneous Interpretation: Harnessing Large Language Models for Democratizing Real-Time Multilingual Communication
Authors:
Kurando Iida,
Kenjiro Mimura,
Nobuo Ito
Abstract:
This study introduces a groundbreaking approach to simultaneous interpretation by directly leveraging the predictive capabilities of Large Language Models (LLMs). We present a novel algorithm that generates real-time translations by predicting speaker utterances and expanding multiple possibilities in a tree-like structure. This method demonstrates unprecedented flexibility and adaptability, poten…
▽ More
This study introduces a groundbreaking approach to simultaneous interpretation by directly leveraging the predictive capabilities of Large Language Models (LLMs). We present a novel algorithm that generates real-time translations by predicting speaker utterances and expanding multiple possibilities in a tree-like structure. This method demonstrates unprecedented flexibility and adaptability, potentially overcoming the structural differences between languages more effectively than existing systems. Our theoretical analysis, supported by illustrative examples, suggests that this approach could lead to more natural and fluent translations with minimal latency. The primary purpose of this paper is to share this innovative concept with the academic community, stimulating further research and development in this field. We discuss the theoretical foundations, potential advantages, and implementation challenges of this technique, positioning it as a significant step towards democratizing multilingual communication.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Gaussian Process Classification Bandits
Authors:
Tatsuya Hayashi,
Naoki Ito,
Koji Tabata,
Atsuyoshi Nakamura,
Katsumasa Fujita,
Yoshinori Harada,
Tamiki Komatsuzaki
Abstract:
Classification bandits are multi-armed bandit problems whose task is to classify a given set of arms into either positive or negative class depending on whether the rate of the arms with the expected reward of at least h is not less than w for given thresholds h and w. We study a special classification bandit problem in which arms correspond to points x in d-dimensional real space with expected re…
▽ More
Classification bandits are multi-armed bandit problems whose task is to classify a given set of arms into either positive or negative class depending on whether the rate of the arms with the expected reward of at least h is not less than w for given thresholds h and w. We study a special classification bandit problem in which arms correspond to points x in d-dimensional real space with expected rewards f(x) which are generated according to a Gaussian process prior. We develop a framework algorithm for the problem using various arm selection policies and propose policies called FCB and FTSV. We show a smaller sample complexity upper bound for FCB than that for the existing algorithm of the level set estimation, in which whether f(x) is at least h or not must be decided for every arm's x. Arm selection policies depending on an estimated rate of arms with rewards of at least h are also proposed and shown to improve empirical sample complexity. According to our experimental results, the rate-estimation versions of FCB and FTSV, together with that of the popular active learning policy that selects the point with the maximum variance, outperform other policies for synthetic functions, and the version of FTSV is also the best performer for our real-world dataset.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Audio Signal Enhancement with Learning from Positive and Unlabelled Data
Authors:
Nobutaka Ito,
Masashi Sugiyama
Abstract:
Supervised learning is a mainstream approach to audio signal enhancement (SE) and requires parallel training data consisting of both noisy signals and the corresponding clean signals. Such data can only be synthesised and are mismatched with real data, which can result in poor performance on real data. Moreover, clean signals may be inaccessible in certain scenarios, which renders this conventiona…
▽ More
Supervised learning is a mainstream approach to audio signal enhancement (SE) and requires parallel training data consisting of both noisy signals and the corresponding clean signals. Such data can only be synthesised and are mismatched with real data, which can result in poor performance on real data. Moreover, clean signals may be inaccessible in certain scenarios, which renders this conventional approach infeasible. Here we explore SE using non-parallel training data consisting of noisy signals and noise, which can be easily recorded. We define the positive (P) and the negative (N) classes as signal inactivity and activity, respectively. We observe that the spectrogram patches of noise clips can be used as P data and those of noisy signal clips as unlabelled data. Thus, learning from positive and unlabelled data enables a convolutional neural network to learn to classify each spectrogram patch as P or N to enable SE.
△ Less
Submitted 26 April, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Non-learning Stereo-aided Depth Completion under Mis-projection via Selective Stereo Matching
Authors:
Yasuhiro Yao,
Ryoichi Ishikawa,
Shingo Ando,
Kana Kurata,
Naoki Ito,
Jun Shimamura,
Takeshi Oishi
Abstract:
We propose a non-learning depth completion method for a sparse depth map captured using a light detection and ranging (LiDAR) sensor guided by a pair of stereo images. Generally, conventional stereo-aided depth completion methods have two limiations. (i) They assume the given sparse depth map is accurately aligned to the input image, whereas the alignment is difficult to achieve in practice. (ii)…
▽ More
We propose a non-learning depth completion method for a sparse depth map captured using a light detection and ranging (LiDAR) sensor guided by a pair of stereo images. Generally, conventional stereo-aided depth completion methods have two limiations. (i) They assume the given sparse depth map is accurately aligned to the input image, whereas the alignment is difficult to achieve in practice. (ii) They have limited accuracy in the long range because the depth is estimated by pixel disparity. To solve the abovementioned limitations, we propose selective stereo matching (SSM) that searches the most appropriate depth value for each image pixel from its neighborly projected LiDAR points based on an energy minimization framework. This depth selection approach can handle any type of mis-projection. Moreover, SSM has an advantage in terms of long-range depth accuracy because it directly uses the LiDAR measurement rather than the depth acquired from the stereo. SSM is a discrete process; thus, we apply variational smoothing with binary anisotropic diffusion tensor (B-ADT) to generate a continuous depth map while preserving depth discontinuity across object boundaries. Experimentally, compared with the previous state-of-the-art stereo-aided depth completion, the proposed method reduced the mean absolute error (MAE) of the depth estimation to 0.65 times and demonstrated approximately twice more accurate estimation in the long range. Moreover, under various LiDAR-camera calibration errors, the proposed method reduced the depth estimation MAE to 0.34-0.93 times from previous depth completion methods.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter
Authors:
Nobutaka Ito,
Rintaro Ikeshita,
Hiroshi Sawada,
Tomohiro Nakatani
Abstract:
This paper presents a computationally efficient approach to blind source separation (BSS) of audio signals, applicable even when there are more sources than microphones (i.e., the underdetermined case). When there are as many sources as microphones (i.e., the determined case), BSS can be performed computationally efficiently by independent component analysis (ICA). Unfortunately, however, ICA is b…
▽ More
This paper presents a computationally efficient approach to blind source separation (BSS) of audio signals, applicable even when there are more sources than microphones (i.e., the underdetermined case). When there are as many sources as microphones (i.e., the determined case), BSS can be performed computationally efficiently by independent component analysis (ICA). Unfortunately, however, ICA is basically inapplicable to the underdetermined case. Another BSS approach using the multichannel Wiener filter (MWF) is applicable even to this case, and encompasses full-rank spatial covariance analysis (FCA) and multichannel non-negative matrix factorization (MNMF). However, these methods require massive numbers of matrix inversions to design the MWF, and are thus computationally inefficient. To overcome this drawback, we exploit the well-known property of diagonal matrices that matrix inversion amounts to mere inversion of the diagonal elements and can thus be performed computationally efficiently. This makes it possible to drastically reduce the computational cost of the above matrix inversions based on a joint diagonalization (JD) idea, leading to computationally efficient BSS. Specifically, we restrict the N spatial covariance matrices (SCMs) of all N sources to a class of (exactly) jointly diagonalizable matrices. Based on this approach, we present FastFCA, a computationally efficient extension of FCA. We also present a unified framework for underdetermined and determined audio BSS, which highlights a theoretical connection between FastFCA and other methods. Moreover, we reveal that FastFCA can be regarded as a regularized version of approximate joint diagonalization (AJD).
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
From Natural Language Instructions to Complex Processes: Issues in Chaining Trigger Action Rules
Authors:
Nobuhiro Ito,
Yuya Suzuki,
Akiko Aizawa
Abstract:
Automation services for complex business processes usually require a high level of information technology literacy. There is a strong demand for a smartly assisted process automation (IPA: intelligent process automation) service that enables even general users to easily use advanced automation. A natural language interface for such automation is expected as an elemental technology for the IPA real…
▽ More
Automation services for complex business processes usually require a high level of information technology literacy. There is a strong demand for a smartly assisted process automation (IPA: intelligent process automation) service that enables even general users to easily use advanced automation. A natural language interface for such automation is expected as an elemental technology for the IPA realization. The workflow targeted by IPA is generally composed of a combination of multiple tasks. However, semantic parsing, one of the natural language processing methods, for such complex workflows has not yet been fully studied. The reasons are that (1) the formal expression and grammar of the workflow required for semantic analysis have not been sufficiently examined and (2) the dataset of the workflow formal expression with its corresponding natural language description required for learning workflow semantics did not exist. This paper defines a new grammar for complex workflows with chaining machine-executable meaning representations for semantic parsing. The representations are at a high abstraction level. Additionally, an approach to creating datasets is proposed based on this grammar.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
FastFCA-AS: Joint Diagonalization Based Acceleration of Full-Rank Spatial Covariance Analysis for Separating Any Number of Sources
Authors:
Nobutaka Ito,
Tomohiro Nakatani
Abstract:
Here we propose FastFCA-AS, an accelerated algorithm for Full-rank spatial Covariance Analysis (FCA), which is a robust audio source separation method proposed by Duong et al. ["Under-determined reverberant audio source separation using a full-rank spatial covariance model," IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830-1840, Sept. 2010]. In the conventional FCA, matrix inversion and matrix multiplic…
▽ More
Here we propose FastFCA-AS, an accelerated algorithm for Full-rank spatial Covariance Analysis (FCA), which is a robust audio source separation method proposed by Duong et al. ["Under-determined reverberant audio source separation using a full-rank spatial covariance model," IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830-1840, Sept. 2010]. In the conventional FCA, matrix inversion and matrix multiplication are required at each time-frequency point in each iteration of an iterative parameter estimation algorithm. This causes a heavy computational load, thereby rendering the FCA infeasible in many applications. To overcome this drawback, we take a joint diagonalization approach, whereby matrix inversion and matrix multiplication are reduced to mere inversion and multiplication of diagonal entries. This makes the FastFCA-AS significantly faster than the FCA and even applicable to observed data of long duration or a situation with restricted computational resources. Although we have already proposed another acceleration of the FCA for two sources, the proposed FastFCA-AS is applicable to an arbitrary number of sources. In an experiment with three sources and three microphones, the FastFCA-AS was over 420 times faster than the FCA with a slightly better source separation performance.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
FastFCA: A Joint Diagonalization Based Fast Algorithm for Audio Source Separation Using A Full-Rank Spatial Covariance Model
Authors:
Nobutaka Ito,
Shoko Araki,
Tomohiro Nakatani
Abstract:
A source separation method using a full-rank spatial covariance model has been proposed by Duong et al. ["Under-determined Reverberant Audio Source Separation Using a Full-rank Spatial Covariance Model," IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830-1840, Sep. 2010], which is referred to as full-rank spatial covariance analysis (FCA) in this paper. Here we propose a fast algorithm for estimating the…
▽ More
A source separation method using a full-rank spatial covariance model has been proposed by Duong et al. ["Under-determined Reverberant Audio Source Separation Using a Full-rank Spatial Covariance Model," IEEE Trans. ASLP, vol. 18, no. 7, pp. 1830-1840, Sep. 2010], which is referred to as full-rank spatial covariance analysis (FCA) in this paper. Here we propose a fast algorithm for estimating the model parameters of the FCA, which is named Fast-FCA, and applicable to the two-source case. Though quite effective in source separation, the conventional FCA has a major drawback of expensive computation. Indeed, the conventional algorithm for estimating the model parameters of the FCA requires frame-wise matrix inversion and matrix multiplication. Therefore, the conventional FCA may be infeasible in applications with restricted computational resources. In contrast, the proposed FastFCA bypasses matrix inversion and matrix multiplication owing to joint diagonalization based on the generalized eigenvalue problem. Furthermore, the FastFCA is strictly equivalent to the conventional algorithm. An experiment has shown that the FastFCA was over 250 times faster than the conventional algorithm with virtually the same source separation performance.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
An open-source job management framework for parameter-space exploration: OACIS
Authors:
Yohsuke Murase,
Takeshi Uchitane,
Nobuyasu Ito
Abstract:
We present an open-source software framework for parameter-space exploration, named OACIS, which is useful to manage vast amount of simulation jobs and results in a systematic way. Recent development of high-performance computers enabled us to explore parameter spaces comprehensively, however, in such cases, manual management of the workflow is practically impossible. OACIS is developed aiming at…
▽ More
We present an open-source software framework for parameter-space exploration, named OACIS, which is useful to manage vast amount of simulation jobs and results in a systematic way. Recent development of high-performance computers enabled us to explore parameter spaces comprehensively, however, in such cases, manual management of the workflow is practically impossible. OACIS is developed aiming at reducing the cost of these repetitive tasks when conducting simulations by automating job submissions and data management. In this article, an overview of OACIS as well as a getting started guide are presented.
△ Less
Submitted 19 April, 2018;
originally announced May 2018.
-
The 2018 Signal Separation Evaluation Campaign
Authors:
Fabian-Robert Stöter,
Antoine Liutkus,
Nobutaka Ito
Abstract:
This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year's edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring c…
▽ More
This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year's edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring close to 10h of audio. Additionally, open-source software was released to automatically load, process and report performance on MUSDB18. Furthermore, a new official Python version for the BSSEval toolbox was released, along with reference implementations for three oracle separation methods: ideal binary mask, ideal ratio mask, and multichannel Wiener filter. We finally report the results obtained by the participants.
△ Less
Submitted 6 July, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.
-
A tool for parameter-space explorations
Authors:
Yohsuke Murase,
Takeshi Uchitane,
Nobuyasu Ito
Abstract:
A software for managing simulation jobs and results, named "OACIS", is presented. It controls a large number of simulation jobs executed in various remote servers, keeps these results in an organized way, and manages the analyses on these results. The software has a web browser front end, and users can submit various jobs to appropriate remote hosts from a web browser easily. After these jobs are…
▽ More
A software for managing simulation jobs and results, named "OACIS", is presented. It controls a large number of simulation jobs executed in various remote servers, keeps these results in an organized way, and manages the analyses on these results. The software has a web browser front end, and users can submit various jobs to appropriate remote hosts from a web browser easily. After these jobs are finished, all the result files are automatically downloaded from the computational hosts and stored in a traceable way together with the logs of the date, host, and elapsed time of the jobs. Some visualization functions are also provided so that users can easily grasp the overview of the results distributed in a high-dimensional parameter space. Thus, OACIS is especially beneficial for the complex simulation models having many parameters for which a lot of parameter searches are required. By using API of OACIS, it is easy to write a code that automates parameter selection depending on the previous simulation results. A few examples of the automated parameter selection are also demonstrated.
△ Less
Submitted 15 April, 2014;
originally announced April 2014.
-
Group Formation through Indirect Reciprocity
Authors:
Koji Oishi,
Takashi Shimada,
Nobuyasu Ito
Abstract:
The emergence of structure in cooperative relation is studied in a game theoretical model. It is proved that specific types of reciprocity norm lead individuals to split into two groups. The condition for the evolutionary stability of the norms is also revealed. This result suggests a connection between group formation and a specific type of reciprocity norm in our society.
The emergence of structure in cooperative relation is studied in a game theoretical model. It is proved that specific types of reciprocity norm lead individuals to split into two groups. The condition for the evolutionary stability of the norms is also revealed. This result suggests a connection between group formation and a specific type of reciprocity norm in our society.
△ Less
Submitted 30 November, 2012;
originally announced November 2012.