-
U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation
Authors:
Shaoxiang Dang,
Tetsuya Matsumoto,
Yoshinori Takeuchi,
Hiroaki Kudo
Abstract:
The topic of speech separation involves separating mixed speech with multiple overlapping speakers into several streams, with each stream containing speech from only one speaker. Many highly effective models have emerged and proliferated rapidly over time. However, the size and computational load of these models have also increased accordingly. This is a disaster for the community, as researchers…
▽ More
The topic of speech separation involves separating mixed speech with multiple overlapping speakers into several streams, with each stream containing speech from only one speaker. Many highly effective models have emerged and proliferated rapidly over time. However, the size and computational load of these models have also increased accordingly. This is a disaster for the community, as researchers need more time and computational resources to reproduce and compare existing models. In this paper, we propose U-mamba-net: a lightweight Mamba-based U-style model for speech separation in complex environments. Mamba is a state space sequence model that incorporates feature selection capabilities. U-style network is a fully convolutional neural network whose symmetric contracting and expansive paths are able to learn multi-resolution features. In our work, Mamba serves as a feature filter, alternating with U-Net. We test the proposed model on Libri2mix. The results show that U-Mamba-Net achieves improved performance with quite low computational cost.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features
Authors:
Shaoxiang Dang,
Tetsuya Matsumoto,
Yoshinori Takeuchi,
Takashi Tsuboi,
Yasuhiro Tanaka,
Daisuke Nakatsubo,
Satoshi Maesawa,
Ryuta Saito,
Masahisa Katsuno,
Hiroaki Kudo
Abstract:
The potential of deep learning in clinical speech processing is immense, yet the hurdles of limited and imbalanced clinical data samples loom large. This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice qua…
▽ More
The potential of deep learning in clinical speech processing is immense, yet the hurdles of limited and imbalanced clinical data samples loom large. This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems. Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease before and after undergoing subthalamic nucleus deep brain stimulation (STN-DBS) surgery. The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (<0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators. Meanwhile, progress has been achieved in predicting the voice quality of patients in the context of STN-DBS.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
A very fast iterative algorithm for TV-regularized image reconstruction with applications to low-dose and few-view CT
Authors:
Hiroyuki Kudo,
Fukashi Yamazaki,
Takuya Nemoto,
Keita Takaki
Abstract:
This paper concerns iterative reconstruction for low-dose and few-view CT by minimizing a data-fidelity term regularized with the Total Variation (TV) penalty. We propose a very fast iterative algorithm to solve this problem. The algorithm derivation is outlined as follows. First, the original minimization problem is reformulated into the saddle point (primal-dual) problem by using the Lagrangian…
▽ More
This paper concerns iterative reconstruction for low-dose and few-view CT by minimizing a data-fidelity term regularized with the Total Variation (TV) penalty. We propose a very fast iterative algorithm to solve this problem. The algorithm derivation is outlined as follows. First, the original minimization problem is reformulated into the saddle point (primal-dual) problem by using the Lagrangian duality, to which we apply the first-order primal-dual iterative methods. Second, we precondition the iteration formula using the ramp flter of Filtered Backprojection (FBP) reconstruction algorithm in such a way that the problem solution is not altered. The resulting algorithm resembles the structure of so-called iterative FBP algorithm, and it converges to the exact minimizer of cost function very fast.
△ Less
Submitted 20 September, 2016;
originally announced September 2016.
-
Proposal of fault-tolerant tomographic image reconstruction
Authors:
Hiroyuki Kudo,
Keita Takaki,
Fukashi Yamazaki,
Takuya Nemoto
Abstract:
This paper deals with tomographic image reconstruction under the situation where some of projection data bins are contaminated with abnormal data. Such situations occur in various instances of tomography. We propose a new reconstruction algorithm called the Fault-Tolerant reconstruction outlined as follows. The least-squares (L2-norm) error function ||Ax-b||_2^2 used in ordinary iterative reconstr…
▽ More
This paper deals with tomographic image reconstruction under the situation where some of projection data bins are contaminated with abnormal data. Such situations occur in various instances of tomography. We propose a new reconstruction algorithm called the Fault-Tolerant reconstruction outlined as follows. The least-squares (L2-norm) error function ||Ax-b||_2^2 used in ordinary iterative reconstructions is sensitive to the existence of abnormal data. The proposed algorithm utilizes the L1-norm error function ||Ax-b||_1^1 instead of the L2-norm, and we develop a row-action-type iterative algorithm using the proximal splitting framework in convex optimization fields. We also propose an improved version of the L1-norm reconstruction called the L1-TV reconstruction, in which a weak Total Variation (TV) penalty is added to the cost function. Simulation results demonstrate that reconstructed images with the L2-norm were severely damaged by the effect of abnormal bins, whereas images with the L1-norm and L1-TV reconstructions were robust to the existence of abnormal bins.
△ Less
Submitted 20 September, 2016;
originally announced September 2016.