-
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Authors:
Linping Xu,
Jiawei Jiang,
Dejun Zhang,
Xianjun Xia,
Li Chen,
Yijian Xiao,
Piao Ding,
Shenyi Song,
Sixing Yin,
Ferdous Sohel
Abstract:
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave…
▽ More
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography
Authors:
Zihan Guo,
Jiali Yao,
Dalong Qi,
Pengpeng Ding,
Chengzhi Jin,
Ning Xu,
Zhiling Zhang,
Yunhua Yao,
Lianzhong Deng,
Zhiyong Wang,
Zhenrong Sun,
Shian Zhang
Abstract:
Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun…
▽ More
Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hundred, and plays a revolutionary role in single-shot ultrafast optical imaging. However, due to the ultra-high data compression ratio induced by the extremely large sequence depth as well as the limited fidelities of traditional reconstruction algorithms over the reconstruction process, HCUP suffers from a poor image reconstruction quality and fails to capture fine structures in complex transient scenes. To overcome these restrictions, we propose a flexible image reconstruction algorithm based on the total variation (TV) and cascaded denoisers (CD) for HCUP, named the TV-CD algorithm. It applies the TV denoising model cascaded with several advanced deep learning-based denoising models in the iterative plug-and-play alternating direction method of multipliers framework, which can preserve the image smoothness while utilizing the deep denoising networks to obtain more priori, and thus solving the common sparsity representation problem in local similarity and motion compensation. Both simulation and experimental results show that the proposed TV-CD algorithm can effectively improve the image reconstruction accuracy and quality of HCUP, and further promote the practical applications of HCUP in capturing high-dimensional complex physical, chemical and biological ultrafast optical scenes.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Authors:
Xiaohuai Le,
Tong Lei,
Li Chen,
Yiqing Guo,
Chao He,
Cheng Chen,
Xianjun Xia,
Hua Gao,
Yijian Xiao,
Piao Ding,
Shenyi Song,
Jing Lu
Abstract:
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccura…
▽ More
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccurate fundamental frequency estimation. To tackle this problem, we propose a learnable comb filter to enhance harmonics. Based on the sub-band model, we design a DNN-based fundamental frequency estimator to estimate the discrete fundamental frequencies and a comb filter for harmonic enhancement, which are trained via an end-to-end pattern. The experiments show the advantages of our proposed method over PecepNet and DeepFilterNet.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Accelerated Alternating Minimization for X-ray Tomographic Reconstruction
Authors:
Peijian Ding
Abstract:
While Computerized Tomography (CT) images can help detect disease such as Covid-19, regular CT machines are large and expensive. Cheaper and more portable machines suffer from errors in geometry acquisition that downgrades CT image quality. The errors in geometry can be represented with parameters in the mathematical model for image reconstruction. To obtain a good image, we formulate a nonlinear…
▽ More
While Computerized Tomography (CT) images can help detect disease such as Covid-19, regular CT machines are large and expensive. Cheaper and more portable machines suffer from errors in geometry acquisition that downgrades CT image quality. The errors in geometry can be represented with parameters in the mathematical model for image reconstruction. To obtain a good image, we formulate a nonlinear least squares problem that simultaneously reconstructs the image and corrects for errors in the geometry parameters. We develop an accelerated alternating minimization scheme to reconstruct the image and geometry parameters.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Deep Residual Dense U-Net for Resolution Enhancement in Accelerated MRI Acquisition
Authors:
Pak Lun Kevin Ding,
Zhiqiang Li,
Yuxiang Zhou,
Baoxin Li
Abstract:
Typical Magnetic Resonance Imaging (MRI) scan may take 20 to 60 minutes. Reducing MRI scan time is beneficial for both patient experience and cost considerations. Accelerated MRI scan may be achieved by acquiring less amount of k-space data (down-sampling in the k-space). However, this leads to lower resolution and aliasing artifacts for the reconstructed images. There are many existing approaches…
▽ More
Typical Magnetic Resonance Imaging (MRI) scan may take 20 to 60 minutes. Reducing MRI scan time is beneficial for both patient experience and cost considerations. Accelerated MRI scan may be achieved by acquiring less amount of k-space data (down-sampling in the k-space). However, this leads to lower resolution and aliasing artifacts for the reconstructed images. There are many existing approaches for attempting to reconstruct high-quality images from down-sampled k-space data, with varying complexity and performance. In recent years, deep-learning approaches have been proposed for this task, and promising results have been reported. Still, the problem remains challenging especially because of the high fidelity requirement in most medical applications employing reconstructed MRI images. In this work, we propose a deep-learning approach, aiming at reconstructing high-quality images from accelerated MRI acquisition. Specifically, we use Convolutional Neural Network (CNN) to learn the differences between the aliased images and the original images, employing a U-Net-like architecture. Further, a micro-architecture termed Residual Dense Block (RDB) is introduced for learning a better feature representation than the plain U-Net. Considering the peculiarity of the down-sampled k-space data, we introduce a new term to the loss function in learning, which effectively employs the given k-space data during training to provide additional regularization on the update of the network weights. To evaluate the proposed approach, we compare it with other state-of-the-art methods. In both visual inspection and evaluation using standard metrics, the proposed approach is able to deliver improved performance, demonstrating its potential for providing an effective solution.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Conservative Wasserstein Training for Pose Estimation
Authors:
Xiaofeng Liu,
Yang Zou,
Tong Che,
Peng Ding,
Ping Jia,
Jane You,
Kumar B. V. K
Abstract:
This paper targets the task with discrete and periodic class labels ($e.g.,$ pose/orientation estimation) in the context of deep learning. The commonly used cross-entropy or regression loss is not well matched to this problem as they ignore the periodic nature of the labels and the class similarity, or assume labels are continuous value. We propose to incorporate inter-class correlations in a Wass…
▽ More
This paper targets the task with discrete and periodic class labels ($e.g.,$ pose/orientation estimation) in the context of deep learning. The commonly used cross-entropy or regression loss is not well matched to this problem as they ignore the periodic nature of the labels and the class similarity, or assume labels are continuous value. We propose to incorporate inter-class correlations in a Wasserstein training framework by pre-defining ($i.e.,$ using arc length of a circle) or adaptively learning the ground metric. We extend the ground metric as a linear, convex or concave increasing function $w.r.t.$ arc length from an optimization perspective. We also propose to construct the conservative target labels which model the inlier and outlier noises using a wrapped unimodal-uniform mixture distribution. Unlike the one-hot setting, the conservative label makes the computation of Wasserstein distance more challenging. We systematically conclude the practical closed-form solution of Wasserstein distance for pose data with either one-hot or conservative target label. We evaluate our method on head, body, vehicle and 3D object pose benchmarks with exhaustive ablation studies. The Wasserstein loss obtaining superior performance over the current methods, especially using convex mapping function for ground metric, conservative label, and closed-form solution.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.