-
Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation
Authors:
Kaien Mo,
Xianrui Wang,
Yichen Yang,
Shoji Makino,
Jingdong Chen
Abstract:
Blind-audio-source-separation (BASS) techniques, particularly those with low latency, play an important role in a wide range of real-time systems, e.g., hearing aids, in-car hand-free voice communication, real-time human-machine interaction, etc. Most existing BASS algorithms are deduced to run on batch mode, and therefore large latency is unavoidable. Recently, some online algorithms were develop…
▽ More
Blind-audio-source-separation (BASS) techniques, particularly those with low latency, play an important role in a wide range of real-time systems, e.g., hearing aids, in-car hand-free voice communication, real-time human-machine interaction, etc. Most existing BASS algorithms are deduced to run on batch mode, and therefore large latency is unavoidable. Recently, some online algorithms were developed, which achieve separation on a frame-by-frame basis in the short-time-Fourier-transform (STFT) domain and the latency is significantly reduced as compared to those batch methods. However, the latency with these algorithms may still be too long for many real-time systems to bear. To further reduce latency while achieving good separation performance, we propose in this work to integrate a weighted prediction error (WPE) module into a non-causal sample-truncating-based independent vector analysis (NST-IVA). The resulting algorithm can maintain the algorithmic delay as NST-IVA if the delay with WPE is appropriately controlled while achieving significantly better performance, which is validated by simulations.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Real-time Virtual Intraoperative CT for Image Guided Surgery
Authors:
Yangming Li,
Neeraja Konuthula,
Ian M. Humphreys,
Kris Moe,
Blake Hannaford,
Randall Bly
Abstract:
Abstract. Purpose: This paper presents a scheme for generating virtual intraoperative CT scans in order to improve surgical completeness in Endoscopic Sinus Surgeries (ESS). Approach: The work presents three methods, the tip motion-based, the tip trajectory-based, and the instrument based, along with non-parametric smoothing and Gaussian Process Regression, for virtual intraoperative CT generation…
▽ More
Abstract. Purpose: This paper presents a scheme for generating virtual intraoperative CT scans in order to improve surgical completeness in Endoscopic Sinus Surgeries (ESS). Approach: The work presents three methods, the tip motion-based, the tip trajectory-based, and the instrument based, along with non-parametric smoothing and Gaussian Process Regression, for virtual intraoperative CT generation. Results: The proposed methods studied and compared on ESS performed on cadavers. Surgical results show all three methods improve the Dice Similarity Coefficients > 86%, with F-score > 92% and precision > 89.91%. The tip trajectory-based method was found to have best performance and reached 96.87% precision in surgical completeness evaluation. Conclusions: This work demonstrated that virtual intraoperative CT scans improves the consistency between the actual surgical scene and the reference model, and improves surgical completeness in ESS. Comparing with actual intraoperative CT scans, the proposed scheme has no impact on existing surgical protocols, does not require extra hardware other than the one is already available in most ESS overcome the high costs, the repeated radiation, and the elongated anesthesia caused by actual intraoperative CTs, and is practical in ESS.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks
Authors:
He Wang,
Zetian Jiang,
Li Yi,
Kaichun Mo,
Hao Su,
Leonidas J. Guibas
Abstract:
In this paper, we examine the long-neglected yet important effects of point sampling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g.PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g.PointNet++, DGCNN) fail to guide valid shape generation. We propose the concep…
▽ More
In this paper, we examine the long-neglected yet important effects of point sampling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g.PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g.PointNet++, DGCNN) fail to guide valid shape generation. We propose the concept of sampling spectrum to depict the different sampling sensitivities of discriminators. We further study how different evaluation metrics weigh the sampling pattern against the geometry and propose several perceptual metrics forming a sampling spectrum of metrics. Guided by the proposed sampling spectrum, we discover a middle-point sampling-aware baseline discriminator, PointNet-Mix, which improves all existing point cloud generators by a large margin on sampling-related metrics. We point out that, though recent research has been focused on the generator design, the main bottleneck of point cloud GAN actually lies in the discriminator design. Our work provides both suggestions and tools for building future discriminators. We will release the code to facilitate future research.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images
Authors:
Shan Lin,
Fangbo Qin,
Yangming Li,
Randall A. Bly,
Kris S. Moe,
Blake Hannaford
Abstract:
Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN)…
▽ More
Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs). We consider a situation when a labeled cadaveric surgery dataset is available while the task is instrument segmentation on an unlabeled live surgery dataset. We train LC-GAN to learn the mappings between the cadaveric and live images. For live image segmentation, we first translate the live images to fake-cadaveric images with LC-GAN and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. The proposed method fully makes use of the labeled cadaveric dataset for live image segmentation without the need to label the live dataset. LC-GAN has two generators with different architectures that leverage the deep feature representation learned from the cadaveric image based segmentation task. Moreover, we propose the structural similarity loss and segmentation consistency loss to improve the semantic consistency during translation. Our model achieves better image-to-image translation and leads to improved segmentation performance in the proposed cross-domain segmentation task.
△ Less
Submitted 13 August, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.