-
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge
Authors:
Muhammad Imran,
Jonathan R. Krebs,
Vishal Balaji Sivaraman,
Teng Zhang,
Amarjeet Kumar,
Walker R. Ueland,
Michael J. Fassler,
Jinlong Huang,
Xiao Sun,
Lisheng Wang,
Pengcheng Shi,
Maximilian Rokuss,
Michael Baumgartner,
Yannick Kirchhof,
Klaus H. Maier-Hein,
Fabian Isensee,
Shuolin Liu,
Bing Han,
Bong Thanh Nguyen,
Dong-jin Shin,
Park Ji-Woo,
Mathew Choi,
Kwang-Hyun Uhm,
Sung-Jea Ko,
Chanwoong Lee
, et al. (38 additional authors not shown)
Abstract:
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently…
▽ More
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Prostate-Specific Foundation Models for Enhanced Detection of Clinically Significant Cancer
Authors:
Jeong Hoon Lee,
Cynthia Xinran Li,
Hassan Jahanandish,
Indrani Bhattacharya,
Sulaiman Vesal,
Lichun Zhang,
Shengtian Sang,
Moon Hyung Choi,
Simon John Christoph Soerensen,
Steve Ran Zhou,
Elijah Richard Sommer,
Richard Fan,
Pejman Ghanouni,
Yuze Song,
Tyler M. Seibert,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (P…
▽ More
Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (ProViCNet), prostate organ-specific vision foundation models for Magnetic Resonance Imaging (MRI) and Trans-Rectal Ultrasound imaging (TRUS) for comprehensive cancer detection. ProViCNet was trained and validated using 4,401 patients across six institutions, as a prostate cancer detection model on radiology images relying on patch-level contrastive learning guided by biopsy confirmed radiologist annotations. ProViCNet demonstrated consistent performance across multiple internal and external validation cohorts with area under the receiver operating curve values ranging from 0.875 to 0.966, significantly outperforming radiologists in the reader study (0.907 versus 0.805, p<0.001) for mpMRI, while achieving 0.670 to 0.740 for TRUS. We also integrated ProViCNet with standard PSA to develop a virtual screening test, and we showed that we can maintain the high sensitivity for detecting clinically significant cancers while more than doubling specificity from 15% to 38% (p<0.001), thereby substantially reducing unnecessary biopsies. These findings highlight that ProViCNet's potential for enhancing prostate cancer diagnosis accuracy and reduce unnecessary biopsies, thereby optimizing diagnostic pathways.
△ Less
Submitted 4 February, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Mask Enhanced Deeply Supervised Prostate Cancer Detection on B-mode Micro-Ultrasound
Authors:
Lichun Zhang,
Steve Ran Zhou,
Moon Hyung Choi,
Jeong Hoon Lee,
Shengtian Sang,
Adam Kinnaird,
Wayne G. Brisbane,
Giovanni Lughezzani,
Davide Maffei,
Vittorio Fasulo,
Patrick Albers,
Sulaiman Vesal,
Wei Shao,
Ahmed N. El Kaffas,
Richard E. Fan,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue a…
▽ More
Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue and large variations in appearance, making it challenging for both machine learning and humans to localize it on micro-ultrasound images.
We propose a novel Mask Enhanced Deeply-supervised Micro-US network, termed MedMusNet, to automatically and more accurately segment prostate cancer to be used as potential targets for biopsy procedures. MedMusNet leverages predicted masks of prostate cancer to enforce the learned features layer-wisely within the network, reducing the influence of noise and improving overall consistency across frames.
MedMusNet successfully detected 76% of clinically significant cancer with a Dice Similarity Coefficient of 0.365, significantly outperforming the baseline Swin-M2F in specificity and accuracy (Wilcoxon test, Bonferroni correction, p-value<0.05). While the lesion-level and patient-level analyses showed improved performance compared to human experts and different baseline, the improvements did not reach statistical significance, likely on account of the small cohort.
We have presented a novel approach to automatically detect and segment clinically significant prostate cancer on B-mode micro-ultrasound images. Our MedMusNet model outperformed other models, surpassing even human experts. These preliminary results suggest the potential for aiding urologists in prostate cancer diagnosis via biopsy and treatment decision-making.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
Tumor aware recurrent inter-patient deformable image registration of computed tomography scans with lung cancer
Authors:
Jue Jiang,
Chloe Min Seo Choi,
Maria Thor,
Joseph O. Deasy,
Harini Veeraraghavan
Abstract:
Background: Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding unrealistic deformations due to tumors occurring on fixed images. Purpose: We developed a tumor-aware recurrent registration (TRACER) deep learning (DL) method and evaluat…
▽ More
Background: Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding unrealistic deformations due to tumors occurring on fixed images. Purpose: We developed a tumor-aware recurrent registration (TRACER) deep learning (DL) method and evaluated its suitability for VBA. Methods: TRACER consists of encoder layers implemented with stacked 3D convolutional long short term memory network (3D-CLSTM) followed by decoder and spatial transform layers to compute dense deformation vector field (DVF). Multiple CLSTM steps are used to compute a progressive sequence of deformations. Input conditioning was applied by including tumor segmentations with 3D image pairs as input channels. Bidirectional tumor rigidity, image similarity, and deformation smoothness losses were used to optimize the network in an unsupervised manner. TRACER and multiple DL methods were trained with 204 3D CT image pairs from patients with lung cancers (LC) and evaluated using (a) Dataset I (N = 308 pairs) with DL segmented LCs, (b) Dataset II (N = 765 pairs) with manually delineated LCs, and (c) Dataset III with 42 LC patients treated with RT. Results: TRACER accurately aligned normal tissues. It best preserved tumors, blackindicated by the smallest tumor volume difference of 0.24\%, 0.40\%, and 0.13 \% and mean square error in CT intensities of 0.005, 0.005, 0.004, computed between original and resampled moving image tumors, for Datasets I, II, and III, respectively. It resulted in the smallest planned RT tumor dose difference computed between original and resampled moving images of 0.01 Gy and 0.013 Gy when using a female and a male reference.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems
Authors:
Aditya Narayanan,
Pranav Kasibhatla,
Minkyu Choi,
Po-han Li,
Ruihan Zhao,
Sandeep Chinchali
Abstract:
Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific p…
▽ More
Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific performance metrics, such as sensor data rates, network bandwidth, and machine learning model latency. While these metrics can be modeled during system design, uncertainties in connection quality, server load, and hardware conditions introduce real-time performance variations, hindering overall performance. We introduce PEERNet, an end-to-end and real-time profiling tool for cloud robotics. PEERNet enables performance monitoring on heterogeneous hardware through targeted yet adaptive profiling of system components such as sensors, networks, deep-learning pipelines, and devices. We showcase PEERNet's capabilities through networked robotics tasks, such as image-based teleoperation of a Franka Emika Panda arm and querying vision language models using an Nvidia Jetson Orin. PEERNet reveals non-intuitive behavior in robotic systems, such as asymmetric network transmission and bimodal language model output. Our evaluation underscores the effectiveness and importance of benchmarking in networked robotics, demonstrating PEERNet's adaptability. Our code is open-source and available at github.com/UTAustin-SwarmLab/PEERNet.
△ Less
Submitted 26 November, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Online-Score-Aided Federated Learning: Taming the Resource Constraints in Wireless Networks
Authors:
Md-Ferdous Pervej,
Minseok Choi,
Andreas F. Molisch
Abstract:
While federated learning (FL) is a widely popular distributed machine learning (ML) strategy that protects data privacy, time-varying wireless network parameters and heterogeneous configurations of the wireless devices pose significant challenges. Although the limited radio and computational resources of the network and the clients, respectively, are widely acknowledged, two critical yet often ign…
▽ More
While federated learning (FL) is a widely popular distributed machine learning (ML) strategy that protects data privacy, time-varying wireless network parameters and heterogeneous configurations of the wireless devices pose significant challenges. Although the limited radio and computational resources of the network and the clients, respectively, are widely acknowledged, two critical yet often ignored aspects are (a) wireless devices can only dedicate a small chunk of their limited storage for the FL task and (b) new training samples may arrive in an online manner in many practical wireless applications. Therefore, we propose a new FL algorithm called online-score-aided federated learning (OSAFL), specifically designed to learn tasks relevant to wireless applications under these practical considerations. Since clients' local training steps differ under resource constraints, which may lead to client drift under statistically heterogeneous data distributions, we leverage normalized gradient similarities and exploit weighting clients' updates based on optimized scores that facilitate the convergence rate of the proposed OSAFL algorithm without incurring any communication overheads to the clients or requiring any statistical data information from them. Our extensive simulation results on two different datasets with four popular ML models validate the effectiveness of OSAFL compared to five modified state-of-the-art FL baselines.
△ Less
Submitted 13 April, 2025; v1 submitted 11 August, 2024;
originally announced August 2024.
-
Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets
Authors:
Hao Li,
Han Liu,
Heinrich von Busch,
Robert Grimm,
Henkjan Huisman,
Angela Tong,
David Winkel,
Tobias Penzkofer,
Ivan Shabunin,
Moon Hyung Choi,
Qingsong Yang,
Dieter Szolar,
Steven Shea,
Fergus Coakley,
Mukesh Harisinghani,
Ipek Oguz,
Dorin Comaniciu,
Ali Kamen,
Bin Lou
Abstract:
Our hypothesis is that UDA using diffusion-weighted images, generated with a unified model, offers a promising and reliable strategy for enhancing the performance of supervised learning models in multi-site prostate lesion detection, especially when various b-values are present. This retrospective study included data from 5,150 patients (14,191 samples) collected across nine different imaging cent…
▽ More
Our hypothesis is that UDA using diffusion-weighted images, generated with a unified model, offers a promising and reliable strategy for enhancing the performance of supervised learning models in multi-site prostate lesion detection, especially when various b-values are present. This retrospective study included data from 5,150 patients (14,191 samples) collected across nine different imaging centers. A novel UDA method using a unified generative model was developed for multi-site PCa detection. This method translates diffusion-weighted imaging (DWI) acquisitions, including apparent diffusion coefficient (ADC) and individual DW images acquired using various b-values, to align with the style of images acquired using b-values recommended by Prostate Imaging Reporting and Data System (PI-RADS) guidelines. The generated ADC and DW images replace the original images for PCa detection. An independent set of 1,692 test cases (2,393 samples) was used for evaluation. The area under the receiver operating characteristic curve (AUC) was used as the primary metric, and statistical analysis was performed via bootstrapping. For all test cases, the AUC values for baseline SL and UDA methods were 0.73 and 0.79 (p<.001), respectively, for PI-RADS>=3, and 0.77 and 0.80 (p<.001) for PI-RADS>=4 PCa lesions. In the 361 test cases under the most unfavorable image acquisition setting, the AUC values for baseline SL and UDA were 0.49 and 0.76 (p<.001) for PI-RADS>=3, and 0.50 and 0.77 (p<.001) for PI-RADS>=4 PCa lesions. The results indicate the proposed UDA with generated images improved the performance of SL methods in multi-site PCa lesion detection across datasets with various b values, especially for images acquired with significant deviations from the PI-RADS recommended DWI protocol (e.g. with an extremely high b-value).
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
Authors:
Jin Woo Lee,
Jaehyun Park,
Min Jun Choi,
Kyogu Lee
Abstract:
While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit…
▽ More
While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.
△ Less
Submitted 30 October, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation
Authors:
Miseul Kim,
Soo-Whan Chung,
Youna Ji,
Hong-Goo Kang,
Min-Seok Choi
Abstract:
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by…
▽ More
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by the target acoustic scene of the reference prompt. Specifically, AST-LDM is a latent diffusion model conditioned by CLAP embeddings that describe target acoustic scenes in either audio or text modalities. The contributions of this paper include introducing the AST task and implementing its baseline model. For AST-LDM, we emphasize its core framework, which is to preserve the input speech and generate audio consistently with both the given speech and the target acoustic environment. Experiments, including objective and subjective tests, validate the feasibility and efficacy of our approach.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Compressed Meta-Optical Encoder for Image Classification
Authors:
Anna Wirth-Singh,
Jinlin Xiang,
Minho Choi,
Johannes E. Fröch,
Luocheng Huang,
Shane Colburn,
Eli Shlizerman,
Arka Majumdar
Abstract:
Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modif…
▽ More
Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modified AlexNet to a single linear convolutional layer and an electronic backend (two fully connected layers). We obtain comparable performance to a purely electronic CNN with five convolutional layers and three fully connected layers. We implement the convolution optically via engineering the point spread function of an inverse-designed meta-optic. Using this hybrid approach, we estimate a reduction in multiply-accumulate operations from 17M in a conventional electronic modified AlexNet to only 86K in the hybrid compressed network enabled by the optical frontend. This constitutes over two orders of magnitude reduction in latency and power consumption. Furthermore, we experimentally demonstrate that the classification accuracy of the system exceeds 93% on the MNIST dataset.
△ Less
Submitted 14 June, 2024; v1 submitted 22 April, 2024;
originally announced June 2024.
-
A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data
Authors:
Kwang-Hyun Uhm,
Seung-Won Jung,
Moon Hyung Choi,
Sung-Hoo Hong,
Sung-Jea Ko
Abstract:
Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv…
▽ More
Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effective for the diagnosis task. In this paper, we propose a unified framework for kidney cancer diagnosis with incomplete multi-phase CT, which simultaneously recovers missing CT images and classifies cancer subtypes using the completed set of images. The advantage of our framework is that it encourages a synthesis model to explicitly learn to generate missing CT phases that are helpful for classifying cancer subtypes. We further incorporate lesion segmentation network into our framework to exploit lesion-level features for effective cancer classification in the whole CT volumes. The proposed framework is based on fully 3D convolutional neural networks to jointly optimize both synthesis and classification of 3D CT volumes. Extensive experiments on both in-house and external datasets demonstrate the effectiveness of our framework for the diagnosis with incomplete data compared with state-of-the-art baselines. In particular, cancer subtype classification using the completed CT data by our method achieves higher performance than the classification using the given incomplete data.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
ProsDectNet: Bridging the Gap in Prostate Cancer Detection via Transrectal B-mode Ultrasound Imaging
Authors:
Sulaiman Vesal,
Indrani Bhattacharya,
Hassan Jahanandish,
Xinran Li,
Zachary Kornberg,
Steve Ran Zhou,
Elijah Richard Sommer,
Moon Hyung Choi,
Richard E. Fan,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancer…
▽ More
Interpreting traditional B-mode ultrasound images can be challenging due to image artifacts (e.g., shadowing, speckle), leading to low sensitivity and limited diagnostic accuracy. While Magnetic Resonance Imaging (MRI) has been proposed as a solution, it is expensive and not widely available. Furthermore, most biopsies are guided by Transrectal Ultrasound (TRUS) alone and can miss up to 52% cancers, highlighting the need for improved targeting. To address this issue, we propose ProsDectNet, a multi-task deep learning approach that localizes prostate cancer on B-mode ultrasound. Our model is pre-trained using radiologist-labeled data and fine-tuned using biopsy-confirmed labels. ProsDectNet includes a lesion detection and patch classification head, with uncertainty minimization using entropy to improve model performance and reduce false positive predictions. We trained and validated ProsDectNet using a cohort of 289 patients who underwent MRI-TRUS fusion targeted biopsy. We then tested our approach on a group of 41 patients and found that ProsDectNet outperformed the average expert clinician in detecting prostate cancer on B-mode ultrasound images, achieving a patient-level ROC-AUC of 82%, a sensitivity of 74%, and a specificity of 67%. Our results demonstrate that ProsDectNet has the potential to be used as a computer-aided diagnosis system to improve targeted biopsy and treatment planning.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
String Sound Synthesizer on GPU-accelerated Finite Difference Scheme
Authors:
Jin Woo Lee,
Min Jun Choi,
Kyogu Lee
Abstract:
This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open…
▽ More
This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open-source physical model simulator not only benefits the audio signal processing community but also contributes to the burgeoning field of neural network-based audio synthesis by serving as a novel dataset construction tool. Implemented in PyTorch, this synthesizer offers flexibility, facilitating both CPU and GPU utilization, thereby enhancing its applicability as a simulator. GPU utilization expedites computation by parallelizing operations across spatial and batch dimensions, further enhancing its utility as a data generator.
△ Less
Submitted 8 January, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
An empirical study on speech restoration guided by self supervised speech representation
Authors:
Jaeuk Byun,
Youna Ji,
Soo Whan Chung,
Soyeon Choe,
Min Seok Choi
Abstract:
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen…
▽ More
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. Specifically, we employ speech representation in various speech restoration networks and evaluate their performance under complicated distortion scenarios. Our experiments demonstrate that the contextual information provided by the self-supervised speech representation can enhance speech restoration performance in various distortion scenarios, while also increasing robustness against the duration of speech attenuation and mismatched test conditions.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography
Authors:
Yucheng Lu,
Zhixin Xu,
Moon Hyung Choi,
Jimin Kim,
Seung-Won Jung
Abstract:
Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effe…
▽ More
Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most were developed on the simulated data. However, the real-world scenario differs significantly from the simulation domain, especially when using the multi-slice spiral scanner geometry. This paper proposes a two-stage method for the commercially available multi-slice spiral CT scanners that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our approach makes good use of the high redundancy of multi-slice projections and the volumetric reconstructions while leveraging the over-smoothing problem in conventional cascaded frameworks caused by aggressive denoising. The dedicated design also provides a more explicit interpretation of the data flow. Extensive experiments on various datasets showed that the proposed method could remove up to 70\% of noise without compromised spatial resolution, and subjective evaluations by two experienced radiologists further supported its superior performance against state-of-the-art methods in clinical practice.
△ Less
Submitted 28 June, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
DECISIVE Benchmarking Data Report: sUAS Performance Results from Phase I
Authors:
Adam Norton,
Reza Ahmadzadeh,
Kshitij Jerath,
Paul Robinette,
Jay Weitzen,
Thanuka Wickramarathne,
Holly Yanco,
Minseop Choi,
Ryan Donald,
Brendan Donoghue,
Christian Dumas,
Peter Gavriel,
Alden Giedraitis,
Brendan Hertel,
Jack Houle,
Nathan Letteri,
Edwin Meriaux,
Zahra Rezaei Khavas,
Rakshith Singh,
Gregg Willcox,
Naye Yoni
Abstract:
This report reviews all results derived from performance benchmarking conducted during Phase I of the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell, using the test methods specified in the DECISIVE Test Methods Handbook v1.1 for evaluating small unmanned aerial systems (sUAS) perfo…
▽ More
This report reviews all results derived from performance benchmarking conducted during Phase I of the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell, using the test methods specified in the DECISIVE Test Methods Handbook v1.1 for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstacle avoidance, navigation, mapping, autonomy, trust, and situation awareness. Using those 20 test methods, over 230 tests were conducted across 8 sUAS platforms: Cleo Robotics Dronut X1P (P = prototype), FLIR Black Hornet PRS, Flyability Elios 2 GOV, Lumenier Nighthawk V3, Parrot ANAFI USA GOV, Skydio X2D, Teal Golden Eagle, and Vantage Robotics Vesper. Best in class criteria is specified for each applicable test method and the sUAS that match this criteria are named for each test method, including a high-level executive summary of their performance.
△ Less
Submitted 20 January, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
DECISIVE Test Methods Handbook: Test Methods for Evaluating sUAS in Subterranean and Constrained Indoor Environments, Version 1.1
Authors:
Adam Norton,
Reza Ahmadzadeh,
Kshitij Jerath,
Paul Robinette,
Jay Weitzen,
Thanuka Wickramarathne,
Holly Yanco,
Minseop Choi,
Ryan Donald,
Brendan Donoghue,
Christian Dumas,
Peter Gavriel,
Alden Giedraitis,
Brendan Hertel,
Jack Houle,
Nathan Letteri,
Edwin Meriaux,
Zahra Rezaei Khavas,
Rakshith Singh,
Gregg Willcox,
Naye Yoni
Abstract:
This handbook outlines all test methods developed under the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstac…
▽ More
This handbook outlines all test methods developed under the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstacle avoidance, navigation, mapping, autonomy, trust, and situation awareness. For sUAS deployment in subterranean and constrained indoor environments, this puts forth two assumptions about applicable sUAS to be evaluated using these test methods: (1) able to operate without access to GPS signal, and (2) width from prop top to prop tip does not exceed 91 cm (36 in) wide (i.e., can physically fit through a typical doorway, although successful navigation through is not guaranteed). All test methods are specified using a common format: Purpose, Summary of Test Method, Apparatus and Artifacts, Equipment, Metrics, Procedure, and Example Data. All test methods are designed to be run in real-world environments (e.g., MOUT sites) or using fabricated apparatuses (e.g., test bays built from wood, or contained inside of one or more shipping containers).
△ Less
Submitted 20 January, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Diffusion-based Generative Speech Source Separation
Authors:
Robin Scheibler,
Youna Ji,
Soo-Whan Chung,
Jaeuk Byun,
Soyeon Choe,
Min-Seok Choi
Abstract:
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a…
▽ More
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset.
△ Less
Submitted 2 November, 2022; v1 submitted 31 October, 2022;
originally announced October 2022.
-
SWIPT-enabled NOMA in Distributed Antenna System with Imperfect Channel State Information for Max-Sum-Rate and Max-Min Fairness
Authors:
Dongjae Kim,
Minseok Choi,
Dong-Wook Seo
Abstract:
Motivated by the fact that the data rate of non-orthogonal multiple access (NOMA) can be greatly increased with the help of the distributed antenna system (DAS), we presents a framework in which the DAS contributes not only to the data rate but also the energy harvesting of simultaneous wireless information and power transfer (SWIPT) enabled NOMA. This study considers the sum-rate maximization pro…
▽ More
Motivated by the fact that the data rate of non-orthogonal multiple access (NOMA) can be greatly increased with the help of the distributed antenna system (DAS), we presents a framework in which the DAS contributes not only to the data rate but also the energy harvesting of simultaneous wireless information and power transfer (SWIPT) enabled NOMA. This study considers the sum-rate maximization problem and the max-min fairness problem for SWIPT-enabled NOMA in DAS and proposes two different schemes of power splitting and power allocation for SWIPT and NOMA, respectively, with imperfect channel state information (CSI). Numerical results validate the theoretical findings and demonstrate that the proposed framework of using SWIPT-enabled NOMA in DAS achieves the higher data rates than the existing SWIPT-enabled NOMA while guaranteeing the minimum harvested energy.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Quality-Aware Deep Reinforcement Learning for Streaming in Infrastructure-Assisted Connected Vehicles
Authors:
Won Joon Yun,
Dohyun Kwon,
Minseok Choi,
Joongheon Kim,
Guiseppe Caire,
Andreas F. Molisch
Abstract:
This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the r…
▽ More
This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the road deliver the desired videos to users. For a smoother streaming service, the MBS proactively pushes video chunks to mBSs. This is done to support vehicles that are currently covered and/or will be by each mBS. We formulate the dynamic video delivery scheme that adaptively determines 1) which content, 2) what quality and 3) how many chunks to be proactively delivered from the MBS to mBSs using Markov decision process (MDP). Since it is difficult for the MBS to track all the channel conditions and the network states have extensive dimensions, we adopt the deep deterministic policy gradient (DDPG) algorithm for the DRL-based video delivery scheme. This paper finally shows that the DRL agent learns a streaming policy that pursues high average quality while limiting packet drops, avoiding playback stalls, reducing quality fluctuations and saving backhaul usage.
△ Less
Submitted 12 October, 2021;
originally announced November 2021.
-
Joint Mobile Charging and Coverage-Time Extension for Unmanned Aerial Vehicles
Authors:
Soohyun Park,
Won-Yong Shin,
Minseok Choi,
Joongheon Kim
Abstract:
In modern networks, the use of drones as mobile base stations (MBSs) has been discussed for coverage flexibility. However, the realization of drone-based networks raises several issues. One of the critical issues is drones are extremely power-hungry. To overcome this, we need to characterize a new type of drones, so-called charging drones, which can deliver energy to MBS drones. Motivated by the f…
▽ More
In modern networks, the use of drones as mobile base stations (MBSs) has been discussed for coverage flexibility. However, the realization of drone-based networks raises several issues. One of the critical issues is drones are extremely power-hungry. To overcome this, we need to characterize a new type of drones, so-called charging drones, which can deliver energy to MBS drones. Motivated by the fact that the charging drones also need to be charged, we deploy ground-mounted charging towers for delivering energy to the charging drones. We introduce a new energy-efficiency maximization problem, which is partitioned into two independently separable tasks. More specifically, as our first optimization task, two-stage charging matching is proposed due to the inherent nature of our network model, where the first matching aims to schedule between charging towers and charging drones while the second matching solves the scheduling between charging drones and MBS drones. We analyze how to convert the formulation containing non-convex terms to another one only with convex terms. As our second optimization task, each MBS drone conducts energy-aware time-average transmit power allocation minimization subject to stability via Lyapunov optimization. Our solutions enable the MBS drones to extend their lifetimes; in turn, network coverage-time can be extended.
△ Less
Submitted 27 June, 2021;
originally announced June 2021.
-
Isogeometric Configuration Design Optimization of Three-dimensional Curved Beam Structures for Maximal Fundamental Frequency
Authors:
Myung-Jin Choi,
Jae-Hyun Kim,
Bonyong Koo,
Seonho Cho
Abstract:
This paper presents a configuration design optimization method for three-dimensional curved beam built-up structures having maximized fundamental eigenfrequency. We develop the method of computation of design velocity field and optimal design of beam structures constrained on a curved surface, where both designs of the embedded beams and the curved surface are simultaneously varied during the opti…
▽ More
This paper presents a configuration design optimization method for three-dimensional curved beam built-up structures having maximized fundamental eigenfrequency. We develop the method of computation of design velocity field and optimal design of beam structures constrained on a curved surface, where both designs of the embedded beams and the curved surface are simultaneously varied during the optimal design process. A shear-deformable beam model is used in the response analyses of structural vibrations within an isogeometric framework using the NURBS basis functions. An analytical design sensitivity expression of repeated eigenvalues is derived. The developed method is demonstrated through several illustrative examples.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
A Computational Analysis of Real-World DJ Mixes using Mix-To-Track Subsequence Alignment
Authors:
Taejun Kim,
Minsuk Choi,
Evan Sacks,
Yi-Hsuan Yang,
Juhan Nam
Abstract:
A DJ mix is a sequence of music tracks concatenated seamlessly, typically rendered for audiences in a live setting by a DJ on stage. As a DJ mix is produced in a studio or the live version is recorded for music streaming services, computational methods to analyze DJ mixes, for example, extracting track information or understanding DJ techniques, have drawn research interests. Many of previous work…
▽ More
A DJ mix is a sequence of music tracks concatenated seamlessly, typically rendered for audiences in a live setting by a DJ on stage. As a DJ mix is produced in a studio or the live version is recorded for music streaming services, computational methods to analyze DJ mixes, for example, extracting track information or understanding DJ techniques, have drawn research interests. Many of previous works are, however, limited to identifying individual tracks in a mix or segmenting it, and the sizes of the datasets are usually small. In this paper, we provide an in-depth analysis of DJ music by aligning a mix to its original music tracks. We set up the subsequence alignment such that the audio features are less sensitive to the tempo or key change of the original track in a mix. This approach provides temporally tight mix-to-track matching from which we can obtain cue-points, transition length, mix segmentation, and musical changes in DJ performance. Using 1,557 mixes from 1001Tracklists including 13,728 tracks and 20,765 transitions, we conduct the proposed analysis and show a wide range of statistics, which may elucidate the creative process of DJ music making.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Joint Distributed Link Scheduling and Power Allocation for Content Delivery in Wireless Caching Networks
Authors:
Minseok Choi,
Andreas F. Molisch,
Joongheon Kim
Abstract:
In wireless caching networks, the design of the content delivery method must consider random user requests, caching states, network topology, and interference management. In this paper, we establish a general framework for content delivery in wireless caching networks without stringent assumptions that restrict the network structure, delivery link, and interference model. Based on the framework, w…
▽ More
In wireless caching networks, the design of the content delivery method must consider random user requests, caching states, network topology, and interference management. In this paper, we establish a general framework for content delivery in wireless caching networks without stringent assumptions that restrict the network structure, delivery link, and interference model. Based on the framework, we propose a dynamic and distributed link scheduling and power allocation scheme for content delivery that is assisted by belief-propagation (BP) algorithms. Considering content-requesting users and potential caching nodes, the scheme achieves three critical purposes of wireless caching networks: 1) limiting the delay of user request satisfactions, 2) maintaining the power efficiency of caching nodes, and 3) managing interference among users. In addition, we address the intrinsic problem of the BP algorithm in our network model, proposing a matching algorithm for one-to-one link scheduling. Simulation results show that the proposed scheme provides almost the same delay performance as the optimal scheme found through an exhaustive search at the expense of a little additional power consumption and does not require a clustering method and orthogonal resources in a large-scale D2D network.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
Spectral data analysis methods for the two-dimensional imaging diagnostics
Authors:
Minjun J. Choi
Abstract:
Some spectral data analysis methods that are useful for the two-dimensional imaging diagnostics data are introduced. It is shown that the frequency spectrum, the local dispersion relation, the flow shear, and the nonlinear energy transfer rates can be estimated using the proper analysis methods.
Some spectral data analysis methods that are useful for the two-dimensional imaging diagnostics data are introduced. It is shown that the frequency spectrum, the local dispersion relation, the flow shear, and the nonlinear energy transfer rates can be estimated using the proper analysis methods.
△ Less
Submitted 27 August, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Dynamic Power Allocation and User Scheduling for Power-Efficient and Low-Latency Communications
Authors:
Minseok Choi,
Joongheon Kim,
Jaekyun Moon
Abstract:
In this paper, we propose a joint dynamic power control and user pairing algorithm for power-efficient and low-latency hybrid multiple access systems. In a hybrid multiple access system, user pairing determines whether the transmitter should serve a certain user by orthogonal multiple access (OMA) or non-orthogonal multiple access (NOMA). The proposed optimization framework minimizes the long-term…
▽ More
In this paper, we propose a joint dynamic power control and user pairing algorithm for power-efficient and low-latency hybrid multiple access systems. In a hybrid multiple access system, user pairing determines whether the transmitter should serve a certain user by orthogonal multiple access (OMA) or non-orthogonal multiple access (NOMA). The proposed optimization framework minimizes the long-term time-average transmit power expenditure while reducing the queueing delay and satisfying time-average data rate requirements. The proposed technique observes channel and queue state information and adjusts queue backlogs to avoid an excessive queueing delay by appropriate user pairing and power allocation. Further, user scheduling for determining the activation of a given user link as well as flexible use of resources are captured in the proposed algorithm. Data-intensive simulation results show that the proposed scheme guarantees an end-to-end delay smaller than 1 ms with high power-efficiency and high reliability, based on the short frame structure designed for ultra-reliable low-latency communications (URLLC).
△ Less
Submitted 28 June, 2018;
originally announced July 2018.