-
Multi-Platform Methane Plume Detection via Model and Domain Adaptation
Authors:
Vassiliki Mancoridis,
Brian Bue,
Jake H. Lee,
Andrew K. Thorpe,
Daniel Cusworth,
Alana Ayasse,
Philip G. Brodrick,
Riley Duren
Abstract:
Prioritizing methane for near-term climate action is crucial due to its significant impact on global warming. Previous work used columnwise matched filter products from the airborne AVIRIS-NG imaging spectrometer to detect methane plume sources; convolutional neural networks (CNNs) discerned anthropogenic methane plumes from false positive enhancements. However, as an increasing number of remote s…
▽ More
Prioritizing methane for near-term climate action is crucial due to its significant impact on global warming. Previous work used columnwise matched filter products from the airborne AVIRIS-NG imaging spectrometer to detect methane plume sources; convolutional neural networks (CNNs) discerned anthropogenic methane plumes from false positive enhancements. However, as an increasing number of remote sensing platforms are used for methane plume detection, there is a growing need to address cross-platform alignment. In this work, we describe model- and data-driven machine learning approaches that leverage airborne observations to improve spaceborne methane plume detection, reconciling the distributional shifts inherent with performing the same task across platforms. We develop a spaceborne methane plume classifier using data from the EMIT imaging spectroscopy mission. We refine classifiers trained on airborne imagery from AVIRIS-NG campaigns using transfer learning, outperforming the standalone spaceborne model. Finally, we use CycleGAN, an unsupervised image-to-image translation technique, to align the data distributions between airborne and spaceborne contexts. Translating spaceborne EMIT data to the airborne AVIRIS-NG domain using CycleGAN and applying airborne classifiers directly yields the best plume detection results. This methodology is useful not only for data simulation, but also for direct data alignment. Though demonstrated on the task of methane plume detection, our work more broadly demonstrates a data-driven approach to align related products obtained from distinct remote sensing instruments.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Decoding Human Attentive States from Spatial-temporal EEG Patches Using Transformers
Authors:
Yi Ding,
Joon Hei Lee,
Shuailei Zhang,
Tianze Luo,
Cuntai Guan
Abstract:
Learning the spatial topology of electroencephalogram (EEG) channels and their temporal dynamics is crucial for decoding attention states. This paper introduces EEG-PatchFormer, a transformer-based deep learning framework designed specifically for EEG attention classification in Brain-Computer Interface (BCI) applications. By integrating a Temporal CNN for frequency-based EEG feature extraction, a…
▽ More
Learning the spatial topology of electroencephalogram (EEG) channels and their temporal dynamics is crucial for decoding attention states. This paper introduces EEG-PatchFormer, a transformer-based deep learning framework designed specifically for EEG attention classification in Brain-Computer Interface (BCI) applications. By integrating a Temporal CNN for frequency-based EEG feature extraction, a pointwise CNN for feature enhancement, and Spatial and Temporal Patching modules for organizing features into spatial-temporal patches, EEG-PatchFormer jointly learns spatial-temporal information from EEG data. Leveraging the global learning capabilities of the self-attention mechanism, it captures essential features across brain regions over time, thereby enhancing EEG data decoding performance. Demonstrating superior performance, EEG-PatchFormer surpasses existing benchmarks in accuracy, area under the ROC curve (AUC), and macro-F1 score on a public cognitive attention dataset. The code can be found via: https://github.com/yi-ding-cs/EEG-PatchFormer .
△ Less
Submitted 18 May, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Registration-Enhanced Segmentation Method for Prostate Cancer in Ultrasound Images
Authors:
Shengtian Sang,
Hassan Jahanandish,
Cynthia Xinran Li,
Indrani Bhattachary,
Jeong Hoon Lee,
Lichun Zhang,
Sulaiman Vesal,
Pejman Ghanouni,
Richard Fan,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Prostate cancer is a major cause of cancer-related deaths in men, where early detection greatly improves survival rates. Although MRI-TRUS fusion biopsy offers superior accuracy by combining MRI's detailed visualization with TRUS's real-time guidance, it is a complex and time-intensive procedure that relies heavily on manual annotations, leading to potential errors. To address these challenges, we…
▽ More
Prostate cancer is a major cause of cancer-related deaths in men, where early detection greatly improves survival rates. Although MRI-TRUS fusion biopsy offers superior accuracy by combining MRI's detailed visualization with TRUS's real-time guidance, it is a complex and time-intensive procedure that relies heavily on manual annotations, leading to potential errors. To address these challenges, we propose a fully automatic MRI-TRUS fusion-based segmentation method that identifies prostate tumors directly in TRUS images without requiring manual annotations. Unlike traditional multimodal fusion approaches that rely on naive data concatenation, our method integrates a registration-segmentation framework to align and leverage spatial information between MRI and TRUS modalities. This alignment enhances segmentation accuracy and reduces reliance on manual effort. Our approach was validated on a dataset of 1,747 patients from Stanford Hospital, achieving an average Dice coefficient of 0.212, outperforming TRUS-only (0.117) and naive MRI-TRUS fusion (0.132) methods, with significant improvements (p $<$ 0.01). This framework demonstrates the potential for reducing the complexity of prostate cancer diagnosis and provides a flexible architecture applicable to other multimodal medical imaging tasks.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Prostate-Specific Foundation Models for Enhanced Detection of Clinically Significant Cancer
Authors:
Jeong Hoon Lee,
Cynthia Xinran Li,
Hassan Jahanandish,
Indrani Bhattacharya,
Sulaiman Vesal,
Lichun Zhang,
Shengtian Sang,
Moon Hyung Choi,
Simon John Christoph Soerensen,
Steve Ran Zhou,
Elijah Richard Sommer,
Richard Fan,
Pejman Ghanouni,
Yuze Song,
Tyler M. Seibert,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (P…
▽ More
Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (ProViCNet), prostate organ-specific vision foundation models for Magnetic Resonance Imaging (MRI) and Trans-Rectal Ultrasound imaging (TRUS) for comprehensive cancer detection. ProViCNet was trained and validated using 4,401 patients across six institutions, as a prostate cancer detection model on radiology images relying on patch-level contrastive learning guided by biopsy confirmed radiologist annotations. ProViCNet demonstrated consistent performance across multiple internal and external validation cohorts with area under the receiver operating curve values ranging from 0.875 to 0.966, significantly outperforming radiologists in the reader study (0.907 versus 0.805, p<0.001) for mpMRI, while achieving 0.670 to 0.740 for TRUS. We also integrated ProViCNet with standard PSA to develop a virtual screening test, and we showed that we can maintain the high sensitivity for detecting clinically significant cancers while more than doubling specificity from 15% to 38% (p<0.001), thereby substantially reducing unnecessary biopsies. These findings highlight that ProViCNet's potential for enhancing prostate cancer diagnosis accuracy and reduce unnecessary biopsies, thereby optimizing diagnostic pathways.
△ Less
Submitted 4 February, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Multimodal MRI-Ultrasound AI for Prostate Cancer Detection Outperforms Radiologist MRI Interpretation: A Multi-Center Study
Authors:
Hassan Jahanandish,
Shengtian Sang,
Cynthia Xinran Li,
Sulaiman Vesal,
Indrani Bhattacharya,
Jeong Hoon Lee,
Richard Fan,
Geoffrey A. Sonna,
Mirabela Rusu
Abstract:
Pre-biopsy magnetic resonance imaging (MRI) is increasingly used to target suspicious prostate lesions. This has led to artificial intelligence (AI) applications improving MRI-based detection of clinically significant prostate cancer (CsPCa). However, MRI-detected lesions must still be mapped to transrectal ultrasound (TRUS) images during biopsy, which results in missing CsPCa. This study systemat…
▽ More
Pre-biopsy magnetic resonance imaging (MRI) is increasingly used to target suspicious prostate lesions. This has led to artificial intelligence (AI) applications improving MRI-based detection of clinically significant prostate cancer (CsPCa). However, MRI-detected lesions must still be mapped to transrectal ultrasound (TRUS) images during biopsy, which results in missing CsPCa. This study systematically evaluates a multimodal AI framework integrating MRI and TRUS image sequences to enhance CsPCa identification. The study included 3110 patients from three cohorts across two institutions who underwent prostate biopsy. The proposed framework, based on the 3D UNet architecture, was evaluated on 1700 test cases, comparing performance to unimodal AI models that use either MRI or TRUS alone. Additionally, the proposed model was compared to radiologists in a cohort of 110 patients. The multimodal AI approach achieved superior sensitivity (80%) and Lesion Dice (42%) compared to unimodal MRI (73%, 30%) and TRUS models (49%, 27%). Compared to radiologists, the multimodal model showed higher specificity (88% vs. 78%) and Lesion Dice (38% vs. 33%), with equivalent sensitivity (79%). Our findings demonstrate the potential of multimodal AI to improve CsPCa lesion targeting during biopsy and treatment planning, surpassing current unimodal models and radiologists; ultimately improving outcomes for prostate cancer patients.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Processing and Analyzing Real-World Driving Data: Insights on Trips, Scenarios, and Human Driving Behaviors
Authors:
Jihun Han,
Dominik Karbowski,
Ayman Moawad,
Namdoo Kim,
Aymeric Rousseau,
Shihong Fan,
Jason Hoon Lee,
Jinho Ha
Abstract:
Analyzing large volumes of real-world driving data is essential for providing meaningful and reliable insights into real-world trips, scenarios, and human driving behaviors. To this end, we developed a multi-level data processing approach that adds new information, segments data, and extracts desired parameters. Leveraging a confidential but extensive dataset (over 1 million km), this approach lea…
▽ More
Analyzing large volumes of real-world driving data is essential for providing meaningful and reliable insights into real-world trips, scenarios, and human driving behaviors. To this end, we developed a multi-level data processing approach that adds new information, segments data, and extracts desired parameters. Leveraging a confidential but extensive dataset (over 1 million km), this approach leads to three levels of in-depth analysis: trip, scenario, and driving. The trip-level analysis explains representative properties observed in real-world trips, while the scenario-level analysis focuses on scenario conditions resulting from road events that reduce vehicle speed. The driving-level analysis identifies the cause of driving regimes for specific situations and characterizes typical human driving behaviors. Such analyses can support the design of both trip- and scenario-based tests, the modeling of human drivers, and the establishment of guidelines for connected and automated vehicles.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Mask Enhanced Deeply Supervised Prostate Cancer Detection on B-mode Micro-Ultrasound
Authors:
Lichun Zhang,
Steve Ran Zhou,
Moon Hyung Choi,
Jeong Hoon Lee,
Shengtian Sang,
Adam Kinnaird,
Wayne G. Brisbane,
Giovanni Lughezzani,
Davide Maffei,
Vittorio Fasulo,
Patrick Albers,
Sulaiman Vesal,
Wei Shao,
Ahmed N. El Kaffas,
Richard E. Fan,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue a…
▽ More
Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue and large variations in appearance, making it challenging for both machine learning and humans to localize it on micro-ultrasound images.
We propose a novel Mask Enhanced Deeply-supervised Micro-US network, termed MedMusNet, to automatically and more accurately segment prostate cancer to be used as potential targets for biopsy procedures. MedMusNet leverages predicted masks of prostate cancer to enforce the learned features layer-wisely within the network, reducing the influence of noise and improving overall consistency across frames.
MedMusNet successfully detected 76% of clinically significant cancer with a Dice Similarity Coefficient of 0.365, significantly outperforming the baseline Swin-M2F in specificity and accuracy (Wilcoxon test, Bonferroni correction, p-value<0.05). While the lesion-level and patient-level analyses showed improved performance compared to human experts and different baseline, the improvements did not reach statistical significance, likely on account of the small cohort.
We have presented a novel approach to automatically detect and segment clinically significant prostate cancer on B-mode micro-ultrasound images. Our MedMusNet model outperformed other models, surpassing even human experts. These preliminary results suggest the potential for aiding urologists in prostate cancer diagnosis via biopsy and treatment decision-making.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
Multi-technology co-optimization approach for sustainable hydrogen and electricity supply chains considering variability and demand scale
Authors:
Sunwoo Kim,
Joungho Park,
Jay H. Lee
Abstract:
In the pursuit of a carbon-neutral future, hydrogen emerges as a pivotal element, serving as a carbon-free energy carrier and feedstock. As efforts to decarbonize sectors such as heating and transportation intensify, understanding and navigating through the dynamics of hydrogen demand expansion becomes critical. Transitioning to hydrogen economy is complicated by varying regional scales and types…
▽ More
In the pursuit of a carbon-neutral future, hydrogen emerges as a pivotal element, serving as a carbon-free energy carrier and feedstock. As efforts to decarbonize sectors such as heating and transportation intensify, understanding and navigating through the dynamics of hydrogen demand expansion becomes critical. Transitioning to hydrogen economy is complicated by varying regional scales and types of hydrogen demand, with forecasts indicating a rise in variable demand that calls for diverse production technologies. Currently, steam methane reforming is prevalent, but its significant carbon emissions make a shift to cleaner alternatives like blue and green hydrogen imperative. Each production method possesses distinct characteristics, necessitating a thorough exploration and co-optimization with electricity supply chains as well as carbon capture, utilization, and storage systems. Our study fills existing research gaps by introducing a superstructure optimization framework that accommodates various demand scenarios and technologies. Through case studies in California, we underscore the critical role of demand profiles in shaping the optimal configurations and economics of supply chains and emphasize the need for diversified portfolios and co-optimization to facilitate sustainable energy transitions.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Integrating solid direct air capture systems with green hydrogen production: Economic synergy of sector coupling
Authors:
Sunwoo Kim,
Joungho Park,
Jay H. Lee
Abstract:
In the global pursuit of sustainable energy solutions, mitigating carbon dioxide (CO2) emissions stands as a pivotal challenge. With escalating atmospheric CO2 levels, the imperative of direct air capture (DAC) systems becomes evident. Simultaneously, green hydrogen (GH) emerges as a pivotal medium for renewable energy. Nevertheless, the substantial expenses associated with these technologies impe…
▽ More
In the global pursuit of sustainable energy solutions, mitigating carbon dioxide (CO2) emissions stands as a pivotal challenge. With escalating atmospheric CO2 levels, the imperative of direct air capture (DAC) systems becomes evident. Simultaneously, green hydrogen (GH) emerges as a pivotal medium for renewable energy. Nevertheless, the substantial expenses associated with these technologies impede widespread adoption, primarily due to significant installation costs and underutilized operational advantages when deployed independently. Integration through sector coupling enhances system efficiency and sustainability, while shared power sources and energy storage devices offer additional economic benefits. In this study, we assess the economic viability of polymer electrolyte membrane electrolyzers versus alkaline electrolyzers within the context of sector coupling. Our findings indicate that combining GH production with solid DAC systems yields significant economic advantages, with approximately a 10% improvement for PEM electrolyzers and a 20% enhancement for alkaline electrolyzers. These results highlight a substantial opportunity to improve the efficiency and economic viability of renewable energy and green hydrogen initiatives, thereby facilitating the broader adoption of cleaner technologies.
△ Less
Submitted 19 October, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization
Authors:
Hoang Viet Do,
Yong Hun Kim,
Joo Han Lee,
Min Ho Lee,
Jin Woo Song
Abstract:
In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit…
▽ More
In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mitigate high drift resulting from accelerometer biases and double integration. Instead, tilt angles measured by gravitational force are utilized alongside relative distance measurements from radar scan matching for the filter's measurement update. Additionally, to further enhance the system's accuracy, we estimate and compensate for the radar velocity scale factor. The performance of the proposed method is verified through five real-world open-source datasets. The results demonstrate that our approach reduces position error by 62% and rotation error by 66% on average compared to the state-of-the-art radar-inertial fusion method in terms of absolute trajectory error.
△ Less
Submitted 24 November, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Authors:
Giorgio Fabbro,
Stefan Uhlich,
Chieh-Hsin Lai,
Woosung Choi,
Marco Martínez-Ramírez,
Weihsiang Liao,
Igor Gadelha,
Geraldo Ramos,
Eddie Hsu,
Hugo Rodrigues,
Fabian-Robert Stöter,
Alexandre Défossez,
Yi Luo,
Jianwei Yu,
Dipam Chakraborty,
Sharada Mohanty,
Roman Solovyev,
Alexander Stempkovskiy,
Tatiana Habruseva,
Nabarun Goswami,
Tatsuya Harada,
Minseok Kim,
Jun Hyung Lee,
Yuanliang Dong,
Xinran Zhang
, et al. (2 additional authors not shown)
Abstract:
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t…
▽ More
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.
△ Less
Submitted 19 April, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Sound Demixing Challenge 2023 Music Demixing Track Technical Report: TFC-TDF-UNet v3
Authors:
Minseok Kim,
Jun Hyung Lee,
Soonyoung Jung
Abstract:
In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. First, we propose TFC-TDF-UNet v3, a time-efficient music source separation model that achieves state-of-the-art results on the MUSDB benchmark. We then give full details regarding our solutions for each Leaderboard, including a loss masking approach for noise-robust training. Code…
▽ More
In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. First, we propose TFC-TDF-UNet v3, a time-efficient music source separation model that achieves state-of-the-art results on the MUSDB benchmark. We then give full details regarding our solutions for each Leaderboard, including a loss masking approach for noise-robust training. Code for reproducing model training and final submissions is available at github.com/kuielab/sdx23.
△ Less
Submitted 21 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Enhancing Generative Networks for Chest Anomaly Localization through Automatic Registration-Based Unpaired-to-Pseudo-Paired Training Data Translation
Authors:
Kyungsu Kim,
Seong Je Oh,
Chae Yeon Lim,
Ju Hwan Lee,
Tae Uk Kim,
Myung Jin Chung
Abstract:
Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To addr…
▽ More
Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To address this problem, we propose an improved two-stage GAN-IT involving registration and data augmentation. For the first stage, we introduce an advanced deep-learning-based registration technique that virtually and reasonably converts unpaired data into paired data for learning registration maps, by sequentially utilizing linear-based global and uniform coordinate transformation and AI-based non-linear coordinate fine-tuning. This approach enables independent and complex coordinate transformation of each detailed location of the lung while recognizing the entire lung structure, thereby achieving higher registration performance with resolving inherent artifacts caused by unpaired conditions. For the second stage, we apply data augmentation to diversify anomaly locations by swapping the left and right lung regions on the uniform registered frames, further improving the performance by alleviating imbalance in data distribution showing left and right lung lesions. The proposed method is model agnostic and shows consistent AL-CXR performance improvement in representative AI models. Therefore, we believe GAN-IT for AL-CXR can be clinically implemented by using our basis framework, even if learning data are scarce or difficult for the pixel-level disease annotation.
△ Less
Submitted 15 June, 2024; v1 submitted 21 July, 2022;
originally announced July 2022.
-
AI-based computer-aided diagnostic system of chest digital tomography synthesis: Demonstrating comparative advantage with X-ray-based AI systems
Authors:
Kyung-Su Kim,
Ju Hwan Lee,
Seong Je Oh,
Myung Jin Chung
Abstract:
Compared with chest X-ray (CXR) imaging, which is a single image projected from the front of the patient, chest digital tomosynthesis (CDTS) imaging can be more advantageous for lung lesion detection because it acquires multiple images projected from multiple angles of the patient. Various clinical comparative analysis and verification studies have been reported to demonstrate this, but there were…
▽ More
Compared with chest X-ray (CXR) imaging, which is a single image projected from the front of the patient, chest digital tomosynthesis (CDTS) imaging can be more advantageous for lung lesion detection because it acquires multiple images projected from multiple angles of the patient. Various clinical comparative analysis and verification studies have been reported to demonstrate this, but there were no artificial intelligence (AI)-based comparative analysis studies. Existing AI-based computer-aided detection (CAD) systems for lung lesion diagnosis have been developed mainly based on CXR images; however, CAD-based on CDTS, which uses multi-angle images of patients in various directions, has not been proposed and verified for its usefulness compared to CXR-based counterparts. This study develops/tests a CDTS-based AI CAD system to detect lung lesions to demonstrate performance improvements compared to CXR-based AI CAD. We used multiple projection images as input for the CDTS-based AI model and a single-projection image as input for the CXR-based AI model to fairly compare and evaluate the performance between models. The proposed CDTS-based AI CAD system yielded sensitivities of 0.782 and 0.785 and accuracies of 0.895 and 0.837 for the performance of detecting tuberculosis and pneumonia, respectively, against normal subjects. These results show higher performance than sensitivities of 0.728 and 0.698 and accuracies of 0.874 and 0.826 for detecting tuberculosis and pneumonia through the CXR-based AI CAD, which only uses a single projection image in the frontal direction. We found that CDTS-based AI CAD improved the sensitivity of tuberculosis and pneumonia by 5.4% and 8.7% respectively, compared to CXR-based AI CAD without loss of accuracy. Therefore, we comparatively prove that CDTS-based AI CAD technology can improve performance more than CXR, enhancing the clinical applicability of CDTS.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
3D unsupervised anomaly detection and localization through virtual multi-view projection and reconstruction: Clinical validation on low-dose chest computed tomography
Authors:
Kyung-Su Kim,
Seong Je Oh,
Ju Hwan Lee,
Myung Jin Chung
Abstract:
Computer-aided diagnosis for low-dose computed tomography (CT) based on deep learning has recently attracted attention as a first-line automatic testing tool because of its high accuracy and low radiation exposure. However, existing methods rely on supervised learning, imposing an additional burden to doctors for collecting disease data or annotating spatial labels for network training, consequent…
▽ More
Computer-aided diagnosis for low-dose computed tomography (CT) based on deep learning has recently attracted attention as a first-line automatic testing tool because of its high accuracy and low radiation exposure. However, existing methods rely on supervised learning, imposing an additional burden to doctors for collecting disease data or annotating spatial labels for network training, consequently hindering their implementation. We propose a method based on a deep neural network for computer-aided diagnosis called virtual multi-view projection and reconstruction for unsupervised anomaly detection. Presumably, this is the first method that only requires data from healthy patients for training to identify three-dimensional (3D) regions containing any anomalies. The method has three key components. Unlike existing computer-aided diagnosis tools that use conventional CT slices as the network input, our method 1) improves the recognition of 3D lung structures by virtually projecting an extracted 3D lung region to obtain two-dimensional (2D) images from diverse views to serve as network inputs, 2) accommodates the input diversity gain for accurate anomaly detection, and 3) achieves 3D anomaly/disease localization through a novel 3D map restoration method using multiple 2D anomaly maps. The proposed method based on unsupervised learning improves the patient-level anomaly detection by 10% (area under the curve, 0.959) compared with a gold standard based on supervised learning (area under the curve, 0.848), and it localizes the anomaly region with 93% accuracy, demonstrating its high performance.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition
Authors:
Xu Yang,
Daoyuan Wu,
Xiao Yi,
Jimmy H. M. Lee,
Tan Lee
Abstract:
Online exams via video conference software like Zoom have been adopted in many schools due to COVID-19. While it is convenient, it is challenging for teachers to supervise online exams from simultaneously displayed student Zoom windows. In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time…
▽ More
Online exams via video conference software like Zoom have been adopted in many schools due to COVID-19. While it is convenient, it is challenging for teachers to supervise online exams from simultaneously displayed student Zoom windows. In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis. To build such a novel system in its first kind, we overcome three challenges. First, we discover a lightweight approach to capturing exam video streams and analyzing them in real time. Second, we utilize the left-corner names that are displayed on each student's Zoom window and propose an improved OCR (optical character recognition) technique to automatically gather the ground truth for the student faces with dynamic positions. Third, we perform several experimental comparisons and optimizations to efficiently shorten the training and testing time required on teachers' PC. Our evaluation shows that iExam achieves high accuracy, 90.4% for real-time face detection and 98.4% for post-exam face recognition, while maintaining acceptable runtime performance. We have made iExam's source code available at https://github.com/VPRLab/iExam.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Authors:
Hyeong-Seok Choi,
Juheon Lee,
Wansoo Kim,
Jie Hwan Lee,
Hoon Heo,
Kyogu Lee
Abstract:
We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on informa…
▽ More
We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.
△ Less
Submitted 28 October, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Authors:
Hyeong-Seok Choi,
Sungjin Park,
Jie Hwan Lee,
Hoon Heo,
Dongsuk Jeon,
Kyogu Lee
Abstract:
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To this end, we propose Tiny Recurrent U-Net (TRU-Net), a lightweight online inference model that matches the performance of current state-of-the-art mod…
▽ More
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks. The number of parameters of state-of-the-art models, however, is often too large to be deployed on devices for real-world applications. To this end, we propose Tiny Recurrent U-Net (TRU-Net), a lightweight online inference model that matches the performance of current state-of-the-art models. The size of the quantized version of TRU-Net is 362 kilobytes, which is small enough to be deployed on edge devices. In addition, we combine the small-sized model with a new masking method called phase-aware $β$-sigmoid mask, which enables simultaneous denoising and dereverberation. Results of both objective and subjective evaluations have shown that our model can achieve competitive performance with the current state-of-the-art models on benchmark datasets using fewer parameters by orders of magnitude.
△ Less
Submitted 22 June, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
A Dynamic Penalty Function Approach for Constraints-Handling in Reinforcement Learning
Authors:
Haeun Yoo,
Victor M. Zavala,
Jay H. Lee
Abstract:
Reinforcement learning (RL) is attracting attention as an effective way to solve sequential optimization problems that involve high dimensional state/action space and stochastic uncertainties. Many such problems involve constraints expressed by inequality constraints. This study focuses on using RL to solve constrained optimal control problems. Most RL application studies have dealt with inequalit…
▽ More
Reinforcement learning (RL) is attracting attention as an effective way to solve sequential optimization problems that involve high dimensional state/action space and stochastic uncertainties. Many such problems involve constraints expressed by inequality constraints. This study focuses on using RL to solve constrained optimal control problems. Most RL application studies have dealt with inequality constraints by adding soft penalty terms for violating the constraints to the reward function. However, while training neural networks to learn the value (or Q) function, one can run into computational issues caused by the sharp change in the function value at the constraint boundary due to the large penalty imposed. This difficulty during training can lead to convergence problems and ultimately lead to poor closed-loop performance. To address this issue, this study proposes a dynamic penalty (DP) approach where the penalty factor is gradually and systematically increased during training as the iteration episodes proceed. We first examine the ability of a neural network to represent a value function when uniform, linear, or DP functions are added to prevent constraint violation. The agent trained by a Deep Q Network (DQN) algorithm with the DP function approach was compared with agents with other constant penalty functions in a simple vehicle control problem. Results show that the proposed approach can improve the neural network approximation accuracy and provide faster convergence when close to a solution.
△ Less
Submitted 31 March, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition
Authors:
Junghyun Koo,
Jie Hwan Lee,
Jaewoo Pyo,
Yujin Jo,
Kyogu Lee
Abstract:
Collecting and accessing a large amount of medical data is very time-consuming and laborious, not only because it is difficult to find specific patients but also because it is required to resolve the confidentiality of a patient's medical records. On the other hand, there are deep learning models, trained on easily collectible, large scale datasets such as Youtube or Wikipedia, offering useful rep…
▽ More
Collecting and accessing a large amount of medical data is very time-consuming and laborious, not only because it is difficult to find specific patients but also because it is required to resolve the confidentiality of a patient's medical records. On the other hand, there are deep learning models, trained on easily collectible, large scale datasets such as Youtube or Wikipedia, offering useful representations. It could therefore be very advantageous to utilize the features from these pre-trained networks for handling a small amount of data at hand. In this work, we exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network, with a small dataset provided by the ADReSS Challenge at INTERSPEECH 2020. The challenge regards to discern patients suspicious of Alzheimer's Dementia by providing acoustic and textual data. With the multi-modal features, we modify a Convolutional Recurrent Neural Network based structure to perform classification and regression tasks simultaneously and is capable of computing conversations with variable lengths. Our test results surpass baseline's accuracy by 18.75%, and our validation result for the regression task shows the possibility of classifying 4 classes of cognitive impairment with an accuracy of 78.70%.
△ Less
Submitted 2 March, 2021; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net
Authors:
Hyeong-Seok Choi,
Hoon Heo,
Jie Hwan Lee,
Kyogu Lee
Abstract:
In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mas…
▽ More
In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the triangle inequality in the complex domain between three signal components such as mixture, source and the rest. Two PHMs are used to deal with direct and reverberant source, which allows controlling the proportion of reverberation in the enhanced speech at inference time. In addition, to improve the speech enhancement performance, we propose a new time-domain loss function and show a reasonable performance gain compared to MSE loss in the complex domain. Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed which significantly reduces the computational overhead up to 88.9% compared to the naïve version.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization
Authors:
Jung Hyun Lee,
Jihun Yun,
Sung Ju Hwang,
Eunho Yang
Abstract:
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] s…
▽ More
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] successfully employ the popular Gumbel-Softmax that allows this transformation with efficient gradient-based optimization. However, RQ with this Gumbel-Softmax relaxation still suffers from bias-variance trade-off depending on the temperature parameter of Gumbel-Softmax. To resolve the issue, we propose a novel method, Semi-Relaxed Quantization (SRQ) that uses multi-class straight-through estimator to effectively reduce the bias and variance, along with a new regularization technique, DropBits that replaces dropout regularization to randomly drop the bits instead of neurons to further reduce the bias of the multi-class straight-through estimator in SRQ. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer using DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support the quantized lottery ticket hypothesis: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.
△ Less
Submitted 7 September, 2021; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Quantum tensor singular value decomposition with applications to recommendation systems
Authors:
Xiaoqiang Wang,
Lejia Gu,
Joseph Heung-wing Joseph Lee,
Guofeng Zhang
Abstract:
In this paper, we present a quantum singular value decomposition algorithm for third-order tensors inspired by the classical algorithm of tensor singular value decomposition (t-svd) and then extend it to order-$p$ tensors. It can be proved that the quantum version of the t-svd for a third-order tensor $\mathcal{A} \in \mathbb{R}^{N\times N \times N}$ achieves the complexity of…
▽ More
In this paper, we present a quantum singular value decomposition algorithm for third-order tensors inspired by the classical algorithm of tensor singular value decomposition (t-svd) and then extend it to order-$p$ tensors. It can be proved that the quantum version of the t-svd for a third-order tensor $\mathcal{A} \in \mathbb{R}^{N\times N \times N}$ achieves the complexity of $\mathcal{O}(N{\rm polylog}(N))$, an exponential speedup compared with its classical counterpart. As an application, we propose a quantum algorithm for recommendation systems which incorporates the contextual situation of users to the personalized recommendation. We provide recommendations varying with contexts by measuring the output quantum state corresponding to an approximation of this user's preferences. This algorithm runs in expected time $\mathcal{O}(N{\rm polylog}(N){\rm poly}(k)),$ if every frontal slice of the preference tensor has a good rank-$k$ approximation. At last, we provide a quantum algorithm for tensor completion based on a different truncation method which is tested to have a good performance in dynamic video completion.
△ Less
Submitted 3 February, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Audio query-based music source separation
Authors:
Jie Hwan Lee,
Hyeong-Seok Choi,
Kyogu Lee
Abstract:
In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a networ…
▽ More
In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent vector from the training samples, allowing separation in the absence of a query. We evaluate our method on the MUSDB18 dataset, and experimental results show that the proposed method can separate multiple sources with a single network. In addition, through further investigation of the latent space we demonstrate that our method can generate continuous outputs via latent vector interpolation.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.