Skip to main content

Showing 1–50 of 62 results for author: Yoon, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.22511  [pdf, ps, other

    eess.IV cs.CV

    Surf2CT: Cascaded 3D Flow Matching Models for Torso 3D CT Synthesis from Skin Surface

    Authors: Siyeop Yoon, Yujin Oh, Pengfei Jin, Sifan Song, Matthew Tivnan, Dufan Wu, Xiang Li, Quanzheng Li

    Abstract: We present Surf2CT, a novel cascaded flow matching framework that synthesizes full 3D computed tomography (CT) volumes of the human torso from external surface scans and simple demographic data (age, sex, height, weight). This is the first approach capable of generating realistic volumetric internal anatomy images solely based on external body shape and demographics, without any internal imaging.… ▽ More

    Submitted 28 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Neurips 2025 submitted

  2. arXiv:2505.22489  [pdf, other

    eess.IV cs.CV cs.GR

    Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics

    Authors: Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li

    Abstract: We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative proce… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: MICCAI2025 Submitted version

  3. arXiv:2503.04966  [pdf, other

    eess.IV cs.AI cs.CV

    Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

    Authors: Siyeop Yoon, Yujin Oh, Matthew Tivnan, Sifan Song, Pengfei Jin, Sekeun Kim, Hyun Jin Cho, Dufan Wu, Raul Uppot, Quanzheng Li

    Abstract: This study presents a 3D flow-matching model designed to predict the progression of the frozen region (iceball) during kidney cryoablation. Precise intraoperative guidance is critical in cryoablation to ensure complete tumor eradication while preserving adjacent healthy tissue. However, conventional methods, typically based on physics driven or diffusion based simulations, are computationally dema… ▽ More

    Submitted 11 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: MICCAI 2025 submitted version (author list included)

  4. arXiv:2502.19759  [pdf, other

    cs.SD eess.AS

    Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

    Authors: Heeseung Kim, Che Hyun Lee, Sangkwon Park, Jiheum Yeom, Nohil Park, Sangwon Yu, Sungroh Yoon

    Abstract: Recent advancements in multi-turn voice interaction models have improved user-model communication. However, while closed-source models effectively retain and recall past utterances, whether open-source models share this ability remains unexplored. To fill this gap, we systematically evaluate how well open-source interaction models utilize past utterances using ContextDialog, a benchmark we propose… ▽ More

    Submitted 23 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: ACL 2025 Findings, Project Page: https://contextdialog.github.io/

  5. arXiv:2502.00619  [pdf, other

    eess.IV cs.AI cs.CV

    Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

    Authors: Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Kyungsang Kim, Jin Sung Kim, Xiang Li, Quanzheng Li

    Abstract: Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. We provide a comprehensive analysis of its underlying mecha… ▽ More

    Submitted 27 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: ICML 2025 spotlight, see https://openreview.net/forum?id=BUONdewsBa

  6. arXiv:2501.11225  [pdf, other

    cond-mat.mtrl-sci cs.CV eess.IV

    CNN-based TEM image denoising from first principles

    Authors: Jinwoong Chae, Sungwook Hong, Sungkyu Kim, Sungroh Yoon, Gunn Kim

    Abstract: Transmission electron microscope (TEM) images are often corrupted by noise, hindering their interpretation. To address this issue, we propose a deep learning-based approach using simulated images. Using density functional theory calculations with a set of pseudo-atomic orbital basis sets, we generate highly accurate ground truth images. We introduce four types of noise into these simulations to cr… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: 10 pages and 4 figures

  7. arXiv:2412.01140  [pdf, other

    cs.CV eess.IV

    Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes

    Authors: Suhyun Shin, Seungwoo Yoon, Ryota Maeda, Seung-Hwan Baek

    Abstract: Hyperspectral 3D imaging captures both depth maps and hyperspectral images, enabling comprehensive geometric and material analysis. Recent methods achieve high spectral and depth accuracy; however, they require long acquisition times often over several minutes or rely on large, expensive systems, restricting their use to static scenes. We present Dense Dispersed Structured Light (DDSL), an accurat… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  8. arXiv:2410.00184  [pdf, other

    eess.IV cs.CV cs.LG

    Volumetric Conditional Score-based Residual Diffusion Model for PET/MR Denoising

    Authors: Siyeop Yoon, Rui Hu, Yuang Wang, Matthew Tivnan, Young-don Son, Dufan Wu, Xiang Li, Kyungsang Kim, Quanzheng Li

    Abstract: PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes. The necessity for PET denoising arises from the intrinsic high noise levels in PET imaging, which can significantly hinder the accurate interpretation and quantitative analysis of the scans. With advances in deep learning techniques, diffusion model-based PET denoising techniques have sho… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Accepted to MICCAI 2024

  9. arXiv:2409.15760  [pdf, other

    cs.SD eess.AS

    NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers

    Authors: Nohil Park, Heeseung Kim, Che Hyun Lee, Jooyoung Choi, Jiheum Yeom, Sungroh Yoon

    Abstract: We present NanoVoice, a personalized text-to-speech model that efficiently constructs voice adapters for multiple speakers simultaneously. NanoVoice introduces a batch-wise speaker adaptation technique capable of fine-tuning multiple references in parallel, significantly reducing training time. Beyond building separate adapters for each speaker, we also propose a parameter sharing technique that r… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, Demo Page: https://nanovoice.github.io/

  10. arXiv:2409.15759  [pdf, other

    cs.SD eess.AS

    VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

    Authors: Jiheum Yeom, Heeseung Kim, Jooyoung Choi, Che Hyun Lee, Nohil Park, Sungroh Yoon

    Abstract: When applying parameter-efficient finetuning via LoRA onto speaker adaptive text-to-speech models, adaptation performance may decline compared to full-finetuned counterparts, especially for out-of-domain speakers. Here, we propose VoiceGuider, a parameter-efficient speaker adaptive text-to-speech system reinforced with autoguidance to enhance the speaker adaptation performance, reducing the gap ag… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, Demo Page: https://voiceguider.github.io/

  11. arXiv:2408.14739  [pdf, other

    cs.SD eess.AS

    VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech

    Authors: Heeseung Kim, Sang-gil Lee, Jiheum Yeom, Che Hyun Lee, Sungwon Kim, Sungroh Yoon

    Abstract: We propose VoiceTailor, a parameter-efficient speaker-adaptive text-to-speech (TTS) system, by equipping a pre-trained diffusion-based TTS model with a personalized adapter. VoiceTailor identifies pivotal modules that benefit from the adapter based on a weight change ratio analysis. We utilize Low-Rank Adaptation (LoRA) as a parameter-efficient adaptation method and incorporate the adapter into pi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  12. arXiv:2408.05769  [pdf, other

    cs.CL cs.SD eess.AS

    LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-su… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  13. arXiv:2407.12780  [pdf, other

    physics.med-ph eess.IV

    Hallucination Index: An Image Quality Metric for Generative Reconstruction Models

    Authors: Matthew Tivnan, Siyeop Yoon, Zhennong Chen, Xiang Li, Dufan Wu, Quanzheng Li

    Abstract: Generative image reconstruction algorithms such as measurement conditioned diffusion models are increasingly popular in the field of medical imaging. These powerful models can transform low signal-to-noise ratio (SNR) inputs into outputs with the appearance of high SNR. However, the outputs can have a new type of error called hallucinations. In medical imaging, these hallucinations may not be obvi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.04162  [pdf, other

    eess.IV cs.CV

    Measurement Embedded Schrödinger Bridge for Inverse Problems

    Authors: Yuang Wang, Pengfei Jin, Siyeop Yoon, Matthew Tivnan, Quanzheng Li, Li Zhang, Dufan Wu

    Abstract: Score-based diffusion models are frequently employed as structural priors in inverse problems. However, their iterative denoising process, initiated from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB), which begins with the corrupted image, presents a promising alternative as a prior for addressing inverse problems. In this work, we introduc… ▽ More

    Submitted 22 May, 2024; originally announced July 2024.

    Comments: 14 pages, 2 figures, Neurips preprint

  15. arXiv:2407.02321  [pdf

    physics.optics eess.IV

    Implementation of reflection matrix microscopy: An algorithm perspective

    Authors: Sungsam Kang, Seokchan Yoon, Wonshik Choi

    Abstract: Over the past decade, reflection matrix microscopy (RMM) and advanced image reconstruction algorithms have emerged to address the fundamental imaging depth limitations of optical microscopy in thick biological tissues and complex media. In this study, we introduce significant advancements in reflection matrix processing algorithms, including logical indexing, power iterations, and low-frequency bl… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  16. arXiv:2403.11578  [pdf, other

    eess.AS

    AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

    Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.06940  [pdf, other

    eess.IV cs.LG q-bio.QM

    Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

    Authors: Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

    Abstract: Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  18. arXiv:2403.06069  [pdf, other

    eess.IV cs.CV cs.LG

    Implicit Image-to-Image Schrodinger Bridge for Image Restoration

    Authors: Yuang Wang, Siyeop Yoon, Pengfei Jin, Matthew Tivnan, Sifan Song, Zhennong Chen, Rui Hu, Li Zhang, Quanzheng Li, Zhiqiang Chen, Dufan Wu

    Abstract: Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-ba… ▽ More

    Submitted 21 March, 2025; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: 27 pages, 8 figures, accepted by Pattern Recognition

  19. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo

    Abstract: Recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech. However, an LLM-based strategy for modeling spoken dialogs remains elusive, calling for further investigation. This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken respons… ▽ More

    Submitted 27 November, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024, Project Page: https://unifiedsdm.github.io/

  20. HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More

    Submitted 13 April, 2025; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023, 14 pages, 13 figures

  21. arXiv:2312.05790  [pdf, other

    cs.LG cs.AI eess.SP

    SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

    Authors: Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

    Abstract: Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency… ▽ More

    Submitted 19 January, 2025; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 camera-ready version w/ Appendix

  22. arXiv:2310.10088  [pdf, other

    eess.IV cs.CV cs.LG

    PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

    Authors: Hyemi Jang, Junsung Park, Dahuin Jung, Jaihyun Lew, Ho Bae, Sungroh Yoon

    Abstract: Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised deno… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  23. arXiv:2310.08598  [pdf, other

    eess.IV cs.AI cs.CV

    Domain Generalization for Medical Image Analysis: A Review

    Authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk

    Abstract: Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet… ▽ More

    Submitted 7 December, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the IEEE, Volume 112, Issue 10, 2024

  24. Deep Video Inpainting Guided by Audio-Visual Self-Supervision

    Authors: Kyuyeon Kim, Junsik Jung, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information. Then, the audio-vis… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2022

  25. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  26. arXiv:2306.16083  [pdf, other

    cs.SD eess.AS

    UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data

    Authors: Heeseung Kim, Sungwon Kim, Jiheum Yeom, Sungroh Yoon

    Abstract: We propose UnitSpeech, a speaker-adaptive speech synthesis method that fine-tunes a diffusion-based text-to-speech (TTS) model using minimal untranscribed data. To achieve this, we use the self-supervised unit representation as a pseudo transcript and integrate the unit encoder into the pre-trained TTS model. We train the unit encoder to provide speech content to the diffusion-based decoder and th… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023, Oral

  27. arXiv:2305.16371  [pdf, other

    cs.CL cs.SD eess.AS

    INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL2023

  28. arXiv:2302.04224  [pdf

    eess.SP

    Data Poisoning Attacks on EEG Signal-based Risk Assessment Systems

    Authors: Zhibo Zhang, Sani Umar, Ahmed Y. Al Hammadi, Sangyoung Yoon, Ernesto Damiani, Chan Yeob Yeun

    Abstract: Industrial insider risk assessment using electroencephalogram (EEG) signals has consistently attracted a lot of research attention. However, EEG signal-based risk assessment systems, which could evaluate the emotional states of humans, have shown several vulnerabilities to data poison attacks. In this paper, from the attackers' perspective, data poison attacks involving label-flipping occurring in… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: 2nd International Conference on Business Analytics For Technology and Security (ICBATS)

  29. Explainable Data Poison Attacks on Human Emotion Evaluation Systems based on EEG Signals

    Authors: Zhibo Zhang, Sani Umar, Ahmed Y. Al Hammadi, Sangyoung Yoon, Ernesto Damiani, Claudio Agostino Ardagna, Nicola Bena, Chan Yeob Yeun

    Abstract: The major aim of this paper is to explain the data poisoning attacks using label-flipping during the training stage of the electroencephalogram (EEG) signal-based human emotion evaluation systems deploying Machine Learning models from the attackers' perspective. Human emotion evaluation using EEG signals has consistently attracted a lot of research attention. The identification of human emotional… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Journal ref: IEEE Access 2023

  30. arXiv:2211.11381  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    LISA: Localized Image Stylization with Audio via Implicit Neural Representation

    Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

    Abstract: We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a pa… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  31. arXiv:2210.11592  [pdf, other

    cs.CR eess.SY

    New data poison attacks on machine learning classifiers for mobile exfiltration

    Authors: Miguel A. Ramirez, Sangyoung Yoon, Ernesto Damiani, Hussam Al Hamadi, Claudio Agostino Ardagna, Nicola Bena, Young-Ji Byon, Tae-Yeon Kim, Chung-Suk Cho, Chan Yeob Yeun

    Abstract: Most recent studies have shown several vulnerabilities to attacks with the potential to jeopardize the integrity of the model, opening in a few recent years a new window of opportunity in terms of cyber-security. The main interest of this paper is directed towards data poisoning attacks involving label-flipping, this kind of attacks occur during the training phase, being the aim of the attacker to… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.10276

  32. arXiv:2207.13223  [pdf, other

    cs.LG eess.IV

    XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning

    Authors: Ahmad Wisnu Mulyadi, Wonsik Jung, Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

    Abstract: Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  33. Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

    Authors: Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung

    Abstract: In this paper, we propose a novel speech emotion recognition model called Cross Attention Network (CAN) that uses aligned audio and text signals as inputs. It is inspired by the fact that humans recognize speech as a combination of simultaneously produced acoustic and textual signals. First, our method segments the audio and the underlying text signals into equal number of steps in an aligned way… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: 5 pages, accepted by INTERSPEECH 2020

    Journal ref: Proc. Interspeech 2020, 2717-2721

  34. arXiv:2206.07578   

    cs.CV cs.LG eess.IV

    E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations

    Authors: Jongwan Kim, DongJin Lee, Byunggook Na, Seongsik Park, Jeonghee Jo, Sungroh Yoon

    Abstract: Event cameras respond to brightness changes in the scene asynchronously and independently for every pixel. Due to the properties, these cameras have distinct features: high dynamic range (HDR), high temporal resolution, and low power consumption. However, the results of event cameras should be processed into an alternative representation for computer vision tasks. Also, they are usually noisy and… ▽ More

    Submitted 13 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: This submission has been withdrawn by arXiv administrators due to inappropriate text overlap with external sources. Additional information at https://doi.org/10.1109/CVPR52688.2022.01319

    Journal ref: The IEEE / CVF Computer Vision and Pattern Recognition Conference 2022

  35. arXiv:2206.04658  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    BigVGAN: A Universal Neural Vocoder with Large-Scale Training

    Authors: Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

    Abstract: Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments. In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution scenarios without fine-tun… ▽ More

    Submitted 16 February, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: To appear at ICLR 2023. Listen to audio samples from BigVGAN at: https://bigvgan-demo.github.io/

  36. arXiv:2205.15370  [pdf, other

    cs.SD cs.AI eess.AS

    Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

    Authors: Sungwon Kim, Heeseung Kim, Sungroh Yoon

    Abstract: We propose Guided-TTS 2, a diffusion-based generative model for high-quality adaptive TTS using untranscribed data. Guided-TTS 2 combines a speaker-conditional diffusion model with a speaker-dependent phoneme classifier for adaptive text-to-speech. We train the speaker-conditional diffusion model on large-scale untranscribed datasets for a classifier-free guidance method and further fine-tune the… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  37. arXiv:2112.01535  [pdf, other

    eess.IV cs.AI cs.LG

    Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images

    Authors: Sang-gil Lee, Eunji Kim, Jae Seok Bae, Jung Hoon Kim, Sungroh Yoon

    Abstract: The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase align… ▽ More

    Submitted 16 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: IEEE TETCI. 14 pages, 8 figures, 5 tables

  38. arXiv:2112.00007  [pdf, other

    cs.GR cs.CV cs.LG cs.SD eess.AS

    Sound-Guided Semantic Image Manipulation

    Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

    Abstract: The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a fra… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  39. arXiv:2111.11755  [pdf, other

    cs.SD cs.AI eess.AS

    Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

    Authors: Heeseung Kim, Sungwon Kim, Sungroh Yoon

    Abstract: We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for classifier guidance. Our unconditional diffusion model learns to generate speech without any context from untranscribed speech data. For… ▽ More

    Submitted 10 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: 15 pages, 5 figures, ICML'2022

  40. arXiv:2109.02342  [pdf, other

    eess.IV cs.CV physics.med-ph

    Automated Cardiac Resting Phase Detection Targeted on the Right Coronary Artery

    Authors: Seung Su Yoon, Elisabeth Preuhs, Michaela Schmidt, Christoph Forman, Teodora Chitiboi, Puneet Sharma, Juliano Lara Fernandes, Christoph Tillmanns, Jens Wetzl, Andreas Maier

    Abstract: Static cardiac imaging such as late gadolinium enhancement, mapping, or 3-D coronary angiography require prior information, e.g., the phase during a cardiac cycle with least motion, called resting phase (RP). The purpose of this work is to propose a fully automated framework that allows the detection of the right coronary artery (RCA) RP within CINE series. The proposed prototype system consists o… ▽ More

    Submitted 31 January, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2023:001

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2023)

  41. arXiv:2108.02716  [pdf, other

    cs.NI eess.SP

    Link Quality-Guaranteed Minimum-Cost Millimeter-Wave Base Station Deployment

    Authors: Miaomiao Dong, Taejoon Kim, Minsung Cho, Kangeun Lee, Sungrok Yoon

    Abstract: Today's growth in the volume of wireless devices coupled with the promise of supporting data-intensive 5G-&-beyond use cases is driving the industry to deploy more millimeter-wave (mmWave) base stations (BSs). Although mmWave cellular systems can carry a larger volume of traffic, dense deployment, in turn, increases the BS installation and maintenance cost, which has been largely ignored in their… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 16 pages, submitted to IEEE Transactions on Wireless Communications

  42. arXiv:2106.06406  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

    Authors: Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu

    Abstract: Denoising diffusion probabilistic models have been recently proposed to generate high-quality samples by estimating the gradient of the data density. The framework defines the prior noise as a standard Gaussian distribution, whereas the corresponding data distribution may be more complicated than the standard Gaussian distribution, which potentially introduces inefficiency in denoising the prior n… ▽ More

    Submitted 20 February, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: ICLR 2022. 19 pages, 7 figures, 8 tables. Audio samples: https://speechresearch.github.io/priorgrad/

  43. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  44. arXiv:2104.14730  [pdf, other

    cs.CV eess.IV

    Perceptual Image Quality Assessment with Transformers

    Authors: Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, Junwoo Lee

    Abstract: In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality assessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) back… ▽ More

    Submitted 4 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted to NTIRE workshop at CVPR 2021. 1st Place in NTIRE 2021 perceptual IQA challenge. https://github.com/manricheon/IQT

  45. arXiv:2101.04662  [pdf, other

    eess.SY

    Output Regulation of Linear Aperiodic Sampled-Data Systems

    Authors: Himadri Basu, Francesco Ferrante, Se Young Yoon

    Abstract: This paper deals with the output regulation problem of a linear time-invariant system in the presence of sporadically available measurement streams. A regulator with a continuous intersample injection term is proposed, where the intersample injection is provided by a linear dynamical system and the state of which is reset with the arrival of every new measurement updates. The resulting system is a… ▽ More

    Submitted 15 February, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: Accepted for presentation at the American Control Conference 2022

  46. arXiv:2010.14231  [pdf

    eess.IV physics.med-ph

    Virtual Alignment Method and its application to the dental prostheses and diagnosis

    Authors: Kyungtaek Jun, Seokhwan Yoon, Jae-Hong Lim, SeungJoon Noh

    Abstract: The recent proposal of a new alignment solution for X-ray tomography, Virtual alignment method (VAM) allowed a more accurate method to remove the possible errors that limit the resolution and clarity of the reconstructed image. In the field of dentistry, the movement of patients during the scanning poses as one of the major factors hindering the final reconstructed image quality. Here, the patient… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: 21 Pages, 5 figures

  47. arXiv:2010.11457  [pdf, other

    eess.AS cs.SD

    Momentum Contrast Speaker Representation Learning

    Authors: Jangho Lee, Jaihyun Koh, Sungroh Yoon

    Abstract: Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementi… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  48. arXiv:2005.11129  [pdf, other

    eess.AS cs.SD

    Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

    Authors: Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

    Abstract: Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external al… ▽ More

    Submitted 22 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted by NeurIPS2020

  49. arXiv:2005.08374  [pdf, ps, other

    eess.SP cs.LG

    Intelligent O-RAN for Beyond 5G and 6G Wireless Networks

    Authors: Solmaz Niknam, Abhishek Roy, Harpreet S. Dhillon, Sukhdeep Singh, Rahul Banerji, Jeffery H. Reed, Navrati Saxena, Seungil Yoon

    Abstract: Building on the principles of openness and intelligence, there has been a concerted global effort from the operators towards enhancing the radio access network (RAN) architecture. The objective is to build an operator-defined RAN architecture (and associated interfaces) on open hardware that provides intelligent radio control for beyond fifth generation (5G) as well as future sixth generation (6G)… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

  50. arXiv:1910.11122  [pdf

    cs.CV cs.LG eess.IV

    Peanut Maturity Classification using Hyperspectral Imagery

    Authors: Sheng Zou, Yu-Chien Tseng, Alina Zare, Diane Rowland, Barry Tillman, Seung-Chul Yoon

    Abstract: Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigor and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in color from white to black as the seed matures. The maturity assessment process involves the removal of the… ▽ More

    Submitted 24 October, 2019; v1 submitted 20 October, 2019; originally announced October 2019.