Skip to main content

Showing 1–9 of 9 results for author: Byun, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.15254  [pdf, ps, other

    cs.SD eess.AS

    Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework

    Authors: Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon

    Abstract: We propose a speech enhancement system that combines speaker-agnostic speech restoration with voice conversion (VC) to obtain a studio-level quality speech signal. While voice conversion models are typically used to change speaker characteristics, they can also serve as a means of speech restoration when the target speaker is the same as the source speaker. However, since VC models are vulnerable… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 5 pages, 3 figures, Accepted to INTERSPEECH 2025

  2. arXiv:2410.00046  [pdf, other

    eess.IV cs.CV cs.LG

    Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation

    Authors: Yujin Oh, Sangjoon Park, Xiang Li, Wang Yi, Jonathan Paly, Jason Efstathiou, Annie Chan, Jun Won Kim, Hwa Kyung Byun, Ik Jae Lee, Jaeho Cho, Chan Woo Wee, Peng Shu, Peilong Wang, Nathan Yu, Jason Holmes, Jong Chul Ye, Quanzheng Li, Wei Liu, Woong Sub Koom, Jin Sung Kim, Kyungsang Kim

    Abstract: Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the… ▽ More

    Submitted 26 October, 2024; v1 submitted 27 September, 2024; originally announced October 2024.

    Comments: 39 pages

  3. arXiv:2409.06126  [pdf, other

    eess.AS cs.SD

    VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

    Authors: Kyungguen Byun, Jason Filos, Erik Visser, Sunkuk Moon

    Abstract: Noise suppression (NS) algorithms are effective in improving speech quality in many cases. However, aggressive noise suppression can damage the target speech, reducing both speech intelligibility and quality despite removing the noise. This study proposes an explicit speech restoration method using a voice conversion (VC) technique for restoration after noise suppression. We observed that high-qua… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures, submitted to ICASSP 2025

  4. LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

    Authors: Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Yeona Cho, Ik Jae Lee, Jin Sung Kim, Jong Chul Ye

    Abstract: Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven mul… ▽ More

    Submitted 24 October, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Published in Nature Communications, see https://www.nature.com/articles/s41467-024-53387-y

    Journal ref: Nat Commun 15, 9186 (2024)

  5. arXiv:2309.03364  [pdf, other

    cs.SD eess.AS

    Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

    Authors: Kyungguen Byun, Sunkuk Moon, Erik Visser

    Abstract: We propose a highly controllable voice manipulation system that can perform any-to-any voice conversion (VC) and prosody modulation simultaneously. State-of-the-art VC systems can transfer sentence-level characteristics such as speaker, emotion, and speaking style. However, manipulating the frame-level prosody, such as pitch, energy and speaking rate, still remains challenging. Our proposed model… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, submitted to ICASSP 2024

  6. arXiv:2309.02730  [pdf, other

    eess.AS cs.AI cs.SD

    Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

    Authors: Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser

    Abstract: While many recent any-to-any voice conversion models succeed in transferring some target speech's style information to the converted speech, they still lack the ability to faithfully reproduce the speaking style of the target speaker. In this work, we propose a novel method to extract rich style information from target utterances and to efficiently transfer it to source speech content without requ… ▽ More

    Submitted 14 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, 2 tables

  7. arXiv:1911.01635  [pdf, other

    eess.AS cs.SD

    Emotional speech synthesis with rich and granularized control

    Authors: Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

    Abstract: This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion… ▽ More

    Submitted 5 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

  8. arXiv:1811.04769  [pdf, other

    eess.AS cs.LG cs.SD

    ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

    Authors: Eunwoo Song, Kyungguen Byun, Hong-Goo Kang

    Abstract: This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of… ▽ More

    Submitted 21 August, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Accepted to the conference of EUSIPCO 2019. arXiv admin note: text overlap with arXiv:1811.03311

  9. arXiv:1811.03311  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker-adaptive neural vocoders for parametric speech synthesis systems

    Authors: Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

    Abstract: This paper proposes speaker-adaptive neural vocoders for parametric text-to-speech (TTS) systems. Recently proposed WaveNet-based neural vocoding systems successfully generate a time sequence of speech signal with an autoregressive framework. However, it remains a challenge to synthesize high-quality speech when the amount of a target speaker's training data is insufficient. To generate more natur… ▽ More

    Submitted 1 August, 2020; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Accepted to the IEEE Workshop of MMSP 2020