Skip to main content

Showing 1–13 of 13 results for author: Yejin

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.17093  [pdf, other

    eess.AS cs.CL

    Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech

    Authors: Yejin Lee, Jaehoon Kang, Kyuhong Shim

    Abstract: In this paper, we propose a novel framework to control voice style in prompt-based, controllable text-to-speech systems by leveraging textual personas as voice style prompts. We present two persona rewriting strategies to transform generic persona descriptions into speech-oriented prompts, enabling fine-grained manipulation of prosodic attributes such as pitch, emotion, and speaking rate. Experime… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2505.12887   

    eess.IV cs.CV

    RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

    Authors: Junzhi Ning, Cheng Tang, Kaijin Zhou, Diping Song, Lihao Liu, Ming Hu, Wei Li, Yanzhou Su, Tianbing Li, Jiyao Liu, Yejin, Sheng Zhang, Yuanfeng Ji, Junjun He

    Abstract: The scarcity of high-quality, labelled retinal imaging data, which presents a significant challenge in the development of machine learning models for ophthalmology, hinders progress in the field. To synthesise Colour Fundus Photographs (CFPs), existing methods primarily relying on predefined disease labels face significant limitations. However, current methods remain limited, thus failing to gener… ▽ More

    Submitted 24 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: The paper is being withdrawn due to issues with the authorship list. Specifically, one or more contributors were unintentionally omitted in the initial submission

  3. arXiv:2409.18622  [pdf, other

    cs.SD eess.AS

    Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech

    Authors: Youngjae Kim, Yejin Jeon, Gary Geunbae Lee

    Abstract: The difficulty of acquiring abundant, high-quality data, especially in multi-lingual contexts, has sparked interest in addressing low-resource scenarios. Moreover, current literature rely on fixed expressions from language IDs, which results in the inadequate learning of language representations, and the failure to generate speech in unseen languages. To address these challenges, we propose a nove… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings

  4. arXiv:2408.06065  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    An Investigation Into Explainable Audio Hate Speech Detection

    Authors: Jinmyeong An, Wonjun Lee, Yejin Jeon, Jungseul Ok, Yunsu Kim, Gary Geunbae Lee

    Abstract: Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to SIGDIAL 2024

  5. arXiv:2407.00570  [pdf, ps, other

    eess.SY

    An Application of Model Reference Adaptive Control for Multi-Agent Synchronization in Drone Networks

    Authors: Miguel F. Arevalo-Castiblanco, Yejin Wi, Marzia Cescon and, Cesar A. Uribe

    Abstract: This paper presents the application of a Distributed Model Reference Adaptive Control (DMRAC) strategy for robust multi-agent synchronization of a network of drones. The proposed approach enables the development of controllers capable of accommodating differences in real-life model parameters between agents, thereby enhancing overall network performance. We compare the performance of the adaptive… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 8 pages, 13 figures, extended version of a conference paper

  6. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Yejin Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  7. arXiv:2403.04111  [pdf

    cs.SD eess.AS

    Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

    Authors: Yejin Jeon, Gary Geunbae Lee

    Abstract: This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide… ▽ More

    Submitted 3 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to EACL Main 2024

  8. arXiv:2401.02014  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations

    Authors: Yejin Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  9. arXiv:2312.01842  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

    Authors: Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee

    Abstract: Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted in ASRU 2023

  10. arXiv:2201.02639  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

    Authors: Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi

    Abstract: As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve, a model that represents videos jointly over time -- through a new training objective that learns from audio, subtitles, and video frames. Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet. Ou… ▽ More

    Submitted 13 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: CVPR 2022. Project page at https://rowanzellers.com/merlotreserve

  11. arXiv:2112.08995  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

    Authors: Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Yejin Choi

    Abstract: Machines that can represent and describe environmental soundscapes have practical potential, e.g., for audio tagging and captioning systems. Prevailing learning paradigms have been relying on parallel audio-text data, which is, however, scarcely available on the web. We propose VIP-ANT that induces \textbf{A}udio-\textbf{T}ext alignment without using any parallel audio-text data. Our key idea is t… ▽ More

    Submitted 2 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to NAACL 2022. Our code is available at https://github.com/zhaoyanpeng/vipant

  12. Multi-Channel Volumetric Neural Network for Knee Cartilage Segmentation in Cone-beam CT

    Authors: Jennifer Maier, Luis Carlos Rivera Monroy, Christopher Syben, Yejin Jeon, Jang-Hwan Choi, Mary Elizabeth Hall, Marc Levenston, Garry Gold, Rebecca Fahrig, Andreas Maier

    Abstract: Analyzing knee cartilage thickness and strain under load can help to further the understanding of the effects of diseases like Osteoarthritis. A precise segmentation of the cartilage is a necessary prerequisite for this analysis. This segmentation task has mainly been addressed in Magnetic Resonance Imaging, and was rarely investigated on contrast-enhanced Computed Tomography, where contrast agent… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 6 pages, accepted at BVM 2020

  13. arXiv:1905.05861  [pdf

    eess.IV cs.IR q-bio.QM

    From Brain Imaging to Graph Analysis: a study on ADNI's patient cohort

    Authors: Rui Zhang, Luca Giancardo, Danilo A. Pena, Yejin Kim, Hanghang Tong, Xiaoqian Jiang

    Abstract: In this paper, we studied the association between the change of structural brain volumes to the potential development of Alzheimer's disease (AD). Using a simple abstraction technique, we converted regional cortical and subcortical volume differences over two time points for each study subject into a graph. We then obtained substructures of interest using a graph decomposition algorithm in order t… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.