Skip to main content

Showing 1–15 of 15 results for author: Yoon, H S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04976  [pdf, ps, other

    cs.CV cs.CL

    Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models

    Authors: Eunseop Yoon, Hee Suk Yoon, Mark A. Hasegawa-Johnson, Chang D. Yoo

    Abstract: In the broader context of deep learning, Multimodal Large Language Models have achieved significant breakthroughs by leveraging powerful Large Language Models as a backbone to align different modalities into the language space. A prime exemplification is the development of Video Large Language Models (Video-LLMs). While numerous advancements have been proposed to enhance the video understanding ca… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: ICLR 2025

  2. arXiv:2506.08712  [pdf, ps, other

    cs.CL cs.AI cs.LG

    ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization

    Authors: Hee Suk Yoon, Eunseop Yoon, Mark Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

    Abstract: We introduce ConfPO, a method for preference learning in Large Language Models (LLMs) that identifies and optimizes preference-critical tokens based solely on the training policy's confidence, without requiring any auxiliary models or compute. Unlike prior Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO), which uniformly adjust all token probabilities regardless of t… ▽ More

    Submitted 12 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  3. arXiv:2411.08378  [pdf, other

    cs.LG cs.AI

    Physics Informed Distillation for Diffusion Models

    Authors: Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D. Yoo

    Abstract: Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled the intrinsic link between diffusion models and Probability Flow Ordinary Differential Equations (ODEs), thus enabling us to conceptualize diffusion mod… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  4. arXiv:2408.05926  [pdf, other

    cs.AI cs.LG cs.MM

    BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation

    Authors: Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, Chang D. Yoo

    Abstract: Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this task and the benefits of leveraging powerful pre-trained models, previous work relies on the text modality as an intermediary step for both the image… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  5. arXiv:2408.05769  [pdf, other

    cs.CL cs.SD eess.AS

    LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-su… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  6. arXiv:2407.16574  [pdf, other

    cs.CL

    TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

    Authors: Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

    Abstract: Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri… ▽ More

    Submitted 8 December, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: ACL2024 Findings

  7. arXiv:2403.14119  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

    Authors: Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo

    Abstract: In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as CLIP. Unfortunately, these prompts have been mainly developed to improve accuracy, overlooking the importance of calibration, which is a crucial aspect… ▽ More

    Submitted 31 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  8. HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More

    Submitted 13 April, 2025; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023, 14 pages, 13 figures

  9. arXiv:2312.05790  [pdf, other

    cs.LG cs.AI eess.SP

    SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

    Authors: Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

    Abstract: Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency… ▽ More

    Submitted 19 January, 2025; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 camera-ready version w/ Appendix

  10. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  11. arXiv:2305.16371  [pdf, other

    cs.CL cs.SD eess.AS

    INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL2023

  12. arXiv:2303.02472  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

    Authors: Hee Suk Yoon, Joshua Tian Jin Tee, Eunseop Yoon, Sunjae Yoon, Gwangsu Kim, Yingzhen Li, Chang D. Yoo

    Abstract: Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters,… ▽ More

    Submitted 18 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

  13. arXiv:2212.07072  [pdf, other

    cs.CL cs.LG

    SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation

    Authors: Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase th… ▽ More

    Submitted 21 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: EMNLP2022

  14. Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

    Authors: Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chang D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to lear… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 12 pages, Accepted in EMNLP 2022

  15. Selective Query-guided Debiasing for Video Corpus Moment Retrieval

    Authors: Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee Suk Yoon, Chang D. Yoo

    Abstract: Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query. Existing retrieval systems tend to rely on retrieval bias as a shortcut and thus, fail to sufficiently learn multi-modal interactions between query and video. This retrieval bias stems from learning frequent co-occurrence patterns between query and moments, which spuriously correlat… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: 16 pages, 6 figures, Accepted in ECCV 2022

    Journal ref: In European Conference on Computer Vision (pp. 185-200). Springer, Cham (2022)