Skip to main content

Showing 51–100 of 1,041 results for author: Yoon, S

.
  1. arXiv:2503.04966  [pdf, other

    eess.IV cs.AI cs.CV

    Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

    Authors: Siyeop Yoon, Yujin Oh, Matthew Tivnan, Sifan Song, Pengfei Jin, Sekeun Kim, Hyun Jin Cho, Dufan Wu, Raul Uppot, Quanzheng Li

    Abstract: This study presents a 3D flow-matching model designed to predict the progression of the frozen region (iceball) during kidney cryoablation. Precise intraoperative guidance is critical in cryoablation to ensure complete tumor eradication while preserving adjacent healthy tissue. However, conventional methods, typically based on physics driven or diffusion based simulations, are computationally dema… ▽ More

    Submitted 11 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: MICCAI 2025 submitted version (author list included)

  2. arXiv:2503.00030  [pdf, other

    cs.LG cs.AI

    Game-Theoretic Regularized Self-Play Alignment of Large Language Models

    Authors: Xiaohang Tang, Sangwoong Yoon, Seongho Son, Huizhuo Yuan, Quanquan Gu, Ilija Bogunovic

    Abstract: Self-play alignment algorithms have been developed as effective methods for fine-tuning large language models (LLMs), formulating preference optimization as a two-player game. However, the regularization with respect to the reference policy, which is crucial for mitigating over-optimization, has been insufficiently investigated in self-play alignment. In this paper, we show that our regularization… ▽ More

    Submitted 24 February, 2025; originally announced March 2025.

    Comments: Preprint

  3. arXiv:2502.19765  [pdf, ps, other

    cs.CL cs.LG

    EdiText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models

    Authors: Che Hyun Lee, Heeseung Kim, Jiheum Yeom, Sungroh Yoon

    Abstract: We propose EdiText, a controllable text editing method that modifies the reference text to desired attributes at various scales. We integrate an SDEdit-based editing technique that allows for broad adjustments in the degree of text editing. Additionally, we introduce a novel fine-level editing method based on self-conditioning, which allows subtle control of reference text. While being capable of… ▽ More

    Submitted 2 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: ACL 2025

  4. arXiv:2502.19759  [pdf, other

    cs.SD eess.AS

    Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

    Authors: Heeseung Kim, Che Hyun Lee, Sangkwon Park, Jiheum Yeom, Nohil Park, Sangwon Yu, Sungroh Yoon

    Abstract: Recent advancements in multi-turn voice interaction models have improved user-model communication. However, while closed-source models effectively retain and recall past utterances, whether open-source models share this ability remains unexplored. To fill this gap, we systematically evaluate how well open-source interaction models utilize past utterances using ContextDialog, a benchmark we propose… ▽ More

    Submitted 23 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: ACL 2025 Findings, Project Page: https://contextdialog.github.io/

  5. arXiv:2502.19207  [pdf, other

    cs.CL cs.AI

    FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

    Authors: Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

    Abstract: Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that shoul… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 16 pages

  6. Letters from Future Self: Augmenting the Letter-Exchange Exercise with LLM-based Agents to Enhance Young Adults' Career Exploration

    Authors: Hayeon Jeon, Suhwoo Yoon, Keyeun Lee, Seo Hyeong Kim, Esther Hehsun Kim, Seonghye Cho, Yena Ko, Soeun Yang, Laura Dabbish, John Zimmerman, Eun-mee Kim, Hajin Lim

    Abstract: Young adults often encounter challenges in career exploration. Self-guided interventions, such as the letter-exchange exercise, where participants envision and adopt the perspective of their future selves by exchanging letters with their envisioned future selves, can support career development. However, the broader adoption of such interventions may be limited without structured guidance. To addre… ▽ More

    Submitted 5 May, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 21 pages, 9 figures, Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (Best Paper Award, Top 1%)

  7. arXiv:2502.13952  [pdf, other

    astro-ph.IM physics.ins-det

    Characterization of a TES-based Anti-Coincidence Detector for Future Large Field-of-View X-ray Calorimetry Missions

    Authors: Samuel V. Hull, Joseph S. Adams, Simon R. Bandler, Matthew Cherry, James A. Chervenak, Renata Cumbee, Xavier Defay, Enectali Figueroa-Feliciano, Fred M. Finkbeiner, Joshua Fuhrman, Richard L. Kelley, Christopher Kenney, Caroline A. Kilbourne, Noah Kurinsky, Jennette Mateo, Haruka Muramatsu, Frederick S. Porter, Kazuhiro Sakai, Aviv Simchony, Stephen J. Smith, Zoe Smith, Nicholas A. Wakeham, Edward J. Wassell, Sang H. Yoon, Betty A. Young

    Abstract: Microcalorimeter instruments aboard future X-ray observatories will require an anti-coincidence (anti-co) detector to veto charged particle events and reduce the non-X-ray background. We have developed a large-format, TES-based prototype anti-coincidence detector that is particularly suitable for use with spatially-extended (~ 10 cm^2}) TES microcalorimeter arrays, as would be used for a future la… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 26 pages, 16 figures

  8. arXiv:2502.13280  [pdf, other

    cs.LG

    Value Gradient Sampler: Sampling as Sequential Decision Making

    Authors: Sangwoong Yoon, Himchan Hwang, Hyeokju Jeong, Dong Kyu Shin, Che-Sang Park, Sehee Kweon, Frank Chongwoo Park

    Abstract: We propose the Value Gradient Sampler (VGS), a trainable sampler based on the interpretation of sampling as discrete-time sequential decision-making. VGS generates samples from a given unnormalized density (i.e., energy) by drifting and diffusing randomly initialized particles. In VGS, finding the optimal drift is equivalent to solving an optimal control problem where the cost is the upper bound o… ▽ More

    Submitted 1 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: Code: https://github.com/swyoon/value-gradient-sampler/

  9. arXiv:2502.11767  [pdf, ps, other

    cs.LG cs.CL

    From Selection to Generation: A Survey of LLM-based Active Learning

    Authors: Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao Kenneth Huang, Zichao Wang , et al. (9 additional authors not shown)

    Abstract: Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the incre… ▽ More

    Submitted 31 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: ACL 2025

  10. arXiv:2502.10764  [pdf, other

    cs.LG

    Learning to Explain Air Traffic Situation

    Authors: Hong-ah Chai, Seokbin Yoon, Keumjin Lee

    Abstract: Understanding how air traffic controllers construct a mental 'picture' of complex air traffic situations is crucial but remains a challenge due to the inherently intricate, high-dimensional interactions between aircraft, pilots, and controllers. Previous work on modeling the strategies of air traffic controllers and their mental image of traffic situations often centers on specific air traffic con… ▽ More

    Submitted 27 May, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

    Comments: 5 pages, 3 figures, minor revisions to address reviewer feedback for final submission to the First US-Europe Air Transportation Research and Development (ATRD) Symposium

  11. arXiv:2502.09793  [pdf, other

    cs.CV

    Noise Controlled CT Super-Resolution with Conditional Diffusion Model

    Authors: Yuang Wang, Siyeop Yoon, Rui Hu, Baihui Yu, Duhgoon Lee, Rajiv Gupta, Li Zhang, Zhiqiang Chen, Dufan Wu

    Abstract: Improving the spatial resolution of CT images is a meaningful yet challenging task, often accompanied by the issue of noise amplification. This article introduces an innovative framework for noise-controlled CT super-resolution utilizing the conditional diffusion model. The model is trained on hybrid datasets, combining noise-matched simulation data with segmented details from real data. Experimen… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: The 8th International Conference on Image Formation in X-Ray Computed Tomography, Bamberg, Germany, August 5 - 9, 2024

  12. arXiv:2502.08662  [pdf, ps, other

    cs.CL cs.AI

    RoToR: Towards More Reliable Responses for Order-Invariant Inputs

    Authors: Soyoung Yoon, Dongha Ahn, Youngwon Lee, Minkyu Jung, HyungJoo Jang, Seung-won Hwang

    Abstract: Mitigating positional bias of language models (LMs) for listwise inputs is a well-known and important problem (e.g., lost-in-the-middle). While zero-shot order-invariant LMs have been proposed to solve this issue, their success on practical listwise problems has been limited. In this work, as a first contribution, we identify and overcome two limitations to make zero-shot invariant LMs more practi… ▽ More

    Submitted 2 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted at ACL 2025 main

  13. arXiv:2502.06802  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Solving the Content Gap in Roblox Game Recommendations: LLM-Based Profile Generation and Reranking

    Authors: Chen Wang, Xiaokai Wei, Yexi Jiang, Frank Ong, Kevin Gao, Xiao Yu, Zheng Hui, Se-eun Yoon, Philip Yu, Michelle Gong

    Abstract: With the vast and dynamic user-generated content on Roblox, creating effective game recommendations requires a deep understanding of game content. Traditional recommendation models struggle with the inconsistent and sparse nature of game text features such as titles and descriptions. Recent advancements in large language models (LLMs) offer opportunities to enhance recommendation systems by analyz… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  14. arXiv:2502.05167  [pdf, other

    cs.CL

    NoLiMa: Long-Context Evaluation Beyond Literal Matching

    Authors: Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, Hinrich Schütze

    Abstract: Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a "needle" (relevant information) from a "haystack" (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning. However, in… ▽ More

    Submitted 26 March, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

  15. arXiv:2502.05055  [pdf

    cs.CV cs.AI cs.GR cs.LG

    Differentiable Mobile Display Photometric Stereo

    Authors: Gawoon Ban, Hyeongjun Kim, Seokjun Choi, Seungwoo Yoon, Seung-Hwan Baek

    Abstract: Display photometric stereo uses a display as a programmable light source to illuminate a scene with diverse illumination conditions. Recently, differentiable display photometric stereo (DDPS) demonstrated improved normal reconstruction accuracy by using learned display patterns. However, DDPS faced limitations in practicality, requiring a fixed desktop imaging setup using a polarization camera and… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 9 pages

  16. arXiv:2502.01419  [pdf, other

    cs.CV cs.AI

    Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models

    Authors: Mingi Jung, Saehuyng Lee, Eunji Kim, Sungroh Yoon

    Abstract: Detailed image captioning is essential for tasks like data generation and aiding visually impaired individuals. High-quality captions require a balance between precision and recall, which remains challenging for current multimodal large language models (MLLMs). In this work, we hypothesize that this limitation stems from weakening and increasingly noisy visual attention as responses lengthen. To a… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    ACM Class: I.2.7

  17. arXiv:2502.01059  [pdf, other

    cs.CL cs.AI

    Knowledge Synthesis of Photosynthesis Research Using a Large Language Model

    Authors: Seungri Yoon, Woosang Jeon, Sanghyeok Choi, Taehyeong Kim, Tae In Ahn

    Abstract: The development of biological data analysis tools and large language models (LLMs) has opened up new possibilities for utilizing AI in plant science research, with the potential to contribute significantly to knowledge integration and research gap identification. Nonetheless, current LLMs struggle to handle complex biological data and theoretical models in photosynthesis research and often fail to… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 17 pages, 6 figures

  18. arXiv:2502.00654  [pdf, other

    cs.CV

    EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis

    Authors: Junuk Cha, Seongro Yoon, Valeriya Strizhkova, Francois Bremond, Seungryul Baek

    Abstract: 3D Gaussian splatting-based talking head synthesis has recently gained attention for its ability to render high-fidelity images with real-time inference speed. However, since it is typically trained on only a short video that lacks the diversity in facial emotions, the resultant talking heads struggle to represent a wide range of emotions. To address this issue, we propose a lip-aligned emotional… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: 22 pages

  19. arXiv:2502.00619  [pdf, other

    eess.IV cs.AI cs.CV

    Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

    Authors: Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Kyungsang Kim, Jin Sung Kim, Xiang Li, Quanzheng Li

    Abstract: Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. We provide a comprehensive analysis of its underlying mecha… ▽ More

    Submitted 27 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: ICML 2025 spotlight, see https://openreview.net/forum?id=BUONdewsBa

  20. The SPHEREx Target List of Ice Sources (SPLICES)

    Authors: Matthew L. N. Ashby, Joseph L. Hora, Kiran Lakshmipathaiah, Sarita Vig, Rama Krishna Sai Subrahmanyam Gorthi, Miju Kang, Volker Tolls, Gary J. Melnick, Michael W. Werner, Brendan P. Crill, Daniel C. Masters, Carlos Contreras Pena, Jeong-Eun Lee, Jaeyeong Kim, Ho-Gyu Lee, Sung-Yong Yoon, Soung-Chul Yang, Nicholas Flagey, Bertrand Mennesson

    Abstract: One of the primary objectives of the SPHEREx mission is to understand the origin of molecules such as H2O, CO2, and other volatile compounds at the early stages of planetary system formation. Because the vast majority of these compounds -- typically exceeding 95% -- exist in the solid phase rather than the gaseous phase in the systems of concern here, the observing strategy planned to characterize… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: Published by ApJ. 21 pages, 6 figures. This article documents the original version of SPLICES (7.1). The current version as well as the complete catalog is publicly available along with release notes documenting all additions and changes at the NASA/IPAC Infrared Science Archive (IRSA) at this URL: https://irsa.ipac.caltech.edu/data/SPHEREx/SPLICES/

    Journal ref: ApJ, 949, 105 (2023)

  21. arXiv:2501.11225  [pdf, other

    cond-mat.mtrl-sci cs.CV eess.IV

    CNN-based TEM image denoising from first principles

    Authors: Jinwoong Chae, Sungwook Hong, Sungkyu Kim, Sungroh Yoon, Gunn Kim

    Abstract: Transmission electron microscope (TEM) images are often corrupted by noise, hindering their interpretation. To address this issue, we propose a deep learning-based approach using simulated images. Using density functional theory calculations with a set of pseudo-atomic orbital basis sets, we generate highly accurate ground truth images. We introduce four types of noise into these simulations to cr… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: 10 pages and 4 figures

  22. arXiv:2501.10913  [pdf, other

    cs.CV cs.CL

    Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

    Authors: Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon

    Abstract: While CLIP has significantly advanced multimodal understanding by bridging vision and language, the inability to grasp negation - such as failing to differentiate concepts like "parking" from "no parking" - poses substantial challenges. By analyzing the data used in the public CLIP model's pre-training, we posit this limitation stems from a lack of negation-inclusive data. To address this, we intr… ▽ More

    Submitted 31 March, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

  23. arXiv:2501.04970  [pdf, other

    cs.LG cs.AI

    Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation

    Authors: HyunGi Kim, Siwon Kim, Jisoo Mok, Sungroh Yoon

    Abstract: Deep Neural Networks have spearheaded remarkable advancements in time series forecasting (TSF), one of the major tasks in time series modeling. Nonetheless, the non-stationarity of time series undermines the reliability of pre-trained source time series forecasters in mission-critical deployment settings. In this study, we introduce a pioneering test-time adaptation framework tailored for TSF (TSF… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: Accepted at AAAI 2025

  24. arXiv:2501.01409  [pdf, other

    cs.CV cs.AI

    JOG3R: Towards 3D-Consistent Video Generators

    Authors: Chun-Hao Paul Huang, Niloy Mitra, Hyeonho Jeong, Jae Shin Yoon, Duygu Ceylan

    Abstract: Emergent capabilities of image generators have led to many impactful zero- or few-shot applications. Inspired by this success, we investigate whether video generators similarly exhibit 3D-awareness. Using structure-from-motion as a 3D-aware task, we test if intermediate features of a video generator - OpenSora in our case - can support camera pose estimation. Surprisingly, at first, we only find a… ▽ More

    Submitted 26 March, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

  25. arXiv:2412.18711  [pdf, other

    hep-ex

    Measurement of reactor antineutrino oscillation amplitude and frequency using 3800 days of complete data sample of the RENO experiment

    Authors: S. Jeon, H. I. Kim, J. H. Choi, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, J. G. Kim, J. H. Kim, J. Y. Kim, S. B. Kim, S. Y. Kim, W. Kim, E. Kwon, D. H. Lee, H. G. Lee, W. J. Lee, I. T. Lim, D. H. Moon, M. Y. Pac, J. S. Park, R. G. Park, H. Seo, J. W. Seo, C. D. Shin , et al. (5 additional authors not shown)

    Abstract: We report an updated neutrino mixing angle of $θ_{13}$ obtained from a complete data sample of the RENO experiment. The experiment has measured the amplitude and frequency of reactor anti-electron-neutrinos ($\barν_{e}$) oscillations at the Hanbit nuclear power plant, Younggwang, Korea, since August 2011. As of March 2023, the data acquisition was completed after a total of 3800 live days of detec… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 13 pages, 11 figures

  26. arXiv:2412.15484  [pdf, ps, other

    cs.CV

    Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

    Authors: Saehyung Lee, Seunghyun Yoon, Trung Bui, Jing Shi, Sungroh Yoon

    Abstract: Multimodal large language models (MLLMs) excel at generating highly detailed captions but often produce hallucinations. Our analysis reveals that existing hallucination detection methods struggle with detailed captions. We attribute this to the increasing reliance of MLLMs on their generated text, rather than the input image, as the sequence length grows. To address this issue, we propose a multia… ▽ More

    Submitted 29 May, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

    Comments: ICML 2025

  27. arXiv:2412.14568  [pdf, other

    cs.CV

    Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation

    Authors: Yongsung Kim, Minjun Park, Jooyoung Choi, Sungroh Yoon

    Abstract: Recent learning-based Multi-View Stereo models have demonstrated state-of-the-art performance in sparse-view 3D reconstruction. However, directly applying 3D Gaussian Splatting (3DGS) as a refinement step following these models presents challenges. We hypothesize that the excessive positional degrees of freedom (DoFs) in Gaussians induce geometry distortion, fitting color patterns at the cost of s… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 11 pages

  28. arXiv:2412.13875  [pdf, other

    cs.CV

    Denoising Nearest Neighbor Graph via Continuous CRF for Visual Re-ranking without Fine-tuning

    Authors: Jaeyoon Kim, Yoonki Cho, Taeyong Kim, Sung-Eui Yoon

    Abstract: Visual re-ranking using Nearest Neighbor graph~(NN graph) has been adapted to yield high retrieval accuracy, since it is beneficial to exploring an high-dimensional manifold and applicable without additional fine-tuning. The quality of visual re-ranking using NN graph, however, is limited to that of connectivity, i.e., edges of the NN graph. Some edges can be misconnected with negative images. Thi… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  29. arXiv:2412.13734  [pdf, other

    cs.CV

    Text2Relight: Creative Portrait Relighting with Text Guidance

    Authors: Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek

    Abstract: We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided text description. The unbounded nature in creativeness of a text allows us to describe the lighting of a scene with any sensory features including temperature,… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  30. arXiv:2412.13646  [pdf, other

    cs.NI

    Transmit What You Need: Task-Adaptive Semantic Communications for Visual Information

    Authors: Jeonghun Park, Sung Whan Yoon

    Abstract: Recently, semantic communications have drawn great attention as the groundbreaking concept surpasses the limited capacity of Shannon's theory. Specifically, semantic communications probably become crucial in realizing visual tasks that demand massive network traffic. Although highly distinctive forms of visual semantics exist for computer vision tasks, a thorough investigation of what visual seman… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  31. arXiv:2412.13501  [pdf, other

    cs.AI cs.HC

    GUI Agents: A Survey

    Authors: Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen , et al. (4 additional authors not shown)

    Abstract: Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and funda… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  32. arXiv:2412.13422  [pdf, other

    cs.AI cs.SE

    Generating Diverse Hypotheses for Inductive Reasoning

    Authors: Kang-il Lee, Hyukhun Koh, Dongryeol Lee, Seunghyun Yoon, Minsung Kim, Kyomin Jung

    Abstract: Inductive reasoning - the process of inferring general rules from a small number of observations - is a fundamental aspect of human intelligence. Recent works suggest that large language models (LLMs) can engage in inductive reasoning by sampling multiple hypotheses about the rules and selecting the one that best explains the observations. However, due to the IID sampling, semantically redundant h… ▽ More

    Submitted 8 February, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: NAACL 2025

  33. arXiv:2412.10436  [pdf, other

    cs.CV cs.LG

    Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation

    Authors: SeungBum Ha, Taehwan Lee, Jiyoun Lim, Sung Whan Yoon

    Abstract: Federated learning (FL) has recently garnered attention as a data-decentralized training framework that enables the learning of deep models from locally distributed samples while keeping data privacy. Built upon the framework, immense efforts have been made to establish FL benchmarks, which provide rigorous evaluation settings that control data heterogeneity across clients. Prior efforts have main… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  34. arXiv:2412.09921  [pdf, other

    cs.CV

    FaceShield: Defending Facial Image against Deepfake Threats

    Authors: Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim

    Abstract: The rising use of deepfakes in criminal activities presents a significant issue, inciting widespread controversy. While numerous studies have tackled this problem, most primarily focus on deepfake detection. These reactive solutions are insufficient as a fundamental approach for crimes where authenticity is disregarded. Existing proactive defenses also have limitations, as they are effective only… ▽ More

    Submitted 10 March, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

  35. arXiv:2412.04680  [pdf, other

    cs.CV

    Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

    Authors: Jaihyun Lew, Soohyuk Jang, Jaehoon Lee, Seungryong Yoo, Eunji Kim, Saehyung Lee, Jisoo Mok, Siwon Kim, Sungroh Yoon

    Abstract: Transformers, a groundbreaking architecture proposed for Natural Language Processing (NLP), have also achieved remarkable success in Computer Vision. A cornerstone of their success lies in the attention mechanism, which models relationships among tokens. While the tokenization process in NLP inherently ensures that a single token does not contain multiple semantics, the tokenization of Vision Tran… ▽ More

    Submitted 24 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Project page: https://github.com/jangsoohyuk/SuiT

  36. arXiv:2412.03736  [pdf, other

    cs.CL

    Domain-specific Question Answering with Hybrid Search

    Authors: Dewang Sultania, Zhaoyu Lu, Twisha Naik, Franck Dernoncourt, David Seunghyun Yoon, Sanat Sharma, Trung Bui, Ashok Gupta, Tushar Vatsa, Suhas Suresha, Ishita Verma, Vibha Belavadi, Cheng Chen, Michael Friedrich

    Abstract: Domain specific question answering is an evolving field that requires specialized solutions to address unique challenges. In this paper, we show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance. Our system leverages a linear combination of relevance signals, including cosine similarity from dense retrieval, BM… ▽ More

    Submitted 21 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

    Comments: AAAI-25 Workshop on Document Understanding and Intelligence

  37. arXiv:2412.01756  [pdf, other

    cs.CR cs.LG

    Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios

    Authors: Sangyeon Yoon, Wonje Jeung, Albert No

    Abstract: Auditing Differentially Private Stochastic Gradient Descent (DP-SGD) in the final model setting is challenging and often results in empirical lower bounds that are significantly looser than theoretical privacy guarantees. We introduce a novel auditing method that achieves tighter empirical lower bounds without additional assumptions by crafting worst-case adversarial samples through loss-based inp… ▽ More

    Submitted 24 February, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: 10 pages, NeurIPS (SFLLM Workshop)

  38. arXiv:2412.01140  [pdf, other

    cs.CV eess.IV

    Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes

    Authors: Suhyun Shin, Seungwoo Yoon, Ryota Maeda, Seung-Hwan Baek

    Abstract: Hyperspectral 3D imaging captures both depth maps and hyperspectral images, enabling comprehensive geometric and material analysis. Recent methods achieve high spectral and depth accuracy; however, they require long acquisition times often over several minutes or rely on large, expensive systems, restricting their use to static scenes. We present Dense Dispersed Structured Light (DDSL), an accurat… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  39. arXiv:2411.19352  [pdf, other

    cs.AI

    OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation

    Authors: Se-eun Yoon, Xiaokai Wei, Yexi Jiang, Rachit Pareek, Frank Ong, Kevin Gao, Julian McAuley, Michelle Gong

    Abstract: In this paper, we present a systematic effort to design, evaluate, and implement a realistic conversational recommender system (CRS). The objective of our system is to allow users to input free-form text to request recommendations, and then receive a list of relevant and diverse items. While previous work on synthetic queries augments large language models (LLMs) with 1-3 tools, we argue that a mo… ▽ More

    Submitted 31 December, 2024; v1 submitted 28 November, 2024; originally announced November 2024.

  40. arXiv:2411.18040  [pdf, other

    astro-ph.GA

    A New Rarity Assessment of the `Disk of Satellites': the Milky Way System Is the Exception Rather than the Rule in the $Λ$CDM Cosmology

    Authors: Chanoul Seo, Suk-Jin Yoon, Sanjaya Paudel, Sung-Ho An, Jun-Sung Moon

    Abstract: The majority of satellite galaxies around the Milky Way (MW) show disk-like distributions (the disk of satellites; DoS), which is a small-scale problem of the $Λ$CDM cosmology. The conventional definition of the MW-like DoS is a satellite system with a minor-to-major axis ratio ($c$/$a$) lower than the MW's $c$/$a$ value of 0.181. Here we question the validity of the $c$/$a$-based DoS rarity asses… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 23 pages, 15 figures

  41. arXiv:2411.15466  [pdf, other

    cs.CV

    Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

    Authors: Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon

    Abstract: Subject-driven text-to-image generation aims to produce images of a new subject within a desired context by accurately capturing both the visual characteristics of the subject and the semantic content of a text prompt. Traditional methods rely on time- and resource-intensive fine-tuning for subject alignment, while recent zero-shot approaches leverage on-the-fly image prompting, often sacrificing… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  42. arXiv:2411.14793  [pdf, other

    cs.CV

    Style-Friendly SNR Sampler for Style-Driven Generation

    Authors: Jooyoung Choi, Chaehun Shin, Yeongtak Oh, Heeseung Kim, Jungbeom Lee, Sungroh Yoon

    Abstract: Recent text-to-image diffusion models generate high-quality images but struggle to learn new, personalized styles, which limits the creation of unique style templates. In style-driven generation, users typically supply reference images exemplifying the desired style, together with text prompts that specify desired stylistic attributes. Previous approaches popularly rely on fine-tuning, yet it ofte… ▽ More

    Submitted 20 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: Project page: https://stylefriendly.github.io/

  43. arXiv:2411.13036  [pdf, other

    cs.CV cs.AI

    Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

    Authors: Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon

    Abstract: Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same ca… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: This paper is accepted to the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

  44. arXiv:2411.11471  [pdf, other

    cs.CV

    Generalizable Person Re-identification via Balancing Alignment and Uniformity

    Authors: Yoonki Cho, Jaeyoon Kim, Woo Jae Kim, Junsik Jung, Sung-eui Yoon

    Abstract: Domain generalizable person re-identification (DG re-ID) aims to learn discriminative representations that are robust to distributional shifts. While data augmentation is a straightforward solution to improve generalization, certain augmentations exhibit a polarized effect in this task, enhancing in-distribution performance while deteriorating out-of-distribution performance. In this paper, we inv… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  45. Discovery of a Rare Group of Dwarf Galaxies in the Local Universe

    Authors: Sanjaya Paudel, Cristiano G. Sabiu, Suk-Jin Yoon, Pierre-Alain Duc, Jaewon Yoo, Oliver Müller

    Abstract: We report the discovery of a rare isolated group of five dwarf galaxies located at z = 0.0086 ($D$ = 36 Mpc). All member galaxies are star-forming, blue, and gas-rich with $g-r$ indices ranging from 0.2 to 0.6 mag, and two of them show signs of ongoing mutual interaction. The most massive member of the group has a stellar mass that is half of the Small Magellanic Cloud stellar mass, and the median… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: Accepted for publication in ApJL

  46. arXiv:2411.09944  [pdf, other

    cs.CL

    SlimLM: An Efficient Small Language Model for On-Device Document Assistance

    Authors: Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui

    Abstract: While small language models (SLMs) show promises for mobile deployment, their real-world performance and applications on smartphones remains underexplored. We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. Through extensive experiments on a Samsung Galaxy S24, we identify the optimal trade-offs between model size (ranging from 125M to 7B parameters), co… ▽ More

    Submitted 25 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

  47. arXiv:2411.08378  [pdf, other

    cs.LG cs.AI

    Physics Informed Distillation for Diffusion Models

    Authors: Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D. Yoo

    Abstract: Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled the intrinsic link between diffusion models and Probability Flow Ordinary Differential Equations (ODEs), thus enabling us to conceptualize diffusion mod… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  48. arXiv:2411.07542  [pdf, other

    astro-ph.HE astro-ph.SR

    Radio Follow-up Observations of SN 2023ixf by Japanese and Korean VLBIs

    Authors: Yuhei Iwata, Masanori Akimoto, Tomoki Matsuoka, Keiichi Maeda, Yoshinori Yonekura, Nozomu Tominaga, Takashi J. Moriya, Kenta Fujisawa, Kotaro Niinuma, Sung-Chul Yoon, Jae-Joon Lee, Taehyun Jung, Do-Young Byun

    Abstract: We report on radio follow-up observations of the nearby Type II supernova, SN 2023ixf, spanning from 1.7 to 269.9 days after the explosion, conducted using three very long baseline interferometers (VLBIs), which are the Japanese VLBI Network (JVN), the VLBI Exploration of Radio Astrometry (VERA), and the Korean VLBI Network (KVN). In three observation epochs (152.3, 206.1, and 269.9 days), we dete… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 12 pages, 3 figures, 3 tables. Accepted for publication in ApJ

  49. Moving Groups in the Solar Neighborhood with Gaia, APOGEE, GALAH, and LAMOST: Dynamical Effects Gather Gas and the Ensuing Star Formation Plays an Important Role in Shaping the Stellar Velocity Distributions

    Authors: Xilong Liang, Suk-Jin Yoon, Jingkun Zhao

    Abstract: With Gaia, APOGEE, GALAH, and LAMOST data, we investigate the positional, kinematic, chemical, and age properties of nine moving groups in the solar neighborhood. We find that each moving group has a distinct distribution in the velocity space in terms of its metallicity, $α$ abundance, and age. Comparison of the moving groups with their underlying background stars suggests that they have experien… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: 22 page2, 9 figures

    Journal ref: AJ 168 277 (2024)

  50. arXiv:2411.05793  [pdf, other

    cs.LG cs.AI

    A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Challenges

    Authors: Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, Sungroh Yoon

    Abstract: Time series forecasting is a critical task that provides key information for decision-making. After traditional statistical and machine learning approaches, various fundamental deep learning architectures such as MLPs, CNNs, RNNs, and GNNs have been developed. However, the structural limitations caused by the inductive biases of each deep learning architecture constrained their performance. Transf… ▽ More

    Submitted 1 May, 2025; v1 submitted 24 October, 2024; originally announced November 2024.

    Comments: This is the accepted manuscript of the article published in Artificial Intelligence Review. The final authenticated version is available at: https://doi.org/10.1007/s10462-025-11223-9