Skip to main content

Showing 1–50 of 258 results for author: Jeong, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04790  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG

    Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

    Authors: Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon

    Abstract: Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastr… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted at ICCV 2025

  2. arXiv:2507.04349  [pdf, ps, other

    cs.SD eess.AS

    TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet

    Authors: Jaeseok Jeong, Yuna Lee, Mingi Kwon, Youngjung Uh

    Abstract: Recent advances in text-to-speech (TTS) have enabled natural speech synthesis, but fine-grained, time-varying emotion control remains challenging. Existing methods often allow only utterance-level control and require full model fine-tuning with a large emotion speech dataset, which can degrade performance. Inspired by adding conditional control to the existing model in ControlNet (Zhang et al, 202… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  3. arXiv:2506.24016  [pdf, ps, other

    cs.CL cs.AI cs.CV

    EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

    Authors: Hyunjong Kim, Sangyeop Kim, Jongheon Jeong, Yeongjae Cho, Sungzoon Cho

    Abstract: Recent advances in large language models and vision-language models have led to growing interest in explainable evaluation metrics for image captioning. However, these metrics generate explanations without standardized criteria, and the overall quality of the generated explanations remains unverified. In this paper, we propose EXPERT, a reference-free evaluation metric that provides structured exp… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted at ACL 2025 Findings

  4. arXiv:2506.19174  [pdf, ps, other

    cs.CV

    MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events

    Authors: Jialu Pi, Juan Maria Farina, Rimita Lahiri, Jiwoong Jeong, Archana Gurudu, Hyung-Bok Park, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee

    Abstract: Major Adverse Cardiovascular Events (MACE) remain the leading cause of mortality globally, as reported in the Global Disease Burden Study 2021. Opportunistic screening leverages data collected from routine health check-ups and multimodal data can play a key role to identify at-risk individuals. Chest X-rays (CXR) provide insights into chronic conditions contributing to major adverse cardiovascular… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.18248  [pdf, ps, other

    cs.CV cs.AI

    Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability

    Authors: Jongoh Jeong, Hunmin Yang, Jaeseok Jeong, Kuk-Jin Yoon

    Abstract: Generative adversarial attacks train a perturbation generator on a white-box surrogate model and subsequently apply the crafted perturbations to unseen black-box victim models. In contrast to iterative attacks, these methods deliver superior inference-time efficiency, scalability, and transferability; however, up until now, existing studies have not fully exploited the representational capacity of… ▽ More

    Submitted 2 July, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

  6. arXiv:2506.15948  [pdf, ps, other

    cs.IT eess.IV

    Information-computation trade-offs in non-linear transforms

    Authors: Connor Ding, Abhiram Rao Gorle, Jiwon Jeong, Naomi Sagan, Tsachy Weissman

    Abstract: In this work, we explore the interplay between information and computation in non-linear transform-based compression for broad classes of modern information-processing tasks. We first investigate two emerging nonlinear data transformation frameworks for image compression: Implicit Neural Representations (INRs) and 2D Gaussian Splatting (GS). We analyze their representational properties, behavior u… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Authors listed in alphabetical order of last name

  7. arXiv:2506.15380  [pdf, ps, other

    cs.RO

    Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning

    Authors: Taegeun Yang, Jiwoo Hwang, Jeil Jeong, Minsung Yoon, Sung-Eui Yoon

    Abstract: We propose a hierarchical reinforcement learning (HRL) framework for efficient Navigation Among Movable Obstacles (NAMO) using a mobile manipulator. Our approach combines interaction-based obstacle property estimation with structured pushing strategies, facilitating the dynamic manipulation of unforeseen obstacles while adhering to a pre-planned global path. The high-level policy generates pushing… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 8 pages, 6 figures, Accepted to IROS 2025. Supplementary Video: https://youtu.be/sZ8_z7sYVP0

  8. arXiv:2506.09417  [pdf, ps, other

    cs.CV

    ODG: Occupancy Prediction Using Dual Gaussians

    Authors: Yunxiao Shi, Yinhao Zhu, Shizhong Han, Jisoo Jeong, Amin Ansari, Hong Cai, Fatih Porikli

    Abstract: Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment, making it a critical perception task for autonomous driving. Existing methods either adopt dense grids as scene representation, which is difficult to scale to high resolution, or learn the entire scene using a single set of sparse queries, which is insufficient to handle the variou… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  9. arXiv:2506.08964  [pdf, other

    cs.CV

    ORIDa: Object-centric Real-world Image Composition Dataset

    Authors: Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyoung Kim, Seon Joo Kim

    Abstract: Object compositing, the task of placing and harmonizing objects in images of diverse visual scenes, has become an important task in computer vision with the rise of generative models. However, existing datasets lack the diversity and scale required to comprehensively explore real-world scenarios. We introduce ORIDa (Object-centric Real-world Image Composition Dataset), a large-scale, real-captured… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR 2025

  10. arXiv:2506.07002  [pdf, ps, other

    cs.CV

    BePo: Leveraging Birds Eye View and Sparse Points for Efficient and Accurate 3D Occupancy Prediction

    Authors: Yunxiao Shi, Hong Cai, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Amin Ansari, Fatih Porikli

    Abstract: 3D occupancy provides fine-grained 3D geometry and semantics for scene understanding which is critical for autonomous driving. Most existing methods, however, carry high compute costs, requiring dense 3D feature volume and cross-attention to effectively aggregate information. More recent works have adopted Bird's Eye View (BEV) or sparse points as scene representation with much reduced cost, but s… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Two-page abstract version available at CVPR 2025 Embodied AI Workshop

  11. arXiv:2506.06261  [pdf, ps, other

    cs.AI cs.LG

    Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

    Authors: Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart

    Abstract: Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifie… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  12. arXiv:2506.03290  [pdf, other

    cs.CV

    Learning Optical Flow Field via Neural Ordinary Differential Equation

    Authors: Leyla Mirvakhabova, Hong Cai, Jisoo Jeong, Hanno Ackermann, Farhad Zanjani, Fatih Porikli

    Abstract: Recent works on optical flow estimation use neural networks to predict the flow field that maps positions of one image to positions of the other. These networks consist of a feature extractor, a correlation volume, and finally several refinement steps. These refinement steps mimic the iterative refinements performed by classical optimization algorithms and are usually implemented by neural layers… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: CVPRW 2025

  13. arXiv:2506.02125  [pdf, ps, other

    cs.AI

    Descriptive History Representations: Learning Representations by Answering Questions

    Authors: Guy Tennenholtz, Jihwan Jeong, Chih-Wei Hsu, Yinlam Chow, Craig Boutilier

    Abstract: Effective decision making in partially observable environments requires compressing long interaction histories into informative representations. We introduce Descriptive History Representations (DHRs): sufficient statistics characterized by their capacity to answer relevant questions about past interactions and potential future outcomes. DHRs focus on capturing the information necessary to address… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  14. arXiv:2506.00324  [pdf, other

    cs.CV

    Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties

    Authors: Jisoo Jeong, Hong Cai, Jamie Menjay Lin, Fatih Porikli

    Abstract: Conventional training for optical flow and stereo depth models typically employs a uniform loss function across all pixels. However, this one-size-fits-all approach often overlooks the significant variations in learning difficulty among individual pixels and contextual regions. This paper investigates the uncertainty-based confidence maps which capture these spatially varying learning difficulties… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: CVPRW2025

  15. arXiv:2505.23847  [pdf, ps, other

    cs.CR cs.AI

    Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems

    Authors: Ronny Ko, Jiseong Jeong, Shuyuan Zheng, Chuan Xiao, Tae-Wan Kim, Makoto Onizuka, Won-Yong Shin

    Abstract: Large language models (LLMs) are rapidly evolving into autonomous agents that cooperate across organizational boundaries, enabling joint disaster response, supply-chain optimization, and other tasks that demand decentralized expertise without surrendering data ownership. Yet, cross-domain collaboration shatters the unified trust assumptions behind current alignment and containment techniques. An a… ▽ More

    Submitted 5 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  16. arXiv:2505.20609  [pdf, other

    cs.AI cs.CL

    Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

    Authors: Hyungjun Park, Chang-Yun Woo, Seungjo Lim, Seunghwan Lim, Keunho Kwak, Ju Young Jeong, Chong Hyun Suh

    Abstract: Objective To develop an LLM based realtime compound diagnostic medical AI interface and performed a clinical trial comparing this interface and physicians for common internal medicine cases based on the United States Medical License Exam (USMLE) Step 2 Clinical Skill (CS) style exams. Methods A nonrandomized clinical trial was conducted on August 20, 2024. We recruited one general physician, two i… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  17. arXiv:2505.18816  [pdf, ps, other

    cs.CV

    Reasoning Segmentation for Images and Videos: A Survey

    Authors: Yiqing Shen, Chenjia Li, Fei Xiong, Jeong-O Jeong, Tianpeng Wang, Michael Latman, Mathias Unberath

    Abstract: Reasoning Segmentation (RS) aims to delineate objects based on implicit text queries, the interpretation of which requires reasoning and knowledge integration. Unlike the traditional formulation of segmentation problems that relies on fixed semantic categories or explicit prompting, RS bridges the gap between visual perception and human-like reasoning capabilities, facilitating more intuitive huma… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  18. arXiv:2505.17612  [pdf, other

    cs.CL cs.AI

    Distilling LLM Agent into Small Models with Retrieval and Code Tools

    Authors: Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang

    Abstract: Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise co… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: preprint, v1

  19. arXiv:2505.15389  [pdf, other

    cs.CL cs.CR cs.CV

    Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

    Authors: DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu

    Abstract: Rapid deployment of vision-language models (VLMs) magnifies safety risks, yet most evaluations rely on artificial images. This study asks: How safe are current VLMs when confronted with meme images that ordinary users share? To investigate this question, we introduce MemeSafetyBench, a 50,430-instance benchmark pairing real meme images with both harmful and benign instructions. Using a comprehensi… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  20. arXiv:2505.13553  [pdf, ps, other

    cs.SE cs.LG

    Selective Code Generation for Functional Guarantees

    Authors: Jaewoo Jeong, Taesoo Kim, Sangdon Park

    Abstract: Large language models (LLMs) show human-level performance and their specialized descendants, code generation models, play core roles in solving complex tasks, including mathematical reasoning and software development. On the downside, the hallucination of LLMs mainly hinders their applicability to systems requiring higher safety standards, thus drawing the attention of the AI community. However, t… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  21. arXiv:2505.13232  [pdf, ps, other

    cs.AI cs.CV

    StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment

    Authors: Younghyun Kim, Jongheon Jeong, Sangkyung Kwak, Kyungmin Lee, Juho Lee, Jinwoo Shin

    Abstract: Learning robust representations from data often requires scale, which has led to the success of recent zero-shot models such as CLIP. However, the obtained robustness can easily be deteriorated when these models are fine-tuned on other downstream tasks (e.g., of smaller scales). Previous works often interpret this phenomenon in the context of domain shift, developing fine-tuning methods that aim t… ▽ More

    Submitted 27 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: IJCAI 2025; Code is available at https://github.com/alinlab/StarFT

  22. arXiv:2505.11755  [pdf, ps, other

    cs.RO cs.AI

    Reachability Barrier Networks: Learning Hamilton-Jacobi Solutions for Smooth and Flexible Control Barrier Functions

    Authors: Matthew Kim, William Sharpless, Hyun Joe Jeong, Sander Tonkens, Somil Bansal, Sylvia Herbert

    Abstract: Recent developments in autonomous driving and robotics underscore the necessity of safety-critical controllers. Control barrier functions (CBFs) are a popular method for appending safety guarantees to a general control framework, but they are notoriously difficult to generate beyond low dimensions. Existing methods often yield non-differentiable or inaccurate approximations that lack integrity, an… ▽ More

    Submitted 19 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: 15 pages, 7 figures

  23. arXiv:2505.03562  [pdf, ps, other

    cs.CV cs.AI

    Real-Time Person Image Synthesis Using a Flow Matching Model

    Authors: Jiwoo Jeong, Kirok Kim, Wooju Kim, Nam-Joon Kim

    Abstract: Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image. This task plays a key role in various real-world applications, such as sign language video generation, AR/VR, gaming, and live streaming. In these scenarios, real-time PGPIS is critical for providing immediate visual feedback and maintaining user immersion.However, achievin… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  24. arXiv:2504.17219  [pdf, other

    cs.LG cs.AI cs.CR

    Enhancing Variational Autoencoders with Smooth Robust Latent Encoding

    Authors: Hyomin Lee, Minseon Kim, Sangwon Jang, Jongheon Jeong, Sung Ju Hwang

    Abstract: Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models, as in Stable Diffusion, yet questions regarding their robustness remain largely underexplored. Although adversarial training has been an established technique for enhancing robustness in predictive models, it has been overlooked for generative models due to concerns about potential fidelity degr… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: Under review

  25. arXiv:2504.15333  [pdf, other

    cs.CY

    Measuring Interest Group Positions on Legislation: An AI-Driven Analysis of Lobbying Reports

    Authors: Jiseon Kim, Dongkwan Kim, Joohye Jeong, Alice Oh, In Song Kim

    Abstract: Special interest groups (SIGs) in the U.S. participate in a range of political activities, such as lobbying and making campaign donations, to influence policy decisions in the legislative and executive branches. The competing interests of these SIGs have profound implications for global issues such as international trade policies, immigration, climate change, and global health challenges. Despite… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.08756  [pdf, ps, other

    cs.IR cs.AI

    MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation

    Authors: Jeongsoo Lee, Daeyong Kwon, Kyohoon Jin, Junnyeong Jeong, Minwoo Sim, Minwoo Kim

    Abstract: Existing RAG benchmarks often overlook query difficulty, leading to inflated performance on simpler questions and unreliable evaluations. A robust benchmark dataset must satisfy three key criteria: quality, diversity, and difficulty, which capturing the complexity of reasoning based on hops and the distribution of supporting evidence. In this paper, we propose MHTS (Multi-Hop Tree Structure), a no… ▽ More

    Submitted 29 May, 2025; v1 submitted 29 March, 2025; originally announced April 2025.

  27. arXiv:2504.06838  [pdf, other

    cs.CV cs.LG

    ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

    Authors: Seonghwan Park, Jaehyeon Jeong, Yongjun Kim, Jaeho Lee, Namhoon Lee

    Abstract: Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allow… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: ICLR 2025

  28. arXiv:2504.05770  [pdf, other

    cs.CV cs.AI

    A Lightweight Multi-Module Fusion Approach for Korean Character Recognition

    Authors: Inho Jake Park, Jaehoon Jay Jeong, Ho-Sang Jo

    Abstract: Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic C… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures, 5 tables

    MSC Class: 68T07 ACM Class: I.2.10

  29. arXiv:2504.04718  [pdf, other

    cs.CL cs.AI

    T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

    Authors: Minki Kang, Jongwon Jeong, Jaewoong Cho

    Abstract: Recent studies have demonstrated that test-time compute scaling effectively improves the performance of small language models (sLMs). However, prior research has mainly examined test-time compute scaling with an additional larger model as a verifier, leaving self-verification by sLMs underexplored. In this work, we investigate whether sLMs can reliably self-verify their outputs under test-time sca… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Preprint

  30. arXiv:2503.23363  [pdf, other

    cs.AI cs.CL cs.LG

    Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

    Authors: Jiwon Jeong, Hyeju Jang, Hogun Park

    Abstract: The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Accepted to NAACL 2025 Findings

  31. arXiv:2503.22201  [pdf, other

    cs.CV

    Multi-modal Knowledge Distillation-based Human Trajectory Forecasting

    Authors: Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, Kuk-Jin Yoon

    Abstract: Pedestrian trajectory forecasting is crucial in various applications such as autonomous driving and mobile robot navigation. In such applications, camera-based perception enables the extraction of additional modalities (human pose, text) to enhance prediction accuracy. Indeed, we find that textual descriptions play a crucial role in integrating additional modalities into a unified understanding. H… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  32. arXiv:2503.20823  [pdf, other

    cs.CR

    Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

    Authors: Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang

    Abstract: Despite the remarkable versatility of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) to generalize across both language and vision tasks, LLMs and MLLMs have shown vulnerability to jailbreaking, generating textual outputs that undermine safety, ethical, and bias standards when exposed to harmful or sensitive inputs. With the recent advancement of safety alignment via preference-tuning fr… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR2025

  33. arXiv:2503.18446  [pdf, other

    cs.CV

    Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

    Authors: Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim

    Abstract: In this paper, we propose LSRNA, a novel framework for higher-resolution (exceeding 1K) image generation using diffusion models by leveraging super-resolution directly in the latent space. Existing diffusion models struggle with scaling beyond their training resolutions, often leading to structural distortions or content repetition. Reference-based methods address the issues by upsampling a low-re… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  34. arXiv:2503.15865  [pdf

    cs.LG cs.AI

    Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement

    Authors: Jong-Hyun Jeong, Hongki Jo, Qiang Zhou, Tahsin Afroz Hoque Nishat, Lang Wu

    Abstract: Wireless sensor networks (WSNs) have become a promising solution for structural health monitoring (SHM), especially in hard-to-reach or remote locations. Battery-powered WSNs offer various advantages over wired systems, however limited battery life has always been one of the biggest obstacles in practical use of the WSNs, regardless of energy harvesting methods. While various methods have been stu… ▽ More

    Submitted 22 March, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

  35. arXiv:2503.04257  [pdf, ps, other

    cs.CV cs.AI

    How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects

    Authors: Wonkwang Lee, Jongwon Jeong, Taehong Moon, Hyeon-Jong Kim, Jaehyeon Kim, Gunhee Kim, Byeong-Uk Lee

    Abstract: Motion synthesis for diverse object categories holds great potential for 3D content creation but remains underexplored due to two key challenges: (1) the lack of comprehensive motion datasets that include a wide range of high-quality motions and annotations, and (2) the absence of methods capable of handling heterogeneous skeletal templates from diverse objects. To address these challenges, we con… ▽ More

    Submitted 30 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to ICML 2025

  36. arXiv:2502.12523  [pdf, other

    cs.SI

    Cohesive Subgraph Discovery in Hypergraphs: A Locality-Driven Indexing Framework

    Authors: Song Kim, Dahee Kim, Taejoon Han, Junghoon Kim, Hyun Ji Jeong, Jungeun Kim

    Abstract: Hypergraphs, increasingly utilised for modelling complex and diverse relationships in modern networks, gain much attention representing intricate higher-order interactions. Among various challenges, cohesive subgraph discovery is one of the fundamental problems and offers deep insights into these structures, yet the task of selecting appropriate parameters is an open question. To handle that quest… ▽ More

    Submitted 9 April, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  37. arXiv:2502.03740  [pdf, ps, other

    cs.LG cs.AI

    Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs

    Authors: Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim

    Abstract: Disentanglement learning is a core issue for understanding and re-using trained information in Variational AutoEncoder (VAE), and effective inductive bias has been reported as a key factor. However, the actual implementation of such bias is still vague. In this paper, we propose a novel method, called Multiple Invertible and partial-equivariant transformation (MIPE-transformation), to inject induc… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  38. arXiv:2501.12001  [pdf, other

    cs.HC

    Conversation Progress Guide : UI System for Enhancing Self-Efficacy in Conversational AI

    Authors: Daeun Jeong, Sungbok Shin, Jongwook Jeong

    Abstract: In this study, we introduce the Conversation Progress Guide (CPG), a system designed for text-based conversational AI interactions that provides a visual interface to represent progress. Users often encounter failures when interacting with conversational AI, which can negatively affect their self-efficacy-an individual's belief in their capabilities, reducing their willingness to engage with these… ▽ More

    Submitted 24 February, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: Accepted to ACM CHI2025'

  39. arXiv:2501.10609  [pdf, other

    eess.SP cs.IT

    Universal Discrete Filtering with Lookahead or Delay

    Authors: Pumiao Yan, Jiwon Jeong, Naomi Sagan, Tsachy Weissman

    Abstract: We consider the universal discrete filtering problem, where an input sequence generated by an unknown source passes through a discrete memoryless channel, and the goal is to estimate its components based on the output sequence with limited lookahead or delay. We propose and establish the universality of a family of schemes for this setting. These schemes are induced by universal Sequential Probabi… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  40. arXiv:2501.03714  [pdf, other

    cs.CV

    MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

    Authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim

    Abstract: 3D Gaussian Splatting (3DGS) has made significant strides in scene representation and neural rendering, with intense efforts focused on adapting it for dynamic scenes. Despite delivering remarkable rendering quality and speed, existing methods struggle with storage demands and representing complex real-world motions. To tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting… ▽ More

    Submitted 24 March, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: CVPR2025 (camera ready ver.). The last two authors are co-corresponding authors. Please visit our project page at https://kaist-viclab.github.io/MoDecGS-site/

  41. arXiv:2412.12629  [pdf, other

    eess.IV cs.AI cs.CV

    a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions

    Authors: Pranav Rajpurkar, Julian N. Acosta, Siddhant Dogra, Jaehwan Jeong, Deepanshu Jindal, Michael Moritz, Samir Rajpurkar

    Abstract: We present a comprehensive evaluation of a2z-1, an artificial intelligence (AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive and actionable findings. Our study focuses on rigorous assessment of the model's performance and generalizability. Large-scale retrospective analysis demonstrates an average AUC of 0.931 across 21 conditions. External validation across two distinct… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  42. arXiv:2412.11463  [pdf, other

    cs.CV cs.AI cs.LG

    FedCAR: Cross-client Adaptive Re-weighting for Generative Models in Federated Learning

    Authors: Minjun Kim, Minjee Kim, Jinhoon Jeong

    Abstract: Generative models trained on multi-institutional datasets can provide an enriched understanding through diverse data distributions. However, training the models on medical images is often challenging due to hospitals' reluctance to share data for privacy reasons. Federated learning(FL) has emerged as a privacy-preserving solution for training distributed datasets across data centers by aggregating… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  43. arXiv:2412.09921  [pdf, other

    cs.CV

    FaceShield: Defending Facial Image against Deepfake Threats

    Authors: Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim

    Abstract: The rising use of deepfakes in criminal activities presents a significant issue, inciting widespread controversy. While numerous studies have tackled this problem, most primarily focus on deepfake detection. These reactive solutions are insufficient as a fundamental approach for crimes where authenticity is disregarded. Existing proactive defenses also have limitations, as they are effective only… ▽ More

    Submitted 10 March, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

  44. arXiv:2412.09122  [pdf, other

    cs.CV

    LVMark: Robust Watermark for Latent Video Diffusion Models

    Authors: MinHyuk Jang, Youngdong Jang, JaeHyeok Lee, Feng Yang, Gyeongrok Oh, Jongheon Jeong, Sangpil Kim

    Abstract: Rapid advancements in video diffusion models have enabled the creation of realistic videos, raising concerns about unauthorized use and driving the demand for techniques to protect model ownership. Existing watermarking methods, while effective for image diffusion models, do not account for temporal consistency, leading to degraded video quality and reduced robustness against video distortions. To… ▽ More

    Submitted 28 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

  45. arXiv:2412.07333  [pdf, other

    cs.CV cs.AI

    Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model

    Authors: Donghwna Lee, Kyungha Min, Kirok Kim, Seyoung Jeong, Jiwoo Jeong, Wooju Kim

    Abstract: Pose-Guided Person Image Synthesis (PGPIS) aims to synthesize high-quality person images corresponding to target poses while preserving the appearance of the source image. Recently, PGPIS methods that use diffusion models have achieved competitive performance. Most approaches involve extracting representations of the target pose and source image and learning their relationships in the generative m… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  46. arXiv:2412.03162  [pdf

    cs.CY

    LLM-Mirror: A Generated-Persona Approach for Survey Pre-Testing

    Authors: Sunwoong Kim, Jongho Jeong, Jin Soo Han, Donghyuk Shin

    Abstract: Surveys are widely used in social sciences to understand human behavior, but their implementation often involves iterative adjustments that demand significant effort and resources. To this end, researchers have increasingly turned to large language models (LLMs) to simulate human behavior. While existing studies have focused on distributional similarities, individual-level comparisons remain under… ▽ More

    Submitted 5 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

    Comments: 11 pages, 5 figures

  47. arXiv:2411.08933  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness

    Authors: Suhyeok Jang, Seojin Kim, Jinwoo Shin, Jongheon Jeong

    Abstract: The remarkable advances in deep learning have led to the emergence of many off-the-shelf classifiers, e.g., large pre-trained models. However, since they are typically trained on clean data, they remain vulnerable to adversarial attacks. Despite this vulnerability, their superior performance and transferability make off-the-shelf classifiers still valuable in practice, demanding further work to pr… ▽ More

    Submitted 15 November, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: 26 pages; TMLR 2024; Code is available at https://github.com/suhyeok24/FT-CADIS

  48. arXiv:2411.01281  [pdf, other

    cs.CL cs.AI

    Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models

    Authors: Seonil Son, Ju-Min Oh, Heegon Jin, Cheolhun Jang, Jeongbeom Jeong, Kuntae Kim

    Abstract: Most existing benchmarking approaches for evaluating the output quality of large language models (LLMs) rely on comparing LLM responses to predefined references. Such methods, based on static datasets, quickly become outdated as LLM capabilities and use cases evolve. In this work, we introduce VARCO Arena--a novel, cost-effective, and robust benchmarking approach that leverages a single-eliminatio… ▽ More

    Submitted 18 February, 2025; v1 submitted 2 November, 2024; originally announced November 2024.

    Comments: 8 pages for main body, 17 pages in total

  49. arXiv:2411.00626  [pdf, other

    cs.CV

    ZIM: Zero-Shot Image Matting for Anything

    Authors: Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu

    Abstract: The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labe… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: preprint (21 pages, 16 figures, and 8 tables)

  50. arXiv:2410.07832  [pdf, other

    cs.CV cs.AI cs.RO

    LaB-CL: Localized and Balanced Contrastive Learning for improving parking slot detection

    Authors: U Jin Jeong, Sumin Roh, Il Yong Chun

    Abstract: Parking slot detection is an essential technology in autonomous parking systems. In general, the classification problem of parking slot detection consists of two tasks, a task determining whether localized candidates are junctions of parking slots or not, and the other that identifies a shape of detected junctions. Both classification tasks can easily face biased learning toward the majority class… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 7 pages, 6 figures