Skip to main content

Showing 1–50 of 470 results for author: Shin, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.25973  [pdf, ps, other

    cs.AI

    Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions

    Authors: Junbeom Kim, Kyuyoung Kim, Jihoon Tack, Dongha Lim, Jinwoo Shin

    Abstract: Language models trained on web-scale corpora risk memorizing and exposing sensitive information, prompting the need for effective machine unlearning. Prior methods mainly focus on input queries to suppress sensitive outputs, yet this often fails to eliminate the underlying knowledge and limits scalability. To address this, we propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    ACM Class: I.2.6

  2. arXiv:2509.25897  [pdf, ps, other

    cs.CL cs.AI cs.CY

    RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

    Authors: Jisu Shin, Hoyun Song, Juhyun Oh, Changgeon Ko, Eunsu Kim, Chani Jung, Alice Oh

    Abstract: Humans often encounter role conflicts -- social dilemmas where the expectations of multiple roles clash and cannot be simultaneously fulfilled. As large language models (LLMs) become increasingly influential in human decision-making, understanding how they behave in complex social situations is essential. While previous research has evaluated LLMs' social abilities in contexts with predefined corr… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  3. arXiv:2509.25465  [pdf, ps, other

    cs.SE

    BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions

    Authors: Yinghang Ma, Jiho Shin, Leuson Da Silva, Zhen Ming, Jiang, Song Wang, Foutse Khomh, Shin Hwei Tan

    Abstract: Recent advances in large language models (LLMs) have accelerated the development of AI-driven automated program repair (APR) solutions. However, these solutions are typically evaluated using static benchmarks such as Defects4J and SWE-bench, which suffer from two key limitations: (1) the risk of data contamination, potentially inflating evaluation results due to overlap with LLM training data, and… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 22 pages, 7 figures, Manuscript submitted to ACM Transactions on Software Engineering and Methodology

  4. arXiv:2509.24328  [pdf, ps, other

    cs.CL

    Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding

    Authors: Sungkyun Kim, Jaemin Kim, Dogyung Yoon, Jiho Shin, Junyeol Lee, Jiwon Seo

    Abstract: LLMs have low GPU efficiency and high latency due to autoregressive decoding. Speculative decoding (SD) mitigates this using a small draft model to speculatively generate multiple tokens, which are then verified in parallel by a target model. However, when speculation accuracy is low, the overhead from rejected tokens can offset the benefits, limiting SD's effectiveness, especially at large batch… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 14 pages, 6 figures

  5. arXiv:2509.21013  [pdf, ps, other

    cs.LG cs.AI

    Predicting LLM Reasoning Performance with Small Proxy Model

    Authors: Woosung Koh, Juyoung Suk, Sungjun Han, Se-Young Yun, Jamin Shin

    Abstract: Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging for reasoning capabilities, which exhibit emergent behavior that only appear reliably at larger model sizes, often exceeding 7B parameters. To address this, we introduce rBridge, showing that small prox… ▽ More

    Submitted 30 September, 2025; v1 submitted 25 September, 2025; originally announced September 2025.

    Comments: Pre-print

  6. arXiv:2509.18190  [pdf, ps, other

    cs.CV cs.AI

    HazeFlow: Revisit Haze Physical Model as ODE and Non-Homogeneous Haze Generation for Real-World Dehazing

    Authors: Junseong Shin, Seungwoo Chung, Yunjeong Yang, Tae Hyun Kim

    Abstract: Dehazing involves removing haze or fog from images to restore clarity and improve visibility by estimating atmospheric scattering effects. While deep learning methods show promise, the lack of paired real-world training data and the resulting domain gap hinder generalization to real-world scenarios. In this context, physics-grounded learning becomes crucial; however, traditional methods based on t… ▽ More

    Submitted 25 September, 2025; v1 submitted 19 September, 2025; originally announced September 2025.

  7. arXiv:2509.15513  [pdf, ps, other

    cs.LG cs.RO eess.SY

    KoopCast: Trajectory Forecasting via Koopman Operators

    Authors: Jungjin Lee, Jaeuk Shin, Gihwan Kim, Joonho Han, Insoon Yang

    Abstract: We present KoopCast, a lightweight yet efficient model for trajectory forecasting in general dynamic environments. Our approach leverages Koopman operator theory, which enables a linear representation of nonlinear dynamics by lifting trajectories into a higher-dimensional space. The framework follows a two-stage design: first, a probabilistic neural goal estimator predicts plausible long-term targ… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  8. arXiv:2509.14285  [pdf, ps, other

    cs.CR cs.LG

    A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

    Authors: S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin

    Abstract: Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  9. arXiv:2509.13218  [pdf, ps, other

    cs.LG

    FOSSIL: Regret-minimizing weighting for robust learning under imbalance and small data

    Authors: J. Cha, J. Lee, J. Cho, J. Shin

    Abstract: Imbalanced and small data regimes are pervasive in domains such as rare disease imaging, genomics, and disaster response, where labeled samples are scarce and naive augmentation often introduces artifacts. Existing solutions such as oversampling, focal loss, or meta-weighting address isolated aspects of this challenge but remain fragile or complex. We introduce FOSSIL (Flexible Optimization via Sa… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: 24 pages, 6 figures, submitted to ICLR 2025

  10. OCELOT 2023: Cell Detection from Cell-Tissue Interaction Challenge

    Authors: JaeWoong Shin, Jeongun Ryu, Aaron Valero Puche, Jinhee Lee, Biagio Brattoli, Wonkyung Jung, Soo Ick Cho, Kyunghyun Paeng, Chan-Young Ock, Donggeun Yoo, Zhaoyang Li, Wangkai Li, Huayu Mai, Joshua Millward, Zhen He, Aiden Nibali, Lydia Anette Schoenpflug, Viktor Hendrik Koelzer, Xu Shuoyu, Ji Zheng, Hu Bin, Yu-Wen Lo, Ching-Hui Yang, Sérgio Pereira

    Abstract: Pathologists routinely alternate between different magnifications when examining Whole-Slide Images, allowing them to evaluate both broad tissue morphology and intricate cellular details to form comprehensive diagnoses. However, existing deep learning-based cell detection models struggle to replicate these behaviors and learn the interdependent semantics between structures at different magnificati… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: This is the accepted manuscript of an article published in Medical Image Analysis (Elsevier). The final version is available at: https://doi.org/10.1016/j.media.2025.103751

    Journal ref: Medical Image Analysis 106 (2025) 103751

  11. arXiv:2509.04476  [pdf, ps, other

    cs.CL cs.AI

    Training Text-to-Molecule Models with Context-Aware Tokenization

    Authors: Seojin Kim, Hyeontae Song, Jaehyun Nam, Jinwoo Shin

    Abstract: Recently, text-to-molecule models have shown great potential across various chemical applications, e.g., drug-discovery. These models adapt language models to molecular data by representing molecules as sequences of atoms. However, they rely on atom-level tokenizations, which primarily focus on modeling local connectivity, thereby limiting the ability of models to capture the global structural con… ▽ More

    Submitted 17 September, 2025; v1 submitted 30 August, 2025; originally announced September 2025.

    Comments: EMNLP 2025 Findings

  12. arXiv:2509.02537  [pdf, ps, other

    cs.HC

    Octo's Heartland: Supporting Children with Congenital Heart Disease through Digital Health Education

    Authors: Irene Zeng, Neda Barbazi, Ji Youn Shin, Gurumurthy Hiremath, Carlye Anne Lauff

    Abstract: Children with congenital heart disease (CHD) often face challenges that require them to understand complex medical information from an early age in order to support lifelong care and improve health outcomes. However, prior research has rarely included young children in designing and evaluating digital tools to support health education using developmentally appropriate strategies. This study is par… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

  13. arXiv:2508.12166  [pdf, ps, other

    cs.RO cs.LG eess.SY

    Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing

    Authors: Gokul Puthumanaillam, Aditya Penumarti, Manav Vora, Paulo Padrao, Jose Fuentes, Leonardo Bobadilla, Jane Shin, Melkior Ornik

    Abstract: Robots equipped with rich sensor suites can localize reliably in partially-observable environments, but powering every sensor continuously is wasteful and often infeasible. Belief-space planners address this by propagating pose-belief covariance through analytic models and switching sensors heuristically--a brittle, runtime-expensive approach. Data-driven approaches--including diffusion models--le… ▽ More

    Submitted 27 August, 2025; v1 submitted 16 August, 2025; originally announced August 2025.

    Comments: Accepted to CoRL 2025 (Conference on Robot Learning)

  14. arXiv:2508.11890  [pdf, ps, other

    cs.RO cs.AI

    Integrating Symbolic RL Planning into a BDI-based Autonomous UAV Framework: System Integration and SIL Validation

    Authors: Sangwoo Jeon, Juchul Shin, YeonJe Cho, Gyeong-Tae Kim, Seongwoo Kim

    Abstract: Modern autonomous drone missions increasingly require software frameworks capable of seamlessly integrating structured symbolic planning with adaptive reinforcement learning (RL). Although traditional rule-based architectures offer robust structured reasoning for drone autonomy, their capabilities fall short in dynamically complex operational environments that require adaptive symbolic planning. S… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  15. arXiv:2508.10954  [pdf, ps, other

    cs.LG cs.AI

    Towards Efficient Prompt-based Continual Learning in Distributed Medical AI

    Authors: Gyutae Oh, Jitae Shin

    Abstract: Modern AI models achieve state-of-the-art performance with large-scale, high-quality datasets; however, ethical, social, and institutional constraints in the medical domain severely restrict data sharing, rendering centralized learning nearly impossible. Each institution must incrementally update models using only local data. Traditional training overfits new samples and suffers from catastrophic… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: 10p

  16. arXiv:2508.10757  [pdf, ps, other

    cs.HC cs.CY

    "I Want My Chart to Be Just for Me": Community-Engaged Design to Support Outpatient Healthcare for Resettled Communities

    Authors: Zhanming Chen, Juan F. Maestre, May Hang, Alisha Ghaju, Ji Youn Shin

    Abstract: Individuals resettled in a new environment often face challenges in accessing adequate healthcare services, particularly within the complex processes of outpatient clinic care. Cultural differences, language barriers, and low socioeconomic status contribute to these difficulties. While previous studies have identified barriers and proposed technology-mediated solutions for resettled populations, m… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  17. arXiv:2508.10747  [pdf, ps, other

    cs.AI cs.RO

    Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

    Authors: Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho, Seongwoo Kim

    Abstract: Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evid… ▽ More

    Submitted 19 August, 2025; v1 submitted 14 August, 2025; originally announced August 2025.

  18. arXiv:2508.08879  [pdf, ps, other

    cs.CL cs.AI

    Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models

    Authors: Haeun Yu, Seogyeong Jeong, Siddhesh Pawar, Jisu Shin, Jiho Jin, Junho Myung, Alice Oh, Isabelle Augenstein

    Abstract: The growing deployment of large language models (LLMs) across diverse cultural contexts necessitates a better understanding of how the overgeneralization of less documented cultures within LLMs' representations impacts their cultural understanding. Prior work only performs extrinsic evaluation of LLMs' cultural competence, without accounting for how LLMs' internal mechanisms lead to cultural (mis)… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

    Comments: 16 pages, 7 figures

  19. arXiv:2508.07747  [pdf, ps, other

    cs.CV

    Grouped Speculative Decoding for Autoregressive Image Generation

    Authors: Junhyuk So, Juncheol Shin, Hyunho Kook, Eunhyeok Park

    Abstract: Recently, autoregressive (AR) image models have demonstrated remarkable generative capabilities, positioning themselves as a compelling alternative to diffusion models. However, their sequential nature leads to long inference times, limiting their practical scalability. In this work, we introduce Grouped Speculative Decoding (GSD), a novel, training-free acceleration method for AR image models. Wh… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: Accepted to the ICCV 2025

  20. arXiv:2508.07519  [pdf, ps, other

    cs.CV

    Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing

    Authors: Joonghyuk Shin, Alchan Hwang, Yujin Kim, Daneul Kim, Jaesik Park

    Abstract: Transformer-based diffusion models have recently superseded traditional U-Net architectures, with multimodal diffusion transformers (MM-DiT) emerging as the dominant approach in state-of-the-art models like Stable Diffusion 3 and Flux.1. Previous approaches have relied on unidirectional cross-attention mechanisms, with information flowing from text embeddings to image latents. In contrast, MMDiT i… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

    Comments: ICCV 2025. Project webpage: https://joonghyuk.com/exploring-mmdit-web/

  21. arXiv:2508.03365  [pdf, ps, other

    cs.SD cs.AI cs.CR eess.AS

    When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

    Authors: Bodam Kim, Hiskias Dingeto, Taeyoun Kwon, Dasol Choi, DongGeon Lee, Haon Park, JaeHoon Lee, Jongho Shin

    Abstract: As large language models become increasingly integrated into daily life, audio has emerged as a key interface for human-AI interaction. However, this convenience also introduces new vulnerabilities, making audio a potential attack surface for adversaries. Our research introduces WhisperInject, a two-stage adversarial audio attack framework that can manipulate state-of-the-art audio language models… ▽ More

    Submitted 20 August, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

  22. arXiv:2508.00548  [pdf, ps, other

    cs.CV

    Video Color Grading via Look-Up Table Generation

    Authors: Seunghyun Shin, Dongmin Shin, Jisu Shin, Hae-Gon Jeon, Joon-Young Lee

    Abstract: Different from color correction and transfer, color grading involves adjusting colors for artistic or storytelling purposes in a video, which is used to establish a specific look or mood. However, due to the complexity of the process and the need for specialized editing skills, video color grading remains primarily the domain of professional colorists. In this paper, we present a reference-based v… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

    Comments: ICCV2025

  23. arXiv:2507.20907  [pdf, ps, other

    cs.CV cs.AI

    SCORPION: Addressing Scanner-Induced Variability in Histopathology

    Authors: Jeongun Ryu, Heon Song, Seungeun Lee, Soo Ick Cho, Jiwon Shin, Kyunghyun Paeng, Sérgio Pereira

    Abstract: Ensuring reliable model performance across diverse domains is a critical challenge in computational pathology. A particular source of variability in Whole-Slide Images is introduced by differences in digital scanners, thus calling for better scanner generalization. This is critical for the real-world adoption of computational pathology, where the scanning devices may differ per institution or hosp… ▽ More

    Submitted 17 September, 2025; v1 submitted 28 July, 2025; originally announced July 2025.

    Comments: Accepted in UNSURE 2025 workshop in MICCAI

  24. arXiv:2507.20469  [pdf, ps, other

    cs.CV

    Priority-Aware Clinical Pathology Hierarchy Training for Multiple Instance Learning

    Authors: Sungrae Hong, Kyungeun Kim, Juhyeon Kim, Sol Lee, Jisu Shin, Chanjae Song, Mun Yong Yi

    Abstract: Multiple Instance Learning (MIL) is increasingly being used as a support tool within clinical settings for pathological diagnosis decisions, achieving high performance and removing the annotation burden. However, existing approaches for clinical MIL tasks have not adequately addressed the priority issues that exist in relation to pathological symptoms and diagnostic classes, causing MIL models to… ▽ More

    Submitted 31 July, 2025; v1 submitted 27 July, 2025; originally announced July 2025.

    Comments: 10 pages, 4 figures, Accepted for oral presentation by The 2nd MICCAI Student Board (MSB) EMERGE Workshop

  25. arXiv:2507.19773  [pdf, ps, other

    cs.CV

    Self-Guided Masked Autoencoder

    Authors: Jeongwoo Shin, Inseo Lee, Junho Lee, Joonseok Lee

    Abstract: Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly learns. In this paper, with an in-depth analysis, we discover that MAE intrinsically learns pattern-based patch-level clustering from surprisingly early stages of… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

  26. arXiv:2507.17998  [pdf, ps, other

    cs.CV

    Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann Manifold

    Authors: Jaeho Shin, Hyeonjae Gil, Junwoo Jang, Maani Ghaffari, Ayoung Kim

    Abstract: Affine Grassmannian has been favored for expressing proximity between lines and planes due to its theoretical exactness in measuring distances among features. Despite this advantage, the existing method can only measure the proximity without yielding the distance as an explicit function of rigid body transformation. Thus, an optimizable distance function on the manifold has remained underdeveloped… ▽ More

    Submitted 25 July, 2025; v1 submitted 23 July, 2025; originally announced July 2025.

  27. arXiv:2507.15892  [pdf, ps, other

    cs.SE

    StaAgent: An Agentic Framework for Testing Static Analyzers

    Authors: Elijah Nnorom, Md Basim Uddin Ahmed, Jiho Shin, Hung Viet Pham, Song Wang

    Abstract: Static analyzers play a critical role in identifying bugs early in the software development lifecycle, but their rule implementations are often under-tested and prone to inconsistencies. To address this, we propose StaAgent, an agentic framework that harnesses the generative capabilities of Large Language Models (LLMs) to systematically evaluate static analyzer rules. StaAgent comprises four speci… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  28. arXiv:2507.15748  [pdf, ps, other

    cs.CV

    CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction

    Authors: Jisu Shin, Richard Shaw, Seunghyun Shin, Zhensong Zhang, Hae-Gon Jeon, Eduardo Perez-Pellitero

    Abstract: Modern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade novel view synthesis. Joint optimization of scene-specific representations and per-image appearance embed… ▽ More

    Submitted 30 September, 2025; v1 submitted 21 July, 2025; originally announced July 2025.

  29. arXiv:2507.15541  [pdf, ps, other

    cs.CV

    Towards Holistic Surgical Scene Graph

    Authors: Jongmin Shin, Enki Cho, Ka Young Kim, Jung Yong Kim, Seong Tae Kim, Namkee Oh

    Abstract: Surgical scene understanding is crucial for computer-assisted intervention systems, requiring visual comprehension of surgical scenes that involves diverse elements such as surgical tools, anatomical structures, and their interactions. To effectively represent the complex information in surgical scenes, graph-based approaches have been explored to structurally model surgical entities and their rel… ▽ More

    Submitted 23 July, 2025; v1 submitted 21 July, 2025; originally announced July 2025.

    Comments: Accepted to MICCAI 2025

  30. arXiv:2507.12050  [pdf, ps, other

    cs.CR cs.CV

    IDFace: Face Template Protection for Efficient and Secure Identification

    Authors: Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Dongsoo Kim, Junbum Shin, Jae Hong Seo

    Abstract: As face recognition systems (FRS) become more widely used, user privacy becomes more important. A key privacy issue in FRS is protecting the user's face template, as the characteristics of the user's face image can be recovered from the template. Although recent advances in cryptographic tools such as homomorphic encryption (HE) have provided opportunities for securing the FRS, HE cannot be used d… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025

    ACM Class: I.5.4; K.6.5; D.4.6; I.4.7

  31. arXiv:2507.09984  [pdf, ps, other

    cs.CV

    Latent Diffusion Models with Masked AutoEncoders

    Authors: Junho Lee, Jeongwoo Shin, Hyungwook Choi, Joonseok Lee

    Abstract: In spite of the remarkable potential of Latent Diffusion Models (LDMs) in image generation, the desired properties and optimal design of the autoencoders have been underexplored. In this work, we analyze the role of autoencoders in LDMs and identify three key properties: latent smoothness, perceptual compression quality, and reconstruction quality. We demonstrate that existing autoencoders fail to… ▽ More

    Submitted 23 July, 2025; v1 submitted 14 July, 2025; originally announced July 2025.

  32. arXiv:2507.08806  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Think Clearly: Improving Reasoning via Redundant Token Pruning

    Authors: Daewon Choi, Jimin Lee, Jihoon Tack, Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati

    Abstract: Recent large language models have shown promising capabilities in long-form reasoning, following structured chains of thought before arriving at a final answer. However, we observe that these reasoning paths tend to include substantial redundancy; analyzing attention patterns reveals that attention scores are widely scattered, particularly incorrect answers exhibit greater attention sparsity. In t… ▽ More

    Submitted 17 June, 2025; originally announced July 2025.

  33. arXiv:2507.04748  [pdf, ps, other

    cs.AI

    LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

    Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko

    Abstract: Question-answering (QA) interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC system insights, particularly for non-expert users. However, enabling accurate, real-time, and context-aware interactions with HVAC systems introduces unique challenges, including the integration of frequently updated sensor data, domain-specific knowledge… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  34. arXiv:2507.04310  [pdf, ps, other

    cs.LG cs.DC

    Heterogeneous Federated Learning with Prototype Alignment and Upscaling

    Authors: Gyuejeong Lee, Jihwan Shin, Daeyoung Choi

    Abstract: Heterogeneity in data distributions and model architectures remains a significant challenge in federated learning (FL). Various heterogeneous FL (HtFL) approaches have recently been proposed to address this challenge. Among them, prototype-based FL (PBFL) has emerged as a practical framework that only shares per-class mean activations from the penultimate layer. However, PBFL approaches often suff… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  35. arXiv:2506.23851  [pdf

    cs.CY cs.ET cs.HC

    Comparative Studies: Cloud-Enabled Adaptive Learning System for Scalable Education in Sub-Saharan

    Authors: Israel Fianyi, Soonja Yeom, Ju-Hyun Shin

    Abstract: The integration of cloud computing in education can revolutionise learning in advanced (Australia & South Korea) and middle-income (Ghana & Nigeria) countries, while offering scalable, cost-effective and equitable access to adaptive learning systems. This paper explores how cloud computing and adaptive learning technologies are deployed across different socio-economic and infrastructure contexts.… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  36. arXiv:2506.23552  [pdf, ps, other

    cs.CV cs.SD eess.AS

    JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

    Authors: Mingi Kwon, Joonghyuk Shin, Jaeseok Jung, Jaesik Park, Youngjung Uh

    Abstract: The intrinsic link between facial motion and speech is often overlooked in generative modeling, where talking head synthesis and text-to-speech (TTS) are typically addressed as separate tasks. This paper introduces JAM-Flow, a unified framework to simultaneously synthesize and condition on both facial motion and speech. Our approach leverages flow matching and a novel Multi-Modal Diffusion Transfo… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: project page: https://joonghyuk.com/jamflow-web Under review. Preprint published on arXiv

  37. arXiv:2506.19352  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation

    Authors: Jisu Shin, Juhyun Oh, Eunsu Kim, Hoyun Song, Alice Oh

    Abstract: Ensuring persona fidelity in large language models (LLMs) is essential for maintaining coherent and engaging human-AI interactions. However, LLMs often exhibit Out-of-Character (OOC) behavior, where generated responses deviate from an assigned persona, leading to inconsistencies that affect model reliability. Existing evaluation methods typically assign single scores to entire responses, strugglin… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Findings of ACL 2025; github repo: https://github.com/ddindidu/atomic-persona-evaluation/

  38. arXiv:2506.18369  [pdf, ps, other

    cs.CV

    RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models

    Authors: Yeongtak Oh, Jisoo Mok, Dohyun Chung, Juhyeon Shin, Sangha Park, Johan Barthelemy, Sungroh Yoon

    Abstract: Recent multi-modal large language models (MLLMs) often struggle to generate personalized image captions, even when trained on high-quality captions. In this work, we observe that such limitations persist in existing post-training-based MLLM personalization methods. Specifically, despite being post-tuned with large-scale caption data through supervised fine-tuning (SFT), these models frequently fai… ▽ More

    Submitted 18 September, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted to NeurIPS 2025

  39. arXiv:2506.15692  [pdf, ps, other

    cs.LG

    MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

    Authors: Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Jinwoo Shin, Sercan Ö. Arık, Tomas Pfister

    Abstract: Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and per… ▽ More

    Submitted 28 August, 2025; v1 submitted 27 May, 2025; originally announced June 2025.

  40. arXiv:2506.11772  [pdf, ps, other

    cs.CV cs.LG

    CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection

    Authors: Byeongchan Lee, John Won, Seunghyun Lee, Jinwoo Shin

    Abstract: Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types (e.g., local and global defect), and the scarcity of training data. As such, it necessitates a comprehensive model capable of capturing both low-level and high-level features, even with limited data. To address this, we propose CLIPFUSION, a method that leverages both discriminative an… ▽ More

    Submitted 7 August, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  41. arXiv:2506.11578  [pdf, ps, other

    cs.AI

    Efficient LLM Collaboration via Planning

    Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Kyungjoon Park, Dongjun Lee, Jinwoo Shin

    Abstract: Recently, large language models (LLMs) have demonstrated strong performance, ranging from simple to complex tasks. However, while large proprietary models (e.g., models with over 100B parameters) achieve remarkable results across diverse tasks, they are often accessible through costly APIs, making frequent use too costly for many applications. In contrast, small open-source models (e.g., models wi… ▽ More

    Submitted 27 September, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  42. arXiv:2506.11098  [pdf, ps, other

    cs.LG cs.AI

    Debiasing Online Preference Learning via Preference Feature Preservation

    Authors: Dongyoung Kim, Jinsung Yoon, Jinwoo Shin, Jaehyung Kim

    Abstract: Recent preference learning frameworks for large language models (LLMs) simplify human preferences with binary pairwise comparisons and scalar rewards. This simplification could make LLMs' responses biased to mostly preferred features, and would be exacerbated during the iterations of online preference learning steps. To address these challenges, we propose a novel framework coined PFP (Preference… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 20 page, 20 figures

  43. arXiv:2506.08456  [pdf, ps, other

    cs.CV

    Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance

    Authors: June Suk Choi, Kyungmin Lee, Sihyun Yu, Yisol Choi, Jinwoo Shin, Kimin Lee

    Abstract: Recent text-to-video (T2V) models have demonstrated strong capabilities in producing high-quality, dynamic videos. To improve the visual controllability, recent works have considered fine-tuning pre-trained T2V models to support image-to-video (I2V) generation. However, such adaptation frequently suppresses motion dynamics of generated outputs, resulting in more static videos compared to their T2V… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Preprint. Under review. Project page available at http://choi403.github.io/ALG

  44. arXiv:2506.05843  [pdf, ps, other

    cs.CV

    FontAdapter: Instant Font Adaptation in Visual Text Generation

    Authors: Myungkyu Koo, Subin Kim, Sangkyung Kwak, Jaehyun Nam, Seojin Kim, Jinwoo Shin

    Abstract: Text-to-image diffusion models have significantly improved the seamless integration of visual text into diverse image contexts. Recent approaches further improve control over font styles through fine-tuning with predefined font dictionaries. However, adapting unseen fonts outside the preset is computationally expensive, often requiring tens of minutes, making real-time customization impractical. I… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Project page: https://fontadapter.github.io/

  45. arXiv:2506.04708  [pdf, other

    cs.CL

    Accelerated Test-Time Scaling with Model-Free Speculative Sampling

    Authors: Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati

    Abstract: Language models have demonstrated remarkable capabilities in reasoning tasks through test-time scaling techniques like best-of-N sampling and tree search. However, these approaches often demand substantial computational resources, creating a critical trade-off between performance and efficiency. We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding appro… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  46. arXiv:2506.01420  [pdf, ps, other

    cs.CL cs.LG

    Self-Refining Language Model Anonymizers via Adversarial Distillation

    Authors: Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin

    Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text poses emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems.… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Preprint

  47. arXiv:2506.01215  [pdf, other

    cs.CL cs.LG

    Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

    Authors: Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model's pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches req… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  48. arXiv:2506.01206  [pdf, other

    cs.CL cs.AI

    Mamba Drafters for Speculative Decoding

    Authors: Daewon Choi, Seunghyuk Oh, Saket Dingliwal, Jihoon Tack, Kyuyoung Kim, Woomin Song, Seojin Kim, Insu Han, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

    Abstract: Speculative decoding has emerged as a promising approach to accelerating large language model (LLM) generation using a fast drafter while maintaining alignment with the target model's distribution. However, existing approaches face a trade-off: external drafters offer flexibility but can suffer from slower drafting, while self-speculation methods use drafters tailored to the target model but requi… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  49. arXiv:2506.00070  [pdf, ps, other

    cs.RO cs.AI

    Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

    Authors: Dongyoung Kim, Sumin Park, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, Younggyo Seo

    Abstract: Large Vision-Language Models (LVLMs) have recently shown great promise in advancing robotics by combining embodied reasoning with robot control. A common approach involves training on embodied reasoning tasks related to robot control using Supervised Fine-Tuning (SFT). However, SFT datasets are often heuristically constructed and not explicitly optimized for improving robot control. Furthermore, S… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

    Comments: 26 pages, 14 figures

  50. arXiv:2505.23651  [pdf, ps, other

    cs.LG cs.CV

    Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation

    Authors: Juncheol Shin, Minsang Seok, Seonggon Kim, Eunhyeok Park

    Abstract: Model merging has emerged as a powerful technique for combining task-specific weights, achieving superior performance in multi-target domain adaptation. However, when applied to practical scenarios, such as quantized models, new challenges arise. In practical scenarios, quantization is often applied to target-specific data, but this process restricts the domain of interest and introduces discretiz… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: ICML 2025. Code: https://github.com/ewsn1593/HDRQ