Skip to main content

Showing 1–50 of 1,483 results for author: Kim, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06125  [pdf, ps, other

    cs.LG cs.AI

    Subspace-based Approximate Hessian Method for Zeroth-Order Optimization

    Authors: Dongyoon Kim, Sungjae Lee, Wonjin Lee, Kwang In Kim

    Abstract: Zeroth-order optimization addresses problems where gradient information is inaccessible or impractical to compute. While most existing methods rely on first-order approximations, incorporating second-order (curvature) information can, in principle, significantly accelerate convergence. However, the high cost of function evaluations required to estimate Hessian matrices often limits practical appli… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 20 pages, 8 figures

  2. arXiv:2507.04772  [pdf, ps, other

    cs.AR

    Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats

    Authors: Seock-Hwan Noh, Sungju Kim, Seohyun Kim, Daehoon Kim, Jaeha Kung, Yeseong Kim

    Abstract: In this work, we introduce an area- and energy-efficient multiply-accumulate (MAC) unit, named Jack unit, that is a jack-of-all-trades, supporting various data formats such as integer (INT), floating point (FP), and microscaling data format (MX). It provides bit-level flexibility and enhances hardware efficiency by i) replacing the carry-save multiplier (CSM) in the FP multiplier with a precision-… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted for publication at the 30th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED 2025)

  3. arXiv:2507.04748  [pdf, ps, other

    cs.AI

    LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

    Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko

    Abstract: Question-answering (QA) interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC system insights, particularly for non-expert users. However, enabling accurate, real-time, and context-aware interactions with HVAC systems introduces unique challenges, including the integration of frequently updated sensor data, domain-specific knowledge… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.02910  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Causal-Paced Deep Reinforcement Learning

    Authors: Geonwoo Cho, Jaegyun Im, Doyoon Kim, Sundong Kim

    Abstract: Designing effective task sequences is crucial for curriculum reinforcement learning (CRL), where agents must gradually acquire skills by training on intermediate tasks. A key challenge in CRL is to identify tasks that promote exploration, yet are similar enough to support effective transfer. While recent approach suggests comparing tasks via their Structural Causal Models (SCMs), the method requir… ▽ More

    Submitted 24 June, 2025; originally announced July 2025.

    Comments: Workshop on Causal Reinforcement Learning, Reinforcement Learning Conference (RLC) 2025

  5. arXiv:2507.02302  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

    Authors: Dohoon Kim, Donghun Kang, Taesup Moon

    Abstract: Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training;… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 22 pages, 5 figures, ACL 2025 Main

  6. arXiv:2507.01409  [pdf, ps, other

    cs.CV

    CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning

    Authors: Kuniaki Saito, Donghyun Kim, Kwanyong Park, Atsushi Hashimoto, Yoshitaka Ushiku

    Abstract: An image captioning model flexibly switching its language pattern, e.g., descriptiveness and length, should be useful since it can be applied to diverse applications. However, despite the dramatic improvement in generative vision-language models, fine-grained control over the properties of generated captions is not easy due to two reasons: (i) existing models are not given the properties as a cond… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV2025

  7. arXiv:2507.01333  [pdf, ps, other

    cs.NI cs.IT

    Multi-User Generative Semantic Communication with Intent-Aware Semantic-Splitting Multiple Access

    Authors: Jiayi Lu, Wanting Yang, Zehui Xiong, Rahim Tafazolli, Tony Q. S. Quek, Mérouane Debbah, Dong In Kim

    Abstract: With the booming development of generative artificial intelligence (GAI), semantic communication (SemCom) has emerged as a new paradigm for reliable and efficient communication. This paper considers a multi-user downlink SemCom system, using vehicular networks as the representative scenario for multi-user content dissemination. To address diverse yet overlapping user demands, we propose a multi-us… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2507.00459  [pdf

    cond-mat.mtrl-sci cs.AI

    Process-aware and high-fidelity microstructure generation using stable diffusion

    Authors: Hoang Cuong Phan, Minh Tien Tran, Chihun Lee, Hoheok Kim, Sehyok Oh, Dong-Kyu Kim, Ho Won Lee

    Abstract: Synthesizing realistic microstructure images conditioned on processing parameters is crucial for understanding process-structure relationships in materials design. However, this task remains challenging due to limited training micrographs and the continuous nature of processing variables. To overcome these challenges, we present a novel process-aware generative modeling approach based on Stable Di… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 46 pages, 13 figures, 5 tables, 3rd Word Congress on Artificial Intelligence in Materials & Manufacturing 2025

  9. arXiv:2506.23547  [pdf, ps, other

    cs.CV

    Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions

    Authors: Jiwon Kim, Soohyun Hwang, Dong-O Kim, Changsu Han, Min Kyu Park, Chang-Su Kim

    Abstract: The first algorithm, called Oneta, for a novel task of multi-style image enhancement is proposed in this work. Oneta uses two point operators sequentially: intensity enhancement with a transformation function (TF) and color correction with a color correction matrix (CCM). This two-step enhancement model, though simple, achieves a high performance upper bound. Also, we introduce eigentransformation… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  10. arXiv:2506.23102  [pdf, ps, other

    eess.IV cs.CV

    MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation

    Authors: Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

    Abstract: The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, submitted to ICCV 2025

  11. arXiv:2506.22762  [pdf, ps, other

    cs.CV

    VSRM: A Robust Mamba-Based Framework for Video Super-Resolution

    Authors: Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim

    Abstract: Video super-resolution remains a major challenge in low-level vision tasks. To date, CNN- and Transformer-based methods have delivered impressive results. However, CNNs are limited by local receptive fields, while Transformers struggle with quadratic complexity, posing challenges for processing long sequences in VSR. Recently, Mamba has drawn attention for its long-sequence modeling, linear comple… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  12. arXiv:2506.21222  [pdf, ps, other

    cs.CL cs.IR

    Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval

    Authors: Yongchan Chun, Minhyuk Kim, Dongjun Kim, Chanjun Park, Heuiseok Lim

    Abstract: Automatic Term Extraction (ATE) identifies domain-specific expressions that are crucial for downstream tasks such as machine translation and information retrieval. Although large language models (LLMs) have significantly advanced various NLP tasks, their potential for ATE has scarcely been examined. We propose a retrieval-based prompting strategy that, in the few-shot setting, selects demonstratio… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  13. arXiv:2506.20090  [pdf, ps, other

    cs.LG

    A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression

    Authors: Ainaz Jamshidi, Dongchan Kim, Muhammad Arif

    Abstract: Predictive maintenance (PdM) has become a crucial element of modern industrial practice. PdM plays a significant role in operational dependability and cost management by decreasing unforeseen downtime and optimizing asset life cycle management. Machine learning and deep learning have enabled more precise forecasts of equipment failure and remaining useful life (RUL). Although many studies have bee… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 13 pages, 7 figures

  14. arXiv:2506.19217  [pdf, ps, other

    cs.CV cs.AI

    MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports

    Authors: Sunggu Kyung, Hyungbin Park, Jinyoung Seo, Jimin Sung, Jihyun Kim, Dongyeong Kim, Wooyoung Jo, Yoojin Nam, Sangah Park, Taehee Kwon, Sang Min Lee, Namkug Kim

    Abstract: Computed Tomography (CT) plays a crucial role in clinical diagnosis, but the growing demand for CT examinations has raised concerns about diagnostic errors. While Multimodal Large Language Models (MLLMs) demonstrate promising comprehension of medical knowledge, their tendency to produce inaccurate information highlights the need for rigorous validation. However, existing medical visual question an… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, submitted to CVPR 2025

  15. arXiv:2506.18557  [pdf, ps, other

    cs.CV

    Object-aware Sound Source Localization via Audio-Visual Scene Understanding

    Authors: Sung Jin Um, Dongjin Kim, Sangmin Lee, Jung Uk Kim

    Abstract: Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in complex scenes, particularly when visually similar silent objects coexist. This limitation arises primarily from their reliance on simple audio-visual corresponden… ▽ More

    Submitted 23 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR 2025

    Journal ref: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 8342-8351

  16. arXiv:2506.18335  [pdf, ps, other

    eess.IV cs.CV

    Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention

    Authors: Saad Wazir, Daeyoung Kim

    Abstract: Segmenting biomarkers in medical images is crucial for various biotech applications. Despite advances, Transformer and CNN based methods often struggle with variations in staining and morphology, limiting feature extraction. In medical image segmentation, where datasets often have limited sample availability, recent state-of-the-art (SOTA) methods achieve higher accuracy by leveraging pre-trained… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 30861-30871

  17. arXiv:2506.17756  [pdf, ps, other

    cond-mat.mtrl-sci cs.AI

    Residual Connection-Enhanced ConvLSTM for Lithium Dendrite Growth Prediction

    Authors: Hosung Lee, Byeongoh Hwang, Dasan Kim, Myungjoo Kang

    Abstract: The growth of lithium dendrites significantly impacts the performance and safety of rechargeable batteries, leading to short circuits and capacity degradation. This study proposes a Residual Connection-Enhanced ConvLSTM model to predict dendrite growth patterns with improved accuracy and computational efficiency. By integrating residual connections into ConvLSTM, the model mitigates the vanishing… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 14pages, 6figures, accepted to Journal of The Electrochemical Society

  18. arXiv:2506.17251  [pdf, ps, other

    cs.LG cs.AI

    Training-free LLM Verification via Recycling Few-shot Examples

    Authors: Dongseok Lee, Jimyung Hong, Dongyoung Kim, Jaehyung Kim

    Abstract: Although LLMs have achieved remarkable performance, the inherent stochasticity of their reasoning process and varying conclusions present significant challenges. Majority voting or Best-of-N with external verification models has been explored to find the most promising solution among multiple LLM outputs. However, these approaches have certain limitations, such as limited applicability or the cost… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  19. arXiv:2506.15601  [pdf, ps, other

    cs.AR

    CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

    Authors: Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung

    Abstract: This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  20. arXiv:2506.14539  [pdf, ps, other

    cs.AI cs.CR

    Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

    Authors: Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son

    Abstract: Since the advent of large language models, prompt engineering now enables the rapid, low-effort creation of diverse autonomous agents that are already in widespread use. Yet this convenience raises urgent concerns about the safety, robustness, and behavioral consistency of the underlying prompts, along with the pressing challenge of preventing those prompts from being exposed to user's attempts. I… ▽ More

    Submitted 26 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  21. arXiv:2506.14107  [pdf, ps, other

    cs.DC cs.CV

    Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse

    Authors: Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park

    Abstract: Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems. These models typically rely on Vision Transformers (ViTs), which process video frames individually to extract visual embeddings. However, generating embeddings for large-scale videos requires ViT inferencing across numerous frames, posi… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted to 2025 VLDB

  22. arXiv:2506.13771  [pdf, ps, other

    cs.LG cs.AI cs.CL

    LittleBit: Ultra Low-Bit Quantization via Latent Factorization

    Authors: Banseok Lee, Dongkyu Kim, Youngcheon You, Youngmin Kim

    Abstract: Deploying large language models (LLMs) often faces challenges from substantial memory and computational costs. Quantization offers a solution, yet performance degradation in the sub-1-bit regime remains particularly difficult. This paper introduces LittleBit, a novel method for extreme LLM compression. It targets levels like 0.1 bits per weight (BPW), achieving nearly 31$\times$ memory reduction,… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  23. Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

    Authors: Suyeon Kim, SeongKu Kang, Dongwoo Kim, Jungseul Ok, Hwanjo Yu

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarkin… ▽ More

    Submitted 16 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: 12 pages

    Journal ref: KDD 2025

  24. arXiv:2506.11815  [pdf, ps, other

    eess.SP cs.AI cs.LG eess.IV

    Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection

    Authors: Tae-Seong Han, Jae-Wook Heo, Hakseung Kim, Cheol-Hui Lee, Hyub Huh, Eue-Keun Choi, Dong-Joo Kim

    Abstract: Electrocardiography (ECG) signals are often degraded by noise, which complicates diagnosis in clinical and wearable settings. This study proposes a diffusion-based framework for ECG noise quantification via reconstruction-based anomaly detection, addressing annotation inconsistencies and the limited generalizability of conventional methods. We introduce a distributional evaluation using the Wasser… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: This manuscript contains 17 pages, 10 figures, and 3 tables

  25. arXiv:2506.11578  [pdf, ps, other

    cs.AI

    Collaborative LLM Inference via Planning for Efficient Reasoning

    Authors: Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Jinwoo Shin

    Abstract: Large language models (LLMs) excel at complex reasoning tasks, but those with strong capabilities (e.g., whose numbers of parameters are larger than 100B) are often accessible only through paid APIs, making them too costly for applications of frequent use. In contrast, smaller open-sourced LLMs (e.g., whose numbers of parameters are less than 3B) are freely available and easy to deploy locally (e.… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  26. arXiv:2506.11098  [pdf, ps, other

    cs.LG cs.AI

    Debiasing Online Preference Learning via Preference Feature Preservation

    Authors: Dongyoung Kim, Jinsung Yoon, Jinwoo Shin, Jaehyung Kim

    Abstract: Recent preference learning frameworks for large language models (LLMs) simplify human preferences with binary pairwise comparisons and scalar rewards. This simplification could make LLMs' responses biased to mostly preferred features, and would be exacerbated during the iterations of online preference learning steps. To address these challenges, we propose a novel framework coined PFP (Preference… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 20 page, 20 figures

  27. arXiv:2506.10612  [pdf, ps, other

    cs.CV cs.AI

    TexTailor: Customized Text-aligned Texturing via Effective Resampling

    Authors: Suin Lee, Dae-Shik Kim

    Abstract: We present TexTailor, a novel method for generating consistent object textures from textual descriptions. Existing text-to-texture synthesis approaches utilize depth-aware diffusion models to progressively generate images and synthesize textures across predefined multiple viewpoints. However, these approaches lead to a gradual shift in texture properties across viewpoints due to (1) insufficient i… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Submitted to ICLR 2025

    ACM Class: I.2.10

  28. arXiv:2506.08964  [pdf, other

    cs.CV

    ORIDa: Object-centric Real-world Image Composition Dataset

    Authors: Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyoung Kim, Seon Joo Kim

    Abstract: Object compositing, the task of placing and harmonizing objects in images of diverse visual scenes, has become an important task in computer vision with the rise of generative models. However, existing datasets lack the diversity and scale required to comprehensively explore real-world scenarios. We introduce ORIDa (Object-centric Real-world Image Composition Dataset), a large-scale, real-captured… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR 2025

  29. arXiv:2506.07984  [pdf, ps, other

    cs.CV cs.LG

    CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray

    Authors: Mingquan Lin, Gregory Holste, Song Wang, Yiliang Zhou, Yishu Wei, Imon Banerjee, Pengyi Chen, Tianjie Dai, Yuexi Du, Nicha C. Dvornek, Yuyan Ge, Zuowei Guo, Shouhei Hanaoka, Dongkyun Kim, Pablo Messina, Yang Lu, Denis Parra, Donghyun Son, Álvaro Soto, Aisha Urooj, René Vidal, Yosuke Yamagishi, Zefan Yang, Ruichi Zhang, Yang Zhou , et al. (8 additional authors not shown)

    Abstract: The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays (CXR). It tackles challenges in open long-tailed lung disease classification and enhances the measurability of state-of-the-art techniques. The first event, CXR-LT 2023, aimed to achieve these goals by providing high-quality benchmark CXR data for model development and conducting c… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 17 pages, 3 figures

  30. arXiv:2506.07750  [pdf, ps, other

    cs.CV

    Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation

    Authors: Hyunsoo Kim, Donghyun Kim, Suhyun Kim

    Abstract: How can we generate an image B' that satisfies A:A'::B:B', given the input images A,A' and B? Recent works have tackled this challenge through approaches like visual in-context learning or visual instruction. However, these methods are typically limited to specific models (e.g. InstructPix2Pix. Inpainting models) rather than general diffusion models (e.g. Stable Diffusion, SDXL). This dependency m… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Published at CVPR 2025

  31. arXiv:2506.07548  [pdf, ps, other

    cs.AI cs.RO

    Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement Learning

    Authors: Weiqiang Jin, Hongyang Du, Guizhong Liu, Dong In Kim

    Abstract: Multi-agent reinforcement learning (MARL) has achieved strong performance in cooperative adversarial tasks. However, most existing methods typically train agents against fixed opponent strategies and rely on such meta-static difficulty conditions, which limits their adaptability to changing environments and often leads to suboptimal policies. Inspired by the success of curriculum learning (CL) in… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 16 pages; 12figures

  32. arXiv:2506.07205  [pdf, other

    cs.CV

    TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation

    Authors: Min-Jung Kim, Dongjin Kim, Seokju Yun, Jaegul Choo

    Abstract: Video editing has garnered increasing attention alongside the rapid progress of diffusion-based video generation models. As part of these advancements, there is a growing demand for more accessible and controllable forms of video editing, such as prompt-based editing. Previous studies have primarily focused on tasks such as style transfer, background replacement, object substitution, and attribute… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  33. arXiv:2506.05637  [pdf, ps, other

    cs.IT eess.SP

    Joint User Association and Beamforming Design for ISAC Networks with Large Language Models

    Authors: Haoyun Li, Ming Xiao, Kezhi Wang, Robert Schober, Dong In Kim, Yong Liang Guan

    Abstract: Integrated sensing and communication (ISAC) has been envisioned to play a more important role in future wireless networks. However, the design of ISAC networks is challenging, especially when there are multiple communication and sensing (C\&S) nodes and multiple sensing targets. We investigate a multi-base station (BS) ISAC network in which multiple BSs equipped with multiple antennas simultaneous… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  34. arXiv:2506.05340  [pdf, ps, other

    cs.LG cs.AI

    Exploring Diffusion Transformer Designs via Grafting

    Authors: Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei

    Abstract: Designing model architectures requires decisions such as selecting operators (e.g., attention, convolution) and configurations (e.g., depth, width). However, evaluating the impact of these decisions on model quality requires costly pretraining, limiting architectural investigation. Inspired by how new software is built on existing code, we ask: can new architecture designs be studied using pretrai… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: 22 pages; Project website: https://grafting.stanford.edu

  35. arXiv:2506.05184  [pdf, other

    cs.CV

    Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

    Authors: Neeraj Kumar, Swaraj Nanda, Siddharth Singi, Jamal Benhamida, David Kim, Jie-Fu Chen, Amir Momeni-Boroujeni, Gregory M. Goldgof, Gabriele Campanella, Chad Vanderbilt

    Abstract: Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs). However, adapting these pretrained PFMs for specific clinical tasks presents considerable challenges, primarily due to the availability of only weak (WSI-level) labels for gigapixel images, necessitating multiple instance learning (MIL) paradigm for effective WSI analysis. This paper proposes… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  36. arXiv:2506.04694  [pdf, ps, other

    cs.LG cs.AI

    Influence Functions for Edge Edits in Non-Convex Graph Neural Networks

    Authors: Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, Dongwoo Kim

    Abstract: Understanding how individual edges influence the behavior of graph neural networks (GNNs) is essential for improving their interpretability and robustness. Graph influence functions have emerged as promising tools to efficiently estimate the effects of edge deletions without retraining. However, existing influence prediction methods rely on strict convexity assumptions, exclusively consider the in… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  37. arXiv:2506.04653  [pdf, ps, other

    cs.LG

    The Oversmoothing Fallacy: A Misguided Narrative in GNN Research

    Authors: MoonJeong Park, Sunghyun Choi, Jaeseung Heo, Eunhyeok Park, Dongwoo Kim

    Abstract: Oversmoothing has been recognized as a main obstacle to building deep Graph Neural Networks (GNNs), limiting the performance. This position paper argues that the influence of oversmoothing has been overstated and advocates for a further exploration of deep GNN architectures. Given the three core operations of GNNs, aggregation, linear transformation, and non-linear activation, we show that prior s… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  38. arXiv:2506.03259  [pdf

    cs.CL

    Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems

    Authors: Michael E. Garcia-Alcoser, Mobina GhojoghNejad, Fakrul Islam Tushar, David Kim, Kyle J. Lafata, Geoffrey D. Rubin, Joseph Y. Lo

    Abstract: Purpose: This study aims to evaluate the effectiveness of large language models (LLMs) in automating disease annotation of CT radiology reports. We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis (CAP) CT reports. Materials and Methods: This retrospective study analyzed 40,833 CT reports from 29,540 pa… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 23 pages, 10 figures, to be submitted in Radiology: Artificial Intelligence

    ACM Class: I.2.7

  39. arXiv:2506.00910  [pdf, other

    cs.LG cs.AI

    PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

    Authors: Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dongseop Kim, Sung Ju Hwang

    Abstract: Knowledge distillation (KD) is a widely used framework for training compact, task-specific models by leveraging the knowledge of teacher models. However, its application to active learning (AL), which aims to minimize annotation costs through iterative sample selection, remains underexplored. This gap stems from the fact that KD typically assumes access to sufficient labeled data, whereas AL opera… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 35 pages, 30 figures

  40. arXiv:2506.00417  [pdf, ps, other

    cs.AI

    World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks

    Authors: Changyuan Zhao, Ruichen Zhang, Jiacheng Wang, Gaosheng Zhao, Dusit Niyato, Geng Sun, Shiwen Mao, Dong In Kim

    Abstract: World models are emerging as a transformative paradigm in artificial intelligence, enabling agents to construct internal representations of their environments for predictive reasoning, planning, and decision-making. By learning latent dynamics, world models provide a sample-efficient framework that is especially valuable in data-constrained or safety-critical scenarios. In this paper, we present a… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 7 pages, 4 figures

  41. arXiv:2506.00386  [pdf, ps, other

    cs.CL cs.HC

    Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees' Dialogue to Facilitate Nurse Communication Training

    Authors: Keyeun Lee, Seolhee Lee, Esther Hehsun Kim, Yena Ko, Jinsu Eun, Dahee Kim, Hyewon Cho, Haiyi Zhu, Robert E. Kraut, Eunyoung Suh, Eun-mee Kim, Hajin Lim

    Abstract: Effective communication training is essential to preparing nurses for high-quality patient care. While standardized patient (SP) simulations provide valuable experiential learning, they are often costly and inflexible. Virtual patient (VP) systems offer a scalable alternative, but most fail to adapt to the varying communication skills of trainees. In particular, when trainees respond ineffectively… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings, 34 pages, 9 figures

  42. arXiv:2506.00070  [pdf, ps, other

    cs.RO cs.AI

    Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

    Authors: Dongyoung Kim, Sumin Park, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, Younggyo Seo

    Abstract: Large Vision-Language Models (LVLMs) have recently shown great promise in advancing robotics by combining embodied reasoning with robot control. A common approach involves training on embodied reasoning tasks related to robot control using Supervised Fine-Tuning (SFT). However, SFT datasets are often heuristically constructed and not explicitly optimized for improving robot control. Furthermore, S… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

    Comments: 26 pages, 14 figures

  43. arXiv:2505.24209  [pdf, other

    cs.RO

    Safety-Aware Robust Model Predictive Control for Robotic Arms in Dynamic Environments

    Authors: Sanghyeon Nam, Dongmin Kim, Seung-Hwan Choi, Chang-Hyun Kim, Hyoeun Kwon, Hiroaki Kawamoto, Suwoong Lee

    Abstract: Robotic manipulators are essential for precise industrial pick-and-place operations, yet planning collision-free trajectories in dynamic environments remains challenging due to uncertainties such as sensor noise and time-varying delays. Conventional control methods often fail under these conditions, motivating the development of Robust MPC (RMPC) strategies with constraint tightening. In this pape… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted to the CASE 2025 conference

  44. arXiv:2505.23941  [pdf, ps, other

    cs.LG cs.CV

    Vision Language Models are Biased

    Authors: An Vo, Khai-Nguyen Nguyen, Mohammad Reza Taesiri, Vy Tuong Dang, Anh Totti Nguyen, Daeyoung Kim

    Abstract: Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that help them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answers. In this work, we test how the knowledge about popular subjects hurt the accuracy of vision language models (VLMs) on standard, objective visual tasks of counting and identification. We find that stat… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Code and qualitative examples are available at: vlmsarebiased.github.io

  45. arXiv:2505.23806  [pdf, ps, other

    cs.CL cs.AI

    MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation

    Authors: Sihyeon Lee, Hyunjoo Song, Jong-chan Lee, Yoon Jin Lee, Boram Lee, Hee-Eon Lim, Dongyeong Kim, Jinwook Seo, Bohyoung Kim

    Abstract: Deploying large language models (LLMs) in clinical settings faces critical trade-offs: cloud LLMs, with their extensive parameters and superior performance, pose risks to sensitive clinical data privacy, while local LLMs preserve privacy but often fail at complex clinical interpretation tasks. We propose MedOrchestra, a hybrid framework where a cloud LLM decomposes complex clinical tasks into mana… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  46. arXiv:2505.23006  [pdf, ps, other

    cs.CL cs.AI

    A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs

    Authors: Chiwan Park, Wonjun Jang, Daeryong Kim, Aelim Ahn, Kichang Yang, Woosung Hwang, Jihyeon Roh, Hyerin Park, Hyosun Wang, Min Seok Kim, Jihoon Kang

    Abstract: The advancement of Large Language Models (LLMs) has led to significant improvements in various service domains, including search, recommendation, and chatbot applications. However, applying state-of-the-art (SOTA) research to industrial settings presents challenges, as it requires maintaining flexible conversational abilities while also strictly complying with service-specific constraints. This ca… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025 Industry Track. 12 pages, 5 figures

    ACM Class: I.2.7

  47. arXiv:2505.22343  [pdf, ps, other

    eess.SP cs.AI

    Empowering Intelligent Low-altitude Economy with Large AI Model Deployment

    Authors: Zhonghao Lyu, Yulan Gao, Junting Chen, Hongyang Du, Jie Xu, Kaibin Huang, Dong In Kim

    Abstract: Low-altitude economy (LAE) represents an emerging economic paradigm that redefines commercial and social aerial activities. Large artificial intelligence models (LAIMs) offer transformative potential to further enhance the intelligence of LAE services. However, deploying LAIMs in LAE poses several challenges, including the significant gap between their computational/storage demands and the limited… ▽ More

    Submitted 3 July, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  48. arXiv:2505.20109  [pdf, other

    cs.CL cs.AI

    Language-Agnostic Suicidal Risk Detection Using Large Language Models

    Authors: June-Woo Kim, Wonkyo Oh, Haram Yoon, Sung-Hoon Yoon, Dae-Jin Kim, Dong-Ho Lee, Sang-Yeol Lee, Chan-Mo Yang

    Abstract: Suicidal risk detection in adolescents is a critical challenge, yet existing methods rely on language-specific models, limiting scalability and generalization. This study introduces a novel language-agnostic framework for suicidal risk assessment with large language models (LLMs). We generate Chinese transcripts from speech using an ASR model and then employ LLMs with prompt-based queries to extra… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted to InterSpeech 2025

  49. arXiv:2505.18817  [pdf, ps, other

    physics.comp-ph cs.AI

    High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction

    Authors: Seongsu Kim, Nayoung Kim, Dongwoo Kim, Sungsoo Ahn

    Abstract: Density functional theory (DFT) is a fundamental method for simulating quantum chemical properties, but it remains expensive due to the iterative self-consistent field (SCF) process required to solve the Kohn-Sham equations. Recently, deep learning methods are gaining attention as a way to bypass this step by directly predicting the Hamiltonian. However, they rely on deterministic regression and d… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  50. arXiv:2505.18545  [pdf, other

    cs.LG cs.CL

    B-score: Detecting biases in large language models using response history

    Authors: An Vo, Mohammad Reza Taesiri, Daeyoung Kim, Anh Totti Nguyen

    Abstract: Large language models (LLMs) often exhibit strong biases, e.g, against women or in favor of the number 7. We investigate whether LLMs would be able to output less biased answers when allowed to observe their prior answers to the same question in a multi-turn conversation. To understand which types of questions invite more biased answers, we test LLMs on our proposed set of questions that span 9 to… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025 (Main track)