Skip to main content

Showing 1–50 of 1,582 results for author: Kim, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26436  [pdf, ps, other

    cs.CV

    Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models

    Authors: Donghoon Kim, Dongyoung Lee, Ik Joon Chang, Sung-Ho Bae

    Abstract: Diffusion models achieve high-quality image generation but face deployment challenges due to their high computational requirements. Although 8-bit outlier-aware post-training quantization (PTQ) matches full-precision performance, extending PTQ to 4 bits remains challenging. Larger step sizes in 4-bit quantization amplify rounding errors in dense, low-magnitude activations, leading to the loss of f… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  2. arXiv:2509.25739  [pdf, ps, other

    cs.CV

    LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

    Authors: Donghwan Kim, Tae-Kyun Kim

    Abstract: We tackle the problem of Human Mesh Recovery (HMR) from a single RGB image, formulating it as an image-conditioned human pose and shape generation. While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output. Probabilistic methods attempt to address this by generating multiple plausible outputs to model the ambi… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 17 pages, 13 figures

  3. arXiv:2509.25504  [pdf, ps, other

    cs.HC cs.AI cs.GR cs.SE

    XR Blocks: Accelerating Human-centered AI + XR Innovation

    Authors: David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, David Kim, Ruofei Du

    Abstract: We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like JAX and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction p… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Report number: d343857f-8888-4790-b03c-664e952bf8b1 ACM Class: H.5.1; D.2.2; H.5.m; D.2.m

  4. arXiv:2509.24240  [pdf, ps, other

    cs.CR

    Takedown: How It's Done in Modern Coding Agent Exploits

    Authors: Eunkyu Lee, Donghyeon Kim, Wonyoung Kim, Insu Yun

    Abstract: Coding agents, which are LLM-driven agents specialized in software development, have become increasingly prevalent in modern programming environments. Unlike traditional AI coding assistants, which offer simple code completion and suggestions, modern coding agents tackle more complex tasks with greater autonomy, such as generating entire programs from natural language instructions. To enable such… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  5. arXiv:2509.24192  [pdf, ps, other

    cs.CV cs.AI

    Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection

    Authors: Sojung An, Kwanyong Park, Yong Jae Lee, Donghyun Kim

    Abstract: While vision-language models (VLMs) have made significant progress in multimodal perception (e.g., open-vocabulary object detection) with simple language queries, state-of-the-art VLMs still show limited ability to perceive complex queries involving descriptive attributes and relational clauses. Our in-depth analysis shows that these limitations mainly stem from text encoders in VLMs. Such text en… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 23 pages, 17 figures

  6. TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction

    Authors: Diane Kim, Minh Nguyen Nhat To, Sherif Abdalla, Teresa S. M. Tsang, Purang Abolmaesumi, and Christina Luong

    Abstract: Coronary angiography remains the gold standard for diagnosing Acute Coronary Syndrome (ACS). However, its resource-intensive and invasive nature can expose patients to procedural risks and diagnostic delays, leading to postponed treatment initiation. In this work, we introduce TREAT-Net, a multimodal deep learning framework for ACS treatment prediction that leverages non-invasive modalities, inclu… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 11 pages, 2 figures, MICCAI ASMUS 2025 paper

    Journal ref: Simplifying Medical Ultrasound (ASMUS 2025), LNCS 16165, Springer, 2026

  7. arXiv:2509.22263  [pdf, ps, other

    cs.LG

    Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

    Authors: Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha

    Abstract: Large language models trained on web-scale data can memorize private or sensitive knowledge, raising significant privacy risks. Although some unlearning methods mitigate these risks, they remain vulnerable to "relearning" during subsequent training, allowing a substantial portion of forgotten knowledge to resurface. In this paper, we show that widely used unlearning methods cause shallow alignment… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 15 pages

  8. arXiv:2509.21993  [pdf, ps, other

    cs.AI cs.LG

    Bilinear relational structure fixes reversal curse and enables consistent model editing

    Authors: Dong-Kyum Kim, Minsung Kim, Jea Kwon, Nakyeong Yang, Meeyoung Cha

    Abstract: The reversal curse -- a language model's (LM) inability to infer an unseen fact ``B is A'' from a learned fact ``A is B'' -- is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure e… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 9 pages

  9. arXiv:2509.21578  [pdf, ps, other

    cs.LG stat.ML

    Interpretable time series analysis with Gumbel dynamics

    Authors: Yiliu Wang, Timothy Doyeon Kim, Eric Shea-Brown, Uygar Sümbül

    Abstract: Switching dynamical systems can model complicated time series data while maintaining interpretability by inferring a finite set of dynamics primitives and explaining different portions of the observed time series with one of these primitives. However, due to the discrete nature of this set, such models struggle to capture smooth, variable-speed transitions, as well as stochastic mixtures of overla… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 15 pages, 5 figures

  10. arXiv:2509.20842  [pdf, ps, other

    cs.LG cs.AI

    Robust Multi-Omics Integration from Incomplete Modalities Significantly Improves Prediction of Alzheimer's Disease

    Authors: Sungjoon Park, Kyungwook Lee, Soorin Yim, Doyeong Hwang, Dongyun Kim, Soonyoung Lee, Amy Dunn, Daniel Gatti, Elissa Chesler, Kristen O'Connell, Kiyoung Kim

    Abstract: Multi-omics data capture complex biomolecular interactions and provide insights into metabolism and disease. However, missing modalities hinder integrative analysis across heterogeneous omics. To address this, we present MOIRA (Multi-Omics Integration with Robustness to Absent modalities), an early integration method enabling robust learning from incomplete omics data via representation alignment… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    ACM Class: I.2.1; J.3

  11. arXiv:2509.20783  [pdf, ps, other

    cs.LG cs.AI

    IConv: Focusing on Local Variation with Channel Independent Convolution for Multivariate Time Series Forecasting

    Authors: Gawon Lee, Hanbyeol Park, Minseop Kim, Dohee Kim, Hyerim Bae

    Abstract: Real-world time-series data often exhibit non-stationarity, including changing trends, irregular seasonality, and residuals. In terms of changing trends, recently proposed multi-layer perceptron (MLP)-based models have shown excellent performance owing to their computational efficiency and ability to capture long-term dependency. However, the linear nature of MLP architectures poses limitations wh… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: Submitted to AAAI

  12. arXiv:2509.20354  [pdf, ps, other

    cs.CL cs.AI

    EmbeddingGemma: Powerful and Lightweight Text Representations

    Authors: Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji , et al. (64 additional authors not shown)

    Abstract: We introduce EmbeddingGemma, a new lightweight, open text embedding model based on the Gemma 3 language model family. Our innovative training recipe strategically captures knowledge from larger models via encoder-decoder initialization and geometric embedding distillation. We improve model robustness and expressiveness with a spread-out regularizer, and ensure generalizability by merging checkpoin… ▽ More

    Submitted 28 September, 2025; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: 18 pages. Models are available in HuggingFace (at https://huggingface.co/collections/google/embeddinggemma-68b9ae3a72a82f0562a80dc4), Kaggle (at https://www.kaggle.com/models/google/embeddinggemma/), and Vertex AI (at https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/embeddinggemma)

  13. arXiv:2509.19651  [pdf, ps, other

    cs.NI

    RIS-assisted Data Collection and Wireless Power Transfer in Low-altitude Wireless Networks

    Authors: Wenwen Xie, Geng Sun, Jiahui Li, Jiacheng Wang, Yinqiu Liu, Dusit Niyato, Dong In Kim, Shiwen Mao

    Abstract: Low-altitude wireless networks (LAWNs) have become effective solutions for collecting data from low-power Internet-of-Things devices (IoTDs) in remote areas with limited communication infrastructure. However, some outdoor IoTDs deployed in such areas face both energy constraints and low-channel quality challenges, making it challenging to ensure timely data collection from these IoTDs in LAWNs. In… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  14. arXiv:2509.19087  [pdf, ps, other

    cs.CV

    Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications

    Authors: Ganesh Mallya, Yotam Gigi, Dahun Kim, Maxim Neumann, Genady Beryozkin, Tomer Shekel, Anelia Angelova

    Abstract: Multi-spectral imagery plays a crucial role in diverse Remote Sensing applications including land-use classification, environmental monitoring and urban planning. These images are widely adopted because their additional spectral bands correlate strongly with physical materials on the ground, such as ice, water, and vegetation. This allows for more accurate identification, and their public availabi… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  15. arXiv:2509.18771  [pdf, ps, other

    cs.AI

    Experience Scaling: Post-Deployment Evolution For Large Language Models

    Authors: Xingkun Yin, Kaibin Huang, Dong In Kim, Hongyang Du

    Abstract: Scaling model size, training data, and compute power have driven advances in large language models (LLMs), but these approaches are reaching saturation as human-generated text is exhausted and further gains diminish. We propose experience scaling, a framework for continuous post-deployment evolution for LLMs through autonomous interaction with the environment and collaborative sharing of accumulat… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  16. arXiv:2509.17452  [pdf, ps, other

    cs.CV cs.AI

    Training-Free Label Space Alignment for Universal Domain Adaptation

    Authors: Dujin Lee, Sojung An, Jungmyung Wi, Kuniaki Saito, Donghyun Kim

    Abstract: Universal domain adaptation (UniDA) transfers knowledge from a labeled source domain to an unlabeled target domain, where label spaces may differ and the target domain may contain private classes. Previous UniDA methods primarily focused on visual space alignment but often struggled with visual ambiguities due to content differences, which limited their robustness and generalizability. To overcome… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: 22 pages, 12 figures

  17. arXiv:2509.15922  [pdf, ps, other

    cs.SD eess.AS

    DISPATCH: Distilling Selective Patches for Speech Enhancement

    Authors: Dohwan Kim, Jung-Woo Choi

    Abstract: In speech enhancement, knowledge distillation (KD) compresses models by transferring a high-capacity teacher's knowledge to a compact student. However, conventional KD methods train the student to mimic the teacher's output entirely, which forces the student to imitate the regions where the teacher performs poorly and to apply distillation to the regions where the student already performs well, wh… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: submitted to ICASSP 2026

  18. arXiv:2509.15234  [pdf, ps, other

    cs.CV

    Exploring the Capabilities of LLM Encoders for Image-Text Retrieval in Chest X-rays

    Authors: Hanbin Ko, Gihun Cho, Inhyeok Baek, Donguk Kim, Joonbeom Koo, Changi Kim, Dongheon Lee, Chang Min Park

    Abstract: Vision-language pretraining has advanced image-text alignment, yet progress in radiology remains constrained by the heterogeneity of clinical reports, including abbreviations, impression-only notes, and stylistic variability. Unlike general-domain settings where more data often leads to better performance, naively scaling to large collections of noisy reports can plateau or even degrade model lear… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: 24 pages, 2 figures, under review

    MSC Class: 68T07; 68U10; 92C55 ACM Class: I.2.10; I.2.7

  19. arXiv:2509.14589  [pdf, ps, other

    cs.CR cs.AI

    ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System

    Authors: Taesoo Kim, HyungSeok Han, Soyeon Park, Dae R. Jeong, Dohyeok Kim, Dongkwan Kim, Eunsoo Kim, Jiho Kim, Joshua Wang, Kangsu Kim, Sangwoo Ji, Woosun Song, Hanqing Zhao, Andrew Chin, Gyejin Lee, Kevin Stevens, Mansour Alharthi, Yizhuo Zhai, Cen Zhang, Joonun Jang, Yeongjin Jang, Ammar Askar, Dongju Kim, Fabian Fleischer, Jeongin Cho , et al. (21 additional authors not shown)

    Abstract: We present ATLANTIS, the cyber reasoning system developed by Team Atlanta that won 1st place in the Final Competition of DARPA's AI Cyber Challenge (AIxCC) at DEF CON 33 (August 2025). AIxCC (2023-2025) challenged teams to build autonomous cyber reasoning systems capable of discovering and patching vulnerabilities at the speed and scale of modern software. ATLANTIS integrates large language models… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: Version 1.0 (September 17, 2025). Technical Report. Team Atlanta -- 1st place in DARPA AIxCC Final Competition. Project page: https://team-atlanta.github.io/

  20. arXiv:2509.13535  [pdf, ps, other

    cs.SE

    Crash Report Enhancement with Large Language Models: An Empirical Study

    Authors: S M Farah Al Fahim, Md Nakhla Rafi, Zeyang Ma, Dong Jae Kim, Tse-Hsun, Chen

    Abstract: Crash reports are central to software maintenance, yet many lack the diagnostic detail developers need to debug efficiently. We examine whether large language models can enhance crash reports by adding fault locations, root-cause explanations, and repair suggestions. We study two enhancement strategies: Direct-LLM, a single-shot approach that uses stack-trace context, and Agentic-LLM, an iterative… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  21. arXiv:2509.13200  [pdf, ps, other

    cs.RO

    StageACT: Stage-Conditioned Imitation for Robust Humanoid Door Opening

    Authors: Moonyoung Lee, Dong Ki Kim, Jai Krishna Bandi, Max Smith, Aileen Liao, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei

    Abstract: Humanoid robots promise to operate in everyday human environments without requiring modifications to the surroundings. Among the many skills needed, opening doors is essential, as doors are the most common gateways in built spaces and often limit where a robot can go. Door opening, however, poses unique challenges as it is a long-horizon task under partial observability, such as reasoning about th… ▽ More

    Submitted 18 September, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: 7 pages

  22. arXiv:2509.12716  [pdf, ps, other

    cs.NI cs.AI

    Joint AoI and Handover Optimization in Space-Air-Ground Integrated Network

    Authors: Zifan Lang, Guixia Liu, Geng Sun, Jiahui Li, Jiacheng Wang, Weijie Yuan, Dusit Niyato, Dong In Kim

    Abstract: Despite the widespread deployment of terrestrial networks, providing reliable communication services to remote areas and maintaining connectivity during emergencies remains challenging. Low Earth orbit (LEO) satellite constellations offer promising solutions with their global coverage capabilities and reduced latency, yet struggle with intermittent coverage and limited communication windows due to… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  23. arXiv:2509.12649  [pdf, ps, other

    cs.CR cs.AI

    A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

    Authors: Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim

    Abstract: Code-generating Large Language Models (LLMs) significantly accelerate software development. However, their frequent generation of insecure code presents serious risks. We present a comprehensive evaluation of seven parameter-efficient fine-tuning (PEFT) techniques, demonstrating substantial gains in secure code generation without compromising functionality. Our research identifies prompt-tuning as… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: 25 pages

  24. arXiv:2509.10825  [pdf, ps, other

    cs.LG cs.AI stat.ML

    FACTORS: Factorial Approximation for Complementary Two-factor Optimization with Risk-aware Scoring

    Authors: Dongseok Kim, Wonjun Jeong, Gisung Oh

    Abstract: We propose FACTORS, a framework that combines design of experiments with Shapley decomposition to address performance and stability issues that are sensitive to combinations of training factors. Our approach consistently estimates main effects and two-factor interactions, then integrates them into a risk-adjusted objective function that jointly accounts for uncertainty and cost, enabling reliable… ▽ More

    Submitted 13 September, 2025; originally announced September 2025.

    Comments: 43 pages, 8 figures

  25. arXiv:2509.09255  [pdf, ps, other

    cs.HC

    Sensible Agent: A Framework for Unobtrusive Interaction with Proactive AR Agents

    Authors: Geonsun Lee, Min Xia, Nels Numan, Xun Qian, David Li, Yanhe Chen, Achin Kulshrestha, Ishan Chatterjee, Yinda Zhang, Dinesh Manocha, David Kim, Ruofei Du

    Abstract: Proactive AR agents promise context-aware assistance, but their interactions often rely on explicit voice prompts or responses, which can be disruptive or socially awkward. We introduce Sensible Agent, a framework designed for unobtrusive interaction with these proactive agents. Sensible Agent dynamically adapts both "what" assistance to offer and, crucially, "how" to deliver it, based on real-tim… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

  26. arXiv:2509.07979  [pdf, ps, other

    cs.CV

    Visual Representation Alignment for Multimodal Large Language Models

    Authors: Heeji Yoon, Jaewoo Jung, Junwan Kim, Hyungyu Choi, Heeseong Shin, Sangbeom Lim, Honggyu An, Chaehyun Kim, Jisang Han, Donghyun Kim, Chanho Eom, Sunghwan Hong, Seungryong Kim

    Abstract: Multimodal large language models (MLLMs) trained with visual instruction tuning have achieved strong performance across diverse tasks, yet they remain limited in vision-centric tasks such as object counting or spatial reasoning. We attribute this gap to the prevailing text-only supervision paradigm, which provides only indirect guidance for the visual pathway and often leads MLLMs to discard fine-… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: Project Page: https://cvlab-kaist.github.io/VIRAL/

  27. arXiv:2509.07530  [pdf, ps, other

    cs.CV

    Universal Few-Shot Spatial Control for Diffusion Models

    Authors: Kiet T. Nguyen, Chanhuyk Lee, Donggyun Kim, Dong Hoon Lee, Seunghoon Hong

    Abstract: Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal F… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  28. A Decade-long Landscape of Advanced Persistent Threats: Longitudinal Analysis and Global Trends

    Authors: Shakhzod Yuldoshkhujaev, Mijin Jeon, Doowon Kim, Nick Nikiforakis, Hyungjoon Koo

    Abstract: An advanced persistent threat (APT) refers to a covert, long-term cyberattack, typically conducted by state-sponsored actors, targeting critical sectors and often remaining undetected for long periods. In response, collective intelligence from around the globe collaborates to identify and trace surreptitious activities, generating substantial documentation on APT campaigns publicly available on th… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: 18 pages, 13 figures (including subfigures), 11 tables. To appear in the Proceedings of the ACM Conference on Computer and Communications Security (CCS) 2025

  29. arXiv:2509.04815  [pdf, ps, other

    cs.LG cs.MA

    An Arbitration Control for an Ensemble of Diversified DQN variants in Continual Reinforcement Learning

    Authors: Wonseo Jang, Dongjae Kim

    Abstract: Deep reinforcement learning (RL) models, despite their efficiency in learning an optimal policy in static environments, easily loses previously learned knowledge (i.e., catastrophic forgetting). It leads RL models to poor performance in continual reinforcement learning (CRL) scenarios. To address this, we present an arbitration control mechanism over an ensemble of RL agents. It is motivated by an… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: 8 pages, 8 figures

  30. arXiv:2509.04602  [pdf, ps, other

    cs.CV

    Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning

    Authors: MinJu Jeon, Si-Woo Kim, Ye-Chan Kim, HyunGee Kim, Dong-Jin Kim

    Abstract: Dense video captioning aims to temporally localize events in video and generate captions for each event. While recent works propose end-to-end models, they suffer from two limitations: (1) applying timestamp supervision only to text while treating all video frames equally, and (2) retrieving captions from fixed-size video chunks, overlooking scene transitions. To address these, we propose Sali4Vid… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

    Comments: Accepted in EMNLP 2025

  31. arXiv:2509.03972  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Expanding Foundational Language Capabilities in Open-Source LLMs through a Korean Case Study

    Authors: Junghwan Lim, Gangwon Jo, Sungmin Lee, Jiyoung Park, Dongseok Kim, Jihwan Kim, Junhyeok Lee, Wai Ting Cheung, Dahye Choi, Kibong Choi, Jaeyeon Huh, Beomgyu Kim, Jangwoong Kim, Taehyun Kim, Haesol Lee, Jeesoo Lee, Dongpin Oh, Changseok Song, Daewon Suh

    Abstract: We introduce Llama-3-Motif, a language model consisting of 102 billion parameters, specifically designed to enhance Korean capabilities while retaining strong performance in English. Developed on the Llama 3 architecture, Llama-3-Motif employs advanced training techniques, including LlamaPro and Masked Structure Growth, to effectively scale the model without altering its core Transformer architect… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  32. arXiv:2509.03426  [pdf, ps, other

    cs.CV

    Time-Scaling State-Space Models for Dense Video Captioning

    Authors: AJ Piergiovanni, Ganesh Satish Mallya, Dahun Kim, Anelia Angelova

    Abstract: Dense video captioning is a challenging video understanding task which aims to simultaneously segment the video into a sequence of meaningful consecutive events and to generate detailed captions to accurately describe each event. Existing methods often encounter difficulties when working with the long videos associated with dense video captioning, due to the computational complexity and memory lim… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

    Comments: BMVC 2025

  33. arXiv:2509.03049  [pdf, ps, other

    cs.NI eess.SY

    Multi-layer Digital Twin System for Future Mobile Metaverse

    Authors: Gaosheng Zhao, Dong In Kim

    Abstract: In the upcoming 6G era, the communication networks are expected to face unprecedented challenges in terms of complexity and dynamics. Digital Twin (DT) technology, with its various digital capabilities, holds great potential to facilitate the transformation of the communication network from passive responding to proactive adaptation. Thus, in this paper, we propose a multi-layer DT system that coo… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

    Comments: This article has been accepted for publication in IEEE Wireless Communications

  34. arXiv:2509.02391  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

    Authors: Dongseok Kim, Wonjun Jeong, Gisung Oh

    Abstract: The success of Federated Learning depends on the actions that participants take out of sight. We model Federated Learning not as a mere optimization task but as a strategic system entangled with rules and incentives. From this perspective, we present an analytical framework that makes it possible to clearly identify where behaviors that genuinely improve performance diverge from those that merely… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: 51 pages, 7 figures

  35. arXiv:2509.01324  [pdf, ps, other

    cs.CL

    KoBLEX: Open Legal Question Answering with Multi-hop Reasoning

    Authors: Jihyung Lee, Daehui Kim, Seonjeong Hwang, Hyounghun Kim, Gary Lee

    Abstract: Large Language Models (LLM) have achieved remarkable performances in general domains and are now extending into the expert domain of law. Several benchmarks have been proposed to evaluate LLMs' legal capabilities. However, these benchmarks fail to evaluate open-ended and provision-grounded Question Answering (QA). To address this, we introduce a Korean Benchmark for Legal EXplainable QA (KoBLEX),… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: EMNLP 2025 Main Conference

  36. arXiv:2509.01182  [pdf, ps, other

    cs.AI cs.CL cs.HC cs.IR cs.MA

    Question-to-Knowledge: Multi-Agent Generation of Inspectable Facts for Product Mapping

    Authors: Wonduk Seo, Taesub Shin, Hyunjin An, Dokyun Kim, Seunghyun Lee

    Abstract: Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names vary widely across platforms. Rule based heuristics and keyword similarity often misclassify products by overlooking subtle distinctions in brand, specification, or bundle configuration. To overcome these limi… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: Preprint

  37. arXiv:2508.19870  [pdf, ps, other

    cs.NI

    Secure Multi-LLM Agentic AI and Agentification for Edge General Intelligence by Zero-Trust: A Survey

    Authors: Yinqiu Liu, Ruichen Zhang, Haoxiang Luo, Yijing Lin, Geng Sun, Dusit Niyato, Hongyang Du, Zehui Xiong, Yonggang Wen, Abbas Jamalipour, Dong In Kim, Ping Zhang

    Abstract: Agentification serves as a critical enabler of Edge General Intelligence (EGI), transforming massive edge devices into cognitive agents through integrating Large Language Models (LLMs) and perception, reasoning, and acting modules. These agents collaborate across heterogeneous edge infrastructures, forming multi-LLM agentic AI systems that leverage collective intelligence and specialized capabilit… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: 35 pages

  38. arXiv:2508.19649  [pdf, ps, other

    cs.CV

    IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising

    Authors: Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali, Tae Hyun Kim

    Abstract: Image denoising is a fundamental challenge in computer vision, with applications in photography and medical imaging. While deep learning-based methods have shown remarkable success, their reliance on specific noise distributions limits generalization to unseen noise types and levels. Existing approaches attempt to address this with extensive training data and high computational resources but they… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: ICCV 2025. Project Page: https://dongjinkim9.github.io/projects/idf/

  39. arXiv:2508.18859  [pdf, ps, other

    cs.CV

    Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization

    Authors: Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim, Tae Hyun Kim, Vivek Gupta, Haonan Luo, Tianrui Li

    Abstract: Video stabilization remains a fundamental problem in computer vision, particularly pixel-level synthesis solutions for video stabilization, which synthesize full-frame outputs, add to the complexity of this task. These methods aim to enhance stability while synthesizing full-frame videos, but the inherent diversity in motion profiles and visual content present in each video sequence makes robust g… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  40. arXiv:2508.18734  [pdf, ps, other

    cs.CV cs.AI cs.MM eess.AS eess.SP

    Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion

    Authors: DongHoon Lim, YoungChae Kim, Dong-Hyun Kim, Da-Hee Yang, Joon-Hyuk Chang

    Abstract: Robust audio-visual speech recognition (AVSR) in noisy environments remains challenging, as existing systems struggle to estimate audio reliability and dynamically adjust modality reliance. We propose router-gated cross-modal feature fusion, a novel AVSR framework that adaptively reweights audio and visual features based on token-level acoustic corruption scores. Using an audio-visual feature fusi… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: Accepted to IEEE ASRU 2025

  41. arXiv:2508.18725  [pdf, ps, other

    cs.NI cs.IT

    Toward Edge General Intelligence with Agentic AI and Agentification: Concepts, Technologies, and Future Directions

    Authors: Ruichen Zhang, Guangyuan Liu, Yinqiu Liu, Changyuan Zhao, Jiacheng Wang, Yunting Xu, Dusit Niyato, Jiawen Kang, Yonghui Li, Shiwen Mao, Sumei Sun, Xuemin Shen, Dong In Kim

    Abstract: The rapid expansion of sixth-generation (6G) wireless networks and the Internet of Things (IoT) has catalyzed the evolution from centralized cloud intelligence towards decentralized edge general intelligence. However, traditional edge intelligence methods, characterized by static models and limited cognitive autonomy, fail to address the dynamic, heterogeneous, and resource-constrained scenarios i… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  42. arXiv:2508.18661  [pdf, ps, other

    cs.IR

    Extracting Information from Scientific Literature via Visual Table Question Answering Models

    Authors: Dongyoun Kim, Hyung-do Choi, Youngsun Jang, John Kim

    Abstract: This study explores three approaches to processing table data in scientific papers to enhance extractive question answering and develop a software tool for the systematic review process. The methods evaluated include: (1) Optical Character Recognition (OCR) for extracting information from documents, (2) Pre-trained models for document visual question answering, and (3) Table detection and structur… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: Accepted at ACM International Conference on Research in Adaptive and Convergent Systems, November 5-8, 2024, Pompei, Italy

    Journal ref: Proceedings of the ACM International Conference on Research in Adaptive and Convergent Systems (RACS 24), November 5-8, 2024, Pompei, Italy. ACM

  43. arXiv:2508.18039  [pdf, ps, other

    cs.RO eess.SY

    Modeling and Control Framework for Autonomous Space Manipulator Handover Operations

    Authors: Diego Quevedo, Sarah Hudson, Donghoon Kim

    Abstract: Autonomous space robotics is poised to play a vital role in future space missions, particularly for In-space Servicing, Assembly, and Manufacturing (ISAM). A key capability in such missions is the Robot-to-Robot (R2R) handover of mission-critical objects. This work presents a dynamic model of a dual-arm space manipulator system and compares various tracking control laws. The key contributions of t… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: 14 pages, submitted to 2025 Astrodynamics Specialists Conference proceedings

  44. arXiv:2508.17412  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Convergence and Generalization of Anti-Regularization for Parametric Models

    Authors: Dongseok Kim, Wonjun Jeong, Gisung Oh

    Abstract: Anti-regularization introduces a reward term with a reversed sign into the loss function, deliberately amplifying model expressivity in small-sample regimes while ensuring that the intervention gradually vanishes as the sample size grows through a power-law decay schedule. We formalize spectral safety conditions and trust-region constraints, and we design a lightweight safeguard that combines a pr… ▽ More

    Submitted 7 September, 2025; v1 submitted 24 August, 2025; originally announced August 2025.

    Comments: v2: Clarity edits; toned-down phrasing; figures replaced by tables; results, formulas, reproducibility unchanged

  45. arXiv:2508.15838  [pdf, ps, other

    cs.NI cs.GT eess.SY

    Safeguarding ISAC Performance in Low-Altitude Wireless Networks Under Channel Access Attack

    Authors: Jiacheng Wang, Jialing He, Geng Sun, Zehui Xiong, Dusit Niyato, Shiwen Mao, Dong In Kim, Tao Xiang

    Abstract: The increasing saturation of terrestrial resources has driven the exploration of low-altitude applications such as air taxis. Low altitude wireless networks (LAWNs) serve as the foundation for these applications, and integrated sensing and communication (ISAC) constitutes one of the core technologies within LAWNs. However, the openness nature of low-altitude airspace makes LAWNs vulnerable to mali… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

  46. arXiv:2508.15732  [pdf, ps, other

    cs.RO eess.SY

    Understanding and Utilizing Dynamic Coupling in Free-Floating Space Manipulators for On-Orbit Servicing

    Authors: Gargi Das, Daegyun Choi, Donghoon Kim

    Abstract: This study proposes a dynamic coupling-informed trajectory optimization algorithm for free-floating space manipulator systems (SMSs). Dynamic coupling between the base and the manipulator arms plays a critical role in influencing the system's behavior. While prior research has predominantly focused on minimizing this coupling, often overlooking its potential advantages, this work investigates how… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: 17 pages, 7 figures, 2025 AAS/AIAA Astrodynamics Specialist Conference

  47. arXiv:2508.15268  [pdf, ps, other

    cs.NI

    Toward Autonomous Digital Populations for Communication-Sensing-Computation Ecosystem

    Authors: Gaosheng Zhao, Dong In Kim

    Abstract: Future communication networks are expected to achieve deep integration of communication, sensing, and computation, forming a tightly coupled and autonomously operating infrastructure system. However, current reliance on centralized control, static design, and human intervention continues to constrain the multidimensional evolution of network functions and applications, limiting adaptability and re… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

  48. arXiv:2508.15258  [pdf, ps, other

    cs.HC

    Spatio-Temporal Mixed and Augmented Reality Experience Description for Interactive Playback

    Authors: Dooyoung Kim, Woontack Woo

    Abstract: We propose the Spatio-Temporal Mixed and Augmented Reality Experience Description (MAR-ED), a novel framework to standardize the representation of past events for interactive and adaptive playback in a user's present physical space. While current spatial media technologies have primarily focused on capturing or replaying content as static assets, often disconnected from the viewer's environment or… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: 4 pages, 2 figures, Accepted in the IEEE ISMAR 2025 XRStand Workshop

  49. arXiv:2508.14258  [pdf, ps, other

    cs.RO physics.bio-ph

    Adapting Biological Reflexes for Dynamic Reorientation in Space Manipulator Systems

    Authors: Daegyun Choi, Alhim Vera, Donghoon Kim

    Abstract: Robotic arms mounted on spacecraft, known as space manipulator systems (SMSs), are critical for enabling on-orbit assembly, satellite servicing, and debris removal. However, controlling these systems in microgravity remains a significant challenge due to the dynamic coupling between the manipulator and the spacecraft base. This study explores the potential of using biological inspiration to addres… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: 18 pages, 11 figures, 2025 AAS/AIAA Astrodynamics Specialist Conference

  50. arXiv:2508.14138  [pdf, ps, other

    cs.LG cs.AI cs.CV cs.NE

    STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers

    Authors: Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Brent ByungHoon Kang, Hyeongboo Baek

    Abstract: Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adap… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: 8 pages