Skip to main content

Showing 1–50 of 581 results for author: Kim, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.02722  [pdf, other

    cs.AI cs.LG

    Enhancing LLMs' Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry

    Authors: Junu Kim, Chaeeun Shim, Sungjin Park, Su Yeon Lee, Gee Young Suh, Chae-Man Lim, Seong Jin Choi, Song Mi Moon, Kyoung-Ho Song, Eu Suk Kim, Hong Bin Kim, Sejoong Kim, Chami Im, Dong-Wan Kang, Yong Soo Kim, Hee-Joon Bae, Sung Yoon Lim, Han-Gil Jeong, Edward Choi

    Abstract: Although large language models (LLMs) have demonstrated impressive reasoning capabilities across general domains, their effectiveness in real-world clinical practice remains limited. This is likely due to their insufficient exposure to real-world clinical data during training, as such data is typically not included due to privacy concerns. To address this, we propose enhancing the clinical reasoni… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  2. arXiv:2505.02499  [pdf, other

    cs.CR cs.IT

    An Efficient Hybrid Key Exchange Mechanism

    Authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Alejandro Cohen, Rafael G. L. D'Oliveira, Thomas Stahlbuhk, Muriel Médard

    Abstract: We present \textsc{CHOKE}, a novel code-based hybrid key-encapsulation mechanism (KEM) designed to securely and efficiently transmit multiple session keys simultaneously. By encoding $n$ independent session keys with an individually secure linear code and encapsulating each resulting coded symbol using a separate KEM, \textsc{CHOKE} achieves computational individual security -- each key remains se… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 7 pages, 2 figures

  3. arXiv:2505.01658  [pdf, other

    cs.CL

    A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

    Authors: Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee

    Abstract: Large language models (LLMs) are widely applied in chatbots, code generators, and search engines. Workloads such as chain-of-thought, complex reasoning, and agent services significantly increase the inference cost by invoking the model repeatedly. Optimization methods such as parallelism, compression, and caching have been adopted to reduce costs, but the diverse service requirements make it hard… ▽ More

    Submitted 8 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: Under review; 65 pages; 27 figures

  4. arXiv:2505.00333  [pdf, other

    cs.LG eess.SP

    Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models

    Authors: Bumjun Kim, Wan Choi

    Abstract: Transformer-based large language models (LLMs) have achieved remarkable success across various tasks. Yet, fine-tuning such massive models in federated learning (FL) settings poses significant challenges due to resource constraints and communication overhead. Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models. This pap… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  5. arXiv:2504.20924  [pdf, other

    cs.AI

    A Domain-Agnostic Scalable AI Safety Ensuring Framework

    Authors: Beomjun Kim, Kangyeon Kim, Sunwoo Kim, Heejin Ahn

    Abstract: Ensuring the safety of AI systems has recently emerged as a critical priority for real-world deployment, particularly in physical AI applications. Current approaches to AI safety typically address predefined domain-specific safety conditions, limiting their ability to generalize across contexts. We propose a novel AI safety framework that ensures AI systems comply with any user-defined constraint,… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: Theoretical supplementary material (Part 1) is available in submitted files. Experimental supplementary material (Part 2) will be available before May 22 23:59PM AOE

  6. arXiv:2504.18112  [pdf

    cs.CV

    Study on Real-Time Road Surface Reconstruction Using Stereo Vision

    Authors: Deepak Ghimire, Byoungjun Kim, Donghoon Kim, SungHwan Jeong

    Abstract: Road surface reconstruction plays a crucial role in autonomous driving, providing essential information for safe and smooth navigation. This paper enhances the RoadBEV [1] framework for real-time inference on edge devices by optimizing both efficiency and accuracy. To achieve this, we proposed to apply Isomorphic Global Structured Pruning to the stereo feature extraction backbone, reducing network… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Stereo Vision, Efficient CNN, Pruning, Optimization. 2025 Intelligent Information and Control Conference (IICC 2025), Jeonju, Korea

  7. An Addendum to NeBula: Towards Extending TEAM CoSTAR's Solution to Larger Scale Environments

    Authors: Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Sung-Kyun Kim, Muhammad Fadhil Ginting, Xianmei Lei, Jeffrey Edlund, Seyed Fakoorian, Amanda Bouman, Fernando Chavez, Taeyeon Kim, Gustavo J. Correa, Maira Saboia, Angel Santamaria-Navarro, Brett Lopez, Boseong Kim, Chanyoung Jung, Mamoru Sobue, Oriana Claudia Peltzer, Joshua Ott, Robert Trybula, Thomas Touma, Marcel Kaufmann, Tiago Stegun Vaquero , et al. (64 additional authors not shown)

    Abstract: This paper presents an appendix to the original NeBula autonomy solution developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), participating in the DARPA Subterranean Challenge. Specifically, this paper presents extensions to NeBula's hardware, software, and algorithmic components that focus on increasing the range and scale of the exploration environment. From the algorithm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Field Robotics, vol. 1, pp. 476-526, 2024

  8. arXiv:2504.11474  [pdf, other

    eess.IV cs.AI cs.CV

    Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD

    Authors: Byunggun Kim, Younghun Kwon

    Abstract: In modern society, Attention-Deficit/Hyperactivity Disorder (ADHD) is one of the common mental diseases discovered not only in children but also in adults. In this context, we propose a ADHD diagnosis transformer model that can effectively simultaneously find important brain spatiotemporal biomarkers from resting-state functional magnetic resonance (rs-fMRI). This model not only learns spatiotempo… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  9. arXiv:2504.11019  [pdf, other

    cs.CV

    DRIFT open dataset: A drone-derived intelligence for traffic analysis in urban environment

    Authors: Hyejin Lee, Seokjun Hong, Jeonghoon Song, Haechan Cho, Zhixiong Jin, Byeonghun Kim, Joobin Jin, Jaegyun Im, Byeongjoon Noh, Hwasoo Yeo

    Abstract: Reliable traffic data are essential for understanding urban mobility and developing effective traffic management strategies. This study introduces the DRone-derived Intelligence For Traffic analysis (DRIFT) dataset, a large-scale urban traffic dataset collected systematically from synchronized drone videos at approximately 250 meters altitude, covering nine interconnected intersections in Daejeon,… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: 30 pages, 15 figures

    ACM Class: I.2.10; I.4.8; H.2.8; J.7

  10. Migrating Code At Scale With LLMs At Google

    Authors: Celal Ziftci, Stoyan Nikolov, Anna Sjövall, Bo Kim, Daniele Codecasa, Max Kim

    Abstract: Developers often evolve an existing software system by making internal changes, called migration. Moving to a new framework, changing implementation to improve efficiency, and upgrading a dependency to its latest version are examples of migrations. Migration is a common and typically continuous maintenance task undertaken either manually or through tooling. Certain migrations are labor intensive… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  11. arXiv:2504.09522  [pdf, other

    cs.CL cs.AI

    How new data permeates LLM knowledge and how to dilute it

    Authors: Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler

    Abstract: Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  12. arXiv:2504.07729  [pdf, other

    cs.CV cs.AI

    Benchmarking Multi-Organ Segmentation Tools for Multi-Parametric T1-weighted Abdominal MRI

    Authors: Nicole Tran, Anisa Prasad, Yan Zhuang, Tejas Sudharshan Mathai, Boah Kim, Sydney Lewis, Pritam Mukherjee, Jianfei Liu, Ronald M. Summers

    Abstract: The segmentation of multiple organs in multi-parametric MRI studies is critical for many applications in radiology, such as correlating imaging biomarkers with disease status (e.g., cirrhosis, diabetes). Recently, three publicly available tools, such as MRSegmentator (MRSeg), TotalSegmentator MRI (TS), and TotalVibeSegmentator (VIBE), have been proposed for multi-organ segmentation in MRI. However… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Published at SPIE Medical Imaging 2025

  13. arXiv:2504.03716  [pdf, other

    cs.LG cs.AI cs.CY

    Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation

    Authors: Hannah Murray, Brian Hyeongseok Kim, Isabelle Lee, Jason Byun, Dani Yogatama, Evi Micha

    Abstract: Large Language Models (LLMs) are becoming ubiquitous, promising automation even in high-stakes scenarios. However, existing evaluation methods often fall short -- benchmarks saturate, accuracy-based metrics are overly simplistic, and many inherently ambiguous problems lack a clear ground truth. Given these limitations, evaluating fairness becomes complex. To address this, we reframe fairness evalu… ▽ More

    Submitted 29 March, 2025; originally announced April 2025.

  14. arXiv:2504.01274  [pdf, other

    q-bio.NC cs.CV

    BOLDSimNet: Examining Brain Network Similarity between Task and Resting-State fMRI

    Authors: Boseong Kim, Debashis Das Chakladar, Haejun Chung, Ikbeom Jang

    Abstract: Traditional causal connectivity methods in task-based and resting-state functional magnetic resonance imaging (fMRI) face challenges in accurately capturing directed information flow due to their sensitivity to noise and inability to model multivariate dependencies. These limitations hinder the effective comparison of brain networks between cognitive states, making it difficult to analyze network… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  15. arXiv:2504.00557  [pdf, other

    cs.CV cs.LG

    Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

    Authors: Jewon Lee, Ki-Ung Song, Seungmin Yang, Donguk Lim, Jaeyeon Kim, Wooksu Shin, Bo-Kyeong Kim, Yong Jae Lee, Tae-Ho Kim

    Abstract: Visual token reduction lowers inference costs caused by extensive image features in large vision-language models (LVLMs). Unlike relevant studies that prune tokens in self-attention-only LVLMs, our work uniquely addresses cross-attention-based models, which achieve superior performance. We identify that the key-value (KV) cache size for image tokens in cross-attention layers significantly exceeds… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: accepted at CVPR 2025 Workshop on ELVM

  16. Shaping the Future of VR Hand Interactions: Lessons Learned from Modern Methods

    Authors: ByungMin Kim, DongHeun Han, HyeongYeop Kang

    Abstract: In virtual reality, it is widely assumed that increased realism in hand-object interactions enhances user immersion and overall experience. However, recent studies challenge this assumption, suggesting that faithfully replicating real-world physics and visuals is not always necessary for improved usability or immersion. This has led to ambiguity for developers when choosing optimal hand interactio… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: Published in IEEE VR 2025

  17. arXiv:2503.23796   

    cs.CV

    On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices

    Authors: Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee

    Abstract: We present On-device Sora, the first model training-free solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. To address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices, the proposed On-device Sora applies three novel techniques to pre-trained video generative models. Firs… ▽ More

    Submitted 31 March, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: Replicated Submission. arXiv:2502.04363 submitted as second version of the paper

  18. arXiv:2503.22746  [pdf

    cs.CL cs.AI cs.CY

    Susceptibility of Large Language Models to User-Driven Factors in Medical Queries

    Authors: Kyung Ho Lim, Ujin Kang, Xiang Li, Jin Sung Kim, Young-Chul Jung, Sangjoon Park, Byung-Hoon Kim

    Abstract: Large language models (LLMs) are increasingly used in healthcare, but their reliability is heavily influenced by user-driven factors such as question phrasing and the completeness of clinical information. In this study, we examined how misinformation framing, source authority, model persona, and omission of key clinical details affect the diagnostic accuracy and reliability of LLM outputs. We cond… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  19. arXiv:2503.22674  [pdf, other

    cs.AI cs.CL cs.LG

    QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?

    Authors: Belinda Z. Li, Been Kim, Zi Wang

    Abstract: Recently, a large amount of work has focused on improving large language models' (LLMs') performance on reasoning benchmarks such as math and logic. However, past work has largely assumed that tasks are well-defined. In the real world, queries to LLMs are often underspecified, only solvable through acquiring missing information. We formalize this as a constraint satisfaction problem (CSP) with mis… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Code and dataset are available at \url{https://github.com/google-deepmind/questbench}

  20. arXiv:2503.22143  [pdf

    eess.SP cs.AI cs.CV cs.LG

    A Self-Supervised Learning of a Foundation Model for Analog Layout Design Automation

    Authors: Sungyu Jeong, Won Joon Choi, Junung Choi, Anik Biswas, Byungsub Kim

    Abstract: We propose a UNet-based foundation model and its self-supervised learning method to address two key challenges: 1) lack of qualified annotated analog layout data, and 2) excessive variety in analog layout design tasks. For self-supervised learning, we propose random patch sampling and random masking techniques automatically to obtain enough training data from a small unannotated layout dataset. Th… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 8 pages, 11 figures

  21. arXiv:2503.17417  [pdf, other

    cs.LG cs.AI

    Generative Modeling of Class Probability for Multi-Modal Representation Learning

    Authors: Jungkyoo Shin, Bumsoo Kim, Eunwoo Kim

    Abstract: Multi-modal understanding plays a crucial role in artificial intelligence by enabling models to jointly interpret inputs from different modalities. However, conventional approaches such as contrastive learning often struggle with modality discrepancies, leading to potential misalignments. In this paper, we propose a novel class anchor alignment approach that leverages class probability distributio… ▽ More

    Submitted 14 April, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: To appear in CVPR 2025 (Highlight)

  22. arXiv:2503.15855  [pdf, other

    cs.CV cs.AI

    VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

    Authors: Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim

    Abstract: We propose VideoRFSplat, a direct text-to-3D model leveraging a video generation model to generate realistic 3D Gaussian Splatting (3DGS) for unbounded real-world scenes. To generate diverse camera poses and unbounded spatial extent of real-world scenes, while ensuring generalization to arbitrary text prompts, previous methods fine-tune 2D generative models to jointly model camera poses and multi-… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Project page: https://gohyojun15.github.io/VideoRFSplat/

  23. arXiv:2503.12686  [pdf, other

    cs.LG cs.PL cs.SE

    Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?

    Authors: Jacqueline L. Mitchell, Brian Hyeongseok Kim, Chenyu Zhou, Chao Wang

    Abstract: LLMs have demonstrated impressive capabilities in code generation and comprehension, but their potential in being able to perform program analysis in a formal, automatic manner remains under-explored. To that end, we systematically investigate whether LLMs can reason about programs using a program analysis framework called abstract interpretation. We prompt LLMs to follow two different strategies,… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  24. arXiv:2503.12024  [pdf, other

    cs.CV

    SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

    Authors: Byeongjun Park, Hyojun Go, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim

    Abstract: Recent progress in 3D/4D scene generation emphasizes the importance of physical alignment throughout video generation and scene reconstruction. However, existing methods improve the alignment separately at each stage, making it difficult to manage subtle misalignments arising from another stage. Here, we present SteerX, a zero-shot inference-time steering method that unifies scene reconstruction i… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: Project page: https://byeongjun-park.github.io/SteerX/

  25. arXiv:2503.09975  [pdf, ps, other

    cs.AR

    Faster Inference of LLMs using FP8 on the Intel Gaudi

    Authors: Joonhyung Lee, Shmulik Markovich-Golan, Daniel Ohayon, Yair Hanani, Gunho Park, Byeongwook Kim, Asaf Karnieli, Uri Livne, Haihao Shen, Tai Huang, Se Jung Kwon, Dongsoo Lee

    Abstract: Low-precision data types are essential in modern neural networks during both training and inference as they enhance throughput and computational capacity by better exploiting available hardware resources. Despite the incorporation of FP8 in commercially available neural network accelerators, a comprehensive exposition of its underlying mechanisms, along with rigorous performance and accuracy evalu… ▽ More

    Submitted 16 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

  26. arXiv:2503.09650  [pdf, other

    cs.PF cs.AR

    A Review on Proprietary Accelerators for Large Language Models

    Authors: Sihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon

    Abstract: With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing. This paper discusses the necessity of LLM accelerators and provides a comprehensive analysis of the hardware and software characteristics of the main commercial LLM accelerators. Based on this analysis, we propose considerations for the development of… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 4 pages, accepted in AICompS 2024

  27. arXiv:2503.08136  [pdf, other

    cs.CV cs.AI cs.LG

    FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems

    Authors: Jeongsol Kim, Bryan Sangwoo Kim, Jong Chul Ye

    Abstract: Flow matching is a recent state-of-the-art framework for generative modeling based on ordinary differential equations (ODEs). While closely related to diffusion models, it provides a more general perspective on generative modeling. Although inverse problem solving has been extensively explored using diffusion models, it has not been rigorously examined within the broader context of flow models. Th… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  28. arXiv:2503.08061  [pdf, other

    cs.RO cs.GR cs.HC cs.LG

    ForceGrip: Reference-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

    Authors: DongHeun Han, Byungmin Kim, RoUn Lee, KyeongMin Kim, Hyoseok Hwang, HyeongYeop Kang

    Abstract: Realistic Hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users' intended force levels. We present ForceGrip, a deep l… ▽ More

    Submitted 30 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 11 pages, 11 figures. Accepted to SIGGRAPH Conference Papers '25. Project page: https://han-dongheun.github.io/ForceGrip

    Journal ref: SIGGRAPH Conference Papers '25, Vancouver, BC, Canada, August 10-14, 2025

  29. arXiv:2503.07390  [pdf, other

    cs.CV

    PersonaBooth: Personalized Text-to-Motion Generation

    Authors: Boeun Kim, Hea In Jeong, JungHoon Sung, Yihua Cheng, Jeongmin Lee, Ju Yong Chang, Sang-Il Choi, Younggeun Choi, Saim Shin, Jungho Kim, Hyung Jin Chang

    Abstract: This paper introduces Motion Personalization, a new task that generates personalized motions aligned with text descriptions using several basic motions containing Persona. To support this novel task, we introduce a new large-scale motion dataset called PerMo (PersonaMotion), which captures the unique personas of multiple actors. We also propose a multi-modal finetuning method of a pretrained motio… ▽ More

    Submitted 21 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  30. arXiv:2503.07216  [pdf, other

    cs.LG

    FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates

    Authors: Sangwoo Park, Seanie Lee, Byungjoo Kim, Sung Ju Hwang

    Abstract: Federated Learning (FL) is a widely used framework for training models in a decentralized manner, ensuring that the central server does not have direct access to data from local clients. However, this approach may still fail to fully preserve data privacy, as models from local clients are exposed to the central server during the aggregation process. This issue becomes even more critical when train… ▽ More

    Submitted 11 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Preprint

  31. arXiv:2503.01905  [pdf, other

    cs.LG cs.AI

    PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

    Authors: Sunghyeon Woo, Sol Namkung, Sunwoo Lee, Inho Jeong, Beomseok Kim, Dongsuk Jeon

    Abstract: Prior parameter-efficient fine-tuning (PEFT) algorithms reduce memory usage and computational costs of fine-tuning large neural network models by training only a few additional adapter parameters, rather than the entire model. However, the reduction in computational costs due to PEFT does not necessarily translate to a reduction in training time; although the computational costs of the adapter lay… ▽ More

    Submitted 11 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

  32. arXiv:2502.20843  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments

    Authors: Yoonyoung Cho, Junhyek Han, Jisu Han, Beomjoon Kim

    Abstract: For robots to operate in general environments like households, they must be able to perform non-prehensile manipulation actions such as toppling and rolling to manipulate ungraspable objects. However, prior works on non-prehensile manipulation cannot yet generalize across environments with diverse geometries. The main challenge lies in adapting to varying environmental constraints: within a cabine… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: http://unicorn-hamnet.github.io/

  33. arXiv:2502.18934  [pdf, other

    cs.CL cs.LG

    Kanana: Compute-efficient Bilingual Language Models

    Authors: Kanana LLM Team, Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon On, Seulye Baeg, Junrae Cho, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee , et al. (4 additional authors not shown)

    Abstract: We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat… ▽ More

    Submitted 28 February, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 40 pages, 15 figures

  34. arXiv:2502.18015  [pdf, other

    cs.RO

    $\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation

    Authors: Haewon Jung, Donguk Lee, Haecheol Park, JunHyeop Kim, Beomjoon Kim

    Abstract: Current robots struggle with long-horizon manipulation tasks requiring sequences of prehensile and non-prehensile skills, contact-rich interactions, and long-term reasoning. We present $\texttt{SPIN}$ ($\textbf{S}$kill $\textbf{P}$lanning to $\textbf{IN}$ference), a framework that distills a computationally intensive planning algorithm into a policy via imitation learning. We propose… ▽ More

    Submitted 7 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Project website: https://sites.google.com/view/skill-rrt

  35. arXiv:2502.17708  [pdf, other

    stat.AP cs.DL

    A Unified Model of Text and Citations for Topic-Specific Citation Networks

    Authors: ByungKoo Kim, Saki Kuzushima, Yuki Shiraito

    Abstract: Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), which… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    MSC Class: 62P25; 91C20; 62F15

  36. arXiv:2502.16908  [pdf, other

    cs.RO

    Design of a low-cost and lightweight 6 DoF bimanual arm for dynamic and contact-rich manipulation

    Authors: Jaehyung Kim, Jiho Kim, Dongryung Lee, Yujin Jang, Beomjoon Kim

    Abstract: Dynamic and contact-rich object manipulation, such as striking, snatching, or hammering, remains challenging for robotic systems due to hardware limitations. Most existing robots are constrained by high-inertia design, limited compliance, and reliance on expensive torque sensors. To address this, we introduce ARMADA (Affordable Robot for Manipulation and Dynamic Actions), a 6 degrees-of-freedom bi… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  37. arXiv:2502.11789  [pdf, other

    cs.CL

    Personality Editing for Language Models through Relevant Knowledge Editing

    Authors: Seojin Hwang, Yumin Kim, Byeongjeong Kim, Hwanhee Lee

    Abstract: Large Language Models (LLMs) play a vital role in applications like conversational agents and content creation, where controlling a model's personality is crucial for maintaining tone, consistency, and engagement. However, traditional prompt-based techniques for controlling personality often fall short, as they do not effectively mitigate the model's inherent biases. In this paper, we introduce a… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 15 pages, 3 figures, 16 tables

  38. arXiv:2502.11438  [pdf, other

    cs.CL

    SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

    Authors: Jimin Lee, Ingeol Baek, Byeongjeong Kim, Hwanhee Lee

    Abstract: Text-to-SQL aims to convert natural language questions into executable SQL queries. While previous approaches, such as skeleton-masked selection, have demonstrated strong performance by retrieving similar training examples to guide large language models (LLMs), they struggle in real-world scenarios where such examples are unavailable. To overcome this limitation, we propose Self-Augmentation in-co… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: 13 pages, 5 figures, 10 tables

  39. arXiv:2502.07586  [pdf, other

    cs.CL cs.AI

    We Can't Understand AI Using our Existing Vocabulary

    Authors: John Hewitt, Robert Geirhos, Been Kim

    Abstract: This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: Position paper

  40. arXiv:2502.06516  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation

    Authors: Soobin Um, Beomsu Kim, Jong Chul Ye

    Abstract: Minority samples are underrepresented instances located in low-density regions of a data manifold, and are valuable in many generative AI applications, such as data augmentation, creative content generation, etc. Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple y… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 29 pages, 11 figures

  41. arXiv:2502.04892  [pdf, other

    cs.LG q-bio.NC stat.ML

    A Foundational Brain Dynamics Model via Stochastic Optimal Control

    Authors: Joonhyeong Park, Byoungwoo Park, Chang-Bae Bang, Jungwon Choi, Hyungjin Chung, Byung-Hoon Kim, Juho Lee

    Abstract: We introduce a foundational model for brain dynamics that utilizes stochastic optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. To address computational limitations, we implement an approximation strategy grounded in the SOC framework. Additionally, we present a s… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: The first two authors contributed equally

  42. arXiv:2502.04363  [pdf, other

    cs.CV

    On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices

    Authors: Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee

    Abstract: We present On-device Sora, the first model training-free solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. To address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices, the proposed On-device Sora applies three novel techniques to pre-trained video generative models. Firs… ▽ More

    Submitted 31 March, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  43. arXiv:2502.04074  [pdf, other

    cs.CV

    3D Prior is All You Need: Cross-Task Few-shot 2D Gaze Estimation

    Authors: Yihua Cheng, Hengfei Wang, Zhongqun Zhang, Yang Yue, Bo Eun Kim, Feng Lu, Hyung Jin Chang

    Abstract: 3D and 2D gaze estimation share the fundamental objective of capturing eye movements but are traditionally treated as two distinct research domains. In this paper, we introduce a novel cross-task few-shot 2D gaze estimation approach, aiming to adapt a pre-trained 3D gaze estimation network for 2D gaze prediction on unseen devices using only a few training images. This task is highly challenging du… ▽ More

    Submitted 24 March, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: CVPR 2025

  44. arXiv:2502.03966  [pdf, other

    cs.CV cs.AI cs.LG

    MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation

    Authors: YoonJe Kang, Yonghoon Jung, Wonseop Shin, Bumsoo Kim, Sanghyun Seo

    Abstract: In this paper, we present synthetic data generation framework for flood hazard detection system. For high fidelity and quality, we characterize several real-world properties into virtual world and simulate the flood situation by controlling them. For the sake of efficiency, recent generative models in image-to-3D and urban city synthesis are leveraged to easily composite flood environments so that… ▽ More

    Submitted 13 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 6 pages, 6 figures. Accepted as Oral Presentation to AAAI 2025 Workshop on Good-Data

  45. arXiv:2502.03468  [pdf

    cs.CY cs.DL

    AI Governance in the Context of the EU AI Act: A Bibliometric and Literature Review Approach

    Authors: Byeong-Je Kim, Seunghoo Jeong, Bong-Kyung Cho, Ji-Bum Chung

    Abstract: The rapid advancement of artificial intelligence (AI) has brought about significant societal changes, necessitating robust AI governance frameworks. This study analyzed the research trends in AI governance within the framework of the EU AI Act. This study conducted a bibliometric analysis to examine the publications indexed in the Web of Science database. Our findings reveal that research on AI go… ▽ More

    Submitted 8 January, 2025; originally announced February 2025.

    Comments: 16 pages, 3 figures, 9 tables, submitted to IEEE Access

  46. arXiv:2502.02732  [pdf, other

    cs.LG cs.AI cs.CL

    Peri-LN: Revisiting Layer Normalization in the Transformer Architecture

    Authors: Jeonghoon Kim, Byeongchan Lee, Cheonbok Park, Yeontaek Oh, Beomjun Kim, Taehwan Yoo, Seongjin Shin, Dongyoon Han, Jinwoo Shin, Kang Min Yoo

    Abstract: Designing Transformer architectures with the optimal layer normalization (LN) strategy that ensures large-scale training stability and expedite convergence has remained elusive, even in this era of large language models (LLMs). To this end, we present a comprehensive analytical foundation for understanding how different LN strategies influence training dynamics in large-scale Transformer training.… ▽ More

    Submitted 6 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: Preprint

  47. arXiv:2502.01070  [pdf, other

    cs.LG cs.PF

    An Inquiry into Datacenter TCO for LLM Inference with FP8

    Authors: Jiwoo Kim, Joonhyung Lee, Gunho Park, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee, Youngjoo Lee

    Abstract: As large language models (LLMs) continue to scale, their inference demands present significant challenges, particularly due to the high power consumption of AI accelerators in datacenters. These facilities require specialized cooling and power management systems, substantially increasing the total cost of ownership (TCO) for cloud service providers (CSPs). In this work, we analyze the computationa… ▽ More

    Submitted 29 April, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  48. arXiv:2501.17683  [pdf, other

    cs.LG

    Temperature-Free Loss Function for Contrastive Learning

    Authors: Bum Jun Kim, Sang Woo Kim

    Abstract: As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a t… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: 10 pages, 5 figures

  49. arXiv:2501.15076  [pdf, other

    cs.CR cs.IT cs.LG

    Cryptanalysis via Machine Learning Based Information Theoretic Metrics

    Authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Muriel Médard

    Abstract: The fields of machine learning (ML) and cryptanalysis share an interestingly common objective of creating a function, based on a given set of inputs and outputs. However, the approaches and methods in doing so vary vastly between the two fields. In this paper, we explore integrating the knowledge from the ML domain to provide empirical evaluations of cryptosystems. Particularly, we utilize informa… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  50. arXiv:2501.14013  [pdf, other

    eess.IV cs.AI cs.CV

    Leveraging Multiphase CT for Quality Enhancement of Portal Venous CT: Utility for Pancreas Segmentation

    Authors: Xinya Wang, Tejas Sudharshan Mathai, Boah Kim, Ronald M. Summers

    Abstract: Multiphase CT studies are routinely obtained in clinical practice for diagnosis and management of various diseases, such as cancer. However, the CT studies can be acquired with low radiation doses, different scanners, and are frequently affected by motion and metal artifacts. Prior approaches have targeted the quality improvement of one specific CT phase (e.g., non-contrast CT). In this work, we h… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: ISBI 2025

    MSC Class: 92C55 ACM Class: I.4.6