Skip to main content

Showing 1–50 of 2,627 results for author: Lee, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06125  [pdf, ps, other

    cs.LG cs.AI

    Subspace-based Approximate Hessian Method for Zeroth-Order Optimization

    Authors: Dongyoon Kim, Sungjae Lee, Wonjin Lee, Kwang In Kim

    Abstract: Zeroth-order optimization addresses problems where gradient information is inaccessible or impractical to compute. While most existing methods rely on first-order approximations, incorporating second-order (curvature) information can, in principle, significantly accelerate convergence. However, the high cost of function evaluations required to estimate Hessian matrices often limits practical appli… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 20 pages, 8 figures

  2. arXiv:2507.05686  [pdf, ps, other

    cs.CL

    Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs

    Authors: SeungWon Ji, Jungyup Lee, Jemin Kim, Sang Park, SeungJae Lee

    Abstract: Multilingual large language models (LLMs) often exhibit language confusion, a tendency to generate responses in a dominant language irrespective of the prompt's language. To address this, we propose Smoothie-Qwen, a lightweight, post-hoc method that mitigates language bias without retraining. This technique selectively adjusts token-level output probabilities to effectively suppress undesired lang… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  3. arXiv:2507.05556  [pdf, ps, other

    cs.AR cs.CR

    Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads

    Authors: Jumin Kim, Seungmin Baek, Minbok Wi, Hwayong Nam, Michael Jaemin Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

    Abstract: Per-Row Activation Counting (PRAC), a DRAM read disturbance mitigation method, modifies key DRAM timing parameters, reportedly causing significant performance overheads in simulator-based studies. However, given known discrepancies between simulators and real hardware, real-machine experiments are vital for accurate PRAC performance estimation. We present the first real-machine performance analysi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 4 pages, 4 figures, to appear at IEEE Computer Architecture Letters

  4. arXiv:2507.05418  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

    Authors: Jaedong Hwang, Kumar Tanmay, Seok-Jin Lee, Ayush Agrawal, Hamid Palangi, Kumar Ayush, Ila Fiete, Paul Pu Liang

    Abstract: Large Language Models (LLMs) have achieved strong performance in domains like mathematics, factual QA, and code generation, yet their multilingual reasoning capabilities in these tasks remain underdeveloped. Especially for low-resource languages such as Swahili or Thai, LLMs can often misinterpret prompts or default to reasoning in English. This implicit bias toward high-resource languages undermi… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  5. arXiv:2507.04748  [pdf, ps, other

    cs.AI

    LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

    Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko

    Abstract: Question-answering (QA) interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC system insights, particularly for non-expert users. However, enabling accurate, real-time, and context-aware interactions with HVAC systems introduces unique challenges, including the integration of frequently updated sensor data, domain-specific knowledge… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  6. arXiv:2507.04660  [pdf, ps, other

    eess.IV cs.CV

    CP-Dilatation: A Copy-and-Paste Augmentation Method for Preserving the Boundary Context Information of Histopathology Images

    Authors: Sungrae Hong, Sol Lee, Mun Yong Yi

    Abstract: Medical AI diagnosis including histopathology segmentation has derived benefits from the recent development of deep learning technology. However, deep learning itself requires a large amount of training data and the medical image segmentation masking, in particular, requires an extremely high cost due to the shortage of medical specialists. To mitigate this issue, we propose a new data augmentatio… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 5 pages, 5 figures

  7. arXiv:2507.04388  [pdf, ps, other

    cs.CV

    Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers

    Authors: Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, Seong-Whan Lee

    Abstract: The feature attribution method reveals the contribution of input variables to the decision-making process to provide an attribution map for explanation. Existing methods grounded on the information bottleneck principle compute information in a specific layer to obtain attributions, compressing the features by injecting noise via a parametric damping ratio. However, the attribution obtained in a sp… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: CVPR 2025 (highlight)

  8. Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

    Authors: Nayeon Kim, Eojin Jeon, Jun-Hyung Park, SangKeun Lee

    Abstract: In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme i… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Journal ref: Advances in Knowledge Discovery and Data Mining. PAKDD 2025

  9. arXiv:2507.04014  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition

    Authors: Kyuhee Kim, Sangah Lee

    Abstract: As large language models (LLMs) become key advisors in various domains, their cultural sensitivity and reasoning skills are crucial in multicultural environments. We introduce Nunchi-Bench, a benchmark designed to evaluate LLMs' cultural understanding, with a focus on Korean superstitions. The benchmark consists of 247 questions spanning 31 topics, assessing factual knowledge, culturally appropria… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  10. arXiv:2507.03328  [pdf, ps, other

    cs.SE

    scikit-package -- software packaging standards and roadmap for sharing reproducible scientific software

    Authors: S. Lee, C. Myers, A. Yang, T. Zhang, S. J. L. Billinge

    Abstract: Scientific advancement relies on the ability to share and reproduce results. When data analysis or calculations are carried out using software written by scientists there are special challenges around code versions, quality and code sharing. scikit-package provides a roadmap to facilitate code reuse and sharing with minimal effort through tutorials coupled with automated and centralized reusable w… ▽ More

    Submitted 8 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: GitHub: https://github.com/scikit-package/scikit-package Doc: https://scikit-package.github.io/scikit-package/

  11. arXiv:2507.03114  [pdf, ps, other

    cs.DC

    Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications

    Authors: Seonho Lee, Jihwan Oh, Junkyum Kim, Seokjin Go, Jongse Park, Divya Mahajan

    Abstract: This paper provides an in-depth characterization of GPU-accelerated systems, to understand the interplay between overlapping computation and communication which is commonly employed in distributed training settings. Due to the large size of models, distributing them across multiple devices is required. Overlapping strategies, which enable concurrent computation and communication, are critical for… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  12. arXiv:2507.02393  [pdf, ps, other

    cs.CV cs.GR

    PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection

    Authors: Seokyeong Lee, Sithu Aung, Junyong Choi, Seungryong Kim, Ig-Jae Kim, Junghyun Cho

    Abstract: Monocular 3D object detection (M3OD) has long faced challenges due to data scarcity caused by high annotation costs and inherent 2D-to-3D ambiguity. Although various weakly supervised methods and pseudo-labeling methods have been proposed to address these issues, they are mostly limited by domain-specific learning or rely solely on shape information from a single observation. In this paper, we pro… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 18 pages, 16 figures

  13. arXiv:2507.02080  [pdf, ps, other

    cs.MM cs.SD

    TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation

    Authors: Yubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park

    Abstract: Multimodal emotion recognition often suffers from performance degradation in valence-arousal estimation due to noise and misalignment between audio and visual modalities. To address this challenge, we introduce TAGF, a Time-aware Gated Fusion framework for multimodal emotion recognition. The TAGF adaptively modulates the contribution of recursive attention outputs based on temporal dynamics. Speci… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 9 pages, 2 figures, 2 tables

  14. arXiv:2507.02068  [pdf, ps, other

    cs.SE

    How do Software Engineering Candidates Prepare for Technical Interviews?

    Authors: Brian Bell, Teresa Thomas, Sang Won Lee, Chris Brown

    Abstract: To obtain employment, aspiring software engineers must complete technical interviews -- a hiring process which involves candidates writing code while communicating to an audience. However, the complexities of tech interviews are difficult to prepare for and seldom faced in computing curricula. To this end, we seek to understand how candidates prepare for technical interviews, investigating the eff… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  15. arXiv:2507.00683  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG

    Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

    Authors: Satadeep Bhattacharjee, Seung-Cheol Lee

    Abstract: The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive th… ▽ More

    Submitted 4 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  16. arXiv:2506.24119  [pdf, ps, other

    cs.AI cs.CL cs.LG

    SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

    Authors: Bo Liu, Leon Guertler, Simon Yu, Zichen Liu, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques

    Abstract: Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving ve… ▽ More

    Submitted 30 June, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

    Comments: Work in Progress

  17. arXiv:2506.23516  [pdf, ps, other

    cs.LG cs.AI cs.CV

    FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

    Authors: Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee

    Abstract: Federated learning (FL) often suffers from performance degradation due to key challenges such as data heterogeneity and communication constraints. To address these limitations, we present a novel FL framework called FedWSQ, which integrates weight standardization (WS) and the proposed distribution-aware non-uniform quantization (DANUQ). WS enhances FL performance by filtering out biased components… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  18. arXiv:2506.23326  [pdf, ps, other

    cs.RO

    Simplifying Data-Driven Modeling of the Volume-Flow-Pressure Relationship in Hydraulic Soft Robotic Actuators

    Authors: Sang-Yoep Lee, Leonardo Zamora Yanez, Jacob Rogatinsky, Vi T. Vo, Tanvi Shingade, Tommaso Ranzani

    Abstract: Soft robotic systems are known for their flexibility and adaptability, but traditional physics-based models struggle to capture their complex, nonlinear behaviors. This study explores a data-driven approach to modeling the volume-flow-pressure relationship in hydraulic soft actuators, focusing on low-complexity models with high accuracy. We perform regression analysis on a stacked balloon actuator… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  19. arXiv:2506.22806  [pdf, ps, other

    cs.CV cs.LG

    Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

    Authors: Byung Hyun Lee, Sungjin Lim, Seunggyu Lee, Dong Un Kang, Se Young Chun

    Abstract: Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  20. arXiv:2506.21855  [pdf, ps, other

    cs.CV

    Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation

    Authors: Jiho Choi, Sang Jun Lee

    Abstract: In this paper, we propose a method that learns a general representation of periodic signals from unlabeled facial videos by capturing subtle changes in skin tone over time. The proposed framework employs the video masked autoencoder to learn a high-dimensional spatio-temporal representation of the facial region through self-supervised learning. Capturing quasi-periodic signals in the video is cruc… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  21. arXiv:2506.21765  [pdf, ps, other

    eess.IV cs.CV

    TUS-REC2024: A Challenge to Reconstruct 3D Freehand Ultrasound Without External Tracker

    Authors: Qi Li, Shaheer U. Saeed, Yuliang Huang, Mingyuan Luo, Zhongnuo Yan, Jiongquan Chen, Xin Yang, Dong Ni, Nektarios Winter, Phuc Nguyen, Lucas Steinberger, Caelan Haney, Yuan Zhao, Mingjie Jiang, Bowen Ren, SiYeoul Lee, Seonho Kim, MinKyung Seo, MinWoo Kim, Yimeng Dou, Zhiwei Zhang, Yin Li, Tomy Varghese, Dean C. Barratt, Matthew J. Clarkson , et al. (2 additional authors not shown)

    Abstract: Trackerless freehand ultrasound reconstruction aims to reconstruct 3D volumes from sequences of 2D ultrasound images without relying on external tracking systems, offering a low-cost, portable, and widely deployable alternative for volumetric imaging. However, it presents significant challenges, including accurate inter-frame motion estimation, minimisation of drift accumulation over long sequence… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  22. arXiv:2506.21582  [pdf, ps, other

    cs.CL cs.AI cs.HC

    VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

    Authors: Sam Yu-Te Lee, Chengyang Ji, Shicheng Wen, Lifu Huang, Dongyi Liu, Kwan-Liu Ma

    Abstract: Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  23. arXiv:2506.21039  [pdf, ps, other

    cs.LG cs.AI

    Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

    Authors: Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

    Abstract: Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces singl… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 9 technical page followed by references and appendix

  24. arXiv:2506.20922  [pdf, ps, other

    cs.CV

    M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization

    Authors: Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee

    Abstract: Image editing techniques have rapidly advanced, facilitating both innovative use cases and malicious manipulation of digital images. Deep learning-based methods have recently achieved high accuracy in pixel-level forgery localization, yet they frequently struggle with computational overhead and limited representation power, particularly for subtle or complex tampering. In this paper, we propose M2… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted in International Conference on Computer Vision (ICCV) 2025

  25. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (63 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  26. arXiv:2506.20357  [pdf, ps, other

    cs.AI

    Tabular Feature Discovery With Reasoning Type Exploration

    Authors: Sungwon Han, Sungkyu Park, Seungeon Lee

    Abstract: Feature engineering for tabular data remains a critical yet challenging step in machine learning. Recently, large language models (LLMs) have been used to automatically generate new features by leveraging their vast knowledge. However, existing LLM-based approaches often produce overly simple or repetitive features, partly due to inherent biases in the transformations the LLM chooses and the lack… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  27. arXiv:2506.20112  [pdf

    cs.CL

    A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection

    Authors: Songsoo Kim, Seungtae Lee, See Young Lee, Joonho Kim, Keechan Kan, Dukyong Yoon

    Abstract: Background: The positive predictive value (PPV) of large language model (LLM)-based proofreading for radiology reports is limited due to the low error prevalence. Purpose: To assess whether a three-pass LLM framework enhances PPV and reduces operational costs compared with baseline approaches. Materials and Methods: A retrospective analysis was performed on 1,000 consecutive radiology reports (250… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 29 pages, 5 figures, 4 tables. Code available at https://github.com/radssk/mp-rred

    ACM Class: I.2.7

  28. arXiv:2506.19602  [pdf, ps, other

    cs.RO

    Soft Robotic Delivery of Coiled Anchors for Cardiac Interventions

    Authors: Leonardo Zamora Yanez, Jacob Rogatinsky, Dominic Recco, Sang-Yoep Lee, Grace Matthews, Andrew P. Sabelhaus, Tommaso Ranzani

    Abstract: Trans-catheter cardiac intervention has become an increasingly available option for high-risk patients without the complications of open heart surgery. However, current catheterbased platforms suffer from a lack of dexterity, force application, and compliance required to perform complex intracardiac procedures. An exemplary task that would significantly ease minimally invasive intracardiac procedu… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  29. arXiv:2506.19451  [pdf, ps, other

    eess.SP cs.LG

    Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search

    Authors: Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park

    Abstract: Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerab… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  30. arXiv:2506.19417  [pdf, ps, other

    cs.LG cs.MA

    Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

    Authors: Yisak Park, Sunwoo Lee, Seungyul Han

    Abstract: Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) stat… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 9 technical page followed by references and appendix

  31. arXiv:2506.19217  [pdf, ps, other

    cs.CV cs.AI

    MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports

    Authors: Sunggu Kyung, Hyungbin Park, Jinyoung Seo, Jimin Sung, Jihyun Kim, Dongyeong Kim, Wooyoung Jo, Yoojin Nam, Sangah Park, Taehee Kwon, Sang Min Lee, Namkug Kim

    Abstract: Computed Tomography (CT) plays a crucial role in clinical diagnosis, but the growing demand for CT examinations has raised concerns about diagnostic errors. While Multimodal Large Language Models (MLLMs) demonstrate promising comprehension of medical knowledge, their tendency to produce inaccurate information highlights the need for rigorous validation. However, existing medical visual question an… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 14 pages, 5 figures, submitted to CVPR 2025

  32. arXiv:2506.18557  [pdf, ps, other

    cs.CV

    Object-aware Sound Source Localization via Audio-Visual Scene Understanding

    Authors: Sung Jin Um, Dongjin Kim, Sangmin Lee, Jung Uk Kim

    Abstract: Audio-visual sound source localization task aims to spatially localize sound-making objects within visual scenes by integrating visual and audio cues. However, existing methods struggle with accurately localizing sound-making objects in complex scenes, particularly when visually similar silent objects coexist. This limitation arises primarily from their reliance on simple audio-visual corresponden… ▽ More

    Submitted 23 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted at CVPR 2025

    Journal ref: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 8342-8351

  33. arXiv:2506.18284  [pdf

    cs.CV cs.AI

    Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset

    Authors: Kasra Moazzami, Seoyoun Son, John Lin, Sun Min Lee, Daniel Son, Hayeon Lee, Jeongho Lee, Seongji Lee

    Abstract: Endoscopic image classification plays a pivotal role in medical diagnostics by identifying anatomical landmarks and pathological findings. However, conventional closed-set classification frameworks are inherently limited in open-world clinical settings, where previously unseen conditions can arise andcompromise model reliability. To address this, we explore the application of Open Set Recognition… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 9 pages, 3 figures, 3 tables

  34. arXiv:2506.17951  [pdf, ps, other

    cs.CL

    A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment

    Authors: Quanwei Tang, Sophia Yat Mei Lee, Junshuang Wu, Dong Zhang, Shoushan Li, Erik Cambria, Guodong Zhou

    Abstract: Recent advancements in retrieval-augmented generation (RAG) have enhanced large language models in question answering by integrating external knowledge. However, challenges persist in achieving global understanding and aligning responses with human ethical and quality preferences. To address these issues, we propose GraphMPA, a comprehensive graph-based framework with mode-seeking preference align… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: acl 2025 findings

  35. arXiv:2506.16617  [pdf, ps, other

    cs.AI cs.HC

    The Role of Explanation Styles and Perceived Accuracy on Decision Making in Predictive Process Monitoring

    Authors: Soobin Chae, Suhwan Lee, Hanna Hauptmann, Hajo A. Reijers, Xixi Lu

    Abstract: Predictive Process Monitoring (PPM) often uses deep learning models to predict the future behavior of ongoing processes, such as predicting process outcomes. While these models achieve high accuracy, their lack of interpretability undermines user trust and adoption. Explainable AI (XAI) aims to address this challenge by providing the reasoning behind the predictions. However, current evaluations o… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted at CAiSE'25

  36. arXiv:2506.15977  [pdf, ps, other

    cs.CV

    Towards Classifying Histopathological Microscope Images as Time Series Data

    Authors: Sungrae Hong, Hyeongmin Park, Youngsin Ko, Sol Lee, Bryan Wong, Mun Yong Yi

    Abstract: As the frontline data for cancer diagnosis, microscopic pathology images are fundamental for providing patients with rapid and accurate treatment. However, despite their practical value, the deep learning community has largely overlooked their usage. This paper proposes a novel approach to classifying microscopy images as time series data, addressing the unique challenges posed by their manual acq… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures, Accepted by International Symposium on Biomedical Imaging (ISBI) 2025

  37. arXiv:2506.15766  [pdf, ps, other

    hep-th cs.LG math.DG

    Approximate Ricci-flat Metrics for Calabi-Yau Manifolds

    Authors: Seung-Joo Lee, Andre Lukas

    Abstract: We outline a method to determine analytic Kähler potentials with associated approximately Ricci-flat Kähler metrics on Calabi-Yau manifolds. Key ingredients are numerically calculating Ricci-flat Kähler potentials via machine learning techniques and fitting the numerical results to Donaldson's Ansatz. We apply this method to the Dwork family of quintic hypersurfaces in $\mathbb{P}^4$ and an analog… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 15 pages, 6 figures

  38. arXiv:2506.15645  [pdf, ps, other

    cs.CV cs.AI

    Demystifying the Visual Quality Paradox in Multimodal Large Language Models

    Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu

    Abstract: Recent Multimodal Large Language Models (MLLMs) excel on benchmark vision-language tasks, yet little is known about how input visual quality shapes their responses. Does higher perceptual quality of images already translate to better MLLM understanding? We conduct the first systematic study spanning leading MLLMs and a suite of vision-language benchmarks, applying controlled degradations and styli… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 18 pages

  39. arXiv:2506.15613  [pdf, ps, other

    cs.AR

    From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction Annotation

    Authors: Miryeong Kwon, Donghyun Gouk, Junhyeok Jang, Jinwoo Baek, Hyunwoo You, Sangyoon Ji, Hongjoo Jung, Junseok Moon, Seungkwan Kang, Seungjun Lee, Myoungsoo Jung

    Abstract: This paper explores how Compute Express Link (CXL) can transform PCIe-based block storage into a scalable, byte-addressable working memory. We address the challenges of adapting block storage to CXL's memory-centric model by emphasizing cacheability as a key enabler and advocating for Type 3 endpoint devices, referred to as CXL-SSDs. To validate our approach, we prototype a CXL-SSD on a custom FPG… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  40. arXiv:2506.15601  [pdf, ps, other

    cs.AR

    CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

    Authors: Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung

    Abstract: This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  41. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  42. arXiv:2506.15138  [pdf, ps, other

    cs.CL cs.AI

    Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models

    Authors: Gyeongje Cho, Yeonkyoun So, Chanwoo Park, Sangmin Lee, Sungmok Jung, Jaejin Lee

    Abstract: This paper introduces Thunder-Tok, a new Korean tokenizer designed to reduce token fertility without compromising model performance. Our approach uses a rule-based pre-tokenization method that aligns with the linguistic structure of the Korean language. We also create a seed vocabulary containing tokens that resemble linguistic units and employ a branching entropy-based selection algorithm. These… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  43. arXiv:2506.14285  [pdf, ps, other

    cs.CL

    From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents

    Authors: Seongbo Jang, Minjin Jeon, Jaehoon Lee, Seonghyeon Lee, Dongha Lee, Hwanjo Yu

    Abstract: While research on dialogue response generation has primarily focused on generating coherent responses conditioning on textual context, the critical question of when to respond grounded on the temporal context remains underexplored. To bridge this gap, we propose a novel task called timely dialogue response generation and introduce the TimelyChat benchmark, which evaluates the capabilities of langu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Work in progress

  44. arXiv:2506.14107  [pdf, ps, other

    cs.DC cs.CV

    Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse

    Authors: Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park

    Abstract: Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems. These models typically rely on Vision Transformers (ViTs), which process video frames individually to extract visual embeddings. However, generating embeddings for large-scale videos requires ViT inferencing across numerous frames, posi… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted to 2025 VLDB

  45. arXiv:2506.13508  [pdf, ps, other

    cs.CV

    Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields

    Authors: Jungeon Kim, Geonsoo Park, Seungyong Lee

    Abstract: Recent methods, such as 2D Gaussian Splatting and Gaussian Opacity Fields, have aimed to address the geometric inaccuracies of 3D Gaussian Splatting while retaining its superior rendering quality. However, these approaches still struggle to reconstruct smooth and reliable geometry, particularly in scenes with significant color variation across viewpoints, due to their per-point appearance modeling… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted to Computer Graphics Forum (EGSR 2025)

  46. arXiv:2506.13342  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

    Authors: Wooseok Seo, Seungju Han, Jaehun Jung, Benjamin Newman, Seungwon Lim, Seungbeen Lee, Ximing Lu, Yejin Choi, Youngjae Yu

    Abstract: Fact verification is essential for ensuring the reliability of LLM applications. In this study, we evaluate 12 pre-trained LLMs and one specialized fact-verifier, including frontier LLMs and open-weight reasoning LLMs, using a collection of examples from 14 fact-checking benchmarks. We share three findings intended to guide future development of more robust fact verifiers. First, we highlight the… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  47. arXiv:2506.13015  [pdf, ps, other

    cs.LG cs.AI

    Geometric Embedding Alignment via Curvature Matching in Transfer Learning

    Authors: Sung Moon Ko, Jaewan Lee, Sumin Lee, Soorin Yim, Kyunghoon Bae, Sehui Han

    Abstract: Geometrical interpretations of deep learning models offer insightful perspectives into their underlying mathematical structures. In this work, we introduce a novel approach that leverages differential geometry, particularly concepts from Riemannian geometry, to integrate multiple models into a unified transfer learning framework. By aligning the Ricci curvature of latent space of individual models… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 13+19 pages, 7 figures, 8 tables, 1 pseudo code

  48. arXiv:2506.12790  [pdf, ps, other

    cs.LG math.NA physics.comp-ph

    PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling

    Authors: Minju Jo, Woojin Cho, Uvini Balasuriya Mudiyanselage, Seungjun Lee, Noseong Park, Kookjin Lee

    Abstract: Scientific machine learning often involves representing complex solution fields that exhibit high-frequency features such as sharp transitions, fine-scale oscillations, and localized structures. While implicit neural representations (INRs) have shown promise for continuous function modeling, capturing such high-frequency behavior remains a challenge-especially when modeling multiple solution field… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  49. arXiv:2506.11772  [pdf, ps, other

    cs.CV cs.LG

    CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection

    Authors: Byeongchan Lee, John Won, Seunghyun Lee, Jinwoo Shin

    Abstract: Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types (e.g., local and global defect), and the scarcity of training data. As such, it necessitates a comprehensive model capable of capturing both low-level and high-level features, even with limited data. To address this, we propose CLIPFUSION, a method that leverages both discriminative an… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  50. arXiv:2506.11499  [pdf, ps, other

    cs.CL

    On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval

    Authors: Seongbo Jang, Seonghyeon Lee, Dongha Lee, Hwanjo Yu

    Abstract: Multimodal chatbots have become one of the major topics for dialogue systems in both research community and industry. Recently, researchers have shed light on the multimodality of responses as well as dialogue contexts. This work explores how a dialogue system can output responses in various modalities such as text and image. To this end, we first formulate a multimodal dialogue response retrieval… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 9 pages, 1 figure