Skip to main content

Showing 1–50 of 1,024 results for author: Lee, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2510.00777  [pdf, ps, other

    cs.LG

    In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

    Authors: Youngbin Choi, Minjong Lee, Saemi Moon, Seunghyuk Cho, Chaehyeon Chung, MoonJeong Park, Dongwoo Kim

    Abstract: Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this wo… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

    Comments: 28 pages, 23 figures

  2. arXiv:2510.00309  [pdf, ps, other

    cs.LG stat.ML

    Lipschitz Bandits with Stochastic Delayed Feedback

    Authors: Zhongxuan Liu, Yue Kang, Thomas C. M. Lee

    Abstract: The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded… ▽ More

    Submitted 30 September, 2025; originally announced October 2025.

  3. arXiv:2509.25851  [pdf, ps, other

    cs.CV

    MuSLR: Multimodal Symbolic Logical Reasoning

    Authors: Jundong Xu, Hao Fei, Yuhui Zhang, Liangming Pan, Qijun Huang, Qian Liu, Preslav Nakov, Min-Yen Kan, William Yang Wang, Mong-Li Lee, Wynne Hsu

    Abstract: Multimodal symbolic logical reasoning, which aims to deduce new facts from multimodal input via formal logic, is critical in high-stakes applications such as autonomous driving and medical diagnosis, as its rigorous, deterministic reasoning helps prevent serious consequences. To evaluate such capabilities of current state-of-the-art vision language models (VLMs), we introduce the first benchmark M… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: Accepted by NeurIPS 2025

  4. arXiv:2509.24935  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Scalable GANs with Transformers

    Authors: Sangeek Hyun, MinKyu Lee, Jae-Pil Heo

    Abstract: Scalability has driven recent advances in generative modeling, yet its principles remain underexplored for adversarial learning. We investigate the scalability of Generative Adversarial Networks (GANs) through two design choices that have proven to be effective in other types of generative models: training in a compact Variational Autoencoder latent space and adopting purely transformer-based gene… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  5. arXiv:2509.23098  [pdf, ps, other

    cs.CV cs.AI

    CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP

    Authors: Na Min An, Inha Kang, Minhyun Lee, Hyunjung Shim

    Abstract: Spatial grounding is crucial for referring image segmentation (RIS), where the goal of the task is to localize an object described by language. Current foundational vision-language models (VLMs), such as CLIP, excel at aligning images and text but struggle with understanding spatial relationships. Within the language stream, most existing methods often focus on the primary noun phrase when extract… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 28 pages, 22 Figures, 11 Tables

  6. arXiv:2509.21251  [pdf, ps, other

    cs.CV cs.AI

    Instruction-tuned Self-Questioning Framework for Multimodal Reasoning

    Authors: You-Won Jang, Yu-Jung Heo, Jaeseok Kim, Minsu Lee, Du-Seong Chang, Byoung-Tak Zhang

    Abstract: The field of vision-language understanding has been actively researched in recent years, thanks to the development of Large Language Models~(LLMs). However, it still needs help with problems requiring multi-step reasoning, even for very simple questions. Recent studies adopt LLMs to tackle this problem by iteratively generating sub-questions and answers. However, there are disadvantages such as 1)… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: This paper was accepted to the "CLVL: 5th Workshop on Closing the Loop Between Vision and Language (ICCV 2023 CLVL workshop)."

  7. arXiv:2509.20750  [pdf, ps, other

    cs.CL cs.AI

    Confidence-guided Refinement Reasoning for Zero-shot Question Answering

    Authors: Youwon Jang, Woo Suk Choi, Minjoon Jung, Minsu Lee, Byoung-Tak Zhang

    Abstract: We propose Confidence-guided Refinement Reasoning (C2R), a novel training-free framework applicable to question-answering (QA) tasks across text, image, and video domains. C2R strategically constructs and refines sub-questions and their answers (sub-QAs), deriving a better confidence score for the target answer. C2R first curates a subset of sub-QAs to explore diverse reasoning paths, then compare… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 18 pages (including references and appendix)

  8. arXiv:2509.18683  [pdf, ps, other

    cs.CV cs.AI cs.MM

    LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection

    Authors: Lanhu Wu, Zilin Gao, Hao Fei, Mong-Li Lee, Wynne Hsu

    Abstract: RGB-D salient object detection (SOD) aims to identify the most conspicuous objects in a scene with the incorporation of depth cues. Existing methods mainly rely on CNNs, limited by the local receptive fields, or Vision Transformers that suffer from the cost of quadratic complexity, posing a challenge in balancing performance and computational efficiency. Recently, state space models (SSM), Mamba,… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: Accepted to ACM MM 2025

  9. arXiv:2509.18447  [pdf, ps, other

    cs.RO cs.AI

    PrioriTouch: Adapting to User Contact Preferences for Whole-Arm Physical Human-Robot Interaction

    Authors: Rishabh Madan, Jiawei Lin, Mahika Goel, Angchen Xie, Xiaoyu Liang, Marcus Lee, Justin Guo, Pranav N. Thakkar, Rohan Banerjee, Jose Barreiros, Kate Tsui, Tom Silver, Tapomayukh Bhattacharjee

    Abstract: Physical human-robot interaction (pHRI) requires robots to adapt to individual contact preferences, such as where and how much force is applied. Identifying preferences is difficult for a single contact; with whole-arm interaction involving multiple simultaneous contacts between the robot and human, the challenge is greater because different body parts can impose incompatible force requirements. I… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: Conference on Robot Learning (CoRL)

  10. arXiv:2509.18085  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

    Authors: Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Mingu Lee, Christopher Lott, Fatih Porikli

    Abstract: Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token generation rates. However, currently available open-source dLLMs often generate at much lower rates, typically decoding only a single token at every denoising timestep in order to maximize output quality. We present Spiffy, a speculativ… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

  11. arXiv:2509.15493  [pdf, ps, other

    cs.LG

    FRAUDGUESS: Spotting and Explaining New Types of Fraud in Million-Scale Financial Data

    Authors: Robson L. F. Cordeiro, Meng-Chieh Lee, Christos Faloutsos

    Abstract: Given a set of financial transactions (who buys from whom, when, and for how much), as well as prior information from buyers and sellers, how can we find fraudulent transactions? If we have labels for some transactions for known types of fraud, we can build a classifier. However, we also want to find new types of fraud, still unknown to the domain experts ('Detection'). Moreover, we also want to p… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  12. arXiv:2509.14589  [pdf, ps, other

    cs.CR cs.AI

    ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System

    Authors: Taesoo Kim, HyungSeok Han, Soyeon Park, Dae R. Jeong, Dohyeok Kim, Dongkwan Kim, Eunsoo Kim, Jiho Kim, Joshua Wang, Kangsu Kim, Sangwoo Ji, Woosun Song, Hanqing Zhao, Andrew Chin, Gyejin Lee, Kevin Stevens, Mansour Alharthi, Yizhuo Zhai, Cen Zhang, Joonun Jang, Yeongjin Jang, Ammar Askar, Dongju Kim, Fabian Fleischer, Jeongin Cho , et al. (21 additional authors not shown)

    Abstract: We present ATLANTIS, the cyber reasoning system developed by Team Atlanta that won 1st place in the Final Competition of DARPA's AI Cyber Challenge (AIxCC) at DEF CON 33 (August 2025). AIxCC (2023-2025) challenged teams to build autonomous cyber reasoning systems capable of discovering and patching vulnerabilities at the speed and scale of modern software. ATLANTIS integrates large language models… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: Version 1.0 (September 17, 2025). Technical Report. Team Atlanta -- 1st place in DARPA AIxCC Final Competition. Project page: https://team-atlanta.github.io/

  13. arXiv:2509.14142  [pdf, ps, other

    cs.CV

    MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

    Authors: Peng Xu, Shengwu Xiong, Jiajun Zhang, Yaxiong Chen, Bowen Zhou, Chen Change Loy, David A. Clifton, Kyoung Mu Lee, Luc Van Gool, Ruiming He, Ruilin Yao, Xinwei Long, Jirui Huang, Kai Tian, Sa Yang, Yihua Shao, Jin Feng, Yue Zhong, Jiakai Zhou, Cheng Tang, Tianyu Zou, Yifang Zhang, Junming Liang, Guoyou Li, Zhaoxiang Wang , et al. (103 additional authors not shown)

    Abstract: This paper reviews the MARS2 2025 Challenge on Multimodal Reasoning. We aim to bring together different approaches in multimodal machine learning and LLMs via a large benchmark. We hope it better allows researchers to follow the state-of-the-art in this very dynamic area. Meanwhile, a growing number of testbeds have boosted the evolution of general-purpose large language models. Thus, this year's… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: ICCV 2025 MARS2 Workshop and Challenge "Multimodal Reasoning and Slow Thinking in the Large Model Era: Towards System 2 and Beyond''

  14. arXiv:2509.13907  [pdf, ps, other

    cs.CV

    White Aggregation and Restoration for Few-shot 3D Point Cloud Semantic Segmentation

    Authors: Jiyun Im, SuBeen Lee, Miso Lee, Jae-Pil Heo

    Abstract: Few-Shot 3D Point Cloud Segmentation (FS-PCS) aims to predict per-point labels for an unlabeled point cloud, given only a few labeled examples. To extract discriminative representations from the limited support set, existing methods have constructed prototypes using conventional algorithms such as farthest point sampling. However, we point out that its initial randomness significantly affects FS-P… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: 9 pages, 5 figures

  15. arXiv:2509.13200  [pdf, ps, other

    cs.RO

    StageACT: Stage-Conditioned Imitation for Robust Humanoid Door Opening

    Authors: Moonyoung Lee, Dong Ki Kim, Jai Krishna Bandi, Max Smith, Aileen Liao, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei

    Abstract: Humanoid robots promise to operate in everyday human environments without requiring modifications to the surroundings. Among the many skills needed, opening doors is essential, as doors are the most common gateways in built spaces and often limit where a robot can go. Door opening, however, poses unique challenges as it is a long-horizon task under partial observability, such as reasoning about th… ▽ More

    Submitted 18 September, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: 7 pages

  16. arXiv:2509.12650  [pdf, ps, other

    cs.LG cs.AI

    Leveraging Intermediate Representations of Time Series Foundation Models for Anomaly Detection

    Authors: Chan Sik Han, Keon Myung Lee

    Abstract: Detecting anomalies in time series data is essential for the reliable operation of many real-world systems. Recently, time series foundation models (TSFMs) have emerged as a powerful tool for anomaly detection. However, existing methods typically rely on the final layer's representations of TSFMs, computing the anomaly score as a reconstruction or forecasting error via a task-specific head. Instea… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: 10 pages,8 figures

  17. arXiv:2509.11866  [pdf, ps, other

    cs.CV

    Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

    Authors: Meng Luo, Shengqiong Wu, Liqiang Jing, Tianjie Ju, Li Zheng, Jinxiang Lai, Tianlong Wu, Xinya Du, Jian Li, Siyuan Yan, Jiebo Luo, William Yang Wang, Hao Fei, Mong-Li Lee, Wynne Hsu

    Abstract: Recent advancements in large video models (LVMs) have significantly enhance video understanding. However, these models continue to suffer from hallucinations, producing content that conflicts with input videos. To address this issue, we propose Dr.V, a hierarchical framework covering perceptive, temporal, and cognitive levels to diagnose video hallucination by fine-grained spatial-temporal groundi… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: 25 pages, 16 figures

  18. arXiv:2509.08163  [pdf, ps, other

    cs.LG q-fin.RM stat.AP stat.ML

    Machine Learning with Multitype Protected Attributes: Intersectional Fairness through Regularisation

    Authors: Ho Ming Lee, Katrien Antonio, Benjamin Avanzi, Lorenzo Marchi, Rui Zhou

    Abstract: Ensuring equitable treatment (fairness) across protected attributes (such as gender or ethnicity) is a critical issue in machine learning. Most existing literature focuses on binary classification, but achieving fairness in regression tasks-such as insurance pricing or hiring score assessments-is equally important. Moreover, anti-discrimination laws also apply to continuous attributes, such as age… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    MSC Class: 90B50; 62P05; 62H20; 68T07

  19. arXiv:2509.04809  [pdf, ps, other

    cs.AI cs.HC

    TalkToAgent: A Human-centric Explanation of Reinforcement Learning Agents with Large Language Models

    Authors: Haechang Kim, Hao Chen, Can Li, Jong Min Lee

    Abstract: Explainable Reinforcement Learning (XRL) has emerged as a promising approach in improving the transparency of Reinforcement Learning (RL) agents. However, there remains a gap between complex RL policies and domain experts, due to the limited comprehensibility of XRL results and isolated coverage of current XRL approaches that leave users uncertain about which tools to employ. To address these chal… ▽ More

    Submitted 7 September, 2025; v1 submitted 5 September, 2025; originally announced September 2025.

    Comments: 31 pages total

  20. arXiv:2509.04448  [pdf, ps, other

    cs.CV cs.MM

    TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection

    Authors: Zehong Yan, Peng Qi, Wynne Hsu, Mong Li Lee

    Abstract: Multimodal misinformation, encompassing textual, visual, and cross-modal distortions, poses an increasing societal threat that is amplified by generative AI. Existing methods typically focus on a single type of distortion and struggle to generalize to unseen scenarios. In this work, we observe that different distortion types share common reasoning capabilities while also requiring task-specific sk… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

    Comments: EMNLP 2025; Project Homepage: https://yanzehong.github.io/trust-vl/

  21. arXiv:2509.02589  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Normal and Atypical Mitosis Image Classifier using Efficient Vision Transformer

    Authors: Xuan Qi, Dominic Labella, Thomas Sanford, Maxwell Lee

    Abstract: We tackle atypical versus normal mitosis classification in the MIDOG 2025 challenge using EfficientViT-L2, a hybrid CNN--ViT architecture optimized for accuracy and efficiency. A unified dataset of 13,938 nuclei from seven cancer types (MIDOG++ and AMi-Br) was used, with atypical mitoses comprising ~15. To assess domain generalization, we applied leave-one-cancer-type-out cross-validation with 5-f… ▽ More

    Submitted 28 August, 2025; originally announced September 2025.

    Comments: for grandchallenge midog 2025 track 2 abstract

  22. arXiv:2509.02447  [pdf, ps, other

    cs.DC

    An Efficient and Adaptive Watermark Detection System with Tile-based Error Correction

    Authors: Xinrui Zhong, Xinze Feng, Jingwei Zuo, Fanjiang Ye, Yi Mu, Junfeng Guo, Heng Huang, Myungjin Lee, Yuke Wang

    Abstract: Efficient and reliable detection of generated images is critical for the responsible deployment of generative models. Existing approaches primarily focus on improving detection accuracy and robustness under various image transformations and adversarial manipulations, yet they largely overlook the efficiency challenges of watermark detection across large-scale image collections. To address this gap… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

  23. arXiv:2509.01147  [pdf, ps, other

    cs.CL

    Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective

    Authors: Zhihao Zhang, Sophia Yat Mei Lee, Dong Zhang, Shoushan Li, Guodong Zhou

    Abstract: Cross-lingual Named Entity Recognition (CL-NER) aims to transfer knowledge from high-resource languages to low-resource languages. However, existing zero-shot CL-NER (ZCL-NER) approaches primarily focus on Latin script language (LSL), where shared linguistic features facilitate effective knowledge transfer. In contrast, for non-Latin script language (NSL), such as Chinese and Japanese, performance… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: EMNLP 2025

  24. arXiv:2508.21085  [pdf, ps, other

    cs.CL cs.IR

    Granite Embedding R2 Models

    Authors: Parul Awasthy, Aashka Trivedi, Yulong Li, Meet Doshi, Riyaz Bhat, Vignesh P, Vishwajeet Kumar, Yushu Yang, Bhavani Iyer, Abraham Daniels, Rudra Murthy, Ken Barker, Martin Franz, Madison Lee, Todd Ward, Salim Roukos, David Cox, Luis Lastras, Jaydeep Sen, Radu Florian

    Abstract: We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these models deliver substantial improvements, including 16x expanded context length (8,192 tokens), state-of-the-art performance across diverse retrieval domains - t… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  25. arXiv:2508.21036  [pdf, ps, other

    cs.HC cs.AI

    Understanding, Protecting, and Augmenting Human Cognition with Generative AI: A Synthesis of the CHI 2025 Tools for Thought Workshop

    Authors: Lev Tankelevitch, Elena L. Glassman, Jessica He, Aniket Kittur, Mina Lee, Srishti Palani, Advait Sarkar, Gonzalo Ramos, Yvonne Rogers, Hari Subramonyam

    Abstract: Generative AI (GenAI) radically expands the scope and capability of automation for work, education, and everyday tasks, a transformation posing both risks and opportunities for human cognition. How will human cognition change, and what opportunities are there for GenAI to augment it? Which theories, metrics, and other tools are needed to address these questions? The CHI 2025 workshop on Tools for… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  26. arXiv:2508.20201  [pdf, ps, other

    cs.CL

    Social Bias in Multilingual Language Models: A Survey

    Authors: Lance Calvin Lim Gamboa, Yue Feng, Mark Lee

    Abstract: Pretrained multilingual models exhibit the same social bias as models processing English texts. This systematic review analyzes emerging research that extends bias evaluation and mitigation approaches into multilingual and non-English contexts. We examine these studies with respect to linguistic diversity, cultural awareness, and their choice of evaluation metrics and mitigation techniques. Our su… ▽ More

    Submitted 5 September, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

    Comments: Accepted into EMNLP 2025 Main Conference

  27. arXiv:2508.19113  [pdf, ps, other

    cs.AI

    Hybrid Deep Searcher: Integrating Parallel and Sequential Search Reasoning

    Authors: Dayoon Ko, Jihyuk Kim, Haeju Park, Sohyeon Kim, Dahyun Lee, Yongrae Jo, Gunhee Kim, Moontae Lee, Kyungjae Lee

    Abstract: Large reasoning models (LRMs) have demonstrated strong performance in complex, multi-step reasoning tasks. Existing methods enhance LRMs by sequentially integrating external knowledge retrieval; models iteratively generate queries, retrieve external information, and progressively reason over this information. However, purely sequential querying increases inference latency and context length, dimin… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  28. arXiv:2508.17756  [pdf, ps, other

    cs.LG eess.SY

    SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

    Authors: Fanjiang Ye, Zepeng Zhao, Yi Mu, Jucheng Shen, Renjie Li, Kaijian Wang, Desen Sun, Saurabh Agarwal, Myungjin Lee, Triston Cao, Aditya Akella, Arvind Krishnamurthy, T. S. Eugene Ng, Zhengzhong Tu, Yuke Wang

    Abstract: Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and proh… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  29. arXiv:2508.17661  [pdf, ps, other

    cs.AI cs.LG cs.NE

    Spacer: Towards Engineered Scientific Inspiration

    Authors: Minhyeong Lee, Suyoung Hwang, Seunghyun Moon, Geonho Nah, Donghyun Koh, Youngjun Cho, Johyun Park, Hojin Yoo, Jiho Park, Haneul Choi, Sungbin Moon, Taehoon Hwang, Seungwon Kim, Jaeyeong Kim, Seongjun Kim, Juneau Jung

    Abstract: Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  30. arXiv:2508.13744  [pdf, ps, other

    cs.CV cs.AI

    Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks

    Authors: Yeji Park, Minyoung Lee, Sanghyuk Chun, Junsuk Choe

    Abstract: Large Vision-Language Models (LVLMs) demonstrate strong performance on single-image tasks. However, we observe that their performance degrades significantly when handling multi-image inputs. This occurs because visual cues from different images become entangled in the model's output. We refer to this phenomenon as cross-image information leakage. To address this issue, we propose FOCUS, a training… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: Source code is available at https://github.com/yejipark-m/FOCUS

  31. arXiv:2508.10395  [pdf, ps, other

    cs.LG

    XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

    Authors: Aditya Tomar, Coleman Hooper, Minjae Lee, Haocheng Xi, Rishabh Tiwari, Wonjun Kang, Luca Manolache, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: Although LLM inference has emerged as a critical workload for many downstream applications, efficiently inferring LLMs is challenging due to the substantial memory footprint and bandwidth requirements. In parallel, compute capabilities have steadily outpaced both memory capacity and bandwidth over the last few decades, a trend that remains evident in modern GPU hardware and exacerbates the challen… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: 24 pages

  32. arXiv:2508.08504  [pdf, ps, other

    cs.CY cs.AI cs.LG

    When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital

    Authors: Avni Kothari, Patrick Vossler, Jean Digitale, Mohammad Forouzannia, Elise Rosenberg, Michele Lee, Jennee Bryant, Melanie Molina, James Marks, Lucas Zier, Jean Feng

    Abstract: Large language models (LLMs) have the potential to address social and behavioral determinants of health by transforming labor intensive workflows in resource-constrained settings. Creating LLM-based applications that serve the needs of underserved communities requires a deep understanding of their local context, but it is often the case that neither LLMs nor their developers possess this local exp… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

  33. arXiv:2508.08457  [pdf, ps, other

    cs.AR cs.ET

    Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler and Ultra-Large Capacity On-Chip Memories

    Authors: Ming-Yen Lee, Faaiq Waqar, Hanchen Yang, Muhammed Ahosan Ul Karim, Harsono Simka, Shimeng Yu

    Abstract: Long-context Large Language Model (LLM) inference faces increasing compute bottlenecks as attention calculations scale with context length, primarily due to the growing KV-cache transfer overhead that saturates High Bandwidth Memory (HBM). While prefetching techniques mitigate cache misses by fetching KV data in advance, their spatial and temporal benefits present new opportunities to exploit. Thi… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: 7 pages, 8 figures, 2 tables

    ACM Class: C.1.3; B.3.1

  34. arXiv:2508.07179  [pdf, ps, other

    cs.CL cs.AI cs.DB

    Schema Lineage Extraction at Scale: Multilingual Pipelines, Composite Evaluation, and Language-Model Benchmarks

    Authors: Jiaqi Yin, Yi-Wei Chen, Meng-Lung Lee, Xiya Liu

    Abstract: Enterprise data pipelines, characterized by complex transformations across multiple programming languages, often cause a semantic disconnect between original metadata and downstream data. This "semantic drift" compromises data reproducibility and governance, and impairs the utility of services like retrieval-augmented generation (RAG) and text-to-SQL systems. To address this, a novel framework is… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

  35. arXiv:2508.05399  [pdf, ps, other

    cs.CV cs.AI cs.LG

    UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

    Authors: Wonjun Kang, Byeongkeun Ahn, Minjae Lee, Kevin Galim, Seunghyuk Oh, Hyung Il Koo, Nam Ik Cho

    Abstract: Text-to-image (T2I) generation has been actively studied using Diffusion Models and Autoregressive Models. Recently, Masked Generative Transformers have gained attention as an alternative to Autoregressive Models to overcome the inherent limitations of causal attention and autoregressive decoding through bidirectional attention and parallel decoding, enabling efficient and high-quality image gener… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: Code is available at https://github.com/furiosa-ai/uncage

  36. arXiv:2508.03483  [pdf, ps, other

    cs.CV cs.AI

    When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

    Authors: Dasol Choi, Jihwan Lee, Minjae Lee, Minsuk Kahng

    Abstract: While prior research on text-to-image generation has predominantly focused on biases in human depictions, we investigate a more subtle yet pervasive phenomenon: demographic bias in generated objects (e.g., cars). We introduce SODA (Stereotyped Object Diagnostic Audit), a novel framework for systematically measuring such biases. Our approach compares visual attributes of objects generated with demo… ▽ More

    Submitted 10 August, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

  37. arXiv:2508.00418  [pdf, ps, other

    cs.CV eess.IV

    IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator

    Authors: Sangwoo Youn, Minji Lee, Nokap Tony Park, Yeonggyoo Jeon, Taeyoung Na

    Abstract: Video outpainting presents a unique challenge of extending the borders while maintaining consistency with the given content. In this paper, we suggest the use of video inpainting models that excel in object flow learning and reconstruction in outpainting rather than solely generating the background as in existing methods. However, directly applying or fine-tuning inpainting models to outpainting h… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

    Comments: ICIP 2025. Code: https://github.com/sang-w00/IN2OUT

  38. arXiv:2507.22459  [pdf, ps, other

    cs.CV

    Exploiting Diffusion Prior for Task-driven Image Restoration

    Authors: Jaeha Kim, Junghun Oh, Kyoung Mu Lee

    Abstract: Task-driven image restoration (TDIR) has recently emerged to address performance drops in high-level vision tasks caused by low-quality (LQ) inputs. Previous TDIR methods struggle to handle practical scenarios in which images are degraded by multiple complex factors, leaving minimal clues for restoration. This motivates us to leverage the diffusion prior, one of the most powerful natural image pri… ▽ More

    Submitted 1 September, 2025; v1 submitted 30 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025. Code is available at https://github.com/JaehaKim97/EDTR

  39. arXiv:2507.19790  [pdf, ps, other

    cs.CV

    DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS) aims to detect the most prominent object in a video. Recently, two-stream approaches that leverage both RGB images and optical flow have gained significant attention, but their performance is fundamentally constrained by the scarcity of training data. To address this, we propose DepthFlow, a novel data generation method that synthesizes optical flow fro… ▽ More

    Submitted 26 July, 2025; originally announced July 2025.

    Comments: ICCVW 2025

  40. arXiv:2507.19789  [pdf, ps, other

    cs.CV

    TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Sunghun Yang, Sangyoun Lee

    Abstract: Video salient object detection (SOD) relies on motion cues to distinguish salient objects from backgrounds, but training such models is limited by scarce video datasets compared to abundant image datasets. Existing approaches that use spatial transformations to create video sequences from static images fail for motion-guided tasks, as these transformations produce unrealistic optical flows that la… ▽ More

    Submitted 26 July, 2025; originally announced July 2025.

    Comments: ICCVW 2025

  41. arXiv:2507.18668  [pdf, ps, other

    cs.LG cs.AI

    Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

    Authors: Donghee Han, Daehee Kim, Minjun Lee, Daeyoung Roh, Keejun Han, Mun Yong Yi

    Abstract: The rise of online learning has led to the development of various knowledge tracing (KT) methods. However, existing methods have overlooked the problem of increasing computational cost when utilizing large graphs and long learning sequences. To address this issue, we introduce Dual Graph Attention-based Knowledge Tracing (DGAKT), a graph neural network model designed to leverage high-order informa… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

  42. arXiv:2507.17332  [pdf, ps, other

    cs.CV

    PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image

    Authors: Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: The misaligned human texture across different human parts is one of the main limitations of existing 3D human reconstruction methods. Each human part, such as a jacket or pants, should maintain a distinct texture without blending into others. The structural coherence of human parts serves as a crucial cue to infer human textures in the invisible regions of a single image. However, most existing 3D… ▽ More

    Submitted 30 July, 2025; v1 submitted 23 July, 2025; originally announced July 2025.

    Comments: Published at ICCV 2025, 22 pages including the supplementary material

  43. arXiv:2507.13575  [pdf, ps, other

    cs.LG cs.AI

    Apple Intelligence Foundation Language Models: Tech Report 2025

    Authors: Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, Xiyou Zhou, Jun Qin, Dian Ang Yap, Narendran Raghavan, Xuankai Chang, Margit Bowler, Eray Yildiz, John Peebles, Hannah Gillis Coleman, Matteo Ronchi, Peter Gray, Keen You, Anthony Spalvieri-Kruse, Ruoming Pang, Reed Li, Yuli Yang, Emad Soroush, Zhiyun Lu, Crystal Xiao, Rong Situ, Jordan Huffaker, David Griffiths , et al. (373 additional authors not shown)

    Abstract: We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transform… ▽ More

    Submitted 27 August, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

  44. arXiv:2507.13490  [pdf, ps, other

    cs.CL

    Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?

    Authors: Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea

    Abstract: There has been extensive research on assessing the value orientation of Large Language Models (LLMs) as it can shape user experiences across demographic groups. However, several challenges remain. First, while the Multiple Choice Question (MCQ) setting has been shown to be vulnerable to perturbations, there is no systematic comparison of probing methods for value probing. Second, it is unclear to… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  45. arXiv:2507.12985  [pdf, ps, other

    eess.IV cs.CV

    From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation

    Authors: Jinseo An, Min Jin Lee, Kyu Won Shim, Helen Hong

    Abstract: Accurate segmentation of orbital bones in facial computed tomography (CT) images is essential for the creation of customized implants for reconstruction of defected orbital bones, particularly challenging due to the ambiguous boundaries and thin structures such as the orbital medial wall and orbital floor. In these ambiguous regions, existing segmentation approaches often output disconnected or un… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Early accepted at MICCAI 2025

  46. arXiv:2507.12771  [pdf, ps, other

    cs.CV cs.AI

    Local Representative Token Guided Merging for Text-to-Image Generation

    Authors: Min-Jeong Lee, Hee-Dong Kim, Seong-Whan Lee

    Abstract: Stable diffusion is an outstanding image generation model for text-to-image, but its time-consuming generation process remains a challenge due to the quadratic complexity of attention operations. Recent token merging methods improve efficiency by reducing the number of tokens during attention operations, but often overlook the characteristics of attention-based image generation models, limiting th… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: 6 pages

  47. arXiv:2507.11814  [pdf, ps, other

    math.CO cs.DM

    Unavoidable butterfly minors in digraphs of large cycle rank

    Authors: Meike Hatzel, O-joung Kwon, Myounghwan Lee, Sebastian Wiederrecht

    Abstract: Cycle rank is one of the depth parameters for digraphs introduced by Eggan in 1963. We show that there exists a function $f:\mathbb{N}\to \mathbb{N}$ such that every digraph of cycle rank at least $f(k)$ contains a directed cycle chain, a directed ladder, or a directed tree chain of order $k$ as a butterfly minor. We also investigate a new connection between cycle rank and a directed analogue of t… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

    Comments: 53 pages, 19 figures

  48. arXiv:2507.11590  [pdf, ps, other

    cs.LG

    Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques

    Authors: Raju Challagundla, Mohsen Dorodchi, Pu Wang, Minwoo Lee

    Abstract: As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like finance, healthcare and the social sciences. This survey presents a comprehensive and focused review of recent advances in synthetic tabular data generation, emphasi… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  49. arXiv:2507.06996  [pdf, ps, other

    cs.LG cs.AI

    Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

    Authors: Eunbyeol Cho, Jiyoun Kim, Minjae Lee, Sungjin Park, Edward Choi

    Abstract: Electronic Health Records (EHR) are time-series relational databases that record patient interactions and medical events over time, serving as a critical resource for healthcare research and applications. However, privacy concerns and regulatory restrictions limit the sharing and utilization of such sensitive data, necessitating the generation of synthetic EHR datasets. Unlike previous EHR synthes… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  50. arXiv:2507.06838  [pdf, ps, other

    cs.CL cs.IR

    Shifting from Ranking to Set Selection for Retrieval Augmented Generation

    Authors: Dahyun Lee, Yongrae Jo, Haeju Park, Moontae Lee

    Abstract: Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved passages are not only individually relevant but also collectively form a comprehensive set. Existing approaches primarily rerank top-k passages based on their individual relevance, often failing to meet the information needs of complex queries in multi-hop question answering. In this work, we propose a set-wise passage sel… ▽ More

    Submitted 9 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

    Comments: Accepted to ACL 2025 main (Oral Presentation)