Skip to main content

Showing 1–50 of 2,821 results for author: Li, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09990  [pdf, ps, other

    cs.CV

    PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

    Authors: Long Cheng, Jiafei Duan, Yi Ru Wang, Haoquan Fang, Boyang Li, Yushan Huang, Elvis Wang, Ainaz Eftekhar, Jason Lee, Wentao Yuan, Rose Hendrix, Noah A. Smith, Fei Xia, Dieter Fox, Ranjay Krishna

    Abstract: Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts, with applications spanning robotics, assistive technologies, and interactive AI systems. While recent multimodal models have started to support pointing capabilities, existing benchmarks typically focus only on referential object localization tasks. We introduce PointArena, a comprehensive platf… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 10 Pages, Dataset and code:https://pointarena.github.io/

  2. arXiv:2505.09358  [pdf, ps, other

    cs.CV cs.LG

    Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis

    Authors: Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler

    Abstract: The success of deep learning in computer vision over the past decade has hinged on large labeled datasets and strong pretrained models. In data-scarce settings, the quality of these pretrained models becomes crucial for effective transfer learning. Image classification and self-supervised learning have traditionally been the primary methods for pretraining CNNs and transformer-based architectures.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Journal extension of our CVPR 2024 paper, featuring new tasks, improved efficiency, high-resolution capabilities, and enhanced accessibility

  3. arXiv:2505.09113  [pdf, other

    cs.LG stat.ME

    Sequential Treatment Effect Estimation with Unmeasured Confounders

    Authors: Yingrong Wang, Anpeng Wu, Baohong Li, Ziyang Xiao, Ruoxuan Xiong, Qing Han, Kun Kuang

    Abstract: This paper studies the cumulative causal effects of sequential treatments in the presence of unmeasured confounders. It is a critical issue in sequential decision-making scenarios where treatment decisions and outcomes dynamically evolve over time. Advanced causal methods apply transformer as a backbone to model such time sequences, which shows superiority in capturing long time dependence and per… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.09067  [pdf, ps, other

    math.OC cs.RO eess.SY

    Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability

    Authors: Boyang Li, Zheng Gong, Sylvia Herbert

    Abstract: In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address th… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures

  5. arXiv:2505.09058  [pdf, ps, other

    cs.RO eess.SY

    Reach-Avoid-Stabilize Using Admissible Control Sets

    Authors: Zheng Gong, Boyang Li, Sylvia Herbert

    Abstract: Hamilton-Jacobi Reachability (HJR) analysis has been successfully used in many robotics and control tasks, and is especially effective in computing reach-avoid sets and control laws that enable an agent to reach a goal while satisfying state constraints. However, the original HJR formulation provides no guarantees of safety after a) the prescribed time horizon, or b) goal satisfaction. The reach-a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control

  6. arXiv:2505.08919  [pdf, ps, other

    cs.GR cs.AI cs.CV

    Template-Guided Reconstruction of Pulmonary Segments with Neural Implicit Functions

    Authors: Kangxian Xie, Yufei Zhu, Kaiming Kuang, Li Zhang, Hongwei Bran Li, Mingchen Gao, Jiancheng Yang

    Abstract: High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical treatment planning for lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: In revision process

  7. arXiv:2505.08446  [pdf, ps, other

    cs.AI

    Agent-as-a-Service based on Agent Network

    Authors: Yuhan Zhu, Haojie Liu, Jian Wang, Bing Li, Zikang Yin, Yefei Liao

    Abstract: The rise of large model-based AI agents has spurred interest in Multi-Agent Systems (MAS) for their capabilities in decision-making, collaboration, and adaptability. While the Model Context Protocol (MCP) addresses tool invocation and data exchange challenges via a unified protocol, it lacks support for organizing agent-level collaboration. To bridge this gap, we propose Agent-as-a-Service based o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: work in progress

  8. arXiv:2505.07688  [pdf, ps, other

    cs.GT cs.LG

    Heterogeneous Data Game: Characterizing the Model Competition Across Multiple Data Sources

    Authors: Renzhe Xu, Kang Wang, Bo Li

    Abstract: Data heterogeneity across multiple sources is common in real-world machine learning (ML) settings. Although many methods focus on enabling a single model to handle diverse data, real-world markets often comprise multiple competing ML providers. In this paper, we propose a game-theoretic framework -- the Heterogeneous Data Game -- to analyze how such providers compete across heterogeneous data sour… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  9. arXiv:2505.06977  [pdf, other

    cs.AI cs.LG

    CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

    Authors: Wenju Sun, Qingyong Li, Yangli-ao Geng, Boyang Li

    Abstract: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training. Existing state-of-the-art techniques, such as Task Arithmetic and its variants, merge models by accumulating task vectors -- the parameter differences between pretrained and finetuned models. However, task vector accumulation is often hindered by knowledge c… ▽ More

    Submitted 14 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  10. arXiv:2505.06892  [pdf, other

    cs.LG

    Learning Soft Sparse Shapes for Efficient Time-Series Classification

    Authors: Zhen Liu, Yicheng Luo, Boyuan Li, Emadeldeen Eldele, Min Wu, Qianli Ma

    Abstract: Shapelets are discriminative subsequences (or shapes) with high interpretability in time series classification. Due to the time-intensive nature of shapelet discovery, existing shapelet-based methods mainly focus on selecting discriminative shapes while discarding others to achieve candidate subsequence sparsification. However, this approach may exclude beneficial shapes and overlook the varying c… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted in ICML 2025

  11. arXiv:2505.06814  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge

    Authors: Bin Li, Shenxi Liu, Yixuan Weng, Yue Du, Yuhang Tian, Shoujun Zhou

    Abstract: Following the successful hosts of the 1-st (NLPCC 2023 Foshan) CMIVQA and the 2-rd (NLPCC 2024 Hangzhou) MMIVQA challenges, this year, a new task has been introduced to further advance research in multi-modal, multilingual, and multi-hop medical instructional question answering (M4IVQA) systems, with a specific focus on medical instructional videos. The M4IVQA challenge focuses on evaluating model… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 12 pages, 5 figures, 4 tables

  12. arXiv:2505.06685  [pdf, ps, other

    cs.MM cs.CV

    Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding

    Authors: Dawei Huang, Qing Li, Chuan Yan, Zebang Cheng, Yurong Huang, Xiang Li, Bin Li, Xiaohui Wang, Zheng Lian, Xiaojiang Peng

    Abstract: Emotion understanding in videos aims to accurately recognize and interpret individuals' emotional states by integrating contextual, visual, textual, and auditory cues. While Large Multimodal Models (LMMs) have demonstrated significant progress in general vision-language (VL) tasks, their performance in emotion-specific scenarios remains limited. Moreover, fine-tuning LMMs on emotion-related tasks… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  13. arXiv:2505.05853  [pdf, other

    cs.CV

    PICD: Versatile Perceptual Image Compression with Diffusion Rendering

    Authors: Tongda Xu, Jiahao Li, Bin Li, Yan Wang, Ya-Qin Zhang, Yan Lu

    Abstract: Recently, perceptual image compression has achieved significant advancements, delivering high visual quality at low bitrates for natural images. However, for screen content, existing methods often produce noticeable artifacts when compressing text. To tackle this challenge, we propose versatile perceptual screen image compression with diffusion rendering (PICD), a codec that works well for both sc… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  14. arXiv:2505.05589  [pdf, ps, other

    cs.CV cs.AI cs.LG

    ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation

    Authors: Jingzhong Lin, Yuanyuan Qi, Xinru Li, Wenxuan Huang, Xiangfeng Xu, Bangyan Li, Xuejiao Wang, Gaoqi He

    Abstract: Reactive dance generation (RDG) produces follower movements conditioned on guiding dancer and music while ensuring spatial coordination and temporal coherence. However, existing methods overemphasize global constraints and optimization, overlooking local information, such as fine-grained spatial interactions and localized temporal context. Therefore, we present ReactDance, a novel diffusion-based… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.04620  [pdf, other

    cs.CV

    On Path to Multimodal Generalist: General-Level and General-Bench

    Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu , et al. (7 additional authors not shown)

    Abstract: The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expande… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML'25, 305 pages, 115 tables, 177 figures, project page: https://generalist.top/

  16. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  17. arXiv:2505.04209  [pdf, other

    cs.IR cs.AI cs.LG

    To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay

    Authors: Soumik Dey, Hansi Wu, Binbin Li

    Abstract: E-commerce sellers are recommended keyphrases based on their inventory on which they advertise to increase buyer engagement (clicks/sales). The relevance of advertiser keyphrases plays an important role in preventing the inundation of search systems with numerous irrelevant items that compete for attention in auctions, in addition to maintaining a healthy seller perception. In this work, we descri… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  18. arXiv:2505.03814  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs

    Authors: Ganghua Wang, Zhaorun Chen, Bo Li, Haifeng Xu

    Abstract: As foundation models continue to scale, the size of trained models grows exponentially, presenting significant challenges for their evaluation. Current evaluation practices involve curating increasingly large datasets to assess the performance of large language models (LLMs). However, there is a lack of systematic analysis and guidance on determining the sufficiency of test data or selecting infor… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  19. arXiv:2505.03654  [pdf, other

    cs.CV cs.AI

    ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

    Authors: Yifan Xiang, Zhenxi Zhang, Bin Li, Yixuan Weng, Shoujun Zhou, Yangfan He, Keqin Li

    Abstract: Recent advances in personalized MLLMs enable effective capture of user-specific concepts, supporting both recognition of personalized concepts and contextual captioning. However, humans typically explore and reason over relations among objects and individuals, transcending surface-level information to achieve more personalized and contextual understanding. To this end, existing methods may face th… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Work in progress

  20. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  21. arXiv:2505.03293  [pdf, other

    cs.CL

    Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback

    Authors: Shijing Zhu, Zhuang Chen, Guanqun Bi, Binghang Li, Yaxi Deng, Dazhen Wan, Libiao Peng, Xiyao Xiao, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, FangFang Li, Minlie Huang

    Abstract: Large language models (LLMs) have shown promise in providing scalable mental health support, while evaluating their counseling capability remains crucial to ensure both efficacy and safety. Existing evaluations are limited by the static assessment that focuses on knowledge tests, the single perspective that centers on user experience, and the open-loop framework that lacks actionable feedback. To… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: in progress

  22. arXiv:2505.02784  [pdf, other

    cs.CV

    Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge

    Authors: Vladyslav Zalevskyi, Thomas Sanchez, Misha Kaandorp, Margaux Roulet, Diego Fajardo-Rojas, Liu Li, Jana Hutter, Hongwei Bran Li, Matthew Barkovich, Hui Ji, Luca Wilhelmi, Aline Dändliker, Céline Steger, Mériam Koob, Yvan Gomez, Anton Jakovčić, Melita Klaić, Ana Adžić, Pavel Marković, Gracia Grabarić, Milan Rados, Jordina Aviles Verdera, Gregor Kasprian, Gregor Dovjak, Raphael Gaubert-Rachmühl , et al. (45 additional authors not shown)

    Abstract: Accurate fetal brain tissue segmentation and biometric analysis are essential for studying brain development in utero. The FeTA Challenge 2024 advanced automated fetal brain MRI analysis by introducing biometry prediction as a new task alongside tissue segmentation. For the first time, our diverse multi-centric test set included data from a new low-field (0.55T) MRI dataset. Evaluation metrics wer… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  23. arXiv:2505.01744  [pdf, ps, other

    cs.LG

    Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

    Authors: Yezhen Wang, Zhouhao Yang, Brian K Chen, Fanyi Pu, Bo Li, Tianyu Gao, Kenji Kawaguchi

    Abstract: Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-ran… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  24. arXiv:2505.01629  [pdf, ps, other

    cs.GT

    When is Truthfully Allocating Chores no Harder than Goods?

    Authors: Bo Li, Biaoshuai Tao, Fangxiao Wang, Xiaowei Wu, Mingwei Yang, Shengwei Zhou

    Abstract: We study the problem of fairly and efficiently allocating a set of items among strategic agents with additive valuations, where items are either all indivisible or all divisible. When items are \emph{goods}, numerous positive and negative results are known regarding the fairness and efficiency guarantees achievable by \emph{truthful} mechanisms, whereas our understanding of truthful mechanisms for… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  25. arXiv:2505.00738  [pdf

    eess.IV cs.LG

    XeMap: Contextual Referring in Large-Scale Remote Sensing Environments

    Authors: Yuxi Li, Lu Si, Yujie Hou, Chengaung Liu, Bin Li, Hongjian Fang, Jun Zhang

    Abstract: Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual,… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 14 pages, 8 figures

  26. arXiv:2505.00212  [pdf, ps, other

    cs.MA cs.CL

    Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

    Authors: Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, Qingyun Wu

    Abstract: Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extens… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  27. arXiv:2504.21042  [pdf, other

    cs.CR cs.AI cs.LG

    What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

    Authors: Jiamin Chang, Haoyang Li, Hammond Pearce, Ruoxi Sun, Bo Li, Minhui Xue

    Abstract: The growing adoption of artificial intelligence (AI) has amplified concerns about trustworthiness, including integrity, privacy, robustness, and bias. To assess and attribute these threats, we propose ConceptLens, a generic framework that leverages pre-trained multimodal models to identify the root causes of integrity threats by analyzing Concept Shift in probing samples. ConceptLens demonstrates… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Accept By The ACM Conference on Computer and Communications Security (CCS) 2025

  28. arXiv:2504.20666  [pdf, other

    cs.LG

    SFi-Former: Sparse Flow Induced Attention for Graph Transformer

    Authors: Zhonghao Li, Ji Shi, Xinming Zhang, Miao Zhang, Bo Li

    Abstract: Graph Transformers (GTs) have demonstrated superior performance compared to traditional message-passing graph neural networks in many studies, especially in processing graph data with long-range dependencies. However, GTs tend to suffer from weak inductive bias, overfitting and over-globalizing problems due to the dense attention. In this paper, we introduce SFi-attention, a novel attention mechan… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: ICMR 2025

  29. arXiv:2504.19654  [pdf, other

    cs.RO cs.AI cs.LG

    Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM

    Authors: Leon Davies, Baihua Li, Mohamad Saada, Simon Sølvsten, Qinggang Meng

    Abstract: SLAM (Simultaneous Localisation and Mapping) is a crucial component for robotic systems, providing a map of an environment, the current location and previous trajectory of a robot. While 3D LiDAR SLAM has received notable improvements in recent years, 2D SLAM lags behind. Gradual drifts in odometry and pose estimation inaccuracies hinder modern 2D LiDAR-odometry algorithms in large complex environ… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 12 pages, preprint, submitted to Robotics And Autonomous Systems

  30. arXiv:2504.19653  [pdf, other

    cs.RO cs.AI cs.LG

    GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM

    Authors: Leon Davies, Baihua Li, Mohamad Saada, Simon Sølvsten, Qinggang Meng

    Abstract: SLAM is a fundamental component of modern autonomous systems, providing robots and their operators with a deeper understanding of their environment. SLAM systems often encounter challenges due to the dynamic nature of robotic motion, leading to inaccuracies in mapping quality, particularly in 2D representations such as Occupancy Grid Maps. These errors can significantly degrade map quality, hinder… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 10 pages, preprint conference submission

  31. arXiv:2504.19276  [pdf, other

    cs.LG cs.AI cs.CL

    Anyprefer: An Agentic Framework for Preference Data Synthesis

    Authors: Yiyang Zhou, Zhaoyang Wang, Tianle Wang, Shangyu Xing, Peng Xia, Bo Li, Kaiyuan Zheng, Zijian Zhang, Zhaorun Chen, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Weitong Zhang, Ying Wei, Mohit Bansal, Huaxiu Yao

    Abstract: High-quality preference data is essential for aligning foundation models with human values through preference learning. However, manual annotation of such data is often time-consuming and costly. Recent methods often adopt a self-rewarding approach, where the target model generates and annotates its own preference data, but this can lead to inaccuracies since the reward model shares weights with t… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  32. arXiv:2504.19163  [pdf, other

    cs.GR

    Bernstein Bounds for Caustics

    Authors: Zhimin Fan, Chen Wang, Yiming Wang, Boxuan Li, Yuxuan Guo, Ling-Qi Yan, Yanwen Guo, Jie Guo

    Abstract: Systematically simulating specular light transport requires an exhaustive search for primitive tuples containing admissible paths. Given the extreme inefficiency of enumerating all combinations, we propose to significantly reduce the search domain by sampling such tuples. The challenge is to design proper sampling probabilities that keep the noise level controllable. Our key insight is that by bou… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH 2025)

  33. arXiv:2504.19101  [pdf, other

    cs.CL

    Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation

    Authors: Qianren Mao, Qili Zhang, Hanwen Hao, Zhentao Han, Runhua Xu, Weifeng Jiang, Qi Hu, Zhijun Chen, Tyler Zhou, Bo Li, Yangqiu Song, Jin Dong, Jianxin Li, Philip S. Yu

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution for enhancing the accuracy and credibility of Large Language Models (LLMs), particularly in Question & Answer tasks. This is achieved by incorporating proprietary and private data from integrated databases. However, private RAG systems face significant challenges due to the scarcity of private domain data and critica… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  34. arXiv:2504.18569  [pdf, other

    cs.CR cs.AI cs.LG

    Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes

    Authors: Guanchen Wu, Linzhi Zheng, Han Xie, Zhen Xiang, Jiaying Lu, Darren Liu, Delgersuren Bold, Bo Li, Xiao Hu, Carl Yang

    Abstract: The de-identification of private information in medical data is a crucial process to mitigate the risk of confidentiality breaches, particularly when patient personal details are not adequately removed before the release of medical records. Although rule-based and learning-based methods have been proposed, they often struggle with limited generalizability and require substantial amounts of annotat… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Shorter version published in MedInfo 2025

  35. arXiv:2504.18349  [pdf, other

    cs.CV cs.CR

    Revisiting Data Auditing in Large Vision-Language Models

    Authors: Hongyu Zhu, Sichu Liang, Wenwen Wang, Boheng Li, Tongxin Yuan, Fangqi Li, ShiLin Wang, Zhuosheng Zhang

    Abstract: With the surge of large language models (LLMs), Large Vision-Language Models (VLMs)--which integrate vision encoders with LLMs for accurate visual grounding--have shown great potential in tasks like generalist agents and robotic control. However, VLMs are typically trained on massive web-scraped images, raising concerns over copyright infringement and privacy violations, and making data auditing i… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  36. arXiv:2504.17828  [pdf, other

    cs.CV cs.AI

    VEU-Bench: Towards Comprehensive Understanding of Video Editing

    Authors: Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu

    Abstract: Widely shared videos on the internet are often edited. Recently, although Video Large Language Models (Vid-LLMs) have made great progress in general video understanding tasks, their capabilities in video editing understanding (VEU) tasks remain unexplored. To address this gap, in this paper, we introduce VEU-Bench (Video Editing Understanding Benchmark), a comprehensive benchmark that categorizes… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  37. arXiv:2504.17704  [pdf, other

    cs.CL

    Safety in Large Reasoning Models: A Survey

    Authors: Cheng Wang, Yue Liu, Baolong Li, Duzhen Zhang, Zhongzhi Li, Junfeng Fang

    Abstract: Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents a comprehensive… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  38. arXiv:2504.17490  [pdf, ps, other

    cs.LG cs.AI

    Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning

    Authors: Mingqi Yuan, Qi Wang, Guozheng Ma, Bo Li, Xin Jin, Yunbo Wang, Xiaokang Yang, Wenjun Zeng, Dacheng Tao

    Abstract: Developing lifelong learning agents is crucial for artificial general intelligence. However, deep reinforcement learning (RL) systems often suffer from plasticity loss, where neural networks gradually lose their ability to adapt during training. Despite its significance, this field lacks unified benchmarks and evaluation protocols. We introduce Plasticine, the first open-source framework for bench… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 23 pages

  39. arXiv:2504.17365  [pdf, other

    cs.CV cs.CL

    TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation

    Authors: Ling You, Wenxuan Huang, Xinni Xie, Xiangyi Wei, Bangyan Li, Shaohui Lin, Yang Li, Changbo Wang

    Abstract: Soccer is a globally popular sporting event, typically characterized by long matches and distinctive highlight moments. Recent advances in Multimodal Large Language Models (MLLMs) offer promising capabilities in temporal grounding and video understanding, soccer commentary generation often requires precise temporal localization and semantically rich descriptions over long-form video. However, exis… ▽ More

    Submitted 28 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  40. arXiv:2504.16516  [pdf, other

    cs.CV cs.AI

    Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation

    Authors: Junrong Yue, Yifan Zhang, Chuan Qin, Bo Li, Xiaomin Lie, Xinlei Yu, Wenxin Zhang, Zhendong Zhao

    Abstract: Vision-and-Language Navigation (VLN) aims to enable embodied agents to follow natural language instructions and reach target locations in real-world environments. While prior methods often rely on either global scene representations or object-level features, these approaches are insufficient for capturing the complex interactions across modalities required for accurate navigation. In this paper, w… ▽ More

    Submitted 24 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

    Comments: 11 pages, 4 figures, Submitted to ACM MM 2025

  41. arXiv:2504.16072  [pdf, ps, other

    cs.CV cs.AI

    Describe Anything: Detailed Localized Image and Video Captioning

    Authors: Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui

    Abstract: Generating detailed and accurate descriptions for specific regions in images and videos remains a fundamental challenge for vision-language models. We introduce the Describe Anything Model (DAM), a model designed for detailed localized captioning (DLC). DAM preserves both local details and global context through two key innovations: a focal prompt, which ensures high-resolution encoding of targete… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Project page: https://describe-anything.github.io/

  42. arXiv:2504.16016  [pdf, ps, other

    cs.CV

    Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework

    Authors: Xinyuan Song, Yangfan He, Sida Li, Jianhui Wang, Hongyang He, Xinhang Yuan, Ruoyu Wang, Jiaqi Chen, Keqin Li, Kuan Lu, Menghao Huo, Binxu Li, Pei Liu

    Abstract: Adapter-based methods are commonly used to enhance model performance with minimal additional complexity, especially in video editing tasks that require frame-to-frame consistency. By inserting small, learnable modules into pretrained diffusion models, these adapters can maintain temporal coherence without extensive retraining. Approaches that incorporate prompt learning with both shared and frame-… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2501.04606

  43. arXiv:2504.15918  [pdf, other

    cs.CV cs.AI cs.HC

    Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions

    Authors: Chang Zong, Bin Li, Shoujun Zhou, Jian Wan, Lei Zhang

    Abstract: Locating specific segments within an instructional video is an efficient way to acquire guiding knowledge. Generally, the task of obtaining video segments for both verbal explanations and visual demonstrations is known as visual answer localization (VAL). However, users often need multiple interactions to obtain answers that align with their expectations when using the system. During these interac… ▽ More

    Submitted 22 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 16 pages, 8 figures

    MSC Class: 68T45; 68T20

  44. arXiv:2504.15814  [pdf, other

    cs.CE cs.MS gr-qc

    Fast Higher-Order Interpolation and Restriction in ExaHyPE Avoiding Non-physical Reflections

    Authors: Timothy Stokes, Tobias Weinzierl, Han Zhang, Baojiu Li

    Abstract: Wave equations help us to understand phenomena ranging from earthquakes to tsunamis. These phenomena materialise over very large scales. It would be computationally infeasible to track them over a regular mesh. Yet, since the phenomena are localised, adaptive mesh refinement (AMR) can be used to construct meshes with a higher resolution close to the regions of interest. ExaHyPE is a software engin… ▽ More

    Submitted 23 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  45. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  46. arXiv:2504.15003  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: KwaiSR Dataset and Study

    Authors: Xin Li, Xijun Wang, Bingchen Li, Kun Yuan, Yizhen Shao, Suhang Yao, Ming Sun, Chao Zhou, Radu Timofte, Zhibo Chen

    Abstract: In this work, we build the first benchmark dataset for short-form UGC Image Super-resolution in the wild, termed KwaiSR, intending to advance the research on developing image super-resolution algorithms for short-form UGC platforms. This dataset is collected from the Kwai Platform, which is composed of two parts, i.e., synthetic and wild parts. Among them, the synthetic dataset, including 1,900 im… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: KwaiSR dataset, a new dataset for image super-resolution, used for CVPR NTIRE 2025 Challenge; CVPR 2025 workshop paper

  47. arXiv:2504.14994  [pdf, other

    cs.LG

    Learning Compositional Transferability of Time Series for Source-Free Domain Adaptation

    Authors: Hankang Sun, Guiming Li, Su Yang, Baoqi Li

    Abstract: Domain adaptation is challenging for time series classification due to the highly dynamic nature. This study tackles the most difficult subtask when both target labels and source data are inaccessible, namely, source-free domain adaptation. To reuse the classification backbone pre-trained on source data, time series reconstruction is a sound solution that aligns target and source time series by mi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Corresponding author: Su Yang

  48. arXiv:2504.14641  [pdf, other

    cs.SE eess.SY

    HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis

    Authors: Kangwei Xu, Bing Li, Grace Li Zhang, Ulf Schlichtmann

    Abstract: In high-level synthesis (HLS), C/C++ programs with synthesis directives are used to generate circuits for FPGA implementations. However, hardware-specific and platform-dependent characteristics in these implementations can introduce behavioral discrepancies between the original C/C++ programs and the circuits after high-level synthesis. Existing methods for testing behavioral discrepancies in HLS… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  49. arXiv:2504.14225  [pdf, other

    cs.CL

    Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

    Authors: Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, Dan Roth

    Abstract: Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks -- from offering writing support to delivering tailored recommendations or consultations. Over time, the interaction history between a user and an LLM can provide extensive information about an individual's traits and preferences. However, open questions remain on how well LLMs today can eff… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  50. arXiv:2504.13424  [pdf, other

    cs.NI

    Decentralized Handover Parameter Optimization with MARL for Load Balancing in 5G Networks

    Authors: Yang Shen, Shuqi Chai, Bing Li, Xiaodong Luo, Qingjiang Shi, Rongqing Zhang

    Abstract: In cellular networks, cell handover refers to the process where a device switches from one base station to another, and this mechanism is crucial for balancing the load among different cells. Traditionally, engineers would manually adjust parameters based on experience. However, the explosive growth in the number of cells has rendered manual tuning impractical. Existing research tends to overlook… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 12 pages, 11 figures

    ACM Class: C.2.3