Skip to main content

Showing 1–50 of 3,548 results for author: Li, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09928  [pdf, ps, other

    cs.CR

    DeFeed: Secure Decentralized Cross-Contract Data Feed in Web 3.0 for Connected Autonomous Vehicles

    Authors: Xingchen Sun, Runhua Xu, Wei Ni, Li Duan, Chao Li

    Abstract: Smart contracts have been a topic of interest in blockchain research and are a key enabling technology for Connected Autonomous Vehicles (CAVs) in the era of Web 3.0. These contracts enable trustless interactions without the need for intermediaries, as they operate based on predefined rules encoded on the blockchain. However, smart contacts face significant challenges in cross-contract communicati… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.09466  [pdf, other

    cs.CV cs.AI

    A 2D Semantic-Aware Position Encoding for Vision Transformers

    Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi

    Abstract: Vision transformers have demonstrated significant advantages in computer vision tasks due to their ability to capture long-range dependencies and contextual relationships through self-attention. However, existing position encoding techniques, which are largely borrowed from natural language processing, fail to effectively capture semantic-aware positional relationships between image patches. Tradi… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures, 3 tables

  3. arXiv:2505.08260  [pdf, ps, other

    cs.CV

    Few-shot Novel Category Discovery

    Authors: Chunming Li, Shidong Wang, Haofeng Zhang

    Abstract: The recently proposed Novel Category Discovery (NCD) adapt paradigm of transductive learning hinders its application in more real-world scenarios. In fact, few labeled data in part of new categories can well alleviate this burden, which coincides with the ease that people can label few of new category data. Therefore, this paper presents a new setting in which a trained agent is able to flexibly s… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.07782  [pdf, ps, other

    cs.LG

    MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

    Authors: Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

    Abstract: We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experimen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.07387  [pdf, ps, other

    cs.CV

    Feature Visualization in 3D Convolutional Neural Networks

    Authors: Chunpeng Li, Ya-tang Li

    Abstract: Understanding the computations of convolutional neural networks requires effective visualization of their kernels. While maximal activation methods have proven successful in highlighting the preferred features of 2D convolutional kernels, directly applying these techniques to 3D convolutions often leads to uninterpretable results due to the higher dimensionality and complexity of 3D features. To a… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2505.07313  [pdf, ps, other

    cs.CL cs.AI

    Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study

    Authors: Baixuan Xu, Chunyang Li, Weiqi Wang, Wei Fan, Tianshi Zheng, Haochen Shi, Tao Fan, Yangqiu Song, Qiang Yang

    Abstract: Designing effective collaboration structure for multi-agent LLM systems to enhance collective reasoning is crucial yet remains under-explored. In this paper, we systematically investigate how collaborative reasoning performance is affected by three key design dimensions: (1) Expertise-Domain Alignment, (2) Collaboration Paradigm (structured workflow vs. diversity-driven integration), and (3) Syste… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 18 pages

  7. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  8. arXiv:2505.06948  [pdf, other

    cs.CV cs.LG

    Unsupervised Learning for Class Distribution Mismatch

    Authors: Pan Du, Wangbo Zhao, Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhao, Hong Chen, Cuiping Li, Kai Wang, Yang You

    Abstract: Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability a… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  9. arXiv:2505.06831  [pdf, other

    cs.CV

    Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification

    Authors: Miaoyun Zhao, Qiang Zhang, Chenrong Li

    Abstract: Achieving group-robust generalization in the presence of spurious correlations remains a significant challenge, particularly when bias annotations are unavailable. Recent studies on Class-Conditional Distribution Balancing (CCDB) reveal that spurious correlations often stem from mismatches between the class-conditional and marginal distributions of bias attributes. They achieve promising results b… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  10. arXiv:2505.06307  [pdf, ps, other

    cs.CR cs.AI

    Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought

    Authors: Mingfei Zeng, Ming Xie, Xixi Zheng, Chunhai Li, Chuan Zhang, Liehuang Zhu

    Abstract: The rapid development of Internet of Things (IoT) technology has transformed people's way of life and has a profound impact on both production and daily activities. However, with the rapid advancement of IoT technology, the security of IoT devices has become an unavoidable issue in both research and applications. Although some efforts have been made to detect or mitigate IoT security vulnerabiliti… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  11. arXiv:2505.05913  [pdf, ps, other

    cs.CV

    DFEN: Dual Feature Equalization Network for Medical Image Segmentation

    Authors: Jianjian Yin, Yi Chen, Chengyu Li, Zhichao Zheng, Yanhui Gu, Junsheng Zhou

    Abstract: Current methods for medical image segmentation primarily focus on extracting contextual feature information from the perspective of the whole image. While these methods have shown effective performance, none of them take into account the fact that pixels at the boundary and regions with a low number of class pixels capture more contextual feature information from other classes, leading to misclass… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  12. arXiv:2505.05367  [pdf, other

    cs.CV eess.IV

    Joint Super-Resolution and Segmentation for 1-m Impervious Surface Area Mapping in China's Yangtze River Economic Belt

    Authors: Jie Deng, Danfeng Hong, Chenyu Li, Naoto Yokoya

    Abstract: We propose a novel joint framework by integrating super-resolution and segmentation, called JointSeg, which enables the generation of 1-meter ISA maps directly from freely available Sentinel-2 imagery. JointSeg was trained on multimodal cross-resolution inputs, offering a scalable and affordable alternative to traditional approaches. This synergistic design enables gradual resolution enhancement f… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  13. arXiv:2505.05062  [pdf, other

    cs.CV

    ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning

    Authors: Enhao Zhang, Chaohua Li, Chuanxing Geng, Songcan Chen

    Abstract: Based on the success of large-scale visual foundation models like CLIP in various downstream tasks, this paper initially attempts to explore their impact on Long-Tailed Semi-Supervised Learning (LTSSL) by employing the foundation model with three strategies: Linear Probing (LP), Lightweight Fine-Tuning (LFT), and Full Fine-Tuning (FFT). Our analysis presents the following insights: i) Compared to… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  14. arXiv:2505.04638  [pdf, ps, other

    cs.AI cs.CL cs.IR

    Towards Artificial Intelligence Research Assistant for Expert-Involved Learning

    Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao

    Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present \textbf{AR}tificial \textbf{I}ntelligence research assistant for \textbf{E}xpert-involved \textbf{L}earning (ARIEL), a multimodal datas… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 36 pages, 7 figures

  15. arXiv:2505.04302  [pdf, other

    cs.GT

    PPO-ACT: Proximal Policy Optimization with Adversarial Curriculum Transfer for Spatial Public Goods Games

    Authors: Zhaoqilin Yang, Chanchan Li, Xin Wang, Youliang Tian

    Abstract: This study investigates cooperation evolution mechanisms in the spatial public goods game. A novel deep reinforcement learning framework, Proximal Policy Optimization with Adversarial Curriculum Transfer (PPO-ACT), is proposed to model agent strategy optimization in dynamic environments. Traditional evolutionary game models frequently exhibit limitations in modeling long-term decision-making proce… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  16. arXiv:2505.03195  [pdf, other

    cs.AR

    QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies

    Authors: Shuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen

    Abstract: Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on sup… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures

  17. arXiv:2505.03049  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci

    34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery

    Authors: Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, L. Catherine Brinson, Yuan Chiang, Defne Circi, Min-Hsueh Chiu, Nathan Daelman, Matthew L. Evans, Abhijeet S. Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, Sartaaj Takrim Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, Seyed Mohamad Moosavi, Aakash Naik, Aleyna Beste Ozhan, Dieter Plessers, Aritra Roy, Fabian Schöppach , et al. (8 additional authors not shown)

    Abstract: Large Language Models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline resear… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2411.15221

  18. DPNet: Dynamic Pooling Network for Tiny Object Detection

    Authors: Luqi Gong, Haotian Chen, Yikun Chen, Tianliang Yao, Chao Li, Shuai Zhao, Guangjie Han

    Abstract: In unmanned aerial systems, especially in complex environments, accurately detecting tiny objects is crucial. Resizing images is a common strategy to improve detection accuracy, particularly for small objects. However, simply enlarging images significantly increases computational costs and the number of negative samples, severely degrading detection performance and limiting its applicability. This… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures Haotian Chen and Luqi Gong contributed equally to this work

  19. arXiv:2505.02690  [pdf, other

    cs.CV

    Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation

    Authors: Haotian Chen, Ziyu Liu, Xi Cheng, Chuangqi Li

    Abstract: This study introduces Dance of Fireworks, an interactive system designed to combat sedentary health risks by enhancing engagement in radio calisthenics. Leveraging mobile device cameras and lightweight pose estimation (PoseNet/TensorFlow Lite), the system extracts body keypoints, computes joint angles, and compares them with standardized motions to deliver real-time corrective feedback. To incenti… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 21 pages, 13 figures

  20. arXiv:2505.02448  [pdf, other

    cs.CV

    Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey

    Authors: Chaohua Li, Enhao Zhang, Chuanxing Geng, Songcan Chen

    Abstract: Out-of-distribution detection (OOD) is a pivotal task for real-world applications that trains models to identify samples that are distributionally different from the in-distribution (ID) data during testing. Recent advances in AI, particularly Vision-Language Models (VLMs) like CLIP, have revolutionized OOD detection by shifting from traditional unimodal image detectors to multimodal image-text de… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  21. arXiv:2505.01838  [pdf, other

    cs.CV

    MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization

    Authors: Chenghong Li, Hongjie Liao, Yihao Zhi, Xihe Yang, Zhengwentai Sun, Jiahao Chang, Shuguang Cui, Xiaoguang Han

    Abstract: In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while significant progress has been achieved in object-centric tasks through large-scale datasets like Objaverse and MVImgNet, human-centric tasks have seen limited advancement, largely due to the absence of a comparable larg… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: project page: https://kevinlee09.github.io/research/MVHumanNet++/. arXiv admin note: substantial text overlap with arXiv:2312.02963

  22. arXiv:2505.01454  [pdf, ps, other

    cs.CR cs.LG

    Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning

    Authors: Zhiyong Jin, Runhua Xu, Chao Li, Yizhong Liu, Jianxin Li

    Abstract: Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it faces significant challenges in communication efficiency and vulnerability to poisoning attacks. While sparsification techniques mitigate communication overhead by transmitting only critical model parameters, they inadvertently amplify security risks: adversarial clients ca… ▽ More

    Submitted 9 May, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

  23. arXiv:2505.01322  [pdf, other

    cs.CV

    FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

    Authors: Chenxi Li, Weijie Wang, Qiang Li, Bruno Lepri, Nicu Sebe, Weizhi Nie

    Abstract: Text-driven object insertion in 3D scenes is an emerging task that enables intuitive scene editing through natural language. However, existing 2D editing-based methods often rely on spatial priors such as 2D masks or 3D bounding boxes, and they struggle to ensure consistency of the inserted object. These limitations hinder flexibility and scalability in real-world applications. In this paper, we p… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  24. arXiv:2505.01022  [pdf, other

    cs.SE

    Detecting the Root Cause Code Lines in Bug-Fixing Commits by Heterogeneous Graph Learning

    Authors: Liguo Ji, Chenchen Li, Shenglin Wang, Furui Zhan

    Abstract: With the continuous growth in the scale and complexity of software systems, defect remediation has become increasingly difficult and costly. Automated defect prediction tools can proactively identify software changes prone to defects within software projects, thereby enhancing software development efficiency. However, existing work in heterogeneous and complex software projects continues to face c… ▽ More

    Submitted 13 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  25. arXiv:2505.00991  [pdf, other

    cs.RO eess.SY

    DexCtrl: Towards Sim-to-Real Dexterity with Adaptive Controller Learning

    Authors: Shuqi Zhao, Ke Yang, Yuxin Chen, Chenran Li, Yichen Xie, Xiang Zhang, Changhao Wang, Masayoshi Tomizuka

    Abstract: Dexterous manipulation has seen remarkable progress in recent years, with policies capable of executing many complex and contact-rich tasks in simulation. However, transferring these policies from simulation to real world remains a significant challenge. One important issue is the mismatch in low-level controller dynamics, where identical trajectories can lead to vastly different contact forces an… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  26. arXiv:2505.00990  [pdf, other

    cs.SE

    Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks

    Authors: Jiaqi Zhang, Shikai Guo, Hui Li, Chenchen Li, Yu Chai, Rong Chen

    Abstract: The Just-In-Time defect prediction model helps development teams improve software quality and efficiency by assessing whether code changes submitted by developers are likely to introduce defects in real-time, allowing timely identification of potential issues during the commit stage. However, two main challenges exist in current work due to the reality that all deleted and added lines in bug-fixin… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  27. arXiv:2505.00389  [pdf, other

    cs.CL

    CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass

    Authors: Bowen Zhang, Zixin Song, Chunping Li

    Abstract: As a fundamental task in Information Retrieval and Computational Linguistics, sentence representation has profound implications for a wide range of practical applications such as text clustering, content analysis, question-answering systems, and web search. Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field, particularly through unsupervised embeddi… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted by SIGIR 2025 (Full)

  28. arXiv:2505.00259  [pdf, ps, other

    cs.CV cs.AI

    Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

    Authors: Changjun Li, Runqing Jiang, Zhuo Song, Pengpeng Yu, Ye Zhang, Yulan Guo

    Abstract: Post-training quantization (PTQ) has evolved as a prominent solution for compressing complex models, which advocates a small calibration dataset and avoids end-to-end retraining. However, most existing PTQ methods employ block-wise reconstruction, which neglects cross-block dependency and exhibits a notable accuracy drop in low-bit cases. To address these limitations, this paper presents a novel P… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  29. arXiv:2505.00144  [pdf, other

    cs.SE

    When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey

    Authors: Feifei Niu, Chuanyi Li, Kui Liu, Xin Xia, David Lo

    Abstract: Bug localization is a crucial aspect of software maintenance, running through the entire software lifecycle. Information retrieval-based bug localization (IRBL) identifies buggy code based on bug reports, expediting the bug resolution process for developers. Recent years have witnessed significant achievements in IRBL, propelled by the widespread adoption of deep learning (DL). To provide a compre… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  30. arXiv:2504.21604  [pdf, other

    cs.CL cs.CY

    Robust Misinformation Detection by Visiting Potential Commonsense Conflict

    Authors: Bing Wang, Ximing Li, Changchun Li, Bingrui Zhao, Bo Fu, Renchu Guan, Shengsheng Wang

    Abstract: The development of Internet technology has led to an increased prevalence of misinformation, causing severe negative effects across diverse domains. To mitigate this challenge, Misinformation Detection (MD), aiming to detect online misinformation automatically, emerges as a rapidly growing research topic in the community. In this paper, we propose a novel plug-and-play augmentation method for the… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures. Accepted by IJCAI 2025. Code: https://github.com/wangbing1416/MD-PCC

  31. arXiv:2504.20504  [pdf, other

    eess.IV cs.LG physics.comp-ph

    Quality-factor inspired deep neural network solver for solving inverse scattering problems

    Authors: Yutong Du, Zicheng Liu, Miao Cao, Zupeng Liang, Yali Zong, Changyou Li

    Abstract: Deep neural networks have been applied to address electromagnetic inverse scattering problems (ISPs) and shown superior imaging performances, which can be affected by the training dataset, the network architecture and the applied loss function. Here, the quality of data samples is cared and valued by the defined quality factor. Based on the quality factor, the composition of the training dataset i… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  32. arXiv:2504.20482  [pdf, other

    cs.LG cs.AI

    Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias

    Authors: Chao Li, Changhua Zhou, Jia Chen

    Abstract: Knowledge distillation typically transfers knowledge from a teacher model to a student model by minimizing differences between their output distributions. However, existing distillation approaches largely focus on mimicking absolute probabilities and neglect the valuable relational inductive biases embedded in the teacher's relative predictions, leading to exposure bias. In this paper, we propose… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  33. arXiv:2504.20215  [pdf

    cs.CY cs.HC econ.GN

    Exploring AI-powered Digital Innovations from A Transnational Governance Perspective: Implications for Market Acceptance and Digital Accountability Accountability

    Authors: Claire Li, David Peter Wallis Freeborn

    Abstract: This study explores the application of the Technology Acceptance Model (TAM) to AI-powered digital innovations within a transnational governance framework. By integrating Latourian actor-network theory (ANT), this study examines how institutional motivations, regulatory compliance, and ethical and cultural acceptance drive organisations to develop and adopt AI innovations, enhancing their market a… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the UK Academy for Information Systems Conference 2025, Newcastle, UK. UKAIS

  34. arXiv:2504.19854  [pdf, other

    cs.RO cs.AI cs.CV

    NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

    Authors: Chia-Yu Hung, Qi Sun, Pengfei Hong, Amir Zadeh, Chuan Li, U-Xuan Tan, Navonil Majumder, Soujanya Poria

    Abstract: Existing Visual-Language-Action (VLA) models have shown promising performance in zero-shot scenarios, demonstrating impressive task execution and reasoning capabilities. However, a significant challenge arises from the limitations of visual encoding, which can result in failures during tasks such as object grasping. Moreover, these models typically suffer from high computational overhead due to th… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  35. arXiv:2504.19300  [pdf

    cs.CV

    Myocardial Region-guided Feature Aggregation Net for Automatic Coronary artery Segmentation and Stenosis Assessment using Coronary Computed Tomography Angiography

    Authors: Ni Yao, Xiangyu Liu, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chengyang Li, Fubao Zhu, Weihua Zhou, Chen Zhao

    Abstract: Coronary artery disease (CAD) remains a leading cause of mortality worldwide, requiring accurate segmentation and stenosis detection using Coronary Computed Tomography angiography (CCTA). Existing methods struggle with challenges such as low contrast, morphological variability and small vessel segmentation. To address these limitations, we propose the Myocardial Region-guided Feature Aggregation N… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 31 pages, 12 figures

  36. arXiv:2504.19189  [pdf, other

    cs.GR cs.CV

    Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation

    Authors: Lei Zhong, Chuan Guo, Yiming Xie, Jiawei Wang, Changjian Li

    Abstract: Storyboarding is widely used for creating 3D animations. Animators use the 2D sketches in storyboards as references to craft the desired 3D animations through a trial-and-error process. The traditional approach requires exceptional expertise and is both labor-intensive and time-consuming. Consequently, there is a high demand for automated methods that can directly translate 2D storyboard sketches… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Project page: https://zhongleilz.github.io/Sketch2Anim/

  37. arXiv:2504.19188  [pdf, other

    cs.LG cs.AI cs.CL cs.LO

    Hierarchical Attention Generates Better Proofs

    Authors: Jianlong Chen, Chao Li, Yang Yuan, Andrew C Yao

    Abstract: Large language models (LLMs) have shown promise in formal theorem proving, but their token-level processing often fails to capture the inherent hierarchical nature of mathematical proofs. We introduce \textbf{Hierarchical Attention}, a regularization method that aligns LLMs' attention mechanisms with mathematical reasoning structures. Our approach establishes a five-level hierarchy from foundation… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 15 pages with 3 figures

  38. arXiv:2504.19002  [pdf

    cs.LG cs.CV cs.RO

    Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation

    Authors: Delun Lai, Yeyubei Zhang, Yunchong Liu, Chaojie Li, Huadong Mo

    Abstract: This paper introduces a novel deep learning-based multimodal fusion architecture aimed at enhancing the perception capabilities of autonomous navigation robots in complex environments. By utilizing innovative feature extraction modules, adaptive fusion strategies, and time-series modeling mechanisms, the system effectively integrates RGB images and LiDAR data. The key contributions of this work ar… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures

  39. arXiv:2504.18242  [pdf, ps, other

    cs.IT

    Demand Private Coded Caching: Small Cache Size

    Authors: Qinyi Lu, Nan Liu, Wei Kang, Chunguo Li

    Abstract: We investigate the demand private coded caching problem, which is an $(N,K)$ coded caching problem with $N$ files, $K$ users, each equipped with a cache of size $M$, and an additional privacy constraint on user demands, i.e., each user can not gain any information about the demands of other users. We focus on scenarios where the size of users' caches is small, aiming to further characterize the fu… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  40. arXiv:2504.18096  [pdf, other

    cs.AI cs.LG

    Combating the Bucket Effect:Multi-Knowledge Alignment for Medication Recommendation

    Authors: Xiang Li, Haixu Ma, Guanyong Wu, Shi Mu, Chen Li, Shunpan Liang

    Abstract: Medication recommendation is crucial in healthcare, offering effective treatments based on patient's electronic health records (EHR). Previous studies show that integrating more medication-related knowledge improves medication representation accuracy. However, not all medications encompass multiple types of knowledge data simultaneously. For instance, some medications provide only textual descript… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 18 pages, 5 figures

  41. arXiv:2504.18020  [pdf, other

    cs.CV

    Federated Client-tailored Adapter for Medical Image Segmentation

    Authors: Guyue Hu, Siyuan Song, Yukun Kang, Zhu Yin, Gangming Zhao, Chenglong Li, Jin Tang

    Abstract: Medical image segmentation in X-ray images is beneficial for computer-aided diagnosis and lesion localization. Existing methods mainly fall into a centralized learning paradigm, which is inapplicable in the practical medical scenario that only has access to distributed data islands. Federated Learning has the potential to offer a distributed solution but struggles with heavy training instability d… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  42. arXiv:2504.17315  [pdf, other

    cs.CV cs.AI

    DIMT25@ICDAR2025: HW-TSC's End-to-End Document Image Machine Translation System Leveraging Large Vision-Language Model

    Authors: Zhanglin Wu, Tengfei Song, Ning Xie, Weidong Zhang, Pengfei Li, Shuang Wu, Chong Li, Junhao Zhu, Hao Yang

    Abstract: This paper presents the technical solution proposed by Huawei Translation Service Center (HW-TSC) for the "End-to-End Document Image Machine Translation for Complex Layouts" competition at the 19th International Conference on Document Analysis and Recognition (DIMT25@ICDAR2025). Leveraging state-of-the-art open-source large vision-language model (LVLM), we introduce a training framework that combi… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 7 pages, 1 figures, 2 tables

  43. arXiv:2504.17314  [pdf, other

    cs.LG cs.CV

    Class-Conditional Distribution Balancing for Group Robust Classification

    Authors: Miaoyun Zhao, Qiang Zhang, Chenrong Li

    Abstract: Spurious correlations that lead models to correct predictions for the wrong reasons pose a critical challenge for robust real-world generalization. Existing research attributes this issue to group imbalance and addresses it by maximizing group-balanced or worst-group accuracy, which heavily relies on expensive bias annotations. A compromise approach involves predicting bias information using exten… ▽ More

    Submitted 24 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  44. arXiv:2504.16680  [pdf, other

    cs.RO cs.AI cs.LG

    Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator

    Authors: Chenhao Li, Andreas Krause, Marco Hutter

    Abstract: Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap. While offline RL eliminates the need for risky real-world exploration by learning from pre-collected data, it suffers from distributional shift, limiting policy generalization. Model-Based RL (MBRL) addresses this b… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  45. arXiv:2504.15934  [pdf, other

    cs.ET q-bio.QM

    Real-time raw signal genomic analysis using fully integrated memristor hardware

    Authors: Peiyi He, Shengbo Wang, Ruibin Mao, Sebastian Siegel, Giacomo Pedretti, Jim Ignowski, John Paul Strachan, Ruibang Luo, Can Li

    Abstract: Advances in third-generation sequencing have enabled portable and real-time genomic sequencing, but real-time data processing remains a bottleneck, hampering on-site genomic analysis due to prohibitive time and energy costs. These technologies generate a massive amount of noisy analog signals that traditionally require basecalling and digital mapping, both demanding frequent and costly data moveme… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 16 pages, 6 figures

  46. arXiv:2504.15037  [pdf, other

    cs.LG

    A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

    Authors: Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yan xia, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world, thereby limiting their broader applications. We argue that… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  47. arXiv:2504.14877  [pdf, other

    cs.CV

    Collaborative Enhancement Network for Low-quality Multi-spectral Vehicle Re-identification

    Authors: Aihua Zheng, Yongqi Sun, Zi Wang, Chenglong Li, Jin Tang

    Abstract: The performance of multi-spectral vehicle Re-identification (ReID) is significantly degraded when some important discriminative cues in visible, near infrared and thermal infrared spectra are lost. Existing methods generate or enhance missing details in low-quality spectra data using the high-quality one, generally called the primary spectrum, but how to justify the primary spectrum is a challengi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  48. arXiv:2504.14872  [pdf, other

    cs.SE

    Efficient Function Orchestration for Large Language Models

    Authors: Xiaoxia Liu, Peng Di, Cong Li, Jun Sun, Jingyi Wang

    Abstract: Function calling is a fundamental capability of today's large language models, but sequential function calling posed efficiency problems. Recent studies have proposed to request function calls with parallelism support in order to alleviate this issue. However, they either delegate the concurrent function calls to users for execution which are conversely executed sequentially, or overlook the relat… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Submitted to TSE

  49. arXiv:2504.14214  [pdf, other

    cs.IR

    Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration

    Authors: Hongji Li, Hanwen Du, Youhua Li, Junchen Fu, Chunxiao Li, Ziyi Zhuang, Jiakang Li, Yongxin Ni

    Abstract: The surge in multimedia content has led to the development of Multi-Modal Recommender Systems (MMRecs), which use diverse modalities such as text, images, videos, and audio for more personalized recommendations. However, MMRecs struggle with noisy data caused by misalignment among modal content and the gap between modal semantics and recommendation semantics. Traditional denoising methods are inad… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Accepted to ACM Web Search and Data Mining (WSDM) 2025

  50. arXiv:2504.13945  [pdf, other

    cs.LG cs.AI

    Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models

    Authors: Zhanglin Wu, Tengfei Song, Ning Xie, Mengli Zhu, Weidong Zhang, Shuang Wu, Pengfei Li, Chong Li, Junhao Zhu, Hao Yang, Shiliang Sun

    Abstract: The rapid advancement of large vision-language models (LVLMs) has significantly propelled applications in document understanding, particularly in optical character recognition (OCR) and multilingual translation. However, current evaluations of LVLMs, like the widely used OCRBench, mainly focus on verifying the correctness of their short-text responses and long-text responses with simple layout, wh… ▽ More

    Submitted 23 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures, 5 Tables