Skip to main content

Showing 1–50 of 4,254 results for author: li, s

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06057  [pdf, ps, other

    cs.AI cs.LG

    FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

    Authors: Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang

    Abstract: Advancements in reasoning for large language models (LLMs) have lead to significant performance improvements for LLMs in various fields such as mathematics and programming. However, research applying these advances to the financial domain, where considerable domain-specific knowledge is necessary to complete tasks, remains limited. To address this gap, we introduce FEVO (Financial Evolution), a mu… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.06044  [pdf, ps, other

    cs.IR

    Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations

    Authors: Yibin Liu, Ang Li, Shijian Li

    Abstract: Explainable recommendations, which use the information of user and item with interaction to generate a explanation for why the user would interact with the item, are crucial for improving user trust and decision transparency to the recommender system. Existing methods primarily rely on encoding features of users and items to embeddings, which often leads to information loss due to dimensionality r… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  3. arXiv:2507.05687  [pdf, ps, other

    cs.LG cs.CL

    AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

    Authors: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and mem… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  4. arXiv:2507.05588  [pdf

    cs.CV

    Semi-Supervised Defect Detection via Conditional Diffusion and CLIP-Guided Noise Filtering

    Authors: Shuai Li, Shihan Chen, Wanru Geng, Zhaohua Xu, Xiaolu Liu, Can Dong, Zhen Tian, Changlin Chen

    Abstract: In the realm of industrial quality inspection, defect detection stands as a critical component, particularly in high-precision, safety-critical sectors such as automotive components aerospace, and medical devices. Traditional methods, reliant on manual inspection or early image processing algorithms, suffer from inefficiencies, high costs, and limited robustness. This paper introduces a semi-super… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  5. arXiv:2507.05567  [pdf, ps, other

    cs.IT

    Lower Bounds for Error Coefficients of Griesmer Optimal Linear Codes via Iteration

    Authors: Chaofeng Guan, Shitao Li, Gaojun Luo, Zhi Ma, Hong Wang

    Abstract: The error coefficient of a linear code is defined as the number of minimum-weight codewords. In an additive white Gaussian noise channel, optimal linear codes with the smallest error coefficients achieve the best possible asymptotic frame error rate (AFER) among all optimal linear codes under maximum likelihood decoding. Such codes are referred to as AFER-optimal linear codes. The Griesmer bound… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 15 pages, 4 tables

  6. arXiv:2507.05101  [pdf, ps, other

    cs.LG cs.AI q-bio.BM q-bio.MN

    PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

    Authors: Xinzhe Zheng, Hao Du, Fanding Xu, Jinzhe Li, Zhiyuan Liu, Wenkang Wang, Tao Chen, Wanli Ouyang, Stan Z. Li, Yan Lu, Nanqing Dong, Yang Zhang

    Abstract: Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive be… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  7. arXiv:2507.04503  [pdf, ps, other

    cs.CV cs.RO

    U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration

    Authors: Xiaofan Li, Zhihao Xu, Chenming Wu, Zhao Yang, Yumeng Zhang, Jiang-Jiang Liu, Haibao Yu, Fan Duan, Xiaoqing Ye, Yuan Wang, Shirui Li, Xun Sun, Ji Wan, Jun Wang

    Abstract: Accurate localization using visual information is a critical yet challenging task, especially in urban environments where nearby buildings and construction sites significantly degrade GNSS (Global Navigation Satellite System) signal quality. This issue underscores the importance of visual localization techniques in scenarios where GNSS signals are unreliable. This paper proposes U-ViLAR, a novel u… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Vision Localization, Autonomous Driving, Bird's-Eye-View

  8. arXiv:2507.04009  [pdf, ps, other

    cs.CL cs.HC cs.LG

    Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

    Authors: Ziyang Miao, Qiyu Sun, Jingyuan Wang, Yuchen Gong, Yaowei Zheng, Shiqi Li, Richong Zhang

    Abstract: Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle to extract reliable fine-tuning data from heterogeneous documents effectively. To address this limitation, we propose Easy Dataset, a unified framework for syn… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: preprint

  9. arXiv:2507.04002  [pdf, ps, other

    cs.CV cs.RO eess.IV

    NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

    Authors: Siyu Li, Fei Teng, Yihong Cao, Kailun Yang, Zhiyong Li, Yaonan Wang

    Abstract: Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/NRSeg

  10. arXiv:2507.03839  [pdf, ps, other

    cs.AI cs.GR

    Participatory Evolution of Artificial Life Systems via Semantic Feedback

    Authors: Shuowen Li, Kexin Wang, Minglu Fang, Danqi Huang, Ali Asadipour, Haipeng Mi, Yitong Sun

    Abstract: We present a semantic feedback framework that enables natural language to guide the evolution of artificial life systems. Integrating a prompt-to-parameter encoder, a CMA-ES optimizer, and CLIP-based evaluation, the system allows user intent to modulate both visual outcomes and underlying behavioral rules. Implemented in an interactive ecosystem simulation, the framework supports prompt refinement… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 10 pages

  11. arXiv:2507.03043  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

    Authors: Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specifi… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  12. arXiv:2507.03002  [pdf, ps, other

    eess.SY cs.AI

    Game-Theoretic Modeling of Vehicle Unprotected Left Turns Considering Drivers' Bounded Rationality

    Authors: Yuansheng Lian, Ke Zhang, Meng Li, Shen Li

    Abstract: Modeling the decision-making behavior of vehicles presents unique challenges, particularly during unprotected left turns at intersections, where the uncertainty of human drivers is especially pronounced. In this context, connected autonomous vehicle (CAV) technology emerges as a promising avenue for effectively managing such interactions while ensuring safety and efficiency. Traditional approaches… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  13. arXiv:2507.02927  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations

    Authors: Phurich Saengthong, Boonnithi Jiaramaneepinit, Sheng Li, Manabu Okumura, Takahiro Shinozaki

    Abstract: Speech Large Language Models (Speech LLMs) have emerged as a crucial paradigm in recent years, extending the capabilities of traditional LLMs to speech tasks such as automatic speech recognition (ASR) and spoken dialogue modeling. However, their effectiveness in real-world multilingual conversations remains limited by the scarcity of data that captures natural conversational phenomena. To address… ▽ More

    Submitted 25 June, 2025; originally announced July 2025.

  14. arXiv:2507.02834  [pdf, ps, other

    cs.LG cs.CL

    ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

    Authors: Ruiyang Zhou, Shuozhe Li, Amy Zhang, Liu Leqi

    Abstract: Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals. GRPO-style approaches implement this by using self-generated samples labeled by an outcome-based verifier. However, these methods depend heavily on the model's initial ability to produce positive sa… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  15. arXiv:2507.02827  [pdf, ps, other

    cs.CV cs.AI

    USAD: An Unsupervised Data Augmentation Spatio-Temporal Attention Diffusion Network

    Authors: Ying Yu, Hang Xiao, Siyao Li, Jiarui Li, Haotian Tang, Hanyu Liu, Chao Li

    Abstract: The primary objective of human activity recognition (HAR) is to infer ongoing human actions from sensor data, a task that finds broad applications in health monitoring, safety protection, and sports analysis. Despite proliferating research, HAR still faces key challenges, including the scarcity of labeled samples for rare activities, insufficient extraction of high-level features, and suboptimal m… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  16. arXiv:2507.02138  [pdf

    cs.HC

    A Theory-driven and AI-enhanced Simulation Platform for Cultivating Nutrition Literacy

    Authors: Shan Li, Guozhu Ding

    Abstract: This study introduces and evaluates Healthy Choice, an innovative theory-driven and AI-enhanced simulation platform designed to cultivate nutrition literacy through interactive scenario-based learning experiences. We collected feedback from 114 university students with diverse backgrounds who completed simulated product selection scenarios. Quantitative ratings of usefulness and ease of use demons… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  17. arXiv:2507.01921  [pdf, ps, other

    cs.CL

    NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

    Authors: Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li

    Abstract: Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al. 2025). However, there has not been a systematic study of what kind of reasoning demonstrations from the teacher are most effective in improving the student model's reasoning capabilities. In this work we cur… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  18. arXiv:2507.01381  [pdf, ps, other

    cs.LG cs.AI

    Distributional Soft Actor-Critic with Diffusion Policy

    Authors: Tong Liu, Yinuo Wang, Xujie Song, Wenjun Zou, Liangfa Chen, Likun Wang, Bin Shuai, Jingliang Duan, Shengbo Eben Li

    Abstract: Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However, unimodal distribution often and easily causes bias in value function estimation, leading to poor algorithm performance. This paper proposes a distributional rei… ▽ More

    Submitted 3 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted IEEE ITSC 2025

  19. arXiv:2507.01378  [pdf, ps, other

    cs.MA cs.AI cs.RO

    RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms

    Authors: Ziyao Wang, Rongpeng Li, Sizhao Li, Yuming Xiang, Haiping Wang, Zhifeng Zhao, Honggang Zhang

    Abstract: Intelligent control of Unmanned Aerial Vehicles (UAVs) swarms has emerged as a critical research focus, and it typically requires the swarm to navigate effectively while avoiding obstacles and achieving continuous coverage over multiple mission targets. Although traditional Multi-Agent Reinforcement Learning (MARL) approaches offer dynamic adaptability, they are hindered by the semantic gap in num… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  20. arXiv:2507.01243  [pdf, ps, other

    cs.RO cs.LG

    Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

    Authors: Ziang Zheng, Guojian Zhan, Shiqi Liu, Yao Lyu, Tao Zhang, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  21. arXiv:2507.01216  [pdf, ps, other

    cs.LG cs.CR

    PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

    Authors: Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan

    Abstract: There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  22. arXiv:2507.01099  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.RO

    Geometry-aware 4D Video Generation for Robot Manipulation

    Authors: Zeyi Liu, Shuang Li, Eric Cousineau, Siyuan Feng, Benjamin Burchfiel, Shuran Song

    Abstract: Understanding and predicting the dynamics of the physical world can enhance a robot's ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we p… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Project website: https://robot4dgen.github.io

  23. arXiv:2507.01037  [pdf, ps, other

    cs.LG cs.AI cs.RO

    Learning to Segment for Vehicle Routing Problems

    Authors: Wenbin Ouyang, Sirui Li, Yining Ma, Cathy Wu

    Abstract: Iterative search heuristics are widely recognized as state-of-the-art for solving Vehicle Routing Problems (VRPs). In this work, we identify and exploit a critical observation: within these solvers, a large portion of the solution remains stable, i.e., unchanged across search iterations, causing redundant computations, especially for large-scale VRPs with long subtours. To address this, we pioneer… ▽ More

    Submitted 22 June, 2025; originally announced July 2025.

  24. arXiv:2507.00992  [pdf, ps, other

    cs.CV

    UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

    Authors: Yuanrui Wang, Cong Han, Yafei Li, Zhipeng Jin, Xiawei Li, SiNan Du, Wen Tao, Yi Yang, Shuanglong Li, Chun Yuan, Liu Lin

    Abstract: Text-to-image generation has greatly advanced content creation, yet accurately rendering visual text remains a key challenge due to blurred glyphs, semantic drift, and limited style control. Existing methods often rely on pre-rendered glyph images as conditions, but these struggle to retain original font styles and color cues, necessitating complex multi-branch designs that increase model overhead… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  25. arXiv:2507.00665  [pdf, ps, other

    cs.CL cs.AI

    SAFER: Probing Safety in Reward Models with Sparse Autoencoder

    Authors: Sihang Li, Wei Shi, Ziyuan Xie, Tao Liang, Guojun Ma, Xiang Wang

    Abstract: Reinforcement learning from human feedback (RLHF) is a key paradigm for aligning large language models (LLMs) with human values, yet the reward models at its core remain largely opaque. In this work, we present sparse Autoencoder For Enhanced Reward model (\textbf{SAFER}), a novel framework for interpreting and improving reward models through mechanistic analysis. Leveraging Sparse Autoencoders (S… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  26. arXiv:2507.00387  [pdf, ps, other

    cs.NE

    A Review on Zeroing Neural Networks

    Authors: Chengze Jiang, Jie Gui, Long Jin, Shuai Li

    Abstract: Zeroing neural networks (ZNNs) have demonstrated outstanding performance on time-varying optimization and control problems. Nonetheless, few studies are committed to illustrating the relationship among different ZNNs and the derivation of them. Therefore, reviewing the advances for a systematical understanding of this field is desirable. This paper provides a survey of ZNNs' progress regarding imp… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: This is what we submitted to IJCAI 2023. Maybe we will update this paper in the future

  27. arXiv:2507.00316  [pdf, ps, other

    cs.LG cs.CL eess.IV

    $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

    Authors: Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

    Abstract: Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficult… ▽ More

    Submitted 1 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  28. arXiv:2506.23680  [pdf, ps, other

    cs.IT

    Asymptotically Optimal Secure Aggregation for Wireless Federated Learning with Multiple Servers

    Authors: Zhenhao Huang, Kai Liang, Yuanming Shi, Songze Li, Youlong Wu

    Abstract: In this paper, we investigate the transmission latency of the secure aggregation problem in a \emph{wireless} federated learning system with multiple curious servers. We propose a privacy-preserving coded aggregation scheme where the servers can not infer any information about the distributed users' local gradients, nor the aggregation value. In our scheme, each user encodes its local gradient int… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: This work was in part presented at the IEEE International Symposium on Information Theory (ISIT), 2023

  29. arXiv:2506.23393  [pdf, ps, other

    cs.CL cs.AI

    Hierarchical Memory Organization for Wikipedia Generation

    Authors: Eugene J. Yu, Dawei Zhu, Yifan Song, Xiangyu Wong, Jiebin Zhang, Wenxuan Shi, Xiaoguang Li, Qun Liu, Sujian Li

    Abstract: Generating Wikipedia articles autonomously is a challenging task requiring the integration of accurate, comprehensive, and well-structured information from diverse sources. This paper introduces the Memory Organization-based Generation (MOG) framework, a novel approach to address these challenges by leveraging a hierarchical memory architecture. MOG extracts fine-grained memory units from web docu… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Main Conference

  30. arXiv:2506.23347  [pdf, ps, other

    cs.CV

    CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation

    Authors: Yi Liu, Shengqian Li, Zuzeng Lin, Feng Wang, Si Liu

    Abstract: The current conditional autoregressive image generation methods have shown promising results, yet their potential remains largely unexplored in the practical unsupervised image translation domain, which operates without explicit cross-domain correspondences. A critical limitation stems from the discrete quantization inherent in traditional Vector Quantization-based frameworks, which disrupts gradi… ▽ More

    Submitted 7 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025. Code available at: https://github.com/IamCreateAI/CycleVAR

  31. arXiv:2506.23325  [pdf, ps, other

    cs.SD cs.AI eess.AS

    XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

    Authors: Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

    Abstract: Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codec… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  32. arXiv:2506.23287  [pdf, ps, other

    cs.LG q-bio.QM

    Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis

    Authors: Zelin Zang, WenZhe Li, Fei Chen, Yongjie Xu, Chang Yu, Zhen Lei, Stan Z. Li

    Abstract: In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding complex biological processes. Key to this is the modeling and generation of hierarchical data that represents the intrinsic structure within datasets. Traditional methods face limitations in terms of computational cost, performance, generative capacity, and stability… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 9 pages, 6 figures, under review

  33. arXiv:2506.23141  [pdf, ps, other

    cs.AI

    Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing

    Authors: Siyuan Li, Ruitong Liu, Yan Wen, Te Sun

    Abstract: Semantic context surrounding a triplet $(h, r, t)$ is crucial for Knowledge Graph Completion (KGC), providing vital cues for prediction. However, traditional node-based message passing mechanisms, when applied to knowledge graphs, often introduce noise and suffer from information dilution or over-smoothing by indiscriminately aggregating information from all neighboring edges. To address this chal… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  34. arXiv:2506.23137  [pdf, ps, other

    cs.CL cs.AI

    Flow-Modulated Scoring for Semantic-Aware Knowledge Graph Completion

    Authors: Siyuan Li, Ruitong Liu, Yan Wen, Te Sun

    Abstract: Effective modeling of multifaceted relations is pivotal for Knowledge Graph Completion (KGC). However, a majority of existing approaches are predicated on static, embedding-based scoring, exhibiting inherent limitations in capturing contextual dependencies and relational dynamics. Addressing this gap, we propose the Flow-Modulated Scoring (FMS) framework. FMS comprises two principal components: (1… ▽ More

    Submitted 1 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: 10 pages

  35. arXiv:2506.22714  [pdf, ps, other

    cs.DC cs.LG cs.PF

    Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

    Authors: Jinliang Shi, Shigang Li, Youxuan Xu, Xueying Wang, Rongtian Fu, Zhi Ma, Tong Wu

    Abstract: Sparse matrix multiplication operators (i.e., SpMM and SDDMM) are widely used in deep learning and scientific computing. Modern accelerators are commonly equipped with Tensor cores and CUDA cores to accelerate sparse operators. The former brings superior computing power but only for structured matrix multiplication, while the latter has relatively lower performance but with higher programming flex… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    ACM Class: C.1.4; I.2.11

  36. arXiv:2506.22500  [pdf, ps, other

    cs.CV cs.AI

    Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models

    Authors: Weiyi Zhao, Xiaoyu Tan, Liang Liu, Sijia Li, Youwei Song, Xihe Qiu

    Abstract: Surgical risk identification is critical for patient safety and reducing preventable medical errors. While multimodal large language models (MLLMs) show promise for automated operating room (OR) risk detection, they often exhibit visual-semantic knowledge conflicts (VS-KC), failing to identify visual safety violations despite understanding textual rules. To address this, we introduce a dataset com… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 13 pages, 5 figures. The dataset and appendix are available at https://github.com/zgg2577/VS-KC

    MSC Class: 68T07; 68U10; 92C55 ACM Class: I.2.10; I.2.7; J.3; I.2.6

  37. arXiv:2506.22027  [pdf, ps, other

    cs.CV

    Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

    Authors: Han Wang, Shengyang Li, Jian Yang, Yuxuan Liu, Yixuan Lv, Zhuang Zhou

    Abstract: Detecting and tracking ground objects using earth observation imagery remains a significant challenge in the field of remote sensing. Continuous maritime ship tracking is crucial for applications such as maritime search and rescue, law enforcement, and shipping analysis. However, most current ship tracking methods rely on geostationary satellites or video satellites. The former offer low resolutio… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  38. arXiv:2506.21967  [pdf, ps, other

    cs.CL cs.LG

    More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents

    Authors: Weimin Xiong, Ke Wang, Yifan Song, Hanchao Liu, Sai Zhou, Wei Peng, Sujian Li

    Abstract: Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process,… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  39. arXiv:2506.21580  [pdf

    cs.CL cs.AI cs.CY

    From General Reasoning to Domain Expertise: Uncovering the Limits of Generalization in Large Language Models

    Authors: Dana Alsagheer, Yang Lu, Abdulrahman Kamal, Omar Kamal, Mohammad Kamal, Nada Mansour, Cosmo Yang Wu, Rambiba Karanjai, Sen Li, Weidong Shi

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains. However, effective decision-making relies heavily on strong reasoning abilities. Reasoning is the foundation for decision-making, providing the analytical and logical framework to make sound choices. Reasoning involves analyzing information, drawing inferences, and reaching conclusions… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  40. arXiv:2506.21577  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR

    Authors: Hongli Yang, Sheng Li, Hao Huang, Ayiduosi Tuohan, Yizhou Peng

    Abstract: Recent advancements in multilingual automatic speech recognition (ASR) have been driven by large-scale end-to-end models like Whisper. However, challenges such as language interference and expanding to unseen languages (language expansion) without degrading performance persist. This paper addresses these with three contributions: 1) Entire Soft Prompt Tuning (Entire SPT), which applies soft prompt… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  41. arXiv:2506.21576  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning

    Authors: Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li

    Abstract: Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios, such as rare languages and code-switching (CS), due to computational costs and catastrophic forgetting. We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge. We evaluate two strategies: (1) full fine-tuning (… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  42. arXiv:2506.21559  [pdf, ps, other

    cs.CL

    GraphLAMA: Enabling Efficient Adaptation of Graph Language Models with Limited Annotations

    Authors: Junze Chen, Cheng Yang, Shujie Li, Zhiqiang Zhang, Yawen Li, Junping Du, Chuan Shi

    Abstract: Large language models (LLMs) have demonstrated their strong capabilities in various domains, and have been recently integrated for graph analysis as graph language models (GLMs). With LLMs as the predictor, some GLMs can interpret unseen tasks described by natural language, and learn from a few examples in the prompts without parameter tuning, known as in-context learning (ICL). Another subset of… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  43. arXiv:2506.21545  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.PF

    Data Efficacy for Language Model Training

    Authors: Yalun Dai, Yangyu Huang, Xin Zhang, Wenshan Wu, Chong Li, Wenhui Lu, Shijie Cao, Li Dong, Scarlett Li

    Abstract: Data is fundamental to the training of language models (LM). Recent research has been dedicated to data efficiency, which aims to maximize performance by selecting a minimal or optimal subset of training data. Techniques such as data filtering, sampling, and selection play a crucial role in this area. To complement it, we define Data Efficacy, which focuses on maximizing performance by optimizing… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  44. arXiv:2506.21513  [pdf, ps, other

    cs.CV

    GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

    Authors: Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian

    Abstract: Creating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue l… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project page: https://vincenthu19.github.io/GGTalker/

  45. arXiv:2506.21140  [pdf, ps, other

    cs.LG cs.AI

    DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding

    Authors: Ziwei Wang, Hongbin Wang, Tianwang Jia, Xingyi He, Siyang Li, Dongrui Wu

    Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) transform spontaneous/evoked neural activity into control commands for external communication. While convolutional neural networks (CNNs) remain the mainstream backbone for EEG decoding, their inherently short receptive field makes it difficult to capture long-range temporal dependencies and global inter-channel relationships. Rec… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 12 pages, 6 figures

  46. arXiv:2506.21107  [pdf, ps, other

    cs.LG q-bio.MN

    Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges

    Authors: Changxi Chi, Jun Xia, Yufei Huang, Jingbo Zhou, Siyuan Li, Yunfan Liu, Chang Yu, Stan Z. Li

    Abstract: Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell's phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed condition… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  47. arXiv:2506.21076  [pdf, ps, other

    cs.CV

    PoseMaster: Generating 3D Characters in Arbitrary Poses from a Single Image

    Authors: Hongyu Yan, Kunming Luo, Weiyu Li, Yixun Liang, Shengming Li, Jingwei Huang, Chunchao Guo, Ping Tan

    Abstract: 3D characters play a crucial role in our daily entertainment. To improve the efficiency of 3D character modeling, recent image-based methods use two separate models to achieve pose standardization and 3D reconstruction of the A-pose character. However, these methods are prone to generating distorted and degraded images in the pose standardization stage due to self-occlusion and viewpoints, which f… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  48. arXiv:2506.21017  [pdf, ps, other

    cs.CV cs.AI

    Multimodal Prompt Alignment for Facial Expression Recognition

    Authors: Fuyan Ma, Yiran He, Bin Sun, Shutao Li

    Abstract: Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs) like CLIP for various downstream tasks. Despite their success, current VLM-based facial expression recognition (FER) methods struggle to capture fine-grained textual-visual relationships, which are essential for distinguishing subtle differences between facial expressions. To address this challenge, we propo… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: To appear in ICCV2025

  49. arXiv:2506.20977  [pdf, ps, other

    cs.CV cs.AI

    From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

    Authors: Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang

    Abstract: Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing age accuracy and identity preservation--what we refer to as the A… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 30 pages, 12 figures

  50. arXiv:2506.20850  [pdf, ps, other

    cs.CV

    Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision

    Authors: Yuting He, Shuo Li

    Abstract: Contrastive learning (CL) has become a cornerstone of self-supervised pretraining (SSP) in foundation models, however, extending CL to pixel-wise representation, crucial for medical vision, remains an open problem. Standard CL formulates SSP as a binary optimization problem (binary CL) where the excessive pursuit of feature dispersion leads to an over-dispersion problem, breaking pixel-wise featur… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025