Skip to main content

Showing 1–50 of 7,302 results for author: Li, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10554  [pdf, ps, other

    cs.CL

    Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

    Authors: Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li

    Abstract: Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning. Prior work has shown that outcome-based reinforcement learning (RL) can incidentally elicit advanced reasoning behaviors such as self-correction, backtracking, and verification phenomena often referred to as the model's "aha moment". However, the timing and consistency of these emergent behaviors r… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: In Progress

  2. arXiv:2505.10352  [pdf, ps, other

    cs.CV cs.AI cs.LG

    SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

    Authors: Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong

    Abstract: Spiking Neural Networks (SNNs) have shown competitive performance to Artificial Neural Networks (ANNs) in various vision tasks, while offering superior energy efficiency. However, existing SNN-based Transformers primarily focus on single-image tasks, emphasizing spatial features while not effectively leveraging SNNs' efficiency in video-based vision tasks. In this paper, we introduce SpikeVideoFor… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  3. arXiv:2505.10282  [pdf, other

    cs.CL

    From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making

    Authors: Dubai Li, Nan Jiang, Kangping Huang, Ruiqi Tu, Shuyu Ouyang, Huayu Yu, Lin Qiao, Chen Yu, Tianshu Zhou, Danyang Tong, Qian Wang, Mengtao Li, Xiaofeng Zeng, Yu Tian, Xinping Tian, Jingsong Li

    Abstract: Clinical evidence, derived from rigorous research and data analysis, provides healthcare professionals with reliable scientific foundations for informed decision-making. Integrating clinical evidence into real-time practice is challenging due to the enormous workload, complex professional processes, and time constraints. This highlights the need for tools that automate evidence synthesis to suppor… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.10244  [pdf, ps, other

    cs.DS

    Simpler and Faster Directed Low-Diameter Decompositions

    Authors: Jason Li

    Abstract: We present a simpler and faster algorithm for low-diameter decompositions on directed graphs, matching the $O(\log m\log\log m)$ loss factor from Bringmann, Fischer, Haeupler, and Latypov (ICALP 2025) and improving the running time to $O((m+n\log\log n)\log^2m\log\log m)$.

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 9 pages

  5. arXiv:2505.10040  [pdf, ps, other

    cs.LG

    Instance-Prototype Affinity Learning for Non-Exemplar Continual Graph Learning

    Authors: Lei Song, Jiaxing Li, Shihan Guan, Youyong Kong

    Abstract: Graph Neural Networks (GNN) endure catastrophic forgetting, undermining their capacity to preserve previously acquired knowledge amid the assimilation of novel information. Rehearsal-based techniques revisit historical examples, adopted as a principal strategy to alleviate this phenomenon. However, memory explosion and privacy infringements impose significant constraints on their utility. Non-Exem… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  6. arXiv:2505.09943  [pdf, ps, other

    cs.CV

    CSPENet: Contour-Aware and Saliency Priors Embedding Network for Infrared Small Target Detection

    Authors: Jiakun Deng, Kexuan Li, Xingye Cui, Jiaxuan Li, Chang Long, Tian Pu, Zhenming Peng

    Abstract: Infrared small target detection (ISTD) plays a critical role in a wide range of civilian and military applications. Existing methods suffer from deficiencies in the localization of dim targets and the perception of contour information under dense clutter environments, severely limiting their detection performance. To tackle these issues, we propose a contour-aware and saliency priors embedding net… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.09925  [pdf, ps, other

    cs.LG cs.AI

    Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback

    Authors: Yutao Yang, Jie Zhou, Junsong Li, Qianjun Pan, Bihao Zhan, Qin Chen, Xipeng Qiu, Liang He

    Abstract: This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback while retaining prior knowledge. This paradigm distinctively addresses two major limitations of traditional continual learning: (1) dynamic model updates using streaming, real-time human-annotated data, rather than static datasets with fixed labels, and (2) th… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2505.09783  [pdf

    stat.AP cs.LG

    Pure Component Property Estimation Framework Using Explainable Machine Learning Methods

    Authors: Jianfeng Jiao, Xi Gao, Jie Li

    Abstract: Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modeling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding re… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  9. arXiv:2505.09343  [pdf, ps, other

    cs.DC cs.AI cs.AR

    Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

    Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei

    Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inferen… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version will appear as part of the Industry Track in Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25)

  10. arXiv:2505.09193  [pdf, other

    eess.IV cs.CV

    BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression

    Authors: Wei Jiang, Junru Li, Kai Zhang, Li Zhang

    Abstract: Recent forward prediction-based learned video compression (LVC) methods have achieved impressive results, even surpassing VVC reference software VTM under the Low Delay B (LDB) configuration. In contrast, learned bidirectional video compression (BVC) remains underexplored and still lags behind its forward-only counterparts. This performance gap is mainly due to the limited ability to extract diver… ▽ More

    Submitted 14 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: The first learned video codec that surpasses VTM 13.2 RA across all standard test datasets. Code will be available at https://github.com/JiangWeibeta/ECVC

  11. arXiv:2505.09074  [pdf, other

    cs.RO

    Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions

    Authors: Letian Wang, Marc-Antoine Lavoie, Sandro Papais, Barza Nisar, Yuxiao Chen, Wenhao Ding, Boris Ivanovic, Hao Shao, Abulikemu Abuduweili, Evan Cook, Yang Zhou, Peter Karkus, Jiachen Li, Changliu Liu, Marco Pavone, Steven Waslander

    Abstract: Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Initial draft, 162 pages, 40 figures, 13 tables

  12. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  13. arXiv:2505.08617  [pdf, ps, other

    cs.CV

    OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

    Authors: Zhaochen Su, Linjie Li, Mingyang Song, Yunzhuo Hao, Zhengyuan Yang, Jun Zhang, Guanjie Chen, Jiawei Gu, Juntao Li, Xiaoye Qu, Yu Cheng

    Abstract: While humans can flexibly leverage interactive visual cognition for complex problem-solving, enabling Large Vision-Language Models (LVLMs) to learn similarly adaptive behaviors with visual tools remains challenging. A significant hurdle is the current lack of standardized infrastructure, which hinders integrating diverse tools, generating rich interaction data, and training robust agents effective… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Work in progress

  14. arXiv:2505.08601  [pdf, other

    cs.CV cond-mat.mtrl-sci

    Rejoining fragmented ancient bamboo slips with physics-driven deep learning

    Authors: Jinchi Zhu, Zhou Zhao, Hailong Lei, Xiaoguang Wang, Jialiang Lu, Jing Li, Qianqian Tang, Jiachen Shen, Gui-Song Xia, Bo Du, Yongchao Xu

    Abstract: Bamboo slips are a crucial medium for recording ancient civilizations in East Asia, and offers invaluable archaeological insights for reconstructing the Silk Road, studying material culture exchanges, and global history. However, many excavated bamboo slips have been fragmented into thousands of irregular pieces, making their rejoining a vital yet challenging step for understanding their content.… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  15. arXiv:2505.08167  [pdf

    cs.CL cs.AI

    Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

    Authors: Ruilin Liu, Zhixiao Zhao, Jieqiong Li, Chang Liu, Dongbo Wang

    Abstract: The rapid development of large language models (LLMs) has provided significant support and opportunities for the advancement of domain-specific LLMs. However, fine-tuning these large models using Intangible Cultural Heritage (ICH) data inevitably faces challenges such as bias, incorrect knowledge inheritance, and catastrophic forgetting. To address these issues, we propose a novel training method… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 22 pages, 5 figures

  16. arXiv:2505.08162  [pdf, ps, other

    cs.CR

    GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM

    Authors: Hengyu Ding, Houran Ji, Jia Li, Jinhang Chen, Chin-Wing Sham, Yao Wang

    Abstract: With the rapid advancement of quantum computing technology, post-quantum cryptography (PQC) has emerged as a pivotal direction for next-generation encryption standards. Among these, lattice-based cryptographic schemes rely heavily on the fast Number Theoretic Transform (NTT) over polynomial rings, whose performance directly determines encryption/decryption throughput and energy efficiency. However… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  17. arXiv:2505.08159  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Enhancing the Efficiency of Complex Systems Crystal Structure Prediction by Active Learning Guided Machine Learning Potential

    Authors: Jiaxiang Li, Junwei Feng, Jie Luo, Bowen Jiang, Xiangyu Zheng, Jian Lv, Keith Butler, Hanyu Liu, Congwei Xie, Yu Xie, Yanming Ma

    Abstract: Understanding multicomponent complex material systems is essential for design of advanced materials for a wide range of technological applications. While state-of-the-art crystal structure prediction (CSP) methods effectively identify new structures and assess phase stability, they face fundamental limitations when applied to complex systems. This challenge stems from the combinatorial explosion o… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  18. arXiv:2505.07895  [pdf, ps, other

    cs.LG cs.AI

    Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

    Authors: Jiafan Li, Jiaqi Zhu, Liang Chang, Yilin Li, Miaomiao Li, Yang Wang, Hongan Wang

    Abstract: Nowadays, numerous online platforms can be described as multi-modal heterogeneous networks (MMHNs), such as Douban's movie networks and Amazon's product review networks. Accurately categorizing nodes within these networks is crucial for analyzing the corresponding entities, which requires effective representation learning on nodes. However, existing multi-modal fusion methods often adopt either ea… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  19. arXiv:2505.07850  [pdf, other

    cs.CL cs.AI cs.CY

    A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas

    Authors: Pranav Narayanan Venkit, Jiayi Li, Yingfan Zhou, Sarah Rajtmajer, Shomir Wilson

    Abstract: As LLMs (large language models) are increasingly used to generate synthetic personas particularly in data-limited domains such as health, privacy, and HCI, it becomes necessary to understand how these narratives represent identity, especially that of minority communities. In this paper, we audit synthetic personas generated by 3 LLMs (GPT4o, Gemini 1.5 Pro, Deepseek 2.5) through the lens of repres… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  20. arXiv:2505.07845  [pdf, other

    cs.RO

    PierGuard: A Planning Framework for Underwater Robotic Inspection of Coastal Piers

    Authors: Pengyu Wang, Hin Wang Lin, Jialu Li, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Using underwater robots instead of humans for the inspection of coastal piers can enhance efficiency while reducing risks. A key challenge in performing these tasks lies in achieving efficient and rapid path planning within complex environments. Sampling-based path planning methods, such as Rapidly-exploring Random Tree* (RRT*), have demonstrated notable performance in high-dimensional spaces. In… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  21. arXiv:2505.07779  [pdf, other

    cs.RO

    Multi-Agent Path Finding via Finite-Horizon Hierarchical Factorization

    Authors: Jiarui Li, Alessandro Zanardi, Gioele Zardini

    Abstract: We present a novel algorithm for large-scale Multi-Agent Path Finding (MAPF) that enables fast, scalable planning in dynamic environments such as automated warehouses. Our approach introduces finite-horizon hierarchical factorization, a framework that plans one step at a time in a receding-horizon fashion. Robots first compute individual plans in parallel, and then dynamically group based on spati… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  22. arXiv:2505.07690  [pdf, ps, other

    cs.CV

    Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models

    Authors: Songlin Dong, Chenhao Ding, Jiangyang Li, Jizhou Han, Qiang Wang, Yuhang He, Yihong Gong

    Abstract: This study aims to address the problem of multi-domain task incremental learning~(MTIL), which requires that vision-language models~(VLMs) continuously acquire new knowledge while maintaining their inherent zero-shot recognition capability. Existing paradigms delegate the testing of unseen-domain samples to the original CLIP, which only prevents the degradation of the model's zero-shot capability… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  23. arXiv:2505.07431  [pdf, ps, other

    cs.IR

    Diffusion-driven SpatioTemporal Graph KANsformer for Medical Examination Recommendation

    Authors: Jianan Li, Yangtao Zhou, Zhifu Zhao, Qinglan Huang, Jian Qi, Xiao He, Hua Chu, Fu Li

    Abstract: Recommendation systems in AI-based medical diagnostics and treatment constitute a critical component of AI in healthcare. Although some studies have explored this area and made notable progress, healthcare recommendation systems remain in their nascent stage. And these researches mainly target the treatment process such as drug or disease recommendations. In addition to the treatment process, the… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  24. arXiv:2505.07294  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    HuB: Learning Extreme Humanoid Balance

    Authors: Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, Yang Gao

    Abstract: The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In th… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project website: https://hub-robot.github.io

  25. arXiv:2505.07260  [pdf, ps, other

    cs.LG cs.AI

    UMoE: Unifying Attention and FFN with Shared Experts

    Authors: Yuanhang Yang, Chaozheng Wang, Jing Li

    Abstract: Sparse Mixture of Experts (MoE) architectures have emerged as a promising approach for scaling Transformer models. While initial works primarily incorporated MoE into feed-forward network (FFN) layers, recent studies have explored extending the MoE paradigm to attention layers to enhance model performance. However, existing attention-based MoE layers require specialized implementations and demonst… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  26. Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework

    Authors: Jun Li, Hongzhang Zhu, Tao Chen, Xiaohua Qian

    Abstract: Recently, numerous pancreas segmentation methods have achieved promising performance on local single-source datasets. However, these methods don't adequately account for generalizability issues, and hence typically show limited performance and low stability on test data from other sources. Considering the limited availability of distinct data sources, we seek to improve the generalization performa… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: accept by IEEE JBHI. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  27. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  28. arXiv:2505.07045  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    Reinforcement Learning (RL) Meets Urban Climate Modeling: Investigating the Efficacy and Impacts of RL-Based HVAC Control

    Authors: Junjie Yu, John S. Schreck, David John Gagne, Keith W. Oleson, Jie Li, Yongtu Liang, Qi Liao, Mingfei Sun, David O. Topping, Zhonghua Zheng

    Abstract: Reinforcement learning (RL)-based heating, ventilation, and air conditioning (HVAC) control has emerged as a promising technology for reducing building energy consumption while maintaining indoor thermal comfort. However, the efficacy of such strategies is influenced by the background climate and their implementation may potentially alter both the indoor climate and local urban climate. This study… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  29. arXiv:2505.06861  [pdf, other

    cs.RO cs.AI cs.CV

    Efficient Robotic Policy Learning via Latent Space Backward Planning

    Authors: Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan

    Abstract: Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grai… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  30. arXiv:2505.06858  [pdf, other

    cs.LG

    FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers

    Authors: Tianyu Chen, Haoyi Zhou, Ying Li, Hao Wang, Zhenzhe Zhang, Tianchen Zhu, Shanghang Zhang, Jianxin Li

    Abstract: Fourier Neural Operators (FNO) have emerged as promising solutions for efficiently solving partial differential equations (PDEs) by learning infinite-dimensional function mappings through frequency domain transformations. However, the sparsity of high-frequency signals limits computational efficiency for high-dimensional inputs, and fixed-pattern truncation often causes high-frequency signal loss,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  31. arXiv:2505.06684  [pdf, other

    cs.CV cs.AI

    FNBench: Benchmarking Robust Federated Learning against Noisy Labels

    Authors: Xuefeng Jiang, Jia Li, Nannan Wu, Zhiyuan Wu, Xujing Li, Sheng Sun, Gang Xu, Yuwei Wang, Qi Li, Min Liu

    Abstract: Robustness to label noise within data is a significant challenge in federated learning (FL). From the data-centric perspective, the data quality of distributed datasets can not be guaranteed since annotations of different clients contain complicated label noise of varying degrees, which causes the performance degradation. There have been some early attempts to tackle noisy labels in FL. However, t… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE TDSC, currently under major revision

  32. arXiv:2505.06482  [pdf, ps, other

    cs.LG cs.AI cs.RO

    Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

    Authors: Minting Pan, Yitao Zheng, Jiajian Li, Yunbo Wang, Xiaokang Yang

    Abstract: Offline reinforcement learning (RL) enables policy optimization in static datasets, avoiding the risks and costs of real-world exploration. However, it struggles with suboptimal behavior learning and inaccurate value estimation due to the lack of environmental interaction. In this paper, we present Video-Enhanced Offline RL (VeoRL), a model-based approach that constructs an interactive world model… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  33. arXiv:2505.06291  [pdf, ps, other

    eess.SP cs.CE cs.HC cs.LG

    ALFEE: Adaptive Large Foundation Model for EEG Representation

    Authors: Wei Xiong, Junming Lin, Jiangtong Li, Jie Li, Changjun Jiang

    Abstract: While foundation models excel in text, image, and video domains, the critical biological signals, particularly electroencephalography(EEG), remain underexplored. EEG benefits neurological research with its high temporal resolution, operational practicality, and safety profile. However, low signal-to-noise ratio, inter-subject variability, and cross-paradigm differences hinder the generalization of… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 17pages, 17 figures

  34. arXiv:2505.05869  [pdf

    cs.LG cs.AI physics.comp-ph

    Generative Discovery of Partial Differential Equations by Learning from Math Handbooks

    Authors: Hao Xu, Yuntian Chen, Rui Cao, Tianning Tang, Mengge Du, Jian Li, Adrian H. Callaghan, Dongxiao Zhang

    Abstract: Data driven discovery of partial differential equations (PDEs) is a promising approach for uncovering the underlying laws governing complex systems. However, purely data driven techniques face the dilemma of balancing search space with optimization efficiency. This study introduces a knowledge guided approach that incorporates existing PDEs documented in a mathematical handbook to facilitate the d… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  35. arXiv:2505.05853  [pdf, other

    cs.CV

    PICD: Versatile Perceptual Image Compression with Diffusion Rendering

    Authors: Tongda Xu, Jiahao Li, Bin Li, Yan Wang, Ya-Qin Zhang, Yan Lu

    Abstract: Recently, perceptual image compression has achieved significant advancements, delivering high visual quality at low bitrates for natural images. However, for screen content, existing methods often produce noticeable artifacts when compressing text. To tackle this challenge, we propose versatile perceptual screen image compression with diffusion rendering (PICD), a codec that works well for both sc… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  36. arXiv:2505.05713  [pdf, ps, other

    cs.DC cs.LG

    Understanding Stragglers in Large Model Training Using What-if Analysis

    Authors: Jinkun Lin, Ziheng Jiang, Zuquan Song, Sida Zhao, Menghan Yu, Zhanghan Wang, Chenyuan Wang, Zuocheng Shi, Xiang Shi, Wei Jia, Zherui Liu, Shuguang Wang, Haibin Lin, Xin Liu, Aurojit Panda, Jinyang Li

    Abstract: Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization across machines. Such a workload pattern makes it susceptible to stragglers, where the training can be stalled by few slow workers. At ByteDance we find stragglers are not trivially always caused by hardware failures, but can arise from mu… ▽ More

    Submitted 12 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  37. arXiv:2505.05315  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Scalable Chain of Thoughts via Elastic Reasoning

    Authors: Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong

    Abstract: Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that exp… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  38. arXiv:2505.05071  [pdf, other

    cs.CV cs.AI

    FG-CLIP: Fine-Grained Visual and Textual Alignment

    Authors: Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng, Yuhui Yin

    Abstract: Contrastive Language-Image Pre-training (CLIP) excels in multimodal tasks such as image-text retrieval and zero-shot classification but struggles with fine-grained understanding due to its focus on coarse-grained short captions. To address this, we propose Fine-Grained CLIP (FG-CLIP), which enhances fine-grained understanding through three key innovations. First, we leverage large multimodal model… ▽ More

    Submitted 13 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025

  39. arXiv:2505.05041  [pdf, other

    eess.IV cs.CV

    ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization

    Authors: Chenxi Zhao, Jianqiang Li, Qing Zhao, Jing Bai, Susana Boluda, Benoit Delatour, Lev Stimmer, Daniel Racoceanu, Gabriel Jimenez, Guanghui Fu

    Abstract: Alzheimer's Disease (AD) is a neurodegenerative disorder characterized by amyloid-beta plaques and tau neurofibrillary tangles, which serve as key histopathological features. The identification and segmentation of these lesions are crucial for understanding AD progression but remain challenging due to the lack of large-scale annotated datasets and the impact of staining variations on automated ima… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  40. arXiv:2505.05034  [pdf, other

    cs.LG stat.ML

    Dequantified Diffusion Schrödinger Bridge for Density Ratio Estimation

    Authors: Wei Chen, Shigui Li, Jiacheng Li, Junmei Yang, John Paisley, Delu Zeng

    Abstract: Density ratio estimation is fundamental to tasks involving $f$-divergences, yet existing methods often fail under significantly different distributions or inadequately overlap supports, suffering from the \textit{density-chasm} and the \textit{support-chasm} problems. Additionally, prior approaches yield divergent time scores near boundaries, leading to instability. We propose… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Journal ref: ICML 2025: Proceedings of the 42nd International Conference on Machine Learning, 2025

  41. arXiv:2505.04996  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    Inter-Diffusion Generation Model of Speakers and Listeners for Effective Communication

    Authors: Jinhe Huang, Yongkang Cheng, Yuming Hang, Gaoge Han, Jinewei Li, Jing Zhang, Xingjian Gu

    Abstract: Full-body gestures play a pivotal role in natural interactions and are crucial for achieving effective communication. Nevertheless, most existing studies primarily focus on the gesture generation of speakers, overlooking the vital role of listeners in the interaction process and failing to fully explore the dynamic interaction between them. This paper innovatively proposes an Inter-Diffusion Gener… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: accepted by ICMR 2025

  42. arXiv:2505.04941  [pdf, other

    cs.CV

    Building-Guided Pseudo-Label Learning for Cross-Modal Building Damage Mapping

    Authors: Jiepan Li, He Huang, Yu Sheng, Yujun Guo, Wei He

    Abstract: Accurate building damage assessment using bi-temporal multi-modal remote sensing images is essential for effective disaster response and recovery planning. This study proposes a novel Building-Guided Pseudo-Label Learning Framework to address the challenges of mapping building damage from pre-disaster optical and post-disaster SAR images. First, we train a series of building extraction models usin… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  43. arXiv:2505.04620  [pdf, other

    cs.CV

    On Path to Multimodal Generalist: General-Level and General-Bench

    Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu , et al. (7 additional authors not shown)

    Abstract: The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expande… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: ICML'25, 305 pages, 115 tables, 177 figures, project page: https://generalist.top/

  44. arXiv:2505.04612  [pdf, other

    cs.CV

    FastMap: Revisiting Dense and Scalable Structure from Motion

    Authors: Jiahao Li, Haochen Wang, Muhammad Zubair Irshad, Igor Vasiljevic, Matthew R. Walter, Vitor Campagnolo Guizilini, Greg Shakhnarovich

    Abstract: We propose FastMap, a new global structure from motion method focused on speed and simplicity. Previous methods like COLMAP and GLOMAP are able to estimate high-precision camera poses, but suffer from poor scalability when the number of matched keypoint pairs becomes large. We identify two key factors leading to this problem: poor parallelization and computationally expensive optimization steps. T… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Project webpage: https://jiahao.ai/fastmap

  45. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  46. arXiv:2505.04481  [pdf, other

    cs.CV

    CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation

    Authors: Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, Xiangdong Zhou

    Abstract: Recently, Large Language Models (LLMs) have achieved significant success, prompting increased interest in expanding their generative capabilities beyond general text into domain-specific areas. This study investigates the generation of parametric sequences for computer-aided design (CAD) models using LLMs. This endeavor represents an initial step towards creating parametric 3D shapes with LLMs, as… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  47. arXiv:2505.04480  [pdf, ps, other

    cs.AI cs.NE cs.RO

    TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution

    Authors: Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park

    Abstract: Trajectory prediction is a crucial task in modeling human behavior, especially in fields as social robotics and autonomous vehicle navigation. Traditional heuristics based on handcrafted rules often lack accuracy, while recently proposed deep learning approaches suffer from computational cost, lack of explainability, and generalization issues that limit their practical adoption. In this paper, we… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  48. arXiv:2505.04306  [pdf, ps, other

    cs.CV

    MoDE: Mixture of Diffusion Experts for Any Occluded Face Recognition

    Authors: Qiannan Fan, Zhuoyang Li, Jitong Li, Chenyang Cao

    Abstract: With the continuous impact of epidemics, people have become accustomed to wearing masks. However, most current occluded face recognition (OFR) algorithms lack prior knowledge of occlusions, resulting in poor performance when dealing with occluded faces of varying types and severity in reality. Recognizing occluded faces is still a significant challenge, which greatly affects the convenience of peo… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 8 pages,7 figures

    ACM Class: I.4.8; I.5.4; I.2.10

  49. arXiv:2505.04281  [pdf, other

    cs.CV eess.IV

    TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement

    Authors: Yi Li, Zhiyuan Zhang, Jiangnan Xia, Jianghan Cheng, Qilong Wu, Junwei Li, Yibin Tian, Hui Kong

    Abstract: This paper presents a novel Two-Stage Diffusion Model (TS-Diff) for enhancing extremely low-light RAW images. In the pre-training stage, TS-Diff synthesizes noisy images by constructing multiple virtual cameras based on a noise space. Camera Feature Integration (CFI) modules are then designed to enable the model to learn generalizable features across diverse virtual cameras. During the aligning st… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: International Joint Conference on Neural Networks (IJCNN)

  50. arXiv:2505.04276  [pdf, ps, other

    cs.CV cs.MM

    HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation

    Authors: Yajie Fu, Chaorui Huang, Junwei Li, Hui Kong, Yibin Tian, Huakang Li, Zhiyuan Zhang

    Abstract: We propose HDiffTG, a novel 3D Human Pose Estimation (3DHPE) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework. HDiffTG leverages the strengths of these techniques to significantly improve pose estimation accuracy and robustness while maintaining a lightweight design. The Transformer captures global spatiotemporal dependencies, the… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures, International Joint Conference on Neural Networks (IJCNN)