Skip to main content

Showing 1–50 of 1,156 results for author: Zhenyu

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21812  [pdf, ps, other

    cs.CL cs.CV

    Towards Transparent AI: A Survey on Explainable Large Language Models

    Authors: Avash Palikhe, Zhenyu Yu, Zichong Wang, Wenbin Zhang

    Abstract: Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI). However, despite their achievements, LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability. This lack of transparency poses a significant obstacle to the adoption of LLMs in high-stakes domain applications, w… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.20923  [pdf, ps, other

    cs.CL

    KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

    Authors: Xinping Zhao, Xinshuo Hu, Zifei Shan, Shouzheng Huang, Yao Zhou, Zetian Sun, Zhenyu Liu, Dongfang Li, Xinyuan Wei, Qian Chen, Youcheng Pan, Yang Xiang, Meishan Zhang, Haofen Wang, Jun Yu, Baotian Hu, Min Zhang

    Abstract: In this paper, we propose KaLM-Embedding-V2, a versatile and compact embedding model, which achieves impressive performance in general-purpose text embedding tasks by leveraging superior training techniques and data. Our key innovations include: (1) To better align the architecture with representation learning, we remove the causal attention mask and adopt a fully bidirectional transformer with si… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Technical Report; 26 pages 12 tables 1 figure. arXiv admin note: substantial text overlap with arXiv:2501.01028

  3. arXiv:2506.18678  [pdf, ps, other

    cs.CV cs.RO

    MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

    Authors: Tianchen Deng, Guole Shen, Xun Chen, Shenghai Yuan, Hongming Shen, Guohao Peng, Zhenyu Wu, Jingchuan Wang, Lihua Xie, Danwei Wang, Hesheng Wang, Weidong Chen

    Abstract: Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed mu… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  4. arXiv:2506.18671  [pdf, ps, other

    cs.SD cs.CV cs.GR eess.AS

    TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography

    Authors: Yuqin Dai, Wanlu Zhu, Ronghui Li, Xiu Li, Zhenyu Zhang, Jun Li, Jian Yang

    Abstract: Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper,… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.18656  [pdf, ps, other

    stat.ML cs.LG math.ST

    A Random Matrix Analysis of In-context Memorization for Nonlinear Attention

    Authors: Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling

    Abstract: Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theore… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 40 pages, 7 pages

  6. arXiv:2506.18308  [pdf

    cs.HC

    Supporting Car-Following Behavior through V2V-Based Beyond-Visual-Range Information Display

    Authors: Feiqi Gu, Zhixiong Wang, Zhenyu Wang, Dengbo He

    Abstract: Rear-end collisions constituted a large portion of crashes on the road, despite efforts to mitigate rear-end collisions, such as forward collision warnings. The chance of rear-end collisions is closely related to drivers' car-following (CF) behaviors in the traffic flow. Given that drivers may rely on more than the information of the direct lead vehicle (DLV) when making CF decisions, expanding dr… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  7. arXiv:2506.18234  [pdf, ps, other

    cs.CV cs.RO

    Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning

    Authors: Yue Li, Meng Tian, Dechang Zhu, Jiangtong Zhu, Zhenyu Lin, Zhiwei Xiong, Xinhai Zhao

    Abstract: Large vision-language models (VLMs) for autonomous driving (AD) are evolving beyond perception and cognition tasks toward motion planning. However, we identify two critical challenges in this direction: (1) VLMs tend to learn shortcuts by relying heavily on history input information, achieving seemingly strong planning results without genuinely understanding the visual inputs; and (2) the chain-of… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  8. arXiv:2506.18088  [pdf, ps, other

    cs.RO cs.AI cs.CL cs.CV cs.MA

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Authors: Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo , et al. (1 additional authors not shown)

    Abstract: Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Page: https://robotwin-platform.github.io/

  9. arXiv:2506.17642  [pdf, ps, other

    cs.SE

    May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs

    Authors: Shaoyu Yang, Chunrong Fang, Haifeng Lin, Xiang Chen, Zhenyu Chen

    Abstract: Artificial Intelligence (AI) Infrastructures, represented by Deep Learning (DL) frameworks, have served as fundamental DL systems over the last decade. However, the bugs in DL frameworks could lead to catastrophic consequences in some critical scenarios (e.g., healthcare and autonomous driving). A simple yet effective way to find bugs in DL frameworks is fuzz testing (Fuzzing). Unfortunately, exis… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  10. arXiv:2506.17638  [pdf, ps, other

    cs.SE

    Deep Learning Framework Testing via Model Mutation: How Far Are We?

    Authors: Yanzhou Mu, Rong Wang, Juan Zhai, Chunrong Fang, Xiang Chen, Zhiyuan Peng, Peiran Yang, Ruixiang Qian, Shaoyu Yang, Zhenyu Chen

    Abstract: Deep Learning (DL) frameworks are a fundamental component of DL development. Therefore, the detection of DL framework defects is important and challenging. As one of the most widely adopted DL testing techniques, model mutation has recently gained significant attention. In this study, we revisit the defect detection ability of existing mutation-based testing methods and investigate the factors tha… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 27 pages, 9 figures

  11. arXiv:2506.17357  [pdf, ps, other

    cs.DC cs.AI

    Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU Acceleration

    Authors: Zhenyu Lei, Jin-Kao Hao, Qinghua Wu

    Abstract: Local search plays a central role in many effective heuristic algorithms for the vehicle routing problem (VRP) and its variants. However, neighborhood exploration is known to be computationally expensive and time consuming, especially for large instances or problems with complex constraints. In this study, we explore a promising direction to address this challenge by introducing an original tensor… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  12. arXiv:2506.16981  [pdf, ps, other

    cs.CR

    SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization

    Authors: Hao Zhang, Shuo Shao, Song Li, Zhenyu Zhong, Yan Liu, Zhan Qin, Kui Ren

    Abstract: End-point monitoring solutions are widely deployed in today's enterprise environments to support advanced attack detection and investigation. These monitors continuously record system-level activities as audit logs and provide deep visibility into security events. Unfortunately, existing methods of semantic analysis based on audit logs have low granularity, only reaching the system call level, mak… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  13. arXiv:2506.15227  [pdf, ps, other

    cs.SE

    Large Language Models for Unit Testing: A Systematic Literature Review

    Authors: Quanjun Zhang, Chunrong Fang, Siqi Gu, Ye Shang, Zhenyu Chen, Liang Xiao

    Abstract: Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly redu… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  14. arXiv:2506.14854  [pdf, ps, other

    cs.CV cs.AI cs.HC cs.LG

    Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis

    Authors: Varun Mannam, Zhenyu Shi

    Abstract: Accurate video annotation plays a vital role in modern retail applications, including customer behavior analysis, product interaction detection, and in-store activity recognition. However, conventional annotation methods heavily rely on time-consuming manual labeling by human annotators, introducing non-robust frame selection and increasing operational costs. To address these challenges in the ret… ▽ More

    Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: Submitting to ICCV 2025 workshop: https://retailvisionworkshop.github.io/

  15. arXiv:2506.14731  [pdf, ps, other

    cs.CL cs.AI

    Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

    Authors: Ling Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan , et al. (21 additional authors not shown)

    Abstract: We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challeng… ▽ More

    Submitted 17 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: Technical Report

  16. arXiv:2506.13222  [pdf, ps, other

    cs.AI cs.LG

    NeuroPhysNet: A FitzHugh-Nagumo-Based Physics-Informed Neural Network Framework for Electroencephalograph (EEG) Analysis and Motor Imagery Classification

    Authors: Zhenyu Xia, Xinlei Huang, Suvash C. Saha

    Abstract: Electroencephalography (EEG) is extensively employed in medical diagnostics and brain-computer interface (BCI) applications due to its non-invasive nature and high temporal resolution. However, EEG analysis faces significant challenges, including noise, nonstationarity, and inter-subject variability, which hinder its clinical utility. Traditional neural networks often lack integration with biophys… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  17. arXiv:2506.13139  [pdf, ps, other

    stat.ML cs.LG

    Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

    Authors: Zhenyu Liao, Michael W. Mahoney

    Abstract: Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data and rely on overparameterized models, where classical low-dimensional intuitions break down. In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large and comparable, gives rise to novel and sometimes counterintuitive behaviors. This p… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 30 pages, 6 figures

  18. arXiv:2506.13114  [pdf, ps, other

    cs.SE

    Designing Deep Learning Frameworks for LLMs:Challenges, Expectations, and Opportunities

    Authors: Yanzhou Mu, Rong Wang, Juan Zhai, Chunrong Fang, Xiang Chen, Jiacong Wu, An Guo, Jiawei Shen, Bingzhuo Li, Zhenyu Chen

    Abstract: Large language models (LLMs) drive significant advancements in real industry applications. LLMs rely on DL frameworks for efficient model construction, distributed execution, and optimized deployment. Their large parameter scale and long execution cycles place extreme demands on DL frameworks in terms of scalability, stability, and efficiency. Therefore, poor usability, limited functionality, and… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages, 2 figures

  19. arXiv:2506.11902  [pdf, ps, other

    cs.LG cs.CL

    TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

    Authors: Zhenyu Hou, Ziniu Hu, Yujiang Li, Rui Lu, Jie Tang, Yuxiao Dong

    Abstract: Reinforcement learning (RL) with tree search has demonstrated superior performance in traditional reasoning tasks. Compared to conventional independent chain sampling strategies with outcome supervision, tree search enables better exploration of the reasoning space and provides dense, on-policy process rewards during RL training but remains under-explored in On-Policy LLM RL. We propose TreeRL, a… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 main conference

  20. arXiv:2506.11039  [pdf, other

    cs.LG cs.AI

    Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation

    Authors: Cheng Jin, Zhenyu Xiao, Chutao Liu, Yuantao Gu

    Abstract: Classifier-free guidance (CFG) has emerged as a pivotal advancement in text-to-image latent diffusion models, establishing itself as a cornerstone technique for achieving high-quality image synthesis. However, under high guidance weights, where text-image alignment is significantly enhanced, CFG also leads to pronounced color distortions in the generated images. We identify that these distortions… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

    Comments: Accepted at ICML 2025

  21. arXiv:2506.11038  [pdf, ps, other

    cs.LG

    MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning

    Authors: Linjie Li, Zhenyu Wu, Yang Ji

    Abstract: Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data while preserving previously learned information. Recently, CIL based on pre-trained models (PTMs) has achieved remarkable success. However, prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment betwee… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

    Comments: Accepted to KBS

  22. arXiv:2506.10972  [pdf, ps, other

    cs.LG cs.AI

    Farseer: A Refined Scaling Law in Large Language Models

    Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

    Abstract: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing… ▽ More

    Submitted 14 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 34

    ACM Class: I.2

  23. arXiv:2506.10177  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models

    Authors: Defang Chen, Zhenyu Zhou, Can Wang, Siwei Lyu

    Abstract: Diffusion-based generative models employ stochastic differential equations (SDEs) and their equivalent probability flow ordinary differential equations (ODEs) to establish a smooth transformation between complex high-dimensional data distributions and tractable prior distributions. In this paper, we reveal a striking geometric regularity in the deterministic sampling dynamics: each simulated sampl… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 50 pages. The short version appeared in ICML 2024. arXiv admin note: substantial text overlap with arXiv:2405.11326

  24. arXiv:2506.09316  [pdf, ps, other

    cs.LG

    On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

    Authors: Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, Aditya Akella

    Abstract: Large language models (LLMs) excel at capturing global token dependencies via self-attention but face prohibitive compute and memory costs on lengthy inputs. While sub-quadratic methods (e.g., linear attention) can reduce these costs, they often degrade accuracy due to overemphasizing recent tokens. In this work, we first propose dual-state linear attention (DSLA), a novel design that maintains tw… ▽ More

    Submitted 17 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  25. arXiv:2506.09280  [pdf, ps, other

    cs.DC cs.LG math.NA

    TTrace: Lightweight Error Checking and Diagnosis for Distributed Training

    Authors: Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li

    Abstract: Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs. However, the complexity of distributed training programs makes them particularly prone to silent bugs, which do not produce explicit error signal but lead to incorrect training outcome. Effectively detecting and localizing such silent bugs in di… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  26. arXiv:2506.08326  [pdf, other

    cs.LG cs.AI

    Graph Prompting for Graph Learning Models: Recent Advances and Future Directions

    Authors: Xingbo Fu, Zehong Wang, Zihan Chen, Jiazheng Li, Yaochen Zhu, Zhenyu Lei, Cong Shen, Yanfang Ye, Chuxu Zhang, Jundong Li

    Abstract: Graph learning models have demonstrated great prowess in learning expressive representations from large-scale graph data in a wide variety of real-world scenarios. As a prevalent strategy for training powerful graph learning models, the "pre-training, adaptation" scheme first pre-trains graph learning models on unlabeled graph data in a self-supervised manner and then adapts them to specific downs… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted by KDD 2025 Tutorial/Survey Track

  27. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  28. arXiv:2506.07636  [pdf, ps, other

    cs.AI

    SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling

    Authors: Haoran Wang, Zhenyu Hou, Yao Wei, Jie Tang, Yuxiao Dong

    Abstract: Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have offered end-to-end automation of the software development process. However, building effective SWE agents remains challenging due to the lack of high-quality t… ▽ More

    Submitted 22 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted to Findings of ACL'25

  29. arXiv:2506.07419  [pdf, ps, other

    cs.SE

    Generate Realistic Test Scenes for V2X Communication Systems

    Authors: An Guo, Xinyu Gao, Chunrong Fang, Haoxiang Tian, Weisong Sun, Yanzhou Mu, Shuncheng Tang, Lei Ma, Zhenyu Chen

    Abstract: Accurately perceiving complex driving environments is essential for ensuring the safe operation of autonomous vehicles. With the tremendous progress in deep learning and communication technologies, cooperative perception with Vehicle-to-Everything (V2X) technologies has emerged as a solution to overcome the limitations of single-agent perception systems in perceiving distant objects and occlusions… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  30. arXiv:2506.07056  [pdf, ps, other

    cs.CV cs.CR cs.LG

    D2R: dual regularization loss with collaborative adversarial generation for model robustness

    Authors: Zhenyu Liu, Huizhi Liang, Rajiv Ranjan, Zhanxing Zhu, Vaclav Snasel, Varun Ojha

    Abstract: The robustness of Deep Neural Network models is crucial for defending models against adversarial attacks. Recent defense methods have employed collaborative learning frameworks to enhance model robustness. Two key limitations of existing methods are (i) insufficient guidance of the target model via loss functions and (ii) non-collaborative adversarial generation. We, therefore, propose a dual regu… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Journal ref: The 34th International Conference on Artificial Neural Networks ICANN 2025

  31. arXiv:2506.07055  [pdf, ps, other

    cs.CV

    A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

    Authors: Tarique Dahri, Zulfiqar Ali Memon, Zhenyu Yu, Mohd. Yamani Idna Idris, Sheheryar Khan, Sadiq Ahmad, Maged Shoman, Saddam Aziz, Rizwan Qureshi

    Abstract: We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an av… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  32. arXiv:2506.06988  [pdf, ps, other

    cs.CV

    Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction

    Authors: Binxiao Huang, Zhihao Li, Shiyong Liu, Xiao Tang, Jiajun Tang, Jiaqi Lin, Yuxin Cheng, Zhenyu Chen, Xiaofei Wu, Ngai Wong

    Abstract: 3D Gaussian splatting (3DGS) has demonstrated exceptional performance in image-based 3D reconstruction and real-time rendering. However, regions with complex textures require numerous Gaussians to capture significant color variations accurately, leading to inefficiencies in rendering speed. To address this challenge, we introduce a hybrid representation for indoor scenes that combines 3DGS with te… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Journal ref: IJCAI-2025

  33. arXiv:2506.06176  [pdf, ps, other

    cs.CV

    SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery

    Authors: Zhenyu Yu, Mohd. Yamani Idna Idris, Pei Wang, Yuelong Xia, Fei Ma, Rizwan Qureshi

    Abstract: We propose SatelliteFormula, a novel symbolic regression framework that derives physically interpretable expressions directly from multi-spectral remote sensing imagery. Unlike traditional empirical indices or black-box learning models, SatelliteFormula combines a Vision Transformer-based encoder for spatial-spectral feature extraction with physics-guided constraints to ensure consistency and inte… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  34. arXiv:2506.05318  [pdf, ps, other

    cs.CV

    Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

    Authors: Haoyuan Li, Yanpeng Zhou, Yufei Gao, Tao Tang, Jianhua Han, Yujie Yuan, Dave Zhenyu Chen, Jiawang Bian, Hang Xu, Xiaodan Liang

    Abstract: Remarkable progress in 2D Vision-Language Models (VLMs) has spurred interest in extending them to 3D settings for tasks like 3D Question Answering, Dense Captioning, and Visual Grounding. Unlike 2D VLMs that typically process images through an image encoder, 3D scenes, with their intricate spatial structures, allow for diverse model architectures. Based on their encoder design, this paper categori… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  35. arXiv:2506.05079  [pdf, ps, other

    cs.SE

    LLM-Guided Scenario-based GUI Testing

    Authors: Shengcheng Yu, Yuchen Ling, Chunrong Fang, Quan Zhou, Chunyang Chen, Shaomin Zhu, Zhenyu Chen

    Abstract: The assurance of mobile app GUI is more and more significant. Automated GUI testing approaches of different strategies have been developed, while there are still huge gaps between the approaches and the app business logic, not taking the completion of specific testing scenarios as the exploration target, leading to the exploration missing of critical app functionalities. Learning from the manual t… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  36. arXiv:2506.04739  [pdf, ps, other

    cs.CL cs.AI

    Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection

    Authors: Ziyi Zhou, Xiaoming Zhang, Litian Zhang, Yibo Zhang, Zhenyu Guan, Chaozhuo Li, Philip S. Yu

    Abstract: The widespread dissemination of fake news on social media has significantly impacted society, resulting in serious consequences. Conventional deep learning methodologies employing small language models (SLMs) suffer from extensive supervised training requirements and difficulties adapting to evolving news environments due to data scarcity and distribution shifts. Large language models (LLMs), desp… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  37. arXiv:2505.24241  [pdf, ps, other

    cs.CL

    Advantageous Parameter Expansion Training Makes Better Large Language Models

    Authors: Naibin Gu, Yilong Chen, Zhenyu Zhang, Peng Fu, Zheng Lin, Shuohuan Wang, Yu Sun, Hua Wu, Weiping Wang, Haifeng Wang

    Abstract: Although scaling up the number of trainable parameters in both pre-training and fine-tuning can effectively improve the performance of large language models, it also leads to increased computational overhead. When delving into the parameter difference, we find that a subset of parameters, termed advantageous parameters, plays a crucial role in determining model performance. Further analysis reveal… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  38. arXiv:2505.23054  [pdf, ps, other

    cs.CV

    Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object

    Authors: Yuxuan Lin, Ruihang Chu, Zhenyu Chen, Xiao Tang, Lei Ke, Haoling Li, Yingji Zhong, Zhihao Li, Shiyong Liu, Xiaofei Wu, Jianzhuang Liu, Yujiu Yang

    Abstract: Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations c… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  39. arXiv:2505.22154  [pdf, ps, other

    cs.CV

    Learning A Robust RGB-Thermal Detector for Extreme Modality Imbalance

    Authors: Chao Tian, Chao Yang, Guoqing Zhu, Qiang Wang, Zhenyu He

    Abstract: RGB-Thermal (RGB-T) object detection utilizes thermal infrared (TIR) images to complement RGB data, improving robustness in challenging conditions. Traditional RGB-T detectors assume balanced training data, where both modalities contribute equally. However, in real-world scenarios, modality degradation-due to environmental factors or technical issues-can lead to extreme modality imbalance, causing… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  40. arXiv:2505.21805  [pdf, ps, other

    cs.SD eess.AS

    An Investigation on Speaker Augmentation for End-to-End Speaker Extraction

    Authors: Zhenghai You, Zhenyu Zhou, Lantian Li, Dong Wang

    Abstract: Target confusion, defined as occasional switching to non-target speakers, poses a key challenge for end-to-end speaker extraction (E2E-SE) systems. We argue that this problem is largely caused by the lack of generalizability and discrimination of the speaker embeddings, and introduce a simple yet effective speaker augmentation strategy to tackle the problem. Specifically, we propose a time-domain… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  41. arXiv:2505.20835  [pdf, other

    cs.DC

    ECC-SNN: Cost-Effective Edge-Cloud Collaboration for Spiking Neural Networks

    Authors: Di Yu, Changze Lv, Xin Du, Linshan Jiang, Wentao Tong, Zhenyu Liao, Xiaoqing Zheng, Shuiguang Deng

    Abstract: Most edge-cloud collaboration frameworks rely on the substantial computational and storage capabilities of cloud-based artificial neural networks (ANNs). However, this reliance results in significant communication overhead between edge devices and the cloud and high computational energy consumption, especially when applied to resource-constrained edge devices. To address these challenges, we propo… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  42. arXiv:2505.19970  [pdf, other

    cs.CL

    CP-Router: An Uncertainty-Aware Router Between LLM and LRM

    Authors: Jiayuan Su, Fulin Lin, Zhaopeng Feng, Han Zheng, Teng Wang, Zhenyu Xiao, Xinlong Zhao, Zuozhu Liu, Lu Cheng, Hongwei Wang

    Abstract: Recent advances in Large Reasoning Models (LRMs) have significantly improved long-chain reasoning capabilities over Large Language Models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To overcome this, we propose CP-Router, a training-free and model-agnostic routing framework that… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  43. arXiv:2505.19897  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

    Authors: Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu

    Abstract: Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are pavin… ▽ More

    Submitted 27 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: work in progress

  44. arXiv:2505.19152  [pdf, ps, other

    cs.IT eess.SP

    RIS-Assisted Survivable Fronthaul Design in Cell-Free Massive MIMO System

    Authors: Zhenyu Li, Özlem Tuğfe Demir, Emil Björnson, Cicek Cavdar

    Abstract: This paper investigates the application of reconfigurable intelligent surfaces (RISs) to improve fronthaul link survivability in cell-free massive MIMO (CF mMIMO) systems. To enhance the fronthaul survivability, two complementary mechanisms are considered. Firstly, RIS is set to provide reliable line-of-sight (LOS) connectivity and enhance the mmWave backup link. Secondly, a resource-sharing schem… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 6 pages, 4 figures, submit to IEEE Globecom 2025

  45. arXiv:2505.19000  [pdf, other

    cs.CL cs.CV

    VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

    Authors: Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang

    Abstract: Applying Reinforcement Learning (RL) to Video Large Language Models (Video-LLMs) shows significant promise for complex video reasoning. However, popular Reinforcement Fine-Tuning (RFT) methods, such as outcome-based Group Relative Policy Optimization (GRPO), are limited by data preparation bottlenecks (e.g., noise or high cost) and exhibit unstable improvements in the quality of long chain-of-thou… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 19 pages, 9 figures, Project Link: https://github.com/HITsz-TMG/VerIPO

  46. arXiv:2505.18691  [pdf, ps, other

    cs.RO cs.MA

    Coordinated guidance and control for multiple parafoil system landing

    Authors: Zhenyu Wei, Zhijiang Shao, Lorenz T. Biegler

    Abstract: Multiple parafoil landing is an enabling technology for massive supply delivery missions. However, it is still an open question to design a collision-free, computation-efficient guidance and control method for unpowered parafoils. To address this issue, this paper proposes a coordinated guidance and control method for multiple parafoil landing. First, the multiple parafoil landing process is formu… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  47. arXiv:2505.17779  [pdf, ps, other

    cs.CV cs.LG

    U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

    Authors: Anjie Le, Henan Liu, Yue Wang, Zhenyu Liu, Rongkun Zhu, Taohan Weng, Jinze Yu, Boyang Wang, Yalun Wu, Kaiwen Yan, Quanlin Sun, Meirui Jiang, Jialun Pei, Siya Liu, Haoyun Zheng, Zhoujun Li, Alison Noble, Jacques Souquet, Xiaoqing Guo, Manxi Lin, Hongcheng Guo

    Abstract: Ultrasound is a widely-used imaging modality critical to global healthcare, yet its interpretation remains challenging due to its varying image quality on operators, noises, and anatomical structures. Although large vision-language models (LVLMs) have demonstrated impressive multimodal capabilities across natural and medical domains, their performance on ultrasound remains largely unexplored. We i… ▽ More

    Submitted 30 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  48. arXiv:2505.15734  [pdf, ps, other

    cs.CL cs.AI cs.LG

    DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning

    Authors: Gaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang

    Abstract: Large language models (LLMs) have improved significantly in their reasoning through extensive training on massive datasets. However, relying solely on additional data for improvement is becoming increasingly impractical, highlighting the need for models to autonomously enhance their reasoning without external supervision. In this paper, we propose Debate, Train, Evolve (DTE), a novel ground truth-… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  49. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  50. arXiv:2505.15287  [pdf, ps, other

    cs.CV

    GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation

    Authors: Yuchen Li, Chaoran Feng, Zhenyu Tang, Kaiyuan Deng, Wangbo Yu, Yonghong Tian, Li Yuan

    Abstract: We introduce GS2E (Gaussian Splatting to Event), a large-scale synthetic event dataset for high-fidelity event vision tasks, captured from real-world sparse multi-view RGB images. Existing event datasets are often synthesized from dense RGB videos, which typically lack viewpoint diversity and geometric consistency, or depend on expensive, difficult-to-scale hardware setups. GS2E overcomes these li… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 21 pages, 7 figures. More details at http://intothemild.github.io/GS2E.github.io