Skip to main content

Showing 1–50 of 2,345 results for author: wang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05932  [pdf, ps, other

    cs.SE cs.CV

    TigAug: Data Augmentation for Testing Traffic Light Detection in Autonomous Driving Systems

    Authors: You Lu, Dingji Wang, Kaifeng Huang, Bihuan Chen, Xin Peng

    Abstract: Autonomous vehicle technology has been developed in the last decades with recent advances in sensing and computing technology. There is an urgent need to ensure the reliability and robustness of autonomous driving systems (ADSs). Despite the recent achievements in testing various ADS modules, little attention has been paid on the automated testing of traffic light detection models in ADSs. A commo… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.04678  [pdf, ps, other

    cs.CV

    ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing

    Authors: Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Zhuo Zheng

    Abstract: Recent advancements in generative methods, especially diffusion models, have made great progress in remote sensing image synthesis. Despite these advancements, existing methods have not explored the simulation of future scenarios based on given scenario images. This simulation capability has wide applications for urban planning, land managementChangeBridge: Spatiotemporal Image Generation with Mul… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.04059  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Attributing Data for Sharpness-Aware Minimization

    Authors: Chenyang Ren, Yifan Jia, Huanyi Xie, Zhaobin Xu, Tianxing Wei, Liangyu Wang, Lijie Hu, Di Wang

    Abstract: Sharpness-aware Minimization (SAM) improves generalization in large-scale model training by linking loss landscape geometry to generalization. However, challenges such as mislabeled noisy data and privacy concerns have emerged as significant issues. Data attribution, which identifies the contributions of specific training samples, offers a promising solution. However, directly rendering existing d… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: 25 pages

  4. arXiv:2507.03657  [pdf, ps, other

    cs.CV

    Dynamic Multimodal Prototype Learning in Vision-Language Models

    Authors: Xingyu Zhu, Shuo Wang, Beier Zhu, Miaoge Li, Yunfan Li, Junfeng Fang, Zhicai Wang, Dongsheng Wang, Hanwang Zhang

    Abstract: With the increasing attention to pre-trained vision-language models (VLMs), \eg, CLIP, substantial efforts have been devoted to many downstream tasks, especially in test-time adaptation (TTA). However, previous works focus on learning prototypes only in the textual modality while overlooking the ambiguous semantics in class names. These ambiguities lead to textual prototypes that are insufficient… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  5. arXiv:2507.03262  [pdf, ps, other

    cs.CV cs.AI

    Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

    Authors: Song Mao, Yang Chen, Pinglong Cai, Ding Wang, Guohang Yan, Zhi Yu, Botian Shi

    Abstract: Multimodal Large Language Models (MLLMs) increasingly adopt multiple vision encoders to capture diverse visual information, ranging from coarse semantics to fine grained details. While this approach is intended to enhance visual understanding capability, we observe that the performance gains from adding encoders often diminish and can even lead to performance degradation, a phenomenon we term enco… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Wrok in Process

  6. arXiv:2507.03211  [pdf, ps, other

    cs.LG cs.PF

    DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing

    Authors: Liangyu Wang, Huanyi Xie, Di Wang

    Abstract: Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passes, its application to multi-hundred-billion-parameter models is constrained by GPU memory and compute throughput. The ZO2 framework addresses the memory bottleneck by offloading model parameters to CP… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  7. arXiv:2507.01921  [pdf, ps, other

    cs.CL

    NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

    Authors: Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li

    Abstract: Recent work has shown that distilling reasoning traces from a larger teacher model via supervised finetuning outperforms reinforcement learning with the smaller student model alone (Guo et al. 2025). However, there has not been a systematic study of what kind of reasoning demonstrations from the teacher are most effective in improving the student model's reasoning capabilities. In this work we cur… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  8. arXiv:2507.01154  [pdf, ps, other

    cs.LG cs.CR

    FlashDP: Private Training Large Language Models with Efficient DP-SGD

    Authors: Liangyu Wang, Junxiao Wang, Jie Ren, Zihang Xiang, David E. Keyes, Di Wang

    Abstract: As large language models (LLMs) increasingly underpin technological advancements, the privacy of their training data emerges as a critical concern. Differential Privacy (DP) serves as a rigorous mechanism to protect this data, yet its integration via Differentially Private Stochastic Gradient Descent (DP-SGD) introduces substantial challenges, primarily due to the complexities of per-sample gradie… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  9. arXiv:2507.00917  [pdf, ps, other

    cs.RO

    A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

    Authors: Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai

    Abstract: The pursuit of artificial general intelligence (AGI) has placed embodied intelligence at the forefront of robotics research. Embodied intelligence focuses on agents capable of perceiving, reasoning, and acting within the physical world. Achieving robust embodied intelligence requires not only advanced perception and control, but also the ability to ground abstract cognition in real-world interacti… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey

  10. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 2 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  11. arXiv:2506.23149  [pdf, ps, other

    cs.CL

    V-SYNTHESIS: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via V-Entropy

    Authors: Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng

    Abstract: High labeling cost for in-context learning (ICL) demonstrations motivates using large language models (LLMs) for synthesis to reduce overhead. However, existing synthesis methods are mainly task-specific or rely on pre-existing demonstrations. So this paper focuses on synthesizing demonstrations from scratch for arbitrary tasks. A major challenge in synthesizing from scratch is ensuring consistenc… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  12. arXiv:2506.23146  [pdf, ps, other

    cs.CL

    Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions

    Authors: Dingzriui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng

    Abstract: In-context learning (ICL) has emerged as an effective approach to enhance the performance of large language models (LLMs). However, its effectiveness varies significantly across models and tasks, posing challenges for practitioners to determine when ICL reliably improves performance. Current evaluation approaches, reliant on performance change after applying ICL, suffer from low reliability, poor… ▽ More

    Submitted 1 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

  13. arXiv:2506.23133  [pdf, ps, other

    cs.CL

    Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format

    Authors: Dingzirui Wang, Xuanliang Zhang, Rongyu Cao, Longxu Dou, Xianzhen Luo, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li

    Abstract: Generating and voting multiple answers is an effective method to mitigate reasoning inconsistencies of large language models (LLMs). Prior works have shown that multiple reasoning formats outperform a single format when generating multiple answers. However, previous works using multiple formats rely on formats labeled by humans, which could be unsuitable for all tasks and have high labeling costs.… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  14. arXiv:2506.23079  [pdf

    cs.CY cs.MA

    Research on Comprehensive Classroom Evaluation System Based on Multiple AI Models

    Authors: Cong Xie, Li Yang, Daben Wang, Jing Xiao

    Abstract: The promotion of the national education digitalization strategy has facilitated the development of teaching quality evaluation towards all-round, process-oriented, precise, and intelligent directions, inspiring explorations into new methods and technologies for educational quality assurance. Classroom teaching evaluation methods dominated by teaching supervision and student teaching evaluation suf… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  15. arXiv:2506.22950  [pdf, ps, other

    cs.LG

    Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models

    Authors: Liangyu Wang, Huanyi Xie, Xinhai Wang, Tianjin Huang, Mengdi Li, Di Wang

    Abstract: Group-based reinforcement learning algorithms such as Group Reward Policy Optimization (GRPO) have proven effective for fine-tuning large language models (LLMs) with human feedback. However, generating and storing multiple responses per prompt incurs substantial memory overhead, especially as the sample group size increases, limiting scalability under constrained hardware. We propose Infinite Sa… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  16. arXiv:2506.21853  [pdf, ps, other

    cs.RO

    Skill-Nav: Enhanced Navigation with Versatile Quadrupedal Locomotion via Waypoint Interface

    Authors: Dewei Wang, Chenjia Bai, Chenhui Li, Jiyuan Shi, Yan Ding, Chi Zhang, Bin Zhao

    Abstract: Quadrupedal robots have demonstrated exceptional locomotion capabilities through Reinforcement Learning (RL), including extreme parkour maneuvers. However, integrating locomotion skills with navigation in quadrupedal robots has not been fully investigated, which holds promise for enhancing long-distance movement capabilities. In this paper, we propose Skill-Nav, a method that incorporates quadrupe… ▽ More

    Submitted 30 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: 17pages, 6 figures

  17. arXiv:2506.21414  [pdf, ps, other

    cs.AR

    Accelerating GNN Training through Locality-aware Dropout and Merge

    Authors: Gongjian Sun, Mingyu Yan, Dengke Han, Runzhen Xue, Duo Wang, Xiaochun Ye, Dongrui Fan

    Abstract: Graph Neural Networks (GNNs) have demonstrated significant success in graph learning and are widely adopted across various critical domains. However, the irregular connectivity between vertices leads to inefficient neighbor aggregation, resulting in substantial irregular and coarse-grained DRAM accesses. This lack of data locality presents significant challenges for execution platforms, ultimately… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: under review in TPDS. extend version of DATE 2025

  18. arXiv:2506.20445  [pdf, ps, other

    cs.RO

    Learn to Position -- A Novel Meta Method for Robotic Positioning

    Authors: Dongkun Wang, Junkai Zhao, Yunfei Teng, Jieyang Peng, Wenjing Xue, Xiaoming Tao

    Abstract: Absolute positioning accuracy is a vital specification for robots. Achieving high position precision can be challenging due to the presence of various sources of errors. Meanwhile, accurately depicting these errors is difficult due to their stochastic nature. Vision-based methods are commonly integrated to guide robotic positioning, but their performance can be highly impacted by inevitable occlus… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  19. arXiv:2506.20381  [pdf, ps, other

    cs.CV cs.LG

    Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking

    Authors: Ben Kang, Xin Chen, Jie Zhao, Chunjuan Bo, Dong Wang, Huchuan Lu

    Abstract: Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities. However, their practicality is limited on resource-constrained devices because of their slow processing speeds. To address this challenge, we present HiT, a novel family of efficient tracking models that achieve high performance while maintaining fast operation across various d… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: This paper was accepted by International Journal of Computer Vision(IJCV)

  20. arXiv:2506.19492  [pdf, ps, other

    cs.CL

    Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs

    Authors: Shu Yang, Junchao Wu, Xuansheng Wu, Derek Wong, Ninhao Liu, Di Wang

    Abstract: Large Reasoning Models (LRMs) have achieved remarkable performance on complex tasks by engaging in extended reasoning before producing final answers, yet this strength introduces the risk of overthinking, where excessive token generation occurs even for simple tasks. While recent work in efficient reasoning seeks to reduce reasoning length while preserving accuracy, it remains unclear whether such… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  21. arXiv:2506.18678  [pdf, ps, other

    cs.CV cs.RO

    MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

    Authors: Tianchen Deng, Guole Shen, Xun Chen, Shenghai Yuan, Hongming Shen, Guohao Peng, Zhenyu Wu, Jingchuan Wang, Lihua Xie, Danwei Wang, Hesheng Wang, Weidong Chen

    Abstract: Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed mu… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  22. arXiv:2506.17578  [pdf, ps, other

    cs.CL

    AgriCHN: A Comprehensive Cross-domain Resource for Chinese Agricultural Named Entity Recognition

    Authors: Lingxiao Zeng, Yiqi Tong, Wei Guo, Huarui Wu, Lihao Ge, Yijun Ye, Fuzhen Zhuang, Deqing Wang, Wei Guo, Cheng Chen

    Abstract: Agricultural named entity recognition is a specialized task focusing on identifying distinct agricultural entities within vast bodies of text, including crops, diseases, pests, and fertilizers. It plays a crucial role in enhancing information extraction from extensive agricultural text resources. However, the scarcity of high-quality agricultural datasets, particularly in Chinese, has resulted in… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  23. arXiv:2506.17252  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Sample Scheduling for Direct Preference Optimization

    Authors: Zixuan Huang, Yikun Ban, Lean Fu, Xiaojie Li, Zhongxiang Dai, Jianxin Li, Deqing Wang

    Abstract: Direct Preference Optimization (DPO) has emerged as an effective approach for aligning large language models (LLMs) with human preferences. However, its performance is highly dependent on the quality of the underlying human preference data. To address this bottleneck, prior work has explored various data selection strategies, but these methods often overlook the impact of the evolving states of th… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  24. arXiv:2506.15617  [pdf, ps, other

    cs.CL cs.AI cs.LG

    The Compositional Architecture of Regret in Large Language Models

    Authors: Xiangxiang Cui, Shu Yang, Tianjin Huang, Wanyu Lin, Lijie Hu, Di Wang

    Abstract: Regret in Large Language Models refers to their explicit regret expression when presented with evidence contradicting their previously generated misinformation. Studying the regret mechanism is crucial for enhancing model reliability and helps in revealing how cognition is coded in neural networks. To understand this mechanism, we need to first identify regret expressions in model outputs, then an… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 23 pages

  25. arXiv:2506.15577  [pdf, ps, other

    cs.CV

    A Unified Graph-based Framework for Scalable 3D Tree Reconstruction and Non-Destructive Biomass Estimation from Point Clouds

    Authors: Di Wang, Shi Li

    Abstract: Estimating forest above-ground biomass (AGB) is crucial for assessing carbon storage and supporting sustainable forest management. Quantitative Structural Model (QSM) offers a non-destructive approach to AGB estimation through 3D tree structural reconstruction. However, current QSM methods face significant limitations, as they are primarily designed for individual trees,depend on high-quality poin… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 17 pages,19 figures

  26. arXiv:2506.15442  [pdf, ps, other

    cs.CV cs.AI

    Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

    Authors: Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, Qingxiang Lin, Zeqiang Lai, Xianghui Yang, Huiwen Shi, Zibo Zhao, Bowen Zhang, Hongyu Yan, Lifu Wang, Sicong Liu, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Yulin Cai, Jiaao Yu , et al. (28 additional authors not shown)

    Abstract: 3D AI-generated content (AIGC) is a passionate field that has significantly accelerated the creation of 3D models in gaming, film, and design. Despite the development of several groundbreaking models that have revolutionized 3D generation, the field remains largely accessible only to researchers, developers, and designers due to the complexities involved in collecting, processing, and training 3D… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Github link: https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1

  27. arXiv:2506.15253  [pdf, ps, other

    cs.CR cs.AI

    RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments

    Authors: Yuchuan Fu, Xiaohan Yuan, Dongxia Wang

    Abstract: The rapid deployment of Large language model (LLM) agents in critical domains like healthcare and finance necessitates robust security frameworks. To address the absence of standardized evaluation benchmarks for these agents in dynamic environments, we introduce RAS-Eval, a comprehensive security benchmark supporting both simulated and real-world tool execution. RAS-Eval comprises 80 test cases an… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 12 pages, 8 figures

  28. arXiv:2506.13832  [pdf, ps, other

    cs.SE cs.AI

    FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

    Authors: Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding, Siyao Liu, Tong Liu, Dandan Wang, Yanan Liu, Zhaojian Li

    Abstract: Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end validation is absent. These issues hinder the accurate assessment of model performance. To address these challenges, we present FrontendBench, a benchmark co-develope… ▽ More

    Submitted 18 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

  29. arXiv:2506.13725  [pdf, ps, other

    cs.RO

    CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

    Authors: Wenxuan Song, Jiayi Chen, Pengxiang Ding, Yuxin Huang, Han Zhao, Donglin Wang, Haoang Li

    Abstract: In recent years, Vision-Language-Action (VLA) models have become a vital research direction in robotics due to their impressive multimodal understanding and generalization capabilities. Despite the progress, their practical deployment is severely constrained by inference speed bottlenecks, particularly in high-frequency and dexterous manipulation tasks. While recent studies have explored Jacobi de… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 16 pages

  30. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  31. arXiv:2506.13045  [pdf, ps, other

    cs.LG cs.CV

    A Comprehensive Survey on Continual Learning in Generative Models

    Authors: Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao, Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: The rapid advancement of generative models has enabled modern AI systems to comprehend and produce highly sophisticated content, even achieving human-level performance in specific domains. However, these models remain fundamentally constrained by catastrophic forgetting - a persistent challenge where adapting to new tasks typically leads to significant degradation in performance on previously lear… ▽ More

    Submitted 18 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

    Comments: Preprint

  32. arXiv:2506.11526  [pdf, ps, other

    cs.RO cs.AI

    Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

    Authors: Yuan Gao, Mattia Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller, Roberto Brusnicki, Baha Zarrouki, Alessio Gambi, Jan Frederik Totz, Kai Storms, Steven Peters, Andrea Stocco, Bassam Alrifaee, Marco Pavone, Johannes Betz

    Abstract: For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing l… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  33. arXiv:2506.11421  [pdf

    cs.IR cs.AI cs.LG

    Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems

    Authors: Junli Shao, Jing Dong, Dingzhou Wang, Kowei Shih, Dannier Li, Chengrui Zhou

    Abstract: With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation systems is how to reduce inference latency and increase system throughput without sacrificing recommendation quality. This paper addresses the high computational co… ▽ More

    Submitted 17 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  34. arXiv:2506.10853  [pdf, ps, other

    cs.AI cs.CY

    A Study on Individual Spatiotemporal Activity Generation Method Using MCP-Enhanced Chain-of-Thought Large Language Models

    Authors: Yu Zhang, Yang Hu, De Wang

    Abstract: Human spatiotemporal behavior simulation is critical for urban planning research, yet traditional rule-based and statistical approaches suffer from high computational costs, limited generalizability, and poor scalability. While large language models (LLMs) show promise as "world simulators," they face challenges in spatiotemporal reasoning including limited spatial cognition, lack of physical cons… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  35. arXiv:2506.10826  [pdf, ps, other

    cs.RO

    RationalVLA: A Rational Vision-Language-Action Model with Dual System

    Authors: Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Can Cui, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li

    Abstract: A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasibl… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 14 pages

  36. arXiv:2506.10630  [pdf, ps, other

    cs.LG cs.AI

    Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

    Authors: Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, Jintao Zhang

    Abstract: To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  37. arXiv:2506.10600  [pdf, ps, other

    cs.RO cs.CV

    EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

    Authors: Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su

    Abstract: Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D comp… ▽ More

    Submitted 16 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  38. LightKG: Efficient Knowledge-Aware Recommendations with Simplified GNN Architecture

    Authors: Yanhui Li, Dongxia Wang, Zhu Sun, Haonan Zhang, Huizhong Guo

    Abstract: Recently, Graph Neural Networks (GNNs) have become the dominant approach for Knowledge Graph-aware Recommender Systems (KGRSs) due to their proven effectiveness. Building upon GNN-based KGRSs, Self-Supervised Learning (SSL) has been incorporated to address the sparity issue, leading to longer training time. However, through extensive experiments, we reveal that: (1)compared to other KGRSs, the exi… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  39. arXiv:2506.10334  [pdf, ps, other

    cs.CV cs.AI

    Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions

    Authors: Deliang Wang, Chao Yang, Gaowei Chen

    Abstract: Students' academic emotions significantly influence their social behavior and learning performance. Traditional approaches to automatically and accurately analyze these emotions have predominantly relied on supervised machine learning algorithms. However, these models often struggle to generalize across different contexts, necessitating repeated cycles of data collection, annotation, and training.… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  40. arXiv:2506.09512  [pdf, ps, other

    eess.SY cs.LG

    A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications

    Authors: Donglin Wang, Anjie Qiu, Qiuheng Zhou, Hans D. Schotten

    Abstract: The rapid advancement of Vehicle-to-Everything (V2X) communication is transforming Intelligent Transportation Systems (ITS), with 6G networks expected to provide ultra-reliable, low-latency, and high-capacity connectivity for Connected and Autonomous Vehicles (CAVs). Artificial Intelligence (AI) and Machine Learning (ML) have emerged as key enablers in optimizing V2X communication by enhancing net… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 7 pages, 1 figure

  41. arXiv:2506.09096  [pdf, ps, other

    cs.LG cs.AI

    Intra-Trajectory Consistency for Reward Modeling

    Authors: Chaoyang Zhou, Shunyu Liu, Zengmao Wang, Di Wang, Rong-Cheng Tu, Bo Du, Dacheng Tao

    Abstract: Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) or inference-time verification. Current reward modeling typically relies on scores of overall responses to learn the outcome rewards for the responses. However, since the response-level scores are coarse-grained supervision signals, the reward model struggles to… ▽ More

    Submitted 16 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: Under review

  42. arXiv:2506.09093  [pdf, ps, other

    cs.LG cs.AI

    Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

    Authors: Bingjie Zhang, Hongkang Li, Changlong Shi, Guowei Rong, He Zhao, Dongsheng Wang, Dandan Guo, Meng Wang

    Abstract: Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL, thus circumventing the need for training data and expanding the scope of applicable scenarios of MTL. However, current a… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: Minor formatting adjustments; no changes to content

  43. arXiv:2506.09040  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

    Authors: Dianyi Wang, Wei Song, Yikun Wang, Siyuan Wang, Kaicheng Yu, Zhongyu Wei, Jiaqi Wang

    Abstract: Typical large vision-language models (LVLMs) apply autoregressive supervision solely to textual sequences, without fully incorporating the visual modality into the learning process. This results in three key limitations: (1) an inability to utilize images without accompanying captions, (2) the risk that captions omit critical visual details, and (3) the challenge that certain vision-centric conten… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  44. arXiv:2506.08840  [pdf, ps, other

    cs.RO

    MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains

    Authors: Dewei Wang, Xinmiao Wang, Xinzhe Liu, Jiyuan Shi, Yingnan Zhao, Chenjia Bai, Xuelong Li

    Abstract: Humanoid robots have demonstrated robust locomotion capabilities using Reinforcement Learning (RL)-based approaches. Further, to obtain human-like behaviors, existing methods integrate human motion-tracking or motion prior in the RL framework. However, these methods are limited in flat terrains with proprioception only, restricting their abilities to traverse challenging terrains with human-like g… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures

  45. arXiv:2506.08552  [pdf, ps, other

    cs.CL cs.AI

    Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

    Authors: Xinyuan Wang, Dongjie Wang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Sixun Dong, Kunpeng Liu, Yanjie Fu

    Abstract: Reasoning is a key component of language understanding in Large Language Models. While Chain-of-Thought prompting enhances performance via explicit intermediate steps, it suffers from sufficient token overhead and a fixed reasoning trajectory, preventing step-wise refinement. Recent advances in latent reasoning address these limitations by refining internal reasoning processes directly in the mode… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  46. arXiv:2506.08038  [pdf, ps, other

    eess.SY cs.MA

    Joint Routing and Control Optimization in VANET

    Authors: Chen Huang, Dingxuan Wang, Ronghui Hou

    Abstract: In this paper, we introduce DynaRoute, an adaptive joint optimization framework for dynamic vehicular networks that simultaneously addresses platoon control and data transmission through trajectory-aware routing and safety-constrained vehicle coordination. DynaRoute guarantees continuous vehicle movement via platoon safety control with optimizing transmission paths through real-time trajectory pre… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 11 pages; 10 figures

  47. arXiv:2506.07503  [pdf

    cs.SE

    Large Language Models for Multilingual Vulnerability Detection: How Far Are We?

    Authors: Honglin Shu, Michael Fu, Junji Yu, Dong Wang, Chakkrit Tantithamthavorn, Junjie Chen, Yasutaka Kamei

    Abstract: Various deep learning-based approaches utilizing pre-trained language models (PLMs) have been proposed for automated vulnerability detection. With recent advancements in large language models (LLMs), several studies have begun exploring their application to vulnerability detection tasks. However, existing studies primarily focus on specific programming languages (e.g., C/C++) and function-level de… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 33 pages, 9 figures

  48. arXiv:2506.07184  [pdf, ps, other

    cs.AI cs.CL cs.CV

    Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

    Authors: Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang

    Abstract: While multimodal large language models excel at various tasks, they still suffer from hallucinations, which limit their reliability and scalability for broader domain applications. To address this issue, recent research mainly focuses on objective hallucination. However, for sequential images, besides objective hallucination, there is also behavioral hallucination, which is less studied. This work… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  49. arXiv:2506.07180  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

    Authors: Wenrui Zhou, Shu Yang, Qingsong Yang, Zikun Guo, Lijie Hu, Di Wang

    Abstract: As video large language models (Video-LLMs) become increasingly integrated into real-world applications that demand grounded multimodal reasoning, ensuring their factual consistency and reliability is of critical importance. However, sycophancy, the tendency of these models to align with user input even when it contradicts the visual evidence, undermines their trustworthiness in such contexts. Cur… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 24 pages

  50. arXiv:2506.07168  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment

    Authors: Huanyi Xie, Lijie Hu, Lu Yu, Tianhao Huang, Longfei Li, Meng Li, Jun Zhou, Huan Wang, Di Wang

    Abstract: In the realm of Text-attributed Graphs (TAGs), traditional graph neural networks (GNNs) often fall short due to the complex textual information associated with each node. Recent methods have improved node representations by leveraging large language models (LLMs) to enhance node text features, but these approaches typically require extensive annotations or fine-tuning across all nodes, which is bo… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 23 pages