Skip to main content

Showing 1–50 of 2,705 results for author: zhang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.04415  [pdf

    cs.LG

    Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data

    Authors: Wenrui Li, Qinghao Zhang, Xiaowo Wang

    Abstract: Understanding causal heterogeneity is essential for scientific discovery in domains such as biology and medicine. However, existing methods lack causal awareness, with insufficient modeling of heterogeneity, confounding, and observational constraints, leading to poor interpretability and difficulty distinguishing true causal heterogeneity from spurious associations. We propose an unsupervised fram… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  2. arXiv:2509.04292  [pdf, ps, other

    cs.CL

    Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

    Authors: Qinyan Zhang, Xinping Lei, Ruijie Miao, Yu Fu, Haojie Fan, Le Chang, Jiafan Hou, Dingling Zhang, Zhongfei Hou, Ziqiang Yang, Changxin Pu, Fei Hu, Jingkai Liu, Mengyun Liu, Yang Liu, Xiang Gao, Jiaheng Liu, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang

    Abstract: Large Language Models (LLMs) achieve strong performance on diverse tasks but often exhibit cognitive inertia, struggling to follow instructions that conflict with the standardized patterns learned during supervised fine-tuning (SFT). To evaluate this limitation, we propose Inverse IFEval, a benchmark that measures models Counter-intuitive Abilitytheir capacity to override training-induced biases a… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  3. arXiv:2509.04093  [pdf, ps, other

    cs.SD

    Open-Source Full-Duplex Conversational Datasets for Natural and Interactive Speech Synthesis

    Authors: Zhitong Zhou, Qingqing Zhang, Lei Luo, Jiechen Liu, Ruohua Zhou

    Abstract: Full-duplex, spontaneous conversational data are essential for enhancing the naturalness and interactivity of synthesized speech in conversational TTS systems. We present two open-source dual-track conversational speech datasets, one in Chinese and one in English, designed to enhance the naturalness of synthesized speech by providing more realistic conversational data. The two datasets contain a t… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  4. arXiv:2509.03887  [pdf, ps, other

    cs.CV

    OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction

    Authors: Bu Jin, Songen Gu, Xiaotao Hu, Yupeng Zheng, Xiaoyang Guo, Qian Zhang, Xiaoxiao Long, Wei Yin

    Abstract: In this paper, we propose OccTENS, a generative occupancy world model that enables controllable, high-fidelity long-term occupancy generation while maintaining computational efficiency. Different from visual generation, the occupancy world model must capture the fine-grained 3D geometry and dynamic evolution of the 3D scenes, posing great challenges for the generative models. Recent approaches bas… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  5. arXiv:2509.03244  [pdf, ps, other

    cs.LG cs.AI

    FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization

    Authors: Yiming Yao, Fei Liu, Liang Zhao, Xi Lin, Qingfu Zhang

    Abstract: Expensive multi-objective optimization is a prevalent and crucial concern in many real-world scenarios, where sample-efficiency is vital due to the limited evaluations to recover the true Pareto front for decision making. Existing works either involve rebuilding Gaussian process surrogates from scratch for each objective in each new problem encountered, or rely on extensive past domain experiments… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  6. arXiv:2509.01875  [pdf, ps, other

    eess.SY cs.LG

    RadioDiff-Loc: Diffusion Model Enhanced Scattering Congnition for NLoS Localization with Sparse Radio Map Estimation

    Authors: Xiucheng Wang, Qiming Zhang, Nan Cheng

    Abstract: Accurate localization of non-cooperative signal sources in non-line-of-sight (NLoS) environments remains a critical challenge with a wide range of applications, including autonomous navigation, industrial automation, and emergency response. In such settings, traditional positioning techniques relying on line-of-sight (LoS) or cooperative signaling fail due to severe multipath propagation and unkno… ▽ More

    Submitted 4 September, 2025; v1 submitted 1 September, 2025; originally announced September 2025.

  7. arXiv:2509.01364  [pdf, ps, other

    cs.RO

    TopoNav: Topological Graphs as a Key Enabler for Advanced Object Navigation

    Authors: Peiran Liu, Qiang Zhang, Daojie Peng, Lingfeng Zhang, Yihao Qin, Hang Zhou, Jun Ma, Renjing Xu, Yiding Ji

    Abstract: Object Navigation (ObjectNav) has made great progress with large language models (LLMs), but still faces challenges in memory management, especially in long-horizon tasks and dynamic scenes. To address this, we propose TopoNav, a new framework that leverages topological structures as spatial memory. By building and updating a topological graph that captures scene connections, adjacency, and semant… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

  8. arXiv:2509.01106  [pdf, ps, other

    cs.AI cs.CV cs.RO

    Robix: A Unified Model for Robot Interaction, Reasoning and Planning

    Authors: Huang Fang, Mengxi Zhang, Heng Dong, Wei Li, Zixuan Wang, Qifeng Zhang, Xueyun Tian, Yucheng Hu, Hang Li

    Abstract: We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions,… ▽ More

    Submitted 31 August, 2025; originally announced September 2025.

    Comments: Tech report. Project page: https://robix-seed.github.io/robix/

  9. arXiv:2508.21444  [pdf, ps, other

    cs.CV

    Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content

    Authors: Jiayu Yang, Weijian Su, Songqian Zhang, Yuqi Han, Jinli Suo, Qiang Zhang

    Abstract: 3D Gaussian Splatting (3DGS) enables high-fidelity real-time rendering, a key requirement for immersive applications. However, the extension of 3DGS to dynamic scenes remains limitations on the substantial data volume of dense Gaussians and the prolonged training time required for each frame. This paper presents \M, a scalable Gaussian Splatting framework designed for efficient training in streami… ▽ More

    Submitted 29 August, 2025; originally announced August 2025.

  10. arXiv:2508.21044  [pdf, ps, other

    cs.CV

    MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs

    Authors: Junpeng Ma, Qizhe Zhang, Ming Lu, Zhibin Wang, Qiang Zhou, Jun Song, Shanghang Zhang

    Abstract: Video Large Language Models (VLLMs) excel in video understanding, but their excessive visual tokens pose a significant computational challenge for real-world applications. Current methods aim to enhance inference efficiency by visual token pruning. However, they do not consider the dynamic characteristics and temporal dependencies of video frames, as they perceive video understanding as a multi-fr… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

    Comments: 10 pages, 3 figures

  11. arXiv:2508.20594  [pdf, ps, other

    cs.CV

    UTA-Sign: Unsupervised Thermal Video Augmentation via Event-Assisted Traffic Signage Sketching

    Authors: Yuqi Han, Songqian Zhang, Weijian Su, Ke Li, Jiayu Yang, Jinli Suo, Qiang Zhang

    Abstract: The thermal camera excels at perceiving outdoor environments under low-light conditions, making it ideal for applications such as nighttime autonomous driving and unmanned navigation. However, thermal cameras encounter challenges when capturing signage from objects made of similar materials, which can pose safety risks for accurately understanding semantics in autonomous driving systems. In contra… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  12. arXiv:2508.20373  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Graph-R1: Unleashing LLM Reasoning with NP-Hard Graph Problems

    Authors: Yuyao Wang, Bowen Liu, Jianheng Tang, Nuo Chen, Yuhan Li, Qifan Zhang, Jia Li

    Abstract: Reasoning Large Language Models (RLLMs) have recently achieved remarkable progress on complex reasoning tasks, largely enabled by their long chain-of-thought (Long CoT) capabilities. However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplo… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  13. arXiv:2508.20133  [pdf, ps, other

    cs.CY

    Proactive HIV Care: AI-Based Comorbidity Prediction from Routine EHR Data

    Authors: Solomon Russom, Dimitrios Kollias, Qianni Zhang

    Abstract: People living with HIV face a high burden of comorbidities, yet early detection is often limited by symptom-driven screening. We evaluate the potential of AI to predict multiple comorbidities from routinely collected Electronic Health Records. Using data from 2,200 HIV-positive patients in South East London, comprising 30 laboratory markers and 7 demographic/social attributes, we compare demograph… ▽ More

    Submitted 29 August, 2025; v1 submitted 26 August, 2025; originally announced August 2025.

    Comments: accepted at ICCV 2025

  14. arXiv:2508.19957  [pdf, ps, other

    cs.CE

    Multi-field decomposed hyper-reduced order modeling of damage-plasticity simulations

    Authors: Jannick Kehls, Stephan Ritzert, Lars Breuer, Qinghua Zhang, Stefanie Reese, Tim Brepols

    Abstract: This paper presents a multi-field decomposed approach for hyper-reduced order modeling to overcome the limitations of traditional model reduction techniques for gradient-extended damage-plasticity simulations. The discrete empirical interpolation method (DEIM) and the energy-conserving sampling and weighting method (ECSW) are extended to account for the multi-field nature of the problem. Both meth… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  15. arXiv:2508.19855  [pdf, ps, other

    cs.IR

    Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

    Authors: Junnan Dong, Siyu An, Yifei Yu, Qian-Wen Zhang, Linhao Luo, Xiao Huang, Yunsheng Wu, Di Yin, Xing Sun

    Abstract: Graph retrieval-augmented generation (GraphRAG) has effectively enhanced large language models in complex reasoning by organizing fragmented knowledge into explicitly structured graphs. Prior efforts have been made to improve either graph construction or graph retrieval in isolation, yielding suboptimal performance, especially when domain shifts occur. In this paper, we propose a vertically unifie… ▽ More

    Submitted 2 September, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

    Comments: 19 pages, 7 figures, 6 tables

  16. arXiv:2508.18616  [pdf, ps, other

    cs.DB

    Optimal $(α,β)$-Dense Subgraph Search in Bipartite Graphs

    Authors: Yalong Zhang, Rong-Hua Li, Qi Zhang, Guoren Wang

    Abstract: Dense subgraph search in bipartite graphs is a fundamental problem in graph analysis, with wide-ranging applications in fraud detection, recommendation systems, and social network analysis. The recently proposed $(α, β)$-dense subgraph model has demonstrated superior capability in capturing the intrinsic density structure of bipartite graphs compared to existing alternatives. However, despite its… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  17. Multi-Resolution Codebook Design and Multiuser Interference Management for Discrete XL-RIS-Aided Near-Field MIMO Systems

    Authors: Qian Zhang, Zheng Dong, Zheng Dong, Yao Ge, Yong Liang Guan, Ju Liu, Chau Yuen

    Abstract: Extremely large-scale reconfigurable intelligent surface (XL-RIS) can effectively overcome severe fading and provide higher communication performance. However, current research on XL-RIS overlooks the discrete phase-shift characteristics of RIS in practical systems, which will result in significant performance degradation.In this paper, we investigate near-field communication schemes assisted by X… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Journal ref: IEEE Transactions on Wireless Communications, 2025

  18. arXiv:2508.18506  [pdf, ps, other

    cs.CV

    DoGFlow: Self-Supervised LiDAR Scene Flow via Cross-Modal Doppler Guidance

    Authors: Ajinkya Khoche, Qingwen Zhang, Yixi Cai, Sina Sharif Mansouri, Patric Jensfelt

    Abstract: Accurate 3D scene flow estimation is critical for autonomous systems to navigate dynamic environments safely, but creating the necessary large-scale, manually annotated datasets remains a significant bottleneck for developing robust perception models. Current self-supervised methods struggle to match the performance of fully supervised approaches, especially in challenging long-range and adverse w… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: Under Review

  19. arXiv:2508.18486  [pdf, ps, other

    physics.ao-ph cs.LG

    Huracan: A skillful end-to-end data-driven system for ensemble data assimilation and weather prediction

    Authors: Zekun Ni, Jonathan Weyn, Hang Zhang, Yanfei Xiang, Jiang Bian, Weixin Jin, Kit Thambiratnam, Qi Zhang, Haiyu Dong, Hongyu Sun

    Abstract: Over the past few years, machine learning-based data-driven weather prediction has been transforming operational weather forecasting by providing more accurate forecasts while using a mere fraction of computing power compared to traditional numerical weather prediction (NWP). However, those models still rely on initial conditions from NWP, putting an upper limit on their forecast abilities. A few… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  20. arXiv:2508.18040  [pdf, ps, other

    cs.AI

    PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

    Authors: Xin Wang, Zhiyao Cui, Hao Li, Ya Zeng, Chenxu Wang, Ruiqi Song, Yihang Chen, Kun Shao, Qiaosheng Zhang, Jinzhuo Liu, Siyue Ren, Shuyue Hu, Zhen Wang

    Abstract: Vision language model (VLM)-based mobile agents show great potential for assisting users in performing instruction-driven tasks. However, these agents typically struggle with personalized instructions -- those containing ambiguous, user-specific context -- a challenge that has been largely overlooked in previous research. In this paper, we define personalized instructions and introduce PerInstruct… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  21. arXiv:2508.17972  [pdf, ps, other

    cs.CV

    SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization

    Authors: Junyuan Deng, Heng Li, Tao Xie, Weiqiang Ren, Qian Zhang, Ping Tan, Xiaoyang Guo

    Abstract: Scene regression methods, such as VGGT, solve the Structure-from-Motion (SfM) problem by directly regressing camera poses and 3D scene structures from input images. They demonstrate impressive performance in handling images under extreme viewpoint changes. However, these methods struggle to handle a large number of input images. To address this problem, we introduce SAIL-Recon, a feed-forward Tran… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  22. arXiv:2508.17771  [pdf, ps, other

    cs.CL

    Speculating LLMs' Chinese Training Data Pollution from Their Tokens

    Authors: Qingjie Zhang, Di Wang, Haoting Qian, Liu Yan, Tianwei Zhang, Ke Xu, Qi Li, Minlie Huang, Hewu Li, Han Qiu

    Abstract: Tokens are basic elements in the datasets for LLM training. It is well-known that many tokens representing Chinese phrases in the vocabulary of GPT (4o/4o-mini/o1/o3/4.5/4.1/o4-mini) are indicating contents like pornography or online gambling. Based on this observation, our goal is to locate Polluted Chinese (PoC) tokens in LLMs and study the relationship between PoC tokens' existence and training… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  23. arXiv:2508.17556  [pdf, ps, other

    cs.DB

    SEFRQO: A Self-Evolving Fine-Tuned RAG-Based Query Optimizer

    Authors: Hanwen Liu, Qihan Zhang, Ryan Marcus, Ibrahim Sabek

    Abstract: Query optimization is a crucial problem in database systems that has been studied for decades. Learned query optimizers (LQOs) can improve performance over time by incorporating feedback; however, they suffer from cold-start issues and often require retraining when workloads shift or schemas change. Recent LLM-based query optimizers leverage pre-trained and fine-tuned LLMs to mitigate these challe… ▽ More

    Submitted 24 August, 2025; originally announced August 2025.

    Comments: To appear at SIGMOD 2026 (https://2026.sigmod.org/)

  24. arXiv:2508.17436  [pdf, ps, other

    cs.CV

    Disentangled Geometry and Appearance for Efficient Multi-View Surface Reconstruction and Rendering

    Authors: Qitong Zhang, Jieqing Feng

    Abstract: This paper addresses the limitations of neural rendering-based multi-view surface reconstruction methods, which require an additional mesh extraction step that is inconvenient and would produce poor-quality surfaces with mesh aliasing, restricting downstream applications. Building on the explicit mesh representation and differentiable rasterization framework, this work proposes an efficient soluti… ▽ More

    Submitted 24 August, 2025; originally announced August 2025.

  25. arXiv:2508.17434  [pdf, ps, other

    cs.CV

    TinySR: Pruning Diffusion for Real-World Image Super-Resolution

    Authors: Linwei Dong, Qingnan Fan, Yuhang Yu, Qi Zhang, Jinwei Chen, Yawei Luo, Changqing Zou

    Abstract: Real-world image super-resolution (Real-ISR) focuses on recovering high-quality images from low-resolution inputs that suffer from complex degradations like noise, blur, and compression. Recently, diffusion models (DMs) have shown great potential in this area by leveraging strong generative priors to restore fine details. However, their iterative denoising process incurs high computational overhea… ▽ More

    Submitted 24 August, 2025; originally announced August 2025.

  26. arXiv:2508.17213  [pdf, ps, other

    cs.CV

    Multi-modal Knowledge Decomposition based Online Distillation for Biomarker Prediction in Breast Cancer Histopathology

    Authors: Qibin Zhang, Xinyu Hao, Qiao Chen, Rui Xu, Fengyu Cong, Cheng Lu, Hongming Xu

    Abstract: Immunohistochemical (IHC) biomarker prediction benefits from multi-modal data fusion analysis. However, the simultaneous acquisition of multi-modal data, such as genomic and pathological information, is often challenging due to cost or technical limitations. To address this challenge, we propose an online distillation approach based on Multi-modal Knowledge Decomposition (MKD) to enhance IHC bioma… ▽ More

    Submitted 24 August, 2025; originally announced August 2025.

    Comments: Accepted at MICCAI 2025

  27. arXiv:2508.17054  [pdf, ps, other

    cs.CV cs.RO

    DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method

    Authors: Qingwen Zhang, Xiaomeng Zhu, Yushan Zhang, Yixi Cai, Olov Andersson, Patric Jensfelt

    Abstract: Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($Δ$Flow), a lightweight… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

    Comments: 17 pages (9 main pages + 8 supp materail), 11 figures, code at https://github.com/Kin-Zhang/DeltaFlow

  28. arXiv:2508.16943  [pdf, ps, other

    cs.RO cs.AI

    HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

    Authors: Haozhuo Zhang, Jingkai Sun, Michele Caprio, Jian Tang, Shanghang Zhang, Qiang Zhang, Wei Pan

    Abstract: We introduce HumanoidVerse, a novel framework for vision-language guided humanoid control that enables a single physically simulated robot to perform long-horizon, multi-object rearrangement tasks across diverse scenes. Unlike prior methods that operate in fixed settings with single-object interactions, our approach supports consecutive manipulation of multiple objects, guided only by natural lang… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

    Comments: Project Page: https://haozhuo-zhang.github.io/HumanoidVerse-project-page/

  29. arXiv:2508.16917  [pdf, ps, other

    cs.CV

    Structural Energy-Guided Sampling for View-Consistent Text-to-3D

    Authors: Qing Zhang, Jinguang Tong, Jie Hong, Jing Zhang, Xuesong Li

    Abstract: Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enfor… ▽ More

    Submitted 23 August, 2025; originally announced August 2025.

  30. Ethical Considerations of Large Language Models in Game Playing

    Authors: Qingquan Zhang, Yuchen Li, Bo Yuan, Julian Togelius, Georgios N. Yannakakis, Jialin Liu

    Abstract: Large language models (LLMs) have demonstrated tremendous potential in game playing, while little attention has been paid to their ethical implications in those contexts. This work investigates and analyses the ethical considerations of applying LLMs in game playing, using Werewolf, also known as Mafia, as a case study. Gender bias, which affects game fairness and player experience, has been obser… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: 19 pages

    Journal ref: Frontiers of Computer Science (2025)

  31. arXiv:2508.15919  [pdf, ps, other

    cs.DC cs.AI

    HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling

    Authors: Zahra Yousefijamarani, Xinglu Wang, Qian Wang, Morgan Lindsay Heisler, Taha Shabani, Niloofar Gholipour, Parham Yassini, Hong Chang, Kan Chen, Qiantao Zhang, Xiaolong Bai, Jiannan Wang, Ying Xiong, Yong Zhang, Zhenan Fan

    Abstract: Modern large language model (LLM) serving systems face challenges from highly variable requests with diverse lengths, priorities, and stage-specific service-level objectives (SLOs). Meeting these requires real-time scheduling, rapid and cost-effective scaling, and support for both collocated and disaggregated Prefill/Decode (P/D) architectures. We present \textbf{HyperFlexis}, a unified LLM serv… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

  32. arXiv:2508.15763  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1: A Scientific Multimodal Foundation Model

    Authors: Lei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqing Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan , et al. (152 additional authors not shown)

    Abstract: In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared… ▽ More

    Submitted 24 August, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  33. arXiv:2508.15305  [pdf, ps, other

    cs.AI

    Coarse-to-Fine Grounded Memory for LLM Agent Planning

    Authors: Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu

    Abstract: Recent advancements in Large Language Models (LLMs) have driven growing interest in LLM-based agents for complex planning tasks. To avoid costly agent training, many studies adopted memory mechanism that enhances LLM with offline experiences or online trajectory analysis. However, existing works focus on single-granularity memory derived from dynamic environmental interactions, which are inherentl… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: Accepted to EMNLP 2025 Main Conference;27 pages,15 figures

  34. arXiv:2508.15214  [pdf, ps, other

    cs.CL

    Self-Guided Function Calling in Large Language Models via Stepwise Experience Recall

    Authors: Sijia Cui, Aiyao He, Shuai Xu, Hongming Zhang, Yanna Wang, Qingyang Zhang, Yajing Wang, Bo Xu

    Abstract: Function calling enables large language models (LLMs) to interact with external systems by leveraging tools and APIs. When faced with multi-step tool usage, LLMs still struggle with tool selection, parameter generation, and tool-chain planning. Existing methods typically rely on manually designing task-specific demonstrations, or retrieving from a curated library. These approaches demand substanti… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

    Comments: Accepted to EMNLP 2025

  35. arXiv:2508.14896  [pdf, ps, other

    cs.CL cs.AI

    Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

    Authors: Haokun Lin, Haobo Xu, Yichen Wu, Ziyu Guo, Renrui Zhang, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun

    Abstract: Recent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource demands. While post-training quantiz… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

    Comments: Technical Report, Work in Progress

  36. arXiv:2508.14848  [pdf, ps, other

    cs.DC

    Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

    Authors: Qiao Zhang, Rabab Alomairy, Dali Wang, Zhuowei Gu, Qinglei Cao

    Abstract: General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

  37. arXiv:2508.14415  [pdf, ps, other

    cs.AI

    The Agent Behavior: Model, Governance and Challenges in the AI Digital Age

    Authors: Qiang Zhang, Pei Yan, Yijia Xu, Chuanpo Fu, Yong Fang, Yang Liu

    Abstract: Advancements in AI have led to agents in networked environments increasingly mirroring human behavior, thereby blurring the boundary between artificial and human actors in specific contexts. This shift brings about significant challenges in trust, responsibility, ethics, security and etc. The difficulty in supervising of agent behaviors may lead to issues such as data contamination and unclear acc… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

  38. arXiv:2508.14313  [pdf, ps, other

    cs.LG cs.AI

    Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

    Authors: Can Jin, Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, Dimitris N. Metaxas

    Abstract: Test-time scaling (TTS) for large language models (LLMs) has thus far fallen into two largely separate paradigms: (1) reinforcement learning (RL) methods that optimize sparse outcome-based rewards, yet suffer from instability and low sample efficiency; and (2) search-based techniques guided by independently trained, static process reward models (PRMs), which require expensive human- or LLM-generat… ▽ More

    Submitted 22 August, 2025; v1 submitted 19 August, 2025; originally announced August 2025.

  39. arXiv:2508.14111  [pdf, ps, other

    cs.LG

    From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

    Authors: Jiaqi Wei, Yuejin Yang, Xiang Zhang, Yuhan Chen, Xiang Zhuang, Zhangyang Gao, Dongzhan Zhou, Guangshuai Wang, Zhiqiang Gao, Juntai Cao, Zijie Qiu, Xuming He, Qiang Zhang, Chenyu You, Shuangjia Zheng, Ning Ding, Wanli Ouyang, Nanqing Dong, Yu Cheng, Siqi Sun, Lei Bai, Bowen Zhou

    Abstract: Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position Agentic Science as a pivotal stage within the broader AI for Science paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research pl… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  40. arXiv:2508.14074  [pdf, ps, other

    cs.LG cs.AI

    GEPD:GAN-Enhanced Generalizable Model for EEG-Based Detection of Parkinson's Disease

    Authors: Qian Zhang, Ruilin Zhang, Biaokai Zhu, Xun Han, Jun Xiao, Yifan Liu, Zhe Wang

    Abstract: Electroencephalography has been established as an effective method for detecting Parkinson's disease, typically diagnosed early.Current Parkinson's disease detection methods have shown significant success within individual datasets, however, the variability in detection methods across different EEG datasets and the small size of each dataset pose challenges for training a generalizable model for c… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

    Comments: Accepted by International Conference on Intelligent Computing(ICIC 2025)

  41. arXiv:2508.14073  [pdf, ps, other

    cs.LG cs.AI

    MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets

    Authors: Qian Zhang, Ruilin Zhang, Jun Xiao, Yifan Liu, Zhe Wang

    Abstract: Electroencephalography has been validated as an effective technique for detecting Parkinson's disease,particularly in its early stages.However,the high cost of EEG data annotation often results in limited dataset size and considerable discrepancies across datasets,including differences in acquisition protocols and subject demographics,significantly hinder the robustness and generalizability of mod… ▽ More

    Submitted 21 August, 2025; v1 submitted 12 August, 2025; originally announced August 2025.

    Comments: Acccepted by European Conference on Artificial Intelligence(ECAI 2025)

  42. arXiv:2508.13979  [pdf, ps, other

    cs.LG

    AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics

    Authors: Yi Yang, Kei Ikemura, Qingwen Zhang, Xiaomeng Zhu, Ci Li, Nazre Batool, Sina Sharif Mansouri, John Folkesson

    Abstract: Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a dire… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: The first two authors hold equal contribution. 10 pages, 6 figures

  43. arXiv:2508.13515  [pdf, ps, other

    cs.CV

    2D Gaussians Meet Visual Tokenizer

    Authors: Yiang Shi, Xiaoyang Guo, Wei Yin, Mingkai Jia, Qian Zhang, Xiaolin Hu, Wenyu Liu, Xinggang Wang

    Abstract: The image tokenizer is a critical component in AR image generation, as it determines how rich and structured visual content is encoded into compact representations. Existing quantization-based tokenizers such as VQ-GAN primarily focus on appearance features like texture and color, often neglecting geometric structures due to their patch-based design. In this work, we explored how to incorporate mo… ▽ More

    Submitted 19 August, 2025; v1 submitted 19 August, 2025; originally announced August 2025.

  44. arXiv:2508.13156  [pdf, ps, other

    cs.AR cs.AI

    EvoVerilog: Large Langugage Model Assisted Evolution of Verilog Code

    Authors: Ping Guo, Yiting Wang, Wanghao Ye, Yexiao He, Ziyao Wang, Xiaopeng Dai, Ang Li, Qingfu Zhang

    Abstract: Large Language Models (LLMs) have demonstrated great potential in automating the generation of Verilog hardware description language code for hardware design. This automation is critical to reducing human effort in the complex and error-prone process of hardware design. However, existing approaches predominantly rely on human intervention and fine-tuning using curated datasets, limiting their sc… ▽ More

    Submitted 26 June, 2025; originally announced August 2025.

  45. arXiv:2508.12433  [pdf, ps, other

    cs.AR

    ATLAS: A Self-Supervised and Cross-Stage Netlist Power Model for Fine-Grained Time-Based Layout Power Analysis

    Authors: Wenkai Li, Yao Lu, Wenji Fang, Jing Wang, Qijun Zhang, Zhiyao Xie

    Abstract: Accurate power prediction in VLSI design is crucial for effective power optimization, especially as designs get transformed from gate-level netlist to layout stages. However, traditional accurate power simulation requires time-consuming back-end processing and simulation steps, which significantly impede design optimization. To address this, we propose ATLAS, which can predict the ultimate time-ba… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: Accepted by Design Automation Conference (DAC), 2025

  46. arXiv:2508.12294  [pdf, ps, other

    cs.AR

    AutoPower: Automated Few-Shot Architecture-Level Power Modeling by Power Group Decoupling

    Authors: Qijun Zhang, Yao Lu, Mengming Li, Zhiyao Xie

    Abstract: Power efficiency is a critical design objective in modern CPU design. Architects need a fast yet accurate architecture-level power evaluation tool to perform early-stage power estimation. However, traditional analytical architecture-level power models are inaccurate. The recently proposed machine learning (ML)-based architecture-level power model requires sufficient data from known configurations… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: Published in DAC'25

  47. arXiv:2508.12250  [pdf, ps, other

    cs.CV

    WXSOD: A Benchmark for Robust Salient Object Detection in Adverse Weather Conditions

    Authors: Quan Chen, Xiong Yang, Rongfeng Lu, Qianyu Zhang, Yu Liu, Xiaofei Zhou, Bolun Zheng

    Abstract: Salient object detection (SOD) in complex environments remains a challenging research topic. Most existing methods perform well in natural scenes with negligible noise, and tend to leverage multi-modal information (e.g., depth and infrared) to enhance accuracy. However, few studies are concerned with the damage of weather noise on SOD performance due to the lack of dataset with pixel-wise annotati… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: Under review

  48. arXiv:2508.11442  [pdf, ps, other

    cs.CL

    CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

    Authors: Bowen Zhang, Zixin Song, Chunquan Chen, Qian-Wen Zhang, Di Yin, Xing Sun

    Abstract: Learning unified text embeddings that excel across diverse downstream tasks is a central goal in representation learning, yet negative transfer remains a persistent obstacle. This challenge is particularly pronounced when jointly training a single encoder for Information Retrieval (IR) and Semantic Textual Similarity (STS), two essential but fundamentally disparate tasks for which naive co-trainin… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  49. arXiv:2508.11289  [pdf, ps, other

    cs.RO

    A Recursive Total Least Squares Solution for Bearing-Only Target Motion Analysis and Circumnavigation

    Authors: Lin Li, Xueming Liu, Zhoujingzi Qiu, Tianjiang Hu, Qingrui Zhang

    Abstract: Bearing-only Target Motion Analysis (TMA) is a promising technique for passive tracking in various applications as a bearing angle is easy to measure. Despite its advantages, bearing-only TMA is challenging due to the nonlinearity of the bearing measurement model and the lack of range information, which impairs observability and estimator convergence. This paper addresses these issues by proposing… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

    Comments: Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6 Pages

  50. arXiv:2508.11279  [pdf, ps, other

    cs.LG cs.CV

    Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble

    Authors: Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng

    Abstract: Spiking Neural Networks (SNNs) offer a promising direction for energy-efficient and brain-inspired computing, yet their vulnerability to adversarial perturbations remains poorly understood. In this work, we revisit the adversarial robustness of SNNs through the lens of temporal ensembling, treating the network as a collection of evolving sub-networks across discrete timesteps. This formulation unc… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.