Skip to main content

Showing 1–50 of 668 results for author: Xiao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.24460  [pdf, ps, other

    cs.AI

    ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling

    Authors: Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Likang Xiao, Yanwei Ren, Quan Chen, Xianglong Liu

    Abstract: Process reward models (PRMs) have demonstrated significant efficacy in enhancing the mathematical reasoning capabilities of large language models (LLMs) by leveraging test-time scaling (TTS). However, while most PRMs exhibit substantial gains in mathematical domains, the scarcity of domain-specific training data and knowledge-based learning patterns limits their generalization ability when faced w… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  2. arXiv:2509.24257  [pdf, ps, other

    cs.CR cs.LG

    VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

    Authors: Ke Wang, Felix Qu, Libin Xia, Zishuo Zhao, Chris Tong, Lynn Ai, Eric Yang

    Abstract: Decentralized inference is an appealing paradigm for serving large language models (LLMs), offering strong security, high efficiency, and lower operating costs. Yet the permissionless setting admits no a priori trust in participating nodes, making output verifiability a prerequisite for secure deployment. We present VeriLLM, a publicly verifiable protocol for decentralized LLM inference that (i) a… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 13 pages, 4 figures, 2 tables

    ACM Class: C.2.1

  3. arXiv:2509.23938  [pdf, ps, other

    cs.CL cs.AI

    Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems

    Authors: Guojian Li, Chengyou Wang, Hongfei Xue, Shuiyuan Wang, Dehui Gao, Zihan Zhang, Yuke Lin, Wenjie Li, Longshuai Xiao, Zhonghua Fu, Lei Xie

    Abstract: Full-duplex interaction is crucial for natural human-machine communication, yet remains challenging as it requires robust turn-taking detection to decide when the system should speak, listen, or remain silent. Existing solutions either rely on dedicated turn-taking models, most of which are not open-sourced. The few available ones are limited by their large parameter size or by supporting only a s… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  4. arXiv:2509.23646  [pdf, ps, other

    cs.CV

    Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures

    Authors: Lu Xiao, Jiale Zhang, Yang Liu, Taicheng Huang, Xin Tian

    Abstract: The creation of high-fidelity 3D assets is often hindered by a 'pixel-level pain point': the loss of high-frequency details. Existing methods often trade off one aspect for another: either sacrificing cross-view consistency, resulting in torn or drifting textures, or remaining trapped by the resolution ceiling of explicit voxels, forfeiting fine texture detail. In this work, we propose Sparse-Up,… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  5. arXiv:2509.23344  [pdf, ps, other

    cs.CV cs.AI

    DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

    Authors: Zijie Meng, Jin Hao, Xiwei Dai, Yang Feng, Jiaxiang Liu, Bin Feng, Huikai Wu, Xiaotang Gai, Hengchuan Zhu, Tianxiang Hu, Yangyang Wu, Hongxia Xu, Jin Li, Jun Xiao, Xiaoqiang Liu, Joey Tianyi Zhou, Fudong Zhu, Zhihe Zhao, Lunguo Xia, Bing Fang, Jimeng Sun, Jian Wu, Zuozhu Liu

    Abstract: Diagnosing and managing oral diseases necessitate advanced visual interpretation across diverse imaging modalities and integrated information synthesis. While current AI models excel at isolated tasks, they often fall short in addressing the complex, multimodal requirements of comprehensive clinical dental practice. Here we introduce DentVLM, a multimodal vision-language model engineered for exper… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  6. arXiv:2509.23299  [pdf, ps, other

    cs.SD eess.AS

    MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow

    Authors: Yike Zhu, Boyi Kang, Ziqian Wang, Xingchen Li, Zihan Zhang, Wenjie Li, Longshuai Xiao, Wei Xue, Lei Xie

    Abstract: Speech enhancement (SE) recovers clean speech from noisy signals and is vital for applications such as telecommunications and automatic speech recognition (ASR). While generative approaches achieve strong perceptual quality, they often rely on multi-step sampling (diffusion/flow-matching) or large language models, limiting real-time deployment. To mitigate these constraints, we present MeanFlowSE,… ▽ More

    Submitted 30 September, 2025; v1 submitted 27 September, 2025; originally announced September 2025.

    Comments: Submitted to ICASSP 2026

  7. arXiv:2509.22082  [pdf, ps, other

    cs.LG cs.CR

    Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning

    Authors: Li Xia, Zheng Liu, Sili Huang, Wei Tang, Xuan Liu

    Abstract: Federated Learning (FL) preserves privacy by keeping raw data local, yet Gradient Inversion Attacks (GIAs) pose significant threats. In FedAVG multi-step scenarios, attackers observe only aggregated gradients, making data reconstruction challenging. Existing surrogate model methods like SME assume linear parameter trajectories, but we demonstrate this severely underestimates SGD's nonlinear comple… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    ACM Class: K.6.5

  8. arXiv:2509.21322  [pdf, ps, other

    cs.LG math.PR stat.AP

    Discovering and Analyzing Stochastic Processes to Reduce Waste in Food Retail

    Authors: Anna Kalenkova, Lu Xia, Dirk Neumann

    Abstract: This paper proposes a novel method for analyzing food retail processes with a focus on reducing food waste. The approach integrates object-centric process mining (OCPM) with stochastic process discovery and analysis. First, a stochastic process in the form of a continuous-time Markov chain is discovered from grocery store sales data. This model is then extended with supply activities. Finally, a w… ▽ More

    Submitted 16 August, 2025; originally announced September 2025.

  9. arXiv:2509.18613  [pdf, ps, other

    cs.CV

    MLF-4DRCNet: Multi-Level Fusion with 4D Radar and Camera for 3D Object Detection in Autonomous Driving

    Authors: Yuzhi Wu, Li Xiao, Jun Liu, Guangfeng Jiang, XiangGen Xia

    Abstract: The emerging 4D millimeter-wave radar, measuring the range, azimuth, elevation, and Doppler velocity of objects, is recognized for its cost-effectiveness and robustness in autonomous driving. Nevertheless, its point clouds exhibit significant sparsity and noise, restricting its standalone application in 3D object detection. Recent 4D radar-camera fusion methods have provided effective perception.… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  10. arXiv:2509.15437  [pdf, ps, other

    cs.SD cs.AI cs.CR eess.AS

    Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

    Authors: Daniyal Kabir Dar, Qiben Yan, Li Xiao, Arun Ross

    Abstract: Adversarial perturbations in speech pose a serious threat to automatic speech recognition (ASR) and speaker verification by introducing subtle waveform modifications that remain imperceptible to humans but can significantly alter system outputs. While targeted attacks on end-to-end ASR models have been widely studied, the phonetic basis of these perturbations and their effect on speaker identity r… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: Additional figures for extended visualization: https://daniyalkabir.github.io/icassp-2025-results/

    ACM Class: I.2.0; I.2.7; I.5.4; K.6.5

  11. arXiv:2509.13785  [pdf, ps, other

    eess.AS cs.SD

    Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods

    Authors: Bingshen Mu, Pengcheng Guo, Zhaokai Sun, Shuai Wang, Hexin Liu, Mingchen Shao, Lei Xie, Eng Siong Chng, Longshuai Xiao, Qiangze Feng, Daliang Wang

    Abstract: This paper summarizes the Interspeech2025 Multilingual Conversational Speech Language Model (MLC-SLM) challenge, which aims to advance the exploration of building effective multilingual conversational speech LLMs (SLLMs). We provide a detailed description of the task settings for the MLC-SLM challenge, the released real-world multilingual conversational speech dataset totaling approximately 1,604… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  12. arXiv:2509.10240  [pdf, ps, other

    cs.IT

    Cooperative Base Station Assignment and Resource Allocation for 6G ISAC Network

    Authors: Jiajia Liao, Luping Xiang, Shida Zhong, Lixia Xiao, Haochen Liu, Kun Yang

    Abstract: In the upcoming 6G networks, integrated sensing and communications (ISAC) will be able to provide a performance boost in both perception and wireless connectivity. This paper considers a multiple base station (BS) architecture to support the comprehensive services of data transmission and multi-target sensing. In this context, a cooperative BS assignment and resource allocation (CBARA) strategy is… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: 13 pages, 10 figures

  13. arXiv:2509.08739  [pdf, ps, other

    math.OC cs.LG stat.ML

    Bregman Douglas-Rachford Splitting Method

    Authors: Shiqian Ma, Lin Xiao, Renbo Zhao

    Abstract: In this paper, we propose the Bregman Douglas-Rachford splitting (BDRS) method and its variant Bregman Peaceman-Rachford splitting method for solving maximal monotone inclusion problem. We show that BDRS is equivalent to a Bregman alternating direction method of multipliers (ADMM) when applied to the dual of the problem. A special case of the Bregman ADMM is an alternating direction version of the… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

  14. Understanding the Video Content Creation Journey of Creators with Sensory Impairment in Kenya

    Authors: Lan Xiao, Maryam Bandukda, Franklin Mingzhe Li, Mark Colley, Catherine Holloway

    Abstract: Video content creation offers vital opportunities for expression and participation, yet remains largely inaccessible to creators with sensory impairments, especially in low-resource settings. We conducted interviews with 20 video creators with visual and hearing impairments in Kenya to examine their tools, challenges, and collaborative practices. Our findings show that accessibility barriers and i… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  15. arXiv:2509.00793  [pdf, ps, other

    cs.AI

    Sharpe Ratio Optimization in Markov Decision Processes

    Authors: Shuai Ma, Guangwu Liu, Li Xia

    Abstract: Sharpe ratio (also known as reward-to-variability ratio) is a widely-used metric in finance, which measures the additional return at the cost of per unit of increased risk (standard deviation of return). However, the optimization of Sharpe ratio in Markov decision processes (MDPs) is challenging, because there exist two difficulties hindering the application of dynamic programming. One is that dyn… ▽ More

    Submitted 31 August, 2025; originally announced September 2025.

  16. arXiv:2508.21476  [pdf, ps, other

    cs.CL cs.AI

    Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

    Authors: Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin

    Abstract: Large Language Models (LLMs) have demonstrated remarkable creative writing capabilities, yet their substantial computational demands hinder widespread use. Enhancing Small Language Models (SLMs) offers a promising alternative, but current methods like Supervised Fine-Tuning (SFT) struggle with novelty, and Reinforcement Learning from Human Feedback (RLHF) is costly. This paper explores two distinc… ▽ More

    Submitted 29 August, 2025; originally announced August 2025.

    Comments: EMNLP 2025 Main

  17. arXiv:2508.19449  [pdf, ps, other

    cs.SE cs.LG

    Stack Trace-Based Crash Deduplication with Transformer Adaptation

    Authors: Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

    Abstract: Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods, relying on string similarity, rule-based heuristics, or deep learning (DL) models, often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transf… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: This work is currently under review at IEEE Transactions on Software Engineering. The replication package will be made publicly available upon acceptance

  18. arXiv:2508.16860  [pdf, ps, other

    cs.SE cs.AI cs.LG

    TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings

    Authors: Md Afif Al Mamun, Gias Uddin, Lan Xia, Longyu Zhang

    Abstract: Pretrained Language Models or PLMs are transformer-based architectures that can be used in bug triaging tasks. PLMs can better capture token semantics than traditional Machine Learning (ML) models that rely on statistical features (e.g., TF-IDF, bag of words). However, PLMs may still attend to less relevant tokens in a bug report, which can impact their effectiveness. In addition, the model can be… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

    Comments: This work is currently under review at IEEE Transactions on Software Engineering. The replication package will be made publicly available upon acceptance

  19. arXiv:2508.16647  [pdf, ps, other

    cs.LG

    AdapSNE: Adaptive Fireworks-Optimized and Entropy-Guided Dataset Sampling for Edge DNN Training

    Authors: Boran Zhao, Hetian Liu, Zihang Yuan, Li Zhu, Fan Yang, Lina Xie Tian Xia, Wenzhe Zhao, Pengju Ren

    Abstract: Training deep neural networks (DNNs) directly on edge devices has attracted increasing attention, as it offers promising solutions to challenges such as domain adaptation and privacy preservation. However, conventional DNN training typically requires large-scale datasets, which imposes prohibitive overhead on edge devices-particularly for emerging large language model (LLM) tasks. To address this… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

  20. arXiv:2508.16069  [pdf, ps, other

    cs.CV

    A Unified Voxel Diffusion Module for Point Cloud 3D Object Detection

    Authors: Qifeng Liu, Dawei Zhao, Yabo Dong, Linzhi Shang, Liang Xiao, Juan Wang, Kunkong Zhao, Dongming Lu, Qi Zhu

    Abstract: Recent advances in point cloud object detection have increasingly adopted Transformer-based and State Space Models (SSMs), demonstrating strong performance. However, voxelbased representations in these models require strict consistency in input and output dimensions due to their serialized processing, which limits the spatial diffusion capability typically offered by convolutional operations. This… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: submit to AAAI2026

  21. arXiv:2508.15763  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1: A Scientific Multimodal Foundation Model

    Authors: Lei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqing Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan , et al. (152 additional authors not shown)

    Abstract: In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared… ▽ More

    Submitted 24 August, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  22. arXiv:2508.14948  [pdf, ps, other

    cs.LG

    Large Foundation Model for Ads Recommendation

    Authors: Shangyu Zhang, Shijie Quan, Zhongren Wang, Junwei Pan, Tianqu Zhuang, Bo Fu, Yilong Sun, Jieying Lin, Jushuo Chen, Xiaotian Li, Zhixiang Feng, Xian Hu, Huiting Deng, Hua Lu, Jinpeng Wang, Boqi Dai, Xiaoyu Chen, Bin Hu, Lili Huang, Yanwen Wu, Yeshou Cai, Qi Zhou, Huang Tang, Chunfeng Yang, Chengguo Yin , et al. (8 additional authors not shown)

    Abstract: Online advertising relies on accurate recommendation models, with recent advances using pre-trained large-scale foundation models (LFMs) to capture users' general interests across multiple scenarios and tasks. However, existing methods have critical limitations: they extract and transfer only user representations (URs), ignoring valuable item representations (IRs) and user-item cross representatio… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

  23. arXiv:2508.14765  [pdf, ps, other

    cs.LG cs.AI

    PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

    Authors: Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang

    Abstract: Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

  24. arXiv:2508.11728  [pdf, ps, other

    cs.CV cs.AI

    UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

    Authors: Chunxia Ren, Ning Zhu, Yue Lai, Gui Chen, Ruijie Wang, Yangyi Hu, Suyao Liu, Shuwen Mao, Hong Su, Yu Zhang, Li Xiao

    Abstract: Dentocraniofacial hard tissue defects profoundly affect patients' physiological functions, facial aesthetics, and psychological well-being, posing significant challenges for precise reconstruction. Current deep learning models are limited to single-tissue scenarios and modality-specific imaging inputs, resulting in poor generalizability and trade-offs between anatomical fidelity, computational eff… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

    Comments: 23 pages, 6 figures

  25. arXiv:2508.11112  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Quantization through Piecewise-Affine Regularization: Optimization and Statistical Guarantees

    Authors: Jianhao Ma, Lin Xiao

    Abstract: Optimization problems over discrete or quantized variables are very challenging in general due to the combinatorial nature of their search space. Piecewise-affine regularization (PAR) provides a flexible modeling and computational framework for quantization based on continuous optimization. In this work, we focus on the setting of supervised learning and investigate the theoretical foundations of… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  26. arXiv:2508.11085  [pdf, ps, other

    cs.AI cs.LG

    A learning-driven automatic planning framework for proton PBS treatments of H&N cancers

    Authors: Qingqing Wang, Liqiang Xiao, Chang Chang

    Abstract: Proton pencil beam scanning (PBS) treatment planning for head & neck (H&N) cancers involves numerous conflicting objectives, requiring iterative objective parameter adjustments to balance multiple clinical goals. We propose a learning-driven inverse optimizer and integrate it into a proximal policy optimization (PPO)-based planning framework to automatically generate high-quality plans for patient… ▽ More

    Submitted 15 September, 2025; v1 submitted 14 August, 2025; originally announced August 2025.

    Comments: 27 pages, 4 figures

  27. arXiv:2508.08891  [pdf, ps, other

    cs.CV

    Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos

    Authors: Chaoyi Wang, Yifan Yang, Jun Pei, Lijie Xia, Jianpo Liu, Xiaobing Yuan, Xinhan Di

    Abstract: Creating realistic, fully animatable whole-body avatars from a single portrait is challenging due to limitations in capturing subtle expressions, body movements, and dynamic backgrounds. Current evaluation datasets and metrics fall short in addressing these complexities. To bridge this gap, we introduce the Whole-Body Benchmark Dataset (WB-DH), an open-source, multi-modal benchmark designed for ev… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

    Comments: This paper has been accepted by ICCV 2025 Workshop MMFM4

  28. arXiv:2508.07863  [pdf, ps, other

    cs.CV cs.LG

    Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model

    Authors: Bin Cao, Sipeng Zheng, Ye Wang, Lujie Xia, Qianshan Wei, Qin Jin, Jing Liu, Zongqing Lu

    Abstract: Human motion generation has emerged as a critical technology with transformative potential for real-world applications. However, existing vision-language-motion models (VLMMs) face significant limitations that hinder their practical deployment. We identify controllability as a main bottleneck, manifesting in five key aspects: inadequate response to diverse human commands, limited pose initializati… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: 16 pages

  29. arXiv:2508.06755  [pdf, ps, other

    cs.CL cs.AI

    Many-Turn Jailbreaking

    Authors: Xianjun Yang, Liqiang Xiao, Shiyang Li, Faisal Ladhak, Hyokun Yun, Linda Ruth Petzold, Yi Xu, William Yang Wang

    Abstract: Current jailbreaking work on large language models (LLMs) aims to elicit unsafe outputs from given prompts. However, it only focuses on single-turn jailbreaking targeting one specific query. On the contrary, the advanced LLMs are designed to handle extremely long contexts and can thus conduct multi-turn conversations. So, we propose exploring multi-turn jailbreaking, in which the jailbroken LLMs a… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  30. arXiv:2508.04604  [pdf, ps, other

    cs.CL cs.AI cs.IR

    TURA: Tool-Augmented Unified Retrieval Agent for AI Search

    Authors: Zhejun Zhao, Yuehu Dong, Alley Liu, Lixue Zheng, Pingsheng Liu, Dongdong Shen, Long Xia, Jiashu Zhao, Dawei Yin

    Abstract: The advent of Large Language Models (LLMs) is transforming search engines into conversational AI search products, primarily using Retrieval-Augmented Generation (RAG) on web corpora. However, this paradigm has significant industrial limitations. Traditional RAG approaches struggle with real-time needs and structured queries that require accessing dynamically generated content like ticket availabil… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

  31. arXiv:2508.03686  [pdf, ps, other

    cs.CL cs.AI

    CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

    Authors: Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen

    Abstract: Answer verification is crucial not only for evaluating large language models (LLMs) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex rules or evaluation prom… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: Technical Report; 31 Pages

  32. arXiv:2508.03252  [pdf, ps, other

    cs.CV

    Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

    Authors: Wentao Qu, Guofeng Mei, Jing Wang, Yujiao Wu, Xiaoshui Huang, Liang Xiao

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have shown success in robust 3D object detection tasks. Existing methods often rely on the score matching from 3D boxes or pre-trained diffusion priors. However, they typically require multi-step iterations in inference, which limits efficiency. To address this, we propose a Robust single-stage fully Sparse 3D object Detection Network with a Detacha… ▽ More

    Submitted 27 August, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

  33. arXiv:2508.02003  [pdf, ps, other

    cs.CV

    Fast and Memory-efficient Non-line-of-sight Imaging with Quasi-Fresnel Transform

    Authors: Yijun Wei, Jianyu Wang, Leping Xiao, Zuoqiang Shi, Xing Fu, Lingyun Qiu

    Abstract: Non-line-of-sight (NLOS) imaging seeks to reconstruct hidden objects by analyzing reflections from intermediary surfaces. Existing methods typically model both the measurement data and the hidden scene in three dimensions, overlooking the inherently two-dimensional nature of most hidden objects. This oversight leads to high computational costs and substantial memory consumption, limiting practical… ▽ More

    Submitted 3 August, 2025; originally announced August 2025.

  34. arXiv:2507.23134  [pdf, ps, other

    cs.CV

    Details Matter for Indoor Open-vocabulary 3D Instance Segmentation

    Authors: Sanghun Jung, Jingjing Zheng, Ke Zhang, Nan Qiao, Albert Y. C. Chen, Lu Xia, Chi Liu, Yuyin Sun, Xiao Zeng, Hsiang-Wei Huang, Byron Boots, Min Sun, Cheng-Hao Kuo

    Abstract: Unlike closed-vocabulary 3D instance segmentation that is often trained end-to-end, open-vocabulary 3D instance segmentation (OV-3DIS) often leverages vision-language models (VLMs) to generate 3D instance proposals and classify them. While various concepts have been proposed from existing research, we observe that these individual concepts are not mutually exclusive but complementary. In this pape… ▽ More

    Submitted 30 July, 2025; originally announced July 2025.

    Comments: ICCV 2025

  35. arXiv:2507.22327  [pdf, ps, other

    math.OC cs.CE

    Mean-Variance Optimization and Algorithm for Finite-Horizon Markov Decision Processes

    Authors: Li Xia, Zhihui Yu

    Abstract: Multi-period mean-variance optimization is a long-standing problem, caused by the failure of dynamic programming principle. This paper studies the mean-variance optimization in a setting of finite-horizon discrete-time Markov decision processes (MDPs), where the objective is to maximize the combined metrics of mean and variance of the accumulated rewards at terminal stage. By introducing the conce… ▽ More

    Submitted 29 July, 2025; originally announced July 2025.

    Comments: 55 pages, 10+figures, a thorough study of mean-variance optimization for finite-horizon MDPs, also applicable to develop reinforcement learning algorithms

  36. arXiv:2507.15864  [pdf, ps, other

    cs.CL cs.LG

    Adversarial Demonstration Learning for Low-resource NER Using Dual Similarity

    Authors: Guowen Yuan, Tien-Hsuan Wu, Lianghao Xia, Ben Kao

    Abstract: We study the problem of named entity recognition (NER) based on demonstration learning in low-resource scenarios. We identify two issues in demonstration construction and model training. Firstly, existing methods for selecting demonstration examples primarily rely on semantic similarity; We show that feature similarity can provide significant performance improvement. Secondly, we show that the NER… ▽ More

    Submitted 13 July, 2025; originally announced July 2025.

  37. arXiv:2507.08963  [pdf, ps, other

    math.OC cs.LG stat.ML

    Stochastic Approximation with Block Coordinate Optimal Stepsizes

    Authors: Tao Jiang, Lin Xiao

    Abstract: We consider stochastic approximation with block-coordinate stepsizes and propose adaptive stepsize rules that aim to minimize the expected distance from the next iterate to an optimal point. These stepsize rules employ online estimates of the second moment of the search direction along each block coordinate. The popular Adam algorithm can be interpreted as a particular heuristic for such estimatio… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  38. arXiv:2507.08513  [pdf, ps, other

    cs.GR cs.CV

    Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation

    Authors: Liu He, Xiao Zeng, Yizhi Song, Albert Y. C. Chen, Lu Xia, Shashwat Verma, Sankalp Dayal, Min Sun, Cheng-Hao Kuo, Daniel Aliaga

    Abstract: Multimodal Large Language Models (MLLMs) struggle with accurately capturing camera-object relations, especially for object orientation, camera viewpoint, and camera shots. This stems from the fact that existing MLLMs are trained on images with limited diverse camera-object relations and corresponding textual descriptions. To address this, we propose a synthetic generation pipeline to create large-… ▽ More

    Submitted 23 July, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

  39. arXiv:2507.07095  [pdf, ps, other

    cs.CV

    Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

    Authors: Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang

    Abstract: Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training dat… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Project Page: https://vankouf.github.io/MotionMillion/

  40. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  41. arXiv:2507.05816  [pdf, ps, other

    cs.AI cs.CE cs.CL

    Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity

    Authors: Shuai Zhao, Yulin Zhang, Luwei Xiao, Xinyi Wu, Yanhao Jia, Zhongliang Guo, Xiaobao Wu, Cong-Duy Nguyen, Guoming Zhang, Anh Tuan Luu

    Abstract: Despite the remarkable progress of large language models (LLMs) across various domains, their capacity to predict retinopathy of prematurity (ROP) risk remains largely unexplored. To address this gap, we introduce a novel Chinese benchmark dataset, termed CROP, comprising 993 admission records annotated with low, medium, and high-risk labels. To systematically examine the predictive capabilities a… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  42. arXiv:2507.04263  [pdf, ps, other

    cs.RO

    SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

    Authors: Liwen Xiao, Zhiyu Pan, Zhicheng Wang, Zhiguo Cao, Wei Li

    Abstract: Accurate prediction of multi-agent future trajectories is crucial for autonomous driving systems to make safe and efficient decisions. Trajectory refinement has emerged as a key strategy to enhance prediction accuracy. However, existing refinement methods often overlook the topological relationships between trajectories, which are vital for improving prediction precision. Inspired by braid theory,… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  43. arXiv:2507.03868  [pdf, ps, other

    cs.AI cs.CE cs.CY cs.MM

    From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM

    Authors: Xinyi Wu, Yanhao Jia, Luwei Xiao, Shuai Zhao, Fengkuang Chiang, Erik Cambria

    Abstract: In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  44. arXiv:2507.02119  [pdf, ps, other

    cs.LG

    Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

    Authors: Shikai Qiu, Lechao Xiao, Andrew Gordon Wilson, Jeffrey Pennington, Atish Agarwala

    Abstract: What scaling limits govern neural network training dynamics when model size and training time grow in tandem? We show that despite the complex interactions between architecture, training algorithms, and data, compute-optimally trained models exhibit a remarkably precise universality. Specifically, loss curves from models of varying sizes collapse onto a single universal curve when training compute… ▽ More

    Submitted 7 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: ICML 25. Code available at https://github.com/shikaiqiu/supercollapse

  45. arXiv:2507.01040  [pdf, ps, other

    cs.LG cs.AI cs.NE cs.PF

    Fast Clifford Neural Layers

    Authors: Tianxiang Xia, Max Neuwinger, Lin Xiao

    Abstract: Clifford Neural Layers improve PDE modeling by introducing Clifford Algebra into neural networks. In this project we focus on optimizing the inference of 2/3D Clifford convolutional layers and multivector activation layers for one core CPU performance. Overall, by testing on a real network block involving Clifford convolutional layers and multivector activation layers, we observe that our implem… ▽ More

    Submitted 22 June, 2025; originally announced July 2025.

    Comments: 7 pages content-wise

  46. arXiv:2506.23643  [pdf, ps, other

    cs.IR

    Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation

    Authors: Yifan Wang, Weinan Gan, Longtao Xiao, Jieming Zhu, Heng Chang, Haozhao Wang, Rui Zhang, Zhenhua Dong, Ruiming Tang, Ruixuan Li

    Abstract: Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens, leveraging the standard autoregressive (AR) generation paradigm to make predictions. However, existing methods tend to overlook their intrinsic relationship, that is, the semantic usually provides some reasonable explainability "$\textbf{why}$" for the behavior "… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures

  47. arXiv:2506.22401  [pdf, ps, other

    cs.LG math.OC

    Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

    Authors: Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

    Abstract: Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  48. arXiv:2506.17627  [pdf, ps, other

    cs.SE

    CodeMorph: Mitigating Data Leakage in Large Language Model Assessment

    Authors: Hongzhou Rao, Yanjie Zhao, Wenjie Zhu, Ling Xiao, Meizhen Wang, Haoyu Wang

    Abstract: Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many training datasets make it difficult to prevent data leakage entirely, even with time lag strategies. Consequently, generating new datasets through code perturbation has become essential. However, existing… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: Accepted by ICSE 2025 (Industry Challenge Track)

  49. arXiv:2506.17188  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Towards AI Search Paradigm

    Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin

    Abstract: In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex m… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  50. arXiv:2506.15227  [pdf, ps, other

    cs.SE

    Large Language Models for Unit Testing: A Systematic Literature Review

    Authors: Quanjun Zhang, Chunrong Fang, Siqi Gu, Ye Shang, Zhenyu Chen, Liang Xiao

    Abstract: Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly redu… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.