Skip to main content

Showing 1–50 of 4,744 results for author: WU, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26473  [pdf, ps, other

    cs.AI

    STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

    Authors: Shaoxiong Guo, Tianyi Du, Lijun Li, Yuyao Wu, Jie Li, Jing Shao

    Abstract: Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  2. arXiv:2509.26391  [pdf, ps, other

    cs.CV

    MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

    Authors: Chenhui Zhu, Yilu Wu, Shuai Wang, Gangshan Wu, Limin Wang

    Abstract: Image-to-video generation has made remarkable progress with the advancements in diffusion models, yet generating videos with realistic motion remains highly challenging. This difficulty arises from the complexity of accurately modeling motion, which involves capturing physical constraints, object interactions, and domain-specific dynamics that are not easily generalized across diverse scenarios. T… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  3. arXiv:2509.25788  [pdf, ps, other

    cs.LG

    From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining

    Authors: Zhizhou Zhang, Youjia Wu, Kaixuan Zhang, Yanjia Wang

    Abstract: Industrial design evaluation often relies on high-fidelity simulations of governing partial differential equations (PDEs). While accurate, these simulations are computationally expensive, making dense exploration of design spaces impractical. Operator learning has emerged as a promising approach to accelerate PDE solution prediction; however, its effectiveness is often limited by the scarcity of l… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  4. arXiv:2509.25748  [pdf, ps, other

    cs.CV cs.AI

    Dolphin v1.0 Technical Report

    Authors: Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun zheng, Anjie Le, Hongcheng Guo

    Abstract: Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultras… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  5. arXiv:2509.25148  [pdf, ps, other

    cs.AI

    UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following

    Authors: FaQiang Qian, WeiKun Zhang, Ziliang Wang, Kang An, Xuhui Zheng, Liangjian Wen, Mengya Gao, Yong Dai, Yichao Wu

    Abstract: Shaping powerful LLMs to be beneficial and safe is central to AI alignment. We argue that post-training alignment is fundamentally a unified Preference Learning problem, involving two modalities: demonstrated preferences (e.g., Supervised Fine-Tuning, SFT) and comparative preferences (e.g., Reinforcement Learning, RL).The standard sequential pipeline-SFT followed by RL-is flawed due to a critical… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  6. arXiv:2509.25052  [pdf, ps, other

    cs.AI cs.LG

    Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

    Authors: Sai Wang, Yu Wu, Zhongwen Xu

    Abstract: The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a n… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  7. arXiv:2509.24773  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.CV cs.SD

    VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning

    Authors: Xin Cheng, Yuyue Wang, Xihua Wang, Yihan Wu, Kaisi Guan, Yijing Chen, Peng Zhang, Xiaojiang Liu, Meng Cao, Ruihua Song

    Abstract: Video-conditioned sound and speech generation, encompassing video-to-sound (V2S) and visual text-to-speech (VisualTTS) tasks, are conventionally addressed as separate tasks, with limited exploration to unify them within a signle framework. Recent attempts to unify V2S and VisualTTS face challenges in handling distinct condition types (e.g., heterogeneous video and transcript conditions) and requir… ▽ More

    Submitted 30 September, 2025; v1 submitted 29 September, 2025; originally announced September 2025.

    Comments: Paper Under Review

  8. arXiv:2509.24391  [pdf, ps, other

    cs.SD

    UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities

    Authors: Xuenan Xu, Jiahao Mei, Zihao Zheng, Ye Tao, Zeyu Xie, Yaoyun Zhang, Haohe Liu, Yuning Wu, Ming Yan, Wen Wu, Chao Zhang, Mengyue Wu

    Abstract: Audio generation, including speech, music and sound effects, has advanced rapidly in recent years. These tasks can be divided into two categories: time-aligned (TA) tasks, where each input unit corresponds to a specific segment of the output audio (e.g., phonemes aligned with frames in speech synthesis); and non-time-aligned (NTA) tasks, where such alignment is not available. Since modeling paradi… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: Project page: https://wsntxxn.github.io/uniflow_audio

  9. arXiv:2509.24364  [pdf, ps, other

    cs.SE

    United We Stand: Towards End-to-End Log-based Fault Diagnosis via Interactive Multi-Task Learning

    Authors: Minghua He, Chiming Duan, Pei Xiao, Tong Jia, Siyu Yu, Lingzhe Zhang, Weijie Hong, Jin Han, Yifan Wu, Ying Li, Gang Huang

    Abstract: Log-based fault diagnosis is essential for maintaining software system availability. However, existing fault diagnosis methods are built using a task-independent manner, which fails to bridge the gap between anomaly detection and root cause localization in terms of data form and diagnostic objectives, resulting in three major issues: 1) Diagnostic bias accumulates in the system; 2) System deployme… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: ASE 2025 (Research Track)

  10. arXiv:2509.24352  [pdf, ps, other

    cs.SE

    Walk the Talk: Is Your Log-based Software Reliability Maintenance System Really Reliable?

    Authors: Minghua He, Tong Jia, Chiming Duan, Pei Xiao, Lingzhe Zhang, Kangjin Wang, Yifan Wu, Ying Li, Gang Huang

    Abstract: Log-based software reliability maintenance systems are crucial for sustaining stable customer experience. However, existing deep learning-based methods represent a black box for service providers, making it impossible for providers to understand how these methods detect anomalies, thereby hindering trust and deployment in real production environments. To address this issue, this paper defines a tr… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: Accepted by ASE 2025 (NIER Track)

  11. arXiv:2509.24325  [pdf, ps, other

    eess.IV cs.CV cs.MM

    ReCon-GS: Continuum-Preserved Guassian Streaming for Fast and Compact Reconstruction of Dynamic Scenes

    Authors: Jiaye Fu, Qiankun Gao, Chengxiang Wen, Yanmin Wu, Siwei Ma, Jiaqi Zhang, Jian Zhang

    Abstract: Online free-viewpoint video (FVV) reconstruction is challenged by slow per-frame optimization, inconsistent motion estimation, and unsustainable storage demands. To address these challenges, we propose the Reconfigurable Continuum Gaussian Stream, dubbed ReCon-GS, a novel storage-aware framework that enables high fidelity online dynamic scene reconstruction and real-time rendering. Specifically, w… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  12. arXiv:2509.24297  [pdf, ps, other

    cs.CL cs.AI

    Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

    Authors: Junying Wang, Zicheng Zhang, Ye Shen, Yalun Wu, Yingji Liang, Yijin Guo, Farong Wen, Wenzhe Li, Xuezhi Zhao, Qi Jia, Guangtao Zhai

    Abstract: High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three parts: 1) Task Definition \& Evaluation Rubric: We develop a TQA-to-MMQA framework a… ▽ More

    Submitted 30 September, 2025; v1 submitted 29 September, 2025; originally announced September 2025.

    Comments: 25 pages

  13. arXiv:2509.24254  [pdf, ps, other

    q-fin.CP cs.CE cs.CL cs.LG

    Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns

    Authors: Yuntao Wu, Ege Mert Akin, Charles Martineau, Vincent Grégoire, Andreas Veneris

    Abstract: We examine how textual features in earnings press releases predict stock returns on earnings announcement days. Using over 138,000 press releases from 2005 to 2023, we compare traditional bag-of-words and BERT-based embeddings. We find that press release content (soft information) is as informative as earnings surprise (hard information), with FinBERT yielding the highest predictive power. Combini… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 9 pages, 4 figures, 6 tables, Accepted by The 6th ACM International Conference on AI in Finance

    ACM Class: J.4; I.2.7

  14. arXiv:2509.24215  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.MM

    Metamorphic Testing for Audio Content Moderation Software

    Authors: Wenxuan Wang, Yongjiang Wu, Junyuan Zhang, Shuqing Li, Yun Peng, Wenting Chen, Shuai Wang, Michael R. Lyu

    Abstract: The rapid growth of audio-centric platforms and applications such as WhatsApp and Twitter has transformed the way people communicate and share audio content in modern society. However, these platforms are increasingly misused to disseminate harmful audio content, such as hate speech, deceptive advertisements, and explicit material, which can have significant negative consequences (e.g., detrimenta… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: Accepted by ASE 2025

  15. arXiv:2509.24171  [pdf, ps, other

    cs.LG

    Model Correlation Detection via Random Selection Probing

    Authors: Ruibo Chen, Sheng Zhang, Yihan Wu, Tong Zheng, Peihua Mai, Heng Huang

    Abstract: The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  16. arXiv:2509.24148  [pdf, ps, other

    cs.SE cs.AI

    TENET: Leveraging Tests Beyond Validation for Code Generation

    Authors: Yiran Hu, Nan Jiang, Shanchao Liang, Yi Wu, Lin Tan

    Abstract: Test-Driven Development (TDD) is a widely adopted software engineering practice that requires developers to create and execute tests alongside code implementation, ensuring that software behavior is continuously validated and refined. In the era of vibe coding, where developers increasingly delegate code writing to large language models (LLMs) by specifying high-level intentions, TDD becomes even… ▽ More

    Submitted 30 September, 2025; v1 submitted 28 September, 2025; originally announced September 2025.

  17. arXiv:2509.24048  [pdf, ps, other

    cs.CR

    Analyzing and Evaluating Unbiased Language Model Watermark

    Authors: Yihan Wu, Xuehao Cui, Ruibo Chen, Heng Huang

    Abstract: Verifying the authenticity of AI-generated text has become increasingly important with the rapid advancement of large language models, and unbiased watermarking has emerged as a promising approach due to its ability to preserve output distribution without degrading quality. However, recent work reveals that unbiased watermarks can accumulate distributional bias over multiple generations and that e… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  18. arXiv:2509.24043  [pdf, ps, other

    cs.CR

    An Ensemble Framework for Unbiased Language Model Watermarking

    Authors: Yihan Wu, Ruibo Chen, Georgios Milis, Heng Huang

    Abstract: As large language models become increasingly capable and widely deployed, verifying the provenance of machine-generated content is critical to ensuring trust, safety, and accountability. Watermarking techniques have emerged as a promising solution by embedding imperceptible statistical signals into the generation process. Among them, unbiased watermarking is particularly attractive due to its theo… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  19. arXiv:2509.23967  [pdf, ps, other

    cs.CL

    HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs

    Authors: Ken Deng, Zizheng Zhan, Wen Xiang, Wenqiang Zhu, Tianhao Peng, Xinping Lei, Weihao Li, Jingxuan Xu, Kun Wu, Yifan Yao, Haoyang Huang, Huaixi Tang, Kepeng Lei, Zhiyi Lai, Songwei Yu, Zongxian Feng, Zuchen Gao, Weihao Xie, Chenchen Zhang, Yanan Wu, Yuanxing Zhang, Lecheng Huang, Yuqun Zhang, Jie Liu, Zhaoxiang Zhang , et al. (3 additional authors not shown)

    Abstract: Large Language Models (LLMs) increasingly rely on chain-of-thought (CoT) reasoning to improve accuracy on complex tasks. However, always generating lengthy reasoning traces is inefficient, leading to excessive token usage and higher inference costs. This paper introduces the Hybrid Policy Optimization (i.e., HiPO), a framework for adaptive reasoning control that enables LLMs to selectively decide… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  20. arXiv:2509.23951  [pdf, ps, other

    cs.CV

    HunyuanImage 3.0 Technical Report

    Authors: Siyu Cao, Hangting Chen, Peng Chen, Yiji Cheng, Yutao Cui, Xinchi Deng, Ying Dong, Kipper Gong, Tianpeng Gu, Xiusen Gu, Tiankai Hang, Duojun Huang, Jie Jiang, Zhengkai Jiang, Weijie Kong, Changlin Li, Donghao Li, Junzhe Li, Xin Li, Yang Li, Zhenxi Li, Zhimin Li, Jiaxin Lin, Linus, Lucaz Liu , et al. (49 additional authors not shown)

    Abstract: We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key components, including meticulous data curation, advanced architecture design, a native Chain-of-Thoughts schema, progressive model pre-training,… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  21. arXiv:2509.23866  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

    Authors: Pengxiang Li, Zechen Hu, Zirui Shang, Jingrong Wu, Yang Liu, Hui Liu, Zhi Gao, Chenrui Shi, Bofei Zhang, Zihao Zhang, Xiaochuan Shi, Zedong YU, Yuwei Wu, Xinxiao Wu, Yunde Jia, Liuyu Xiang, Zhaofeng He, Qing Li

    Abstract: Vision-language model (VLM) based GUI agents show promise for automating complex desktop and mobile tasks, but face significant challenges in applying reinforcement learning (RL): (1) slow multi-turn interactions with GUI environments for policy rollout, and (2) insufficient high-quality agent-environment interactions for policy learning. To address these challenges, we propose DART, a Decoupled A… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  22. arXiv:2509.23638  [pdf, ps, other

    cs.LG

    PreScope: Unleashing the Power of Prefetching for Resource-Constrained MoE Inference

    Authors: Enda Yu, Zhaoning Zhang, Dezun Dong, Yongwei Wu, Xiangke Liao

    Abstract: Mixture-of-Experts (MoE) models face memory and PCIe latency bottlenecks when deployed on commodity hardware. Offloading expert weights to CPU memory results in PCIe transfer latency that exceeds GPU computation by several folds. We present PreScope, a prediction-driven expert scheduling system that addresses three key challenges: inaccurate activation prediction, PCIe bandwidth competition, and c… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  23. arXiv:2509.23614  [pdf, ps, other

    cs.AI

    PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents

    Authors: Yaozu Wu, Jizhou Guo, Dongyuan Li, Henry Peng Zou, Wei-Chieh Huang, Yankai Chen, Zhen Wang, Weizhi Zhang, Yangning Li, Meng Zhang, Renhe Jiang, Philip S. Yu

    Abstract: Effective guardrails are essential for safely deploying LLM-based agents in critical applications. Despite recent advances, existing guardrails suffer from two fundamental limitations: (i) they apply uniform guardrail policies to all users, ignoring that the same agent behavior can harm some users while being safe for others; (ii) they check each response in isolation, missing how risks evolve and… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  24. arXiv:2509.23577  [pdf, ps, other

    cs.DB cs.AI cs.IR

    ML-Asset Management: Curation, Discovery, and Utilization

    Authors: Mengying Wang, Moming Duan, Yicong Huang, Chen Li, Bingsheng He, Yinghui Wu

    Abstract: Machine learning (ML) assets, such as models, datasets, and metadata, are central to modern ML workflows. Despite their explosive growth in practice, these assets are often underutilized due to fragmented documentation, siloed storage, inconsistent licensing, and lack of unified discovery mechanisms, making ML-asset management an urgent challenge. This tutorial offers a comprehensive overview of M… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: Tutorial, VLDB 2025. Project page: https://ml-assets-management.github.io/

    Journal ref: PVLDB, 18(12): 5493 - 5498, 2025

  25. arXiv:2509.23344  [pdf, ps, other

    cs.CV cs.AI

    DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

    Authors: Zijie Meng, Jin Hao, Xiwei Dai, Yang Feng, Jiaxiang Liu, Bin Feng, Huikai Wu, Xiaotang Gai, Hengchuan Zhu, Tianxiang Hu, Yangyang Wu, Hongxia Xu, Jin Li, Jun Xiao, Xiaoqiang Liu, Joey Tianyi Zhou, Fudong Zhu, Zhihe Zhao, Lunguo Xia, Bing Fang, Jimeng Sun, Jian Wu, Zuozhu Liu

    Abstract: Diagnosing and managing oral diseases necessitate advanced visual interpretation across diverse imaging modalities and integrated information synthesis. While current AI models excel at isolated tasks, they often fall short in addressing the complex, multimodal requirements of comprehensive clinical dental practice. Here we introduce DentVLM, a multimodal vision-language model engineered for exper… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  26. arXiv:2509.23336  [pdf, ps, other

    cs.GR cs.CV

    DiffTex: Differentiable Texturing for Architectural Proxy Models

    Authors: Weidan Xiong, Yongli Wu, Bochuan Zeng, Jianwei Guo, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Simplified proxy models are commonly used to represent architectural structures, reducing storage requirements and enabling real-time rendering. However, the geometric simplifications inherent in proxies result in a loss of fine color and geometric details, making it essential for textures to compensate for the loss. Preserving the rich texture information from the original dense architectural rec… ▽ More

    Submitted 30 September, 2025; v1 submitted 27 September, 2025; originally announced September 2025.

    Comments: ACM TOG and SIGGRAPH Asia 2025 (Patent Protected); Project page: https://vcc.tech/research/2025/DiffTex

  27. arXiv:2509.23242  [pdf, ps, other

    cs.CV

    TATTOO: Training-free AesTheTic-aware Outfit recOmmendation

    Authors: Yuntian Wu, Xiaonan Hu, Ziqi Zhou, Hao Lu

    Abstract: The global fashion e-commerce market relies significantly on intelligent and aesthetic-aware outfit-completion tools to promote sales. While previous studies have approached the problem of fashion outfit-completion and compatible-item retrieval, most of them require expensive, task-specific training on large-scale labeled data, and no effort is made to guide outfit recommendation with explicit hum… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 4 figures, 4 tables

  28. arXiv:2509.23071  [pdf, ps, other

    cs.CL cs.AI

    From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents

    Authors: Muzhi Li, Jinhu Qi, Yihong Wu, Minghao Zhao, Liheng Ma, Yifan Li, Xinyu Wang, Yingxue Zhang, Ho-fung Leung, Irwin King

    Abstract: Retrieval-augmented generation agents development is hindered by the lack of process-level supervision to effectively guide agentic capabilities like task decomposition, retriever invocation, and stepwise decision-making. While reinforcement learning offers a potential solution, it suffers from sparse rewards and the limited reasoning capabilities of large language models (LLMs). Meanwhile, existi… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  29. arXiv:2509.22984  [pdf, ps, other

    cs.AI cs.CL

    Not only a helper, but also a teacher: Interactive LLM Cascade

    Authors: Yu Wu, Shuo Wu, Ye Tao, Yansong Li, Anand D. Sarwate

    Abstract: Large Language Models (LLMs) vary widely in their capabilities, with larger models often having better performance but higher cost: choosing an LLM model often involves trading off performance and cost. The LLM Cascade is a paradigm that defers difficult queries from weak/cheap to strong/expensive models. This approach is nonadaptive: the deferral decision is trained offline. When confronted with… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 29 pages, 4 figures, under review

  30. arXiv:2509.22796  [pdf, ps, other

    cs.CR cs.LG

    What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs

    Authors: Xingyu Li, Juefei Pu, Yifan Wu, Xiaochen Zou, Shitong Zhu, Xiaochen Zou, Shitong Zhu, Qiushi Wu, Zheng Zhang, Joshua Hsu, Yue Dong, Zhiyun Qian, Kangjie Lu, Trent Jaeger, Michael De Lucia, Srikanth V. Krishnamurthy

    Abstract: Open-source software projects are foundational to modern software ecosystems, with the Linux kernel standing out as a critical exemplar due to its ubiquity and complexity. Although security patches are continuously integrated into the Linux mainline kernel, downstream maintainers often delay their adoption, creating windows of vulnerability. A key reason for this lag is the difficulty in identifyi… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  31. arXiv:2509.22761  [pdf, ps, other

    cs.CV cs.AI

    MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning

    Authors: Yapeng Mi, Hengli Li, Yanpeng Zhao, Chenxi Li, Huimin Wu, Xiaojian Ma, Song-Chun Zhu, Ying Nian Wu, Qing Li

    Abstract: Reasoning-augmented machine learning systems have shown improved performance in various domains, including image generation. However, existing reasoning-based methods for image generation either restrict reasoning to a single modality (image or text) or rely on high-quality reasoning data for fine-tuning. To tackle these limitations, we propose MILR, a test-time method that jointly reasons over im… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 21 pages,13 figures,7 tables

  32. Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM

    Authors: Xiao Chi, Wenlin Zhong, Yiquan Wu, Wei Wang, Kun Kuang, Fei Wu, Minghui Xiong

    Abstract: Legal Article Prediction (LAP) is a critical task in legal text classification, leveraging natural language processing (NLP) techniques to automatically predict relevant legal articles based on the fact descriptions of cases. As a foundational step in legal decision-making, LAP plays a pivotal role in determining subsequent judgments, such as charges and penalties. Despite its importance, existing… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 10 pages, 6 figures, Accepted to ICAIL 2025 (International Conference on Artificial Intelligence and Law)

  33. arXiv:2509.21887  [pdf, ps, other

    cs.CV cs.MM

    StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing

    Authors: Liyang Chen, Tianze Zhou, Xu He, Boshi Tang, Zhiyong Wu, Yang Huang, Yang Wu, Zhongqian Sun, Wei Yang, Helen Meng

    Abstract: The visual dubbing task aims to generate mouth movements synchronized with the driving audio, which has seen significant progress in recent years. However, two critical deficiencies hinder their wide application: (1) Audio-only driving paradigms inadequately capture speaker-specific lip habits, which fail to generate lip movements similar to the target avatar; (2) Conventional blind-inpainting app… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  34. arXiv:2509.21778  [pdf, ps, other

    cond-mat.mtrl-sci cs.AI

    Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction

    Authors: Bin Cao, Yang Liu, Longhan Zhang, Yifan Wu, Zhixun Li, Yuyu Luo, Hong Cheng, Yang Ren, Tong-Yi Zhang

    Abstract: Crystal property prediction, governed by quantum mechanical principles, is computationally prohibitive to solve exactly for large many-body systems using traditional density functional theory. While machine learning models have emerged as efficient approximations for large-scale applications, their performance is strongly influenced by the choice of atomic representation. Although modern graph-bas… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  35. arXiv:2509.21523  [pdf, ps, other

    cs.RO

    DroneFL: Federated Learning for Multi-UAV Visual Target Tracking

    Authors: Xiaofan Yu, Yuwei Wu, Katherine Mao, Ye Tian, Vijay Kumar, Tajana Rosing

    Abstract: Multi-robot target tracking is a fundamental problem that requires coordinated monitoring of dynamic entities in applications such as precision agriculture, environmental monitoring, disaster response, and security surveillance. While Federated Learning (FL) has the potential to enhance learning across multiple robots without centralized data aggregation, its use in multi-Unmanned Aerial Vehicle (… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  36. arXiv:2509.21237  [pdf, ps, other

    cs.CL cs.IR

    Query-Centric Graph Retrieval Augmented Generation

    Authors: Yaxiong Wu, Jianyuan Bo, Yongyue Zhang, Sheng Liang, Yong Liu

    Abstract: Graph-based retrieval-augmented generation (RAG) enriches large language models (LLMs) with external knowledge for long-context understanding and multi-hop reasoning, but existing methods face a granularity dilemma: fine-grained entity-level graphs incur high token costs and lose context, while coarse document-level graphs fail to capture nuanced relations. We introduce QCG-RAG, a query-centric gr… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 25 pages, 6 figures, 1 table

    ACM Class: I.2.7; H.3.3

  37. arXiv:2509.21212  [pdf, ps, other

    cs.CL cs.IR

    SGMem: Sentence Graph Memory for Long-Term Conversational Agents

    Authors: Yaxiong Wu, Yongyue Zhang, Sheng Liang, Yong Liu

    Abstract: Long-term conversational agents require effective memory management to handle dialogue histories that exceed the context window of large language models (LLMs). Existing methods based on fact extraction or summarization reduce redundancy but struggle to organize and retrieve relevant information across different granularities of dialogue and generated memory. We introduce SGMem (Sentence Graph Mem… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 19 pages, 6 figures, 1 table

    ACM Class: I.2.7; H.3.3

  38. arXiv:2509.20830  [pdf, ps, other

    cs.NI cs.AI

    Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions

    Authors: Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu

    Abstract: Semantic communication (SemCom) has the potential to significantly reduce communication delay in vehicle-to-everything (V2X) communications within vehicular networks (VNs). However, the deployment of vehicular SemCom networks (VN-SemComNets) faces critical trust challenges in information transmission, semantic encoding, and communication entity reliability. This paper proposes an innovative three-… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 8 pages, 8 figures, accepted by IEEE Vehicular Technology Magazine

  39. arXiv:2509.20696  [pdf, ps, other

    cs.RO

    RuN: Residual Policy for Natural Humanoid Locomotion

    Authors: Qingpeng Li, Chengrui Zhu, Yanming Wu, Xin Yuan, Zhen Zhang, Jian Yang, Yong Liu

    Abstract: Enabling humanoid robots to achieve natural and dynamic locomotion across a wide range of speeds, including smooth transitions from walking to running, presents a significant challenge. Existing deep reinforcement learning methods typically require the policy to directly track a reference motion, forcing a single policy to simultaneously learn motion imitation, velocity tracking, and stability mai… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

  40. arXiv:2509.20427  [pdf, ps, other

    cs.CV

    Seedream 4.0: Toward Next-generation Multimodal Image Generation

    Authors: Team Seedream, :, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, Zhiyao Guo, Xiaoxia Hou, Weilin Huang, Yixuan Huang, Xiaowen Jian, Huafeng Kuang, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yanzuo Lu, Zhengxiong Luo, Tongtong Ou, Guang Shi, Yichun Shi , et al. (26 additional authors not shown)

    Abstract: We introduce Seedream 4.0, an efficient and high-performance multimodal image generation system that unifies text-to-image (T2I) synthesis, image editing, and multi-image composition within a single framework. We develop a highly efficient diffusion transformer with a powerful VAE which also can reduce the number of image tokens considerably. This allows for efficient training of our model, and en… ▽ More

    Submitted 28 September, 2025; v1 submitted 24 September, 2025; originally announced September 2025.

    Comments: Seedream 4.0 Technical Report

  41. arXiv:2509.20376  [pdf, ps, other

    cs.CL cs.AI

    ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models

    Authors: Haoxuan Li, Zhen Wen, Qiqi Jiang, Chenxiao Li, Yuwei Wu, Yuchen Yang, Yiyao Wang, Xiuqi Huang, Minfeng Zhu, Wei Chen

    Abstract: Large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks. Understanding how LLMs internally represent knowledge remains a significant challenge. Despite Sparse Autoencoders (SAEs) have emerged as a promising technique for extracting interpretable features from LLMs, SAE features do not inherently align with human-understandable concepts, makin… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

  42. arXiv:2509.20010  [pdf, ps, other

    cs.SE

    Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories

    Authors: Xiaoning Ren, Yuhang Ye, Xiongfei Wu, Yueming Wu, Yinxing Xue

    Abstract: Neural networks have become integral to many fields due to their exceptional performance. The open-source community has witnessed a rapid influx of neural network (NN) repositories with fast-paced iterations, making it crucial for practitioners to analyze their evolution to guide development and stay ahead of trends. While extensive research has explored traditional software evolution using Softwa… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 11pages,8figures

  43. arXiv:2509.19819  [pdf, ps, other

    cs.CV

    Adaptive Model Ensemble for Continual Learning

    Authors: Yuchuan Mao, Zhi Gao, Xiaomeng Fan, Yuwei Wu, Yunde Jia, Chenchen Jing

    Abstract: Model ensemble is an effective strategy in continual learning, which alleviates catastrophic forgetting by interpolating model parameters, achieving knowledge fusion learned from different tasks. However, existing model ensemble methods usually encounter the knowledge conflict issue at task and layer levels, causing compromised learning performance in both old and new tasks. To solve this issue, w… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

  44. arXiv:2509.19541  [pdf, ps, other

    cs.RO

    Autonomous Elemental Characterization Enabled by a Low Cost Robotic Platform Built Upon a Generalized Software Architecture

    Authors: Xuan Cao, Yuxin Wu, Michael L. Whittaker

    Abstract: Despite the rapidly growing applications of robots in industry, the use of robots to automate tasks in scientific laboratories is less prolific due to lack of generalized methodologies and high cost of hardware. This paper focuses on the automation of characterization tasks necessary for reducing cost while maintaining generalization, and proposes a software architecture for building robotic syste… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  45. arXiv:2509.19352  [pdf, ps, other

    cs.CL cs.AI

    TriSPrompt: A Hierarchical Soft Prompt Model for Multimodal Rumor Detection with Incomplete Modalities

    Authors: Jiajun Chen, Yangyang Wu, Xiaoye Miao, Mengying Zhu, Meng Xi

    Abstract: The widespread presence of incomplete modalities in multimodal data poses a significant challenge to achieving accurate rumor detection. Existing multimodal rumor detection methods primarily focus on learning joint modality representations from \emph{complete} multimodal training data, rendering them ineffective in addressing the common occurrence of \emph{missing modalities} in real-world scenari… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  46. arXiv:2509.19199  [pdf, ps, other

    cs.CL

    Agentic Reinforcement Learning with Implicit Step Rewards

    Authors: Xiaoqian Liu, Ke Wang, Yuchuan Wu, Fei Huang, Yongbin Li, Junge Zhang, Jianbin Jiao

    Abstract: Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL) that reason and act in interactive environments. However, sparse and sometimes unverifiable rewards make it extremely challenging to assign credit when training LLM agents that serve as a policy. Recent work attempts to integrate process supervision into RL but suffers from biased… ▽ More

    Submitted 28 September, 2025; v1 submitted 23 September, 2025; originally announced September 2025.

    Comments: 18 pages, 8 figures

  47. arXiv:2509.18970  [pdf, ps, other

    cs.AI

    LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions

    Authors: Xixun Lin, Yucheng Ning, Jingwen Zhang, Yan Dong, Yilong Liu, Yongxuan Wu, Xiaohua Qi, Nan Sun, Yanmin Shang, Pengfei Cao, Lixin Zou, Xu Chen, Chuan Zhou, Jia Wu, Shirui Pan, Bin Wang, Yanan Cao, Kai Chen, Songlin Hu, Li Guo

    Abstract: Driven by the rapid advancements of Large Language Models (LLMs), LLM-based agents have emerged as powerful intelligent systems capable of human-like cognition, reasoning, and interaction. These agents are increasingly being deployed across diverse real-world applications, including student education, scientific research, and financial analysis. However, despite their remarkable potential, LLM-bas… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  48. arXiv:2509.18905  [pdf, ps, other

    cs.AI

    How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

    Authors: Songsong Yu, Yuxin Chen, Hao Ju, Lianjie Jia, Fuxi Zhang, Shaofei Huang, Yuhan Wu, Rundi Cui, Binghao Ran, Zaibin Zhang, Zhedong Zheng, Zhipeng Zhang, Yifan Wang, Lin Song, Lijun Wang, Yanwei Li, Ying Shan, Huchuan Lu

    Abstract: Visual Spatial Reasoning (VSR) is a core human cognitive ability and a critical requirement for advancing embodied intelligence and autonomous systems. Despite recent progress in Vision-Language Models (VLMs), achieving human-level VSR remains highly challenging due to the complexity of representing and reasoning over three-dimensional space. In this paper, we present a systematic investigation of… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: a comprehensive visual spatial reasoning evaluation tool, 25 pages, 16 figures

  49. arXiv:2509.18891  [pdf, ps, other

    cs.CV

    Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model

    Authors: Xueyu Liu, Xiaoyi Zhang, Guangze Shi, Meilin Liu, Yexin Lai, Yongfei Wu, Mingqiang Wei

    Abstract: Prompt quality plays a critical role in the performance of the Segment Anything Model (SAM), yet existing approaches often rely on heuristic or manually crafted prompts, limiting scalability and generalization. In this paper, we propose Point Prompt Defender, an adversarial reinforcement learning framework that adopts an attack-for-defense paradigm to automatically optimize point prompts. We const… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  50. arXiv:2509.18661  [pdf, ps, other

    cs.IR cs.CL cs.HC

    Agentic AutoSurvey: Let LLMs Survey LLMs

    Authors: Yixin Liu, Yonghui Wu, Denghui Zhang, Lichao Sun

    Abstract: The exponential growth of scientific literature poses unprecedented challenges for researchers attempting to synthesize knowledge across rapidly evolving fields. We present \textbf{Agentic AutoSurvey}, a multi-agent framework for automated survey generation that addresses fundamental limitations in existing approaches. Our system employs four specialized agents (Paper Search Specialist, Topic Mini… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: 29 pages, 7 figures