Skip to main content

Showing 1–50 of 651 results for author: Lu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04240  [pdf, ps, other

    cs.RO

    Optimal Scheduling of a Dual-Arm Robot for Efficient Strawberry Harvesting in Plant Factories

    Authors: Yuankai Zhu, Wenwu Lu, Guoqiang Ren, Yibin Ying, Stavros Vougioukas, Chen Peng

    Abstract: Plant factory cultivation is widely recognized for its ability to optimize resource use and boost crop yields. To further increase the efficiency in these environments, we propose a mixed-integer linear programming (MILP) framework that systematically schedules and coordinates dual-arm harvesting tasks, minimizing the overall harvesting makespan based on pre-mapped fruit locations. Specifically, w… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. OpenSN: An Open Source Library for Emulating LEO Satellite Networks

    Authors: Wenhao Lu, Zhiyuan Wang, Hefan Zhang, Shan Zhang, Hongbin Luo

    Abstract: Low-earth-orbit (LEO) satellite constellations (e.g., Starlink) are becoming a necessary component of future Internet. There have been increasing studies on LEO satellite networking. It is a crucial problem how to evaluate these studies in a systematic and reproducible manner. In this paper, we present OpenSN, i.e., an open source library for emulating large-scale satellite network (SN). Different… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 17 pages

    Journal ref: IEEE Transactions on Parallel and Distributed Systems (TPDS), 2025

  3. arXiv:2507.02199  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

    Authors: Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu

    Abstract: Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning. However, in standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency. To capture reasoning that is not easily represented in words, many works have explored recurrent arch… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.00388  [pdf, ps, other

    cs.IT eess.SP

    Accuracy and Security-Guaranteed Participant Selection and Beamforming Design for RIS-Assisted Federated Learning

    Authors: Mengru Wu, Yu Gao, Weidang Lu, Huimei Han, Lei Sun, Wanli Ni

    Abstract: Federated learning (FL) has emerged as an effective approach for training neural network models without requiring the sharing of participants' raw data, thereby addressing data privacy concerns. In this paper, we propose a reconfigurable intelligent surface (RIS)-assisted FL framework in the presence of eavesdropping, where partial edge devices are selected to participate in the FL training proces… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  5. Reconciling Attribute and Structural Anomalies for Improved Graph Anomaly Detection

    Authors: Chunjing Xiao, Jiahui Lu, Xovee Xu, Fan Zhou, Tianshu Xie, Wei Lu, Lifeng Xu

    Abstract: Graph anomaly detection is critical in domains such as healthcare and economics, where identifying deviations can prevent substantial losses. Existing unsupervised approaches strive to learn a single model capable of detecting both attribute and structural anomalies. However, they confront the tug-of-war problem between two distinct types of anomalies, resulting in suboptimal performance. This wor… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS); DOI: https://doi.org/10.1109/TNNLS.2025.3561172

  6. arXiv:2506.23075  [pdf, ps, other

    cs.HC cs.LG eess.SP q-bio.NC

    CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding

    Authors: Yuchen Zhou, Jiamin Wu, Zichen Ren, Zhouheng Yao, Weiheng Lu, Kunyu Peng, Qihao Zheng, Chunfeng Song, Wanli Ouyang, Chao Gou

    Abstract: Understanding and decoding brain activity from electroencephalography (EEG) signals is a fundamental challenge in neuroscience and AI, with applications in cognition, emotion recognition, diagnosis, and brain-computer interfaces. While recent EEG foundation models advance generalized decoding via unified architectures and large-scale pretraining, they adopt a scale-agnostic dense modeling paradigm… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  7. arXiv:2506.21545  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.PF

    Data Efficacy for Language Model Training

    Authors: Yalun Dai, Yangyu Huang, Xin Zhang, Wenshan Wu, Chong Li, Wenhui Lu, Shijie Cao, Li Dong, Scarlett Li

    Abstract: Data is fundamental to the training of language models (LM). Recent research has been dedicated to data efficiency, which aims to maximize performance by selecting a minimal or optimal subset of training data. Techniques such as data filtering, sampling, and selection play a crucial role in this area. To complement it, we define Data Efficacy, which focuses on maximizing performance by optimizing… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  8. arXiv:2506.20981  [pdf, ps, other

    cs.CR

    PrivacyGo: Privacy-Preserving Ad Measurement with Multidimensional Intersection

    Authors: Jian Du, Haohao Qian, Shikun Zhang, Wen-jie Lu, Donghang Lu, Yongchuan Niu, Bo Jiang, Yongjun Zhao, Qiang Yan

    Abstract: This paper tackles the challenging and practical problem of multi-identifier private user profile matching for privacy-preserving ad measurement, a cornerstone of modern advertising analytics. We introduce a comprehensive cryptographic framework leveraging reversed Oblivious Pseudorandom Functions (OPRF) and novel blind key rotation techniques to support secure matching across multiple identifiers… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  9. arXiv:2506.20980  [pdf, ps, other

    cs.SI cs.AI

    Enhancing Homophily-Heterophily Separation: Relation-Aware Learning in Heterogeneous Graphs

    Authors: Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Weigang Lu

    Abstract: Real-world networks usually have a property of node heterophily, that is, the connected nodes usually have different features or different labels. This heterophily issue has been extensively studied in homogeneous graphs but remains under-explored in heterogeneous graphs, where there are multiple types of nodes and edges. Capturing node heterophily in heterogeneous graphs is very challenging since… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: accepted by KDD 2025

  10. arXiv:2506.19343  [pdf, ps, other

    cs.LG cs.AI

    Discrepancy-Aware Graph Mask Auto-Encoder

    Authors: Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Weigang Lu

    Abstract: Masked Graph Auto-Encoder, a powerful graph self-supervised training paradigm, has recently shown superior performance in graph representation learning. Existing works typically rely on node contextual information to recover the masked information. However, they fail to generalize well to heterophilic graphs where connected nodes may be not similar, because they focus only on capturing the neighbo… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  11. arXiv:2506.18962  [pdf, ps, other

    cs.HC

    UniMind: Unleashing the Power of LLMs for Unified Multi-Task Brain Decoding

    Authors: Weiheng Lu, Chunfeng Song, Jiamin Wu, Pengyu Zhu, Yuchen Zhou, Weijian Mai, Qihao Zheng, Wanli Ouyang

    Abstract: Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale t… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 19pages,4 figures

  12. arXiv:2506.15115  [pdf, ps, other

    cs.LG

    Towards Reliable Forgetting: A Survey on Machine Unlearning Verification, Challenges, and Future Directions

    Authors: Lulu Xue, Shengshan Hu, Wei Lu, Yan Shen, Dongxu Li, Peijin Guo, Ziqi Zhou, Minghui Li, Yanjun Zhang, Leo Yu Zhang

    Abstract: With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a grow… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  13. arXiv:2506.15084  [pdf, ps, other

    cs.SE cs.CV cs.HC

    An Empirical Study of Bugs in Data Visualization Libraries

    Authors: Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun

    Abstract: Data visualization (DataViz) libraries play a crucial role in presentation, data analysis, and application development, underscoring the importance of their accuracy in transforming data into visual representations. Incorrect visualizations can adversely impact user experience, distort information conveyance, and influence user perception and decision-making processes. Visual bugs in these librari… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Proc. ACM Softw. Eng. 2, FSE

  14. arXiv:2506.11549  [pdf, ps, other

    cs.CV eess.IV

    EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment

    Authors: Zhaoyang Wang, Wen Lu, Jie Li, Lihuo He, Maoguo Gong, Xinbo Gao

    Abstract: Free-energy-guided self-repair mechanisms have shown promising results in image quality assessment (IQA), but remain under-explored in video quality assessment (VQA), where temporal dynamics and model constraints pose unique challenges. Unlike static images, video content exhibits richer spatiotemporal complexity, making perceptual restoration more difficult. Moreover, VQA systems often rely on pr… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE TCSVT for possible publication

  15. arXiv:2506.11545  [pdf, ps, other

    eess.IV cs.CV

    FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution

    Authors: Zhaoyang Wang, Jie Li, Wen Lu, Lihuo He, Maoguo Gong, Xinbo Gao

    Abstract: State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information. As video frame rates continue to increase, the diminishing inter-frame differences further expose the limitations of traditional frame-to-frame information exploitation methods, which are inadequat… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE TMM for possible publication

  16. arXiv:2506.08795  [pdf, other

    cs.RO cs.AI

    Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning

    Authors: Kaijie Shi, Wanglong Lu, Hanli Zhao, Vinicius Prado da Fonseca, Ting Zou, Xianta Jiang

    Abstract: Limb loss affects millions globally, impairing physical function and reducing quality of life. Most traditional surface electromyographic (sEMG) and semi-autonomous methods require users to generate myoelectric signals for each control, imposing physically and mentally taxing demands. This study aims to develop a fully autonomous control system that enables a prosthetic hand to automatically grasp… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  17. arXiv:2506.08493  [pdf, ps, other

    cs.CV cs.MM

    Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization

    Authors: Qilin Yin, Wei Lu, Xiangyang Luo, Xiaochun Cao

    Abstract: Most research efforts in the multimedia forensics domain have focused on detecting forgery audio-visual content and reached sound achievements. However, these works only consider deepfake detection as a classification task and ignore the case where partial segments of the video are tampered with. Temporal forgery localization (TFL) of small fake audio-visual clips embedded in real videos is still… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  18. arXiv:2506.07712  [pdf, other

    cs.CL

    Through the Valley: Path to Effective Long CoT Training for Small Language Models

    Authors: Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu

    Abstract: Long chain-of-thought (CoT) supervision has become a common strategy to enhance reasoning in language models. While effective for large models, we identify a phenomenon we call Long CoT Degradation, in which small language models (SLMs; <=3B parameters) trained on limited long CoT data experience significant performance deterioration. Through extensive experiments on the Qwen2.5, LLaMA3 and Gemma3… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  19. arXiv:2506.07126  [pdf

    cs.AR cs.AI

    MAGNet: A Multi-Scale Attention-Guided Graph Fusion Network for DRC Violation Detection

    Authors: Weihan Lu, Hong Cai Chen

    Abstract: Design rule checking (DRC) is of great significance for cost reduction and design efficiency improvement in integrated circuit (IC) designs. Machine-learning-based DRC has become an important approach in computer-aided design (CAD). In this paper, we propose MAGNet, a hybrid deep learning model that integrates an improved U-Net with a graph neural network for DRC violation prediction. The U-Net ba… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 9 pages, 12 figures, 2 tables

  20. arXiv:2506.03570  [pdf, ps, other

    cs.CL

    FreePRM: Training Process Reward Models Without Ground Truth Process Labels

    Authors: Lin Sun, Chuang Liu, Xiaofeng Ma, Tao Yang, Weijia Lu, Ning Wu

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated that Process Reward Models (PRMs) play a crucial role in enhancing model performance. However, training PRMs typically requires step-level labels, either manually annotated or automatically generated, which can be costly and difficult to obtain at scale. To address this challenge, we introduce FreePRM, a weakly supervised framew… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  21. arXiv:2506.03557  [pdf, ps, other

    cs.CL

    BPO: Revisiting Preference Modeling in Direct Preference Optimization

    Authors: Lin Sun, Chuang Liu, Peng Liu, Bingyang Li, Weijia Lu, Ning Wu

    Abstract: Direct Preference Optimization (DPO) have emerged as a popular method for aligning Large Language Models (LLMs) with human preferences. While DPO effectively preserves the relative ordering between chosen and rejected responses through pairwise ranking losses, it often neglects absolute reward magnitudes. This oversight can decrease the likelihood of chosen responses and increase the risk of gener… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  22. arXiv:2506.03139  [pdf, ps, other

    cs.CV cs.AI

    SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

    Authors: Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang

    Abstract: Large Language Models (LLMs) and Multimodal LLMs have shown promising capabilities for SVG processing, yet existing benchmarks suffer from limited real-world coverage, lack of complexity stratification, and fragmented evaluation paradigms. We introduce SVGenius, a comprehensive benchmark comprising 2,377 queries across three progressive dimensions: understanding, editing, and generation. Built on… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 19 pages,4 figures, Project page: https://zju-real.github.io/SVGenius, Code: https://github.com/ZJU-REAL/SVGenius-Bench

  23. arXiv:2506.02683  [pdf, ps, other

    cs.CL

    Decompose, Plan in Parallel, and Merge: A Novel Paradigm for Large Language Models based Planning with Multiple Constraints

    Authors: Zhengdong Lu, Weikai Lu, Yiling Tao, Yun Dai, ZiXuan Chen, Huiping Zhuang, Cen Chen, Hao Peng, Ziqian Zeng

    Abstract: Despite significant advances in Large Language Models (LLMs), planning tasks still present challenges for LLM-based agents. Existing planning methods face two key limitations: heavy constraints and cascading errors. To address these limitations, we propose a novel parallel planning paradigm, which Decomposes, Plans for subtasks in Parallel, and Merges subplans into a final plan (DPPM). Specificall… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  24. arXiv:2506.00030  [pdf, ps, other

    cs.LG

    Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

    Authors: Xiang Shi, Rui Zhang, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu

    Abstract: Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we propose a Shapley-guided alternating training framework that adaptively prioritizes minor modalities to balance and thus enhance the fusion. Our method leverages Sh… ▽ More

    Submitted 25 May, 2025; originally announced June 2025.

    Comments: work in progress

  25. arXiv:2505.24837  [pdf, ps, other

    cs.CV

    Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning

    Authors: Yinglian Zhu, Haiyang Yu, Qizao Wang, Wei Lu, Xiangyang Xue, Bin Li

    Abstract: Chinese Character Recognition (CCR) is a fundamental technology for intelligent document processing. Unlike Latin characters, Chinese characters exhibit unique spatial structures and compositional rules, allowing for the use of fine-grained semantic information in representation. However, existing approaches are usually based on auto-regressive as well as edit distance post-process and typically r… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: The first three authors contributed equally

  26. arXiv:2505.24500  [pdf, other

    cs.CL cs.AI

    TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

    Authors: Guiyang Hou, Xing Gao, Yuchuan Wu, Xiang Huang, Wenqi Zhang, Zhe Zheng, Yongliang Shen, Jialu Du, Fei Huang, Yongbin Li, Weiming Lu

    Abstract: Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 22 pages, 12 figures

  27. arXiv:2505.23604  [pdf, ps, other

    cs.CL cs.AI cs.SE

    Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

    Authors: Guangtao Zeng, Maohao Shen, Delin Chen, Zhenting Qi, Subhro Das, Dan Gutfreund, David Cox, Gregory Wornell, Wei Lu, Zhang-Wei Hong, Chuang Gan

    Abstract: Language models (LMs) perform well on standardized coding benchmarks but struggle with real-world software engineering tasks such as resolving GitHub issues in SWE-Bench, especially when model parameters are less than 100B. While smaller models are preferable in practice due to their lower computational cost, improving their performance remains challenging. Existing approaches primarily rely on su… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  28. arXiv:2505.23486  [pdf, ps, other

    cs.AI

    Autoformalization in the Era of Large Language Models: A Survey

    Authors: Ke Weng, Lun Du, Sirui Li, Wangyue Lu, Haozhe Sun, Hengyu Liu, Tiancheng Zhang

    Abstract: Autoformalization, the process of transforming informal mathematical propositions into verifiable formal representations, is a foundational task in automated theorem proving, offering a new perspective on the use of mathematics in both theoretical and applied domains. Driven by the rapid progress in artificial intelligence, particularly large language models (LLMs), this field has witnessed substa… ▽ More

    Submitted 3 July, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  29. arXiv:2505.23177  [pdf, other

    cs.CL

    Infinite-Instruct: Synthesizing Scaling Code instruction Data with Bidirectional Synthesis and Static Verification

    Authors: Wenjing Xing, Wenke Lu, Yeheng Duan, Bing Zhao, Zhenghui kang, Yaolong Wang, Kai Gao, Lei Qiao

    Abstract: Traditional code instruction data synthesis methods suffer from limited diversity and poor logic. We introduce Infinite-Instruct, an automated framework for synthesizing high-quality question-answer pairs, designed to enhance the code generation capabilities of large language models (LLMs). The framework focuses on improving the internal logic of synthesized problems and the quality of synthesized… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  30. arXiv:2505.22299  [pdf, ps, other

    cs.IR

    Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries

    Authors: Ganlin Xu, Zhoujia Zhang, Wangyi Mei, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Xiaofeng Ma, Yanghua Xiao, Deqing Yang

    Abstract: Information retrieval plays a crucial role in resource localization. Current dense retrievers retrieve the relevant documents within a corpus via embedding similarities, which compute similarities between dense vectors mainly depending on word co-occurrence between queries and documents, but overlook the real query intents. Thus, they often retrieve numerous irrelevant documents. Particularly in… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025

  31. arXiv:2505.22279  [pdf, ps, other

    cs.CV

    Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss

    Authors: Wenjun Lu, Haodong Chen, Anqi Yi, Yuk Ying Chung, Zhiyong Wang, Kun Hu

    Abstract: Novel view synthesis is a fundamental task in 3D computer vision that aims to reconstruct realistic images from a set of posed input views. However, reconstruction quality degrades significantly under sparse-view conditions due to limited geometric cues. Existing methods, such as Neural Radiance Fields (NeRF) and the more recent 3D Gaussian Splatting (3DGS), often suffer from blurred details and s… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  32. arXiv:2505.21500  [pdf, ps, other

    cs.CV cs.AI cs.CL

    ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

    Authors: Dingming Li, Hongxing Li, Zixuan Wang, Yuchen Yan, Hang Zhang, Siqi Chen, Guiyang Hou, Shengpei Jiang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

    Abstract: Vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and reasoning about visual content, but significant challenges persist in tasks requiring cross-viewpoint understanding and spatial reasoning. We identify a critical limitation: current VLMs excel primarily at egocentric spatial reasoning (from the camera's perspective) but fail to generalize to allocentric vi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project: https://zju-real.github.io/ViewSpatial-Page/

  33. arXiv:2505.20600  [pdf, ps, other

    cs.DC cs.AI cs.LG

    InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling

    Authors: Xiaoxiao Jiang, Suyi Li, Lingyun Yang, Tianyu Feng, Zhipeng Di, Weiyi Lu, Guoxuan Zhu, Xiu Lin, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: Generative image editing using diffusion models has become a prevalent application in today's AI cloud services. In production environments, image editing typically involves a mask that specifies the regions of an image template to be edited. The use of masks provides direct control over the editing process and introduces sparsity in the model inference. In this paper, we present InstGenIE, a syst… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  34. arXiv:2505.20075  [pdf, other

    cs.AI

    Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

    Authors: Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang

    Abstract: Reward models trained with conventional Reinforcement Learning from AI Feedback (RLAIF) methods suffer from limited generalizability, which hinders the alignment performance of the policy model during reinforcement learning (RL). This challenge stems from various issues, including distribution shift, preference label noise, and mismatches between overly challenging samples and model capacity. In t… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  35. arXiv:2505.19108  [pdf, ps, other

    cs.CL cs.AI

    CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models

    Authors: Yongheng Zhang, Xu Liu, Ruoxi Zhou, Qiguang Chen, Hao Fei, Wenpeng Lu, Libo Qin

    Abstract: Investigating hallucination issues in large language models (LLMs) within cross-lingual and cross-modal scenarios can greatly advance the large-scale deployment in real-world applications. Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. M… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted at ACL 2025 Main Conference

  36. arXiv:2505.16972  [pdf, ps, other

    cs.CL cs.SD eess.AS

    From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

    Authors: Tianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng

    Abstract: Recent advances in Automatic Speech Recognition (ASR) have been largely fueled by massive speech corpora. However, extending coverage to diverse languages with limited resources remains a formidable challenge. This paper introduces Speech Back-Translation, a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  37. arXiv:2505.16649  [pdf, ps, other

    cs.LG cs.NE

    Stochastic Forward-Forward Learning through Representational Dimensionality Compression

    Authors: Zhichao Zhu, Yang Qi, Hengyuan Ma, Wenlian Lu, Jianfeng Feng

    Abstract: The Forward-Forward (FF) algorithm provides a bottom-up alternative to backpropagation (BP) for training neural networks, relying on a layer-wise "goodness" function to guide learning. Existing goodness functions, inspired by energy-based learning (EBL), are typically defined as the sum of squared post-synaptic activations, neglecting the correlations between neurons. In this work, we propose a no… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 14 pages, 9 figures, 2 tables

  38. arXiv:2505.14684  [pdf, ps, other

    cs.CL cs.AI

    Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning

    Authors: Haolei Xu, Yuchen Yan, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Shengpei Jiang, Kaitao Song, Weiming Lu, Jun Xiao, Yueting Zhuang

    Abstract: Large language models (LLMs) have achieved remarkable progress on mathematical tasks through Chain-of-Thought (CoT) reasoning. However, existing mathematical CoT datasets often suffer from Thought Leaps due to experts omitting intermediate steps, which negatively impacts model learning and generalization. We propose the CoT Thought Leap Bridge Task, which aims to automatically detect leaps and gen… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Project: https://zju-real.github.io/CoT-Bridge/

  39. arXiv:2505.14668  [pdf, ps, other

    cs.AI cs.CL cs.HC

    ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

    Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan

    Abstract: Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functional… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  40. arXiv:2505.14604  [pdf, ps, other

    cs.CL cs.AI

    Let LLMs Break Free from Overthinking via Self-Braking Tuning

    Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang

    Abstract: Large reasoning models (LRMs), such as OpenAI o1 and DeepSeek-R1, have significantly enhanced their reasoning capabilities by generating longer chains of thought, demonstrating outstanding performance across a variety of tasks. However, this performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process, leading to high computational overhead and… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Github:https://github.com/ZJU-REAL/Self-Braking-Tuning Project Page: https://ZJU-REAL.github.io/SBT

  41. arXiv:2505.13212  [pdf, ps, other

    cs.CV

    RB-SCD: A New Benchmark for Semantic Change Detection of Roads and Bridges in Traffic Scenes

    Authors: Qingling Shu, Sibao Chen, Zhihui You, Wei Lu, Jin Tang, Bin Luo

    Abstract: With the rapid modernization of urban transportation, accurately detecting changes such as road and bridge construction, renovation, and demolition is crucial for urban planning and traffic management. However, existing methods often struggle to extract fine-grained semantic changes in complex traffic scenes, largely due to the lack of high-quality annotated change detection (CD) datasets. To addr… ▽ More

    Submitted 6 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  42. arXiv:2505.13101  [pdf, ps, other

    cs.CV cs.AI

    ARIW-Framework: Adaptive Robust Iterative Watermarking Framework

    Authors: Shaowu Wu, Liting Zeng, Wei Lu, Xiangyang Luo

    Abstract: With the rapid rise of large models, copyright protection for generated image content has become a critical security challenge. Although deep learning watermarking techniques offer an effective solution for digital image copyright protection, they still face limitations in terms of visual quality, robustness and generalization. To address these issues, this paper proposes an adaptive robust iterat… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 10 pages, 4 figures

  43. arXiv:2505.13023  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Anti-Inpainting: A Proactive Defense against Malicious Diffusion-based Inpainters under Unknown Conditions

    Authors: Yimao Guo, Zuomin Qu, Wei Lu, Xiangyang Luo

    Abstract: As diffusion-based malicious image manipulation becomes increasingly prevalent, multiple proactive defense methods are developed to safeguard images against unauthorized tampering. However, most proactive defense methods only can safeguard images against manipulation under known conditions, and fail to protect images from manipulations guided by tampering conditions crafted by malicious users. To… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  44. arXiv:2505.12339  [pdf, other

    cs.CV cs.AI

    Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation

    Authors: Midou Guo, Qilin Yin, Wei Lu, Xiangyang Luo

    Abstract: With the development of generative artificial intelligence, new forgery methods are rapidly emerging. Social platforms are flooded with vast amounts of unlabeled synthetic data and authentic data, making it increasingly challenging to distinguish real from fake. Due to the lack of labels, existing supervised detection methods struggle to effectively address the detection of unknown deepfake method… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  45. arXiv:2505.12332  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning

    Authors: Qianyue Hu, Junyan Wu, Wei Lu, Xiangyang Luo

    Abstract: Diffusion Models (DMs) have achieved remarkable success in realistic voice cloning (VC), while they also increase the risk of malicious misuse. Existing proactive defenses designed for traditional VC models aim to disrupt the forgery process, but they have been proven incompatible with DMs due to the intricate generative mechanisms of diffusion. To bridge this gap, we introduce VoiceCloak, a multi… ▽ More

    Submitted 20 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

  46. arXiv:2505.12224  [pdf, other

    cs.RO cs.AI

    RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction

    Authors: Weifeng Lu, Minghao Ye, Zewei Ye, Ruihan Tao, Shuo Yang, Bo Zhao

    Abstract: Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and image information into sequential control actions. However, these models often underperform in open-world scenarios, as they are predominantly trained on successful expert demonstrations and exhibit a limited capacity for failure recovery. In this work, we present a Robo… ▽ More

    Submitted 25 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

  47. arXiv:2505.12191  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum

    Authors: Wenquan Lu, Jiaqi Zhang, Hugues Van Assel, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality datasets. As a result, applying SSL on noisy data remains a challenge, despite being crucial to applications such as astrophysics, medical imaging, geophysics or finance. In this work, we present a fully self-super… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  48. arXiv:2505.10931  [pdf, ps, other

    cs.CV

    M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection

    Authors: Chao Wang, Wei Lu, Xiang Li, Jian Yang, Lei Luo

    Abstract: Single-source remote sensing object detection using optical or SAR images struggles in complex environments. Optical images offer rich textural details but are often affected by low-light, cloud-obscured, or low-resolution conditions, reducing the detection performance. SAR images are robust to weather, but suffer from speckle noise and limited semantic expressiveness. Optical and SAR images provi… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  49. arXiv:2505.08330  [pdf, ps, other

    cs.LG cs.SI

    Structural-Temporal Coupling Anomaly Detection with Dynamic Graph Transformer

    Authors: Chang Zong, Yueting Zhuang, Jian Shao, Weiming Lu

    Abstract: Detecting anomalous edges in dynamic graphs is an important task in many applications over evolving triple-based data, such as social networks, transaction management, and epidemiology. A major challenge with this task is the absence of structural-temporal coupling information, which decreases the ability of the representation to distinguish anomalies from normal instances. Existing methods focus… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 20 pages, 6 figures

    MSC Class: 68T07; 68T09

  50. arXiv:2505.06997  [pdf, ps, other

    cs.AI

    A Multi-Agent Reinforcement Learning Approach for Cooperative Air-Ground-Human Crowdsensing in Emergency Rescue

    Authors: Wenhao Lu, Zhengqiu Zhu, Yong Zhao, Yonglin Tian, Junjie Zeng, Jun Zhang, Zhong Liu, Fei-Yue Wang

    Abstract: Mobile crowdsensing is evolving beyond traditional human-centric models by integrating heterogeneous entities like unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). Optimizing task allocation among these diverse agents is critical, particularly in challenging emergency rescue scenarios characterized by complex environments, limited communication, and partial observability. This… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.