Skip to main content

Showing 1–50 of 4,587 results for author: Zhang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06146  [pdf, ps, other

    cs.CV

    Prompt-Free Conditional Diffusion for Multi-object Image Augmentation

    Authors: Haoyu Wang, Lei Zhang, Wei Wei, Chen Ding, Yanning Zhang

    Abstract: Diffusion models has underpinned much recent advances of dataset augmentation in various computer vision tasks. However, when involving generating multi-object images as real scenarios, most existing methods either rely entirely on text condition, resulting in a deviation between the generated objects and the original data, or rely too much on the original images, resulting in a lack of diversity… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Accepted at IJCAI 2025

  2. arXiv:2507.05962  [pdf, ps, other

    cs.HC

    Evaluation of Large Language Model-Driven AutoML in Data and Model Management from Human-Centered Perspective

    Authors: Jiapeng Yao, Lantian Zhang, Jiping Huang

    Abstract: As organizations increasingly seek to leverage machine learning (ML) capabilities, the technical complexity of implementing ML solutions creates significant barriers to adoption and impacts operational efficiency. This research examines how Large Language Models (LLMs) can transform the accessibility of ML technologies within organizations through a human-centered Automated Machine Learning (AutoM… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  3. arXiv:2507.05638  [pdf, ps, other

    cs.AI cs.SI

    LLMs are Introvert

    Authors: Litian Zhang, Xiaoming Zhang, Bingyu Yan, Ziyi Zhou, Bo Zhang, Zhenyu Guan, Xi Zhang, Chaozhuo Li

    Abstract: The exponential growth of social media and generative AI has transformed information dissemination, fostering connectivity but also accelerating the spread of misinformation. Understanding information propagation dynamics and developing effective control strategies is essential to mitigate harmful content. Traditional models, such as SIR, provide basic insights but inadequately capture the complex… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.05601  [pdf, ps, other

    cs.CV

    Rethinking Layered Graphic Design Generation with a Top-Down Approach

    Authors: Jingye Chen, Zhaowen Wang, Nanxuan Zhao, Li Zhang, Difan Liu, Jimei Yang, Qifeng Chen

    Abstract: Graphic design is crucial for conveying ideas and messages. Designers usually organize their work into objects, backgrounds, and vectorized text layers to simplify editing. However, this workflow demands considerable expertise. With the rise of GenAI methods, an endless supply of high-quality graphic designs in pixel format has become more accessible, though these designs often lack editability. D… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: ICCV 2025

  5. arXiv:2507.05241  [pdf, ps, other

    cs.AI cs.CL

    SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?

    Authors: Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Yuzhi Zhang, Linfeng Zhang, Siheng Chen

    Abstract: The rapid advancements of AI agents have ignited the long-held ambition of leveraging them to accelerate scientific discovery. Achieving this goal requires a deep understanding of the frontiers of human knowledge. As such, Humanity's Last Exam (HLE) provides an exceptionally challenging touchstone for evaluating scientific AI agents. In this work, we aim to construct the foundational architecture… ▽ More

    Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 15 pages, 10 figures

  6. arXiv:2507.04999  [pdf, ps, other

    cs.CV

    Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport

    Authors: Qinkai Yu, Jianyang Xie, Yitian Zhao, Cheng Chen, Lijun Zhang, Liming Chen, Jun Cheng, Lu Liu, Yalin Zheng, Yanda Meng

    Abstract: Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography (OCT) to provide a comprehensive view of ocular pathologies. However, the uneven global distribution of healthcare resources often results in real-world clinical scenarios encountering incomplete multimodal data, which significantly compromises diagnostic accuracy. Existing commonly used p… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025

  7. arXiv:2507.04952  [pdf, ps, other

    cs.CL cs.SE

    ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation

    Authors: Chenchen Zhang, Yuhang Li, Can Xu, Jiaheng Liu, Ao Liu, Shihui Hu, Dengpeng Wu, Guanhua Huang, Kejiao Li, Qi Yi, Ruibin Xiong, Haotian Zhu, Yuanxing Zhang, Yuhao Jiang, Yue Zhang, Zenan Xu, Bohui Zhai, Guoxiang He, Hebin Li, Jie Zhao, Le Zhang, Lingyun Tan, Pengyu Guo, Xianshu Pang, Yang Ruan , et al. (7 additional authors not shown)

    Abstract: The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap: established benchmarks focus on algorithmic correctness and are blind to the visual fidelity and interactive integrity that define modern user experiences. To bridge this gap, we introduce ArtifactsB… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  8. arXiv:2507.04749  [pdf, ps, other

    cs.CV

    MatDecompSDF: High-Fidelity 3D Shape and PBR Material Decomposition from Multi-View Images

    Authors: Chengyu Wang, Isabella Bennett, Henry Scott, Liang Zhang, Mei Chen, Hao Li, Rui Zhao

    Abstract: We present MatDecompSDF, a novel framework for recovering high-fidelity 3D shapes and decomposing their physically-based material properties from multi-view images. The core challenge of inverse rendering lies in the ill-posed disentanglement of geometry, materials, and illumination from 2D observations. Our method addresses this by jointly optimizing three neural components: a neural Signed Dista… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 12 pages, 4 figures

    MSC Class: 68U05 ACM Class: I.3.7; I.3.3; I.4.1

  9. arXiv:2507.04294  [pdf, ps, other

    cs.IR

    BiFair: A Fairness-aware Training Framework for LLM-enhanced Recommender Systems via Bi-level Optimization

    Authors: Jiaming Zhang, Yuyuan Li, Yiqun Xu, Li Zhang, Xiaohua Feng, Zhifei Ren, Chaochao Chen

    Abstract: Large Language Model-enhanced Recommender Systems (LLM-enhanced RSs) have emerged as a powerful approach to improving recommendation quality by leveraging LLMs to generate item representations. Despite these advancements, the integration of LLMs raises severe fairness concerns. Existing studies reveal that LLM-based RSs exhibit greater unfairness than traditional RSs, yet fairness issues in LLM-en… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  10. arXiv:2507.04060  [pdf, ps, other

    cs.CV cs.AI

    Temporal Continual Learning with Prior Compensation for Human Motion Prediction

    Authors: Jianwei Tang, Jiangxin Sun, Xiaotong Lin, Lifang Zhang, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Human Motion Prediction (HMP) aims to predict future poses at different moments according to past motion sequences. Previous approaches have treated the prediction of various moments equally, resulting in two main limitations: the learning of short-term predictions is hindered by the focus on long-term predictions, and the incorporation of prior information from past predictions into subsequent pr… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Advances in Neural Information Processing Systems 2023

    Journal ref: Advances in Neural Information Processing Systems, 2023, 36: 65837-65849

  11. arXiv:2507.03898  [pdf, ps, other

    cs.CV

    Deconfounding Causal Inference through Two-Branch Framework with Early-Forking for Sensor-Based Cross-Domain Activity Recognition

    Authors: Di Xiong, Lei Zhang, Shuoyuan Wang, Dongzhou Cheng, Wenbo Huang

    Abstract: Recently, domain generalization (DG) has emerged as a promising solution to mitigate distribution-shift issue in sensor-based human activity recognition (HAR) scenario. However, most existing DG-based works have merely focused on modeling statistical dependence between sensor data and activity labels, neglecting the importance of intrinsic casual mechanism. Intuitively, every sensor input can be v… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

    Comments: Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)

    Journal ref: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 9, 2, Article 56 (June 2025)

  12. arXiv:2507.03872  [pdf, ps, other

    eess.IV cs.CV

    PLUS: Plug-and-Play Enhanced Liver Lesion Diagnosis Model on Non-Contrast CT Scans

    Authors: Jiacheng Hao, Xiaoming Zhang, Wei Liu, Xiaoli Yin, Yuan Gao, Chunli Li, Ling Zhang, Le Lu, Yu Shi, Xu Han, Ke Yan

    Abstract: Focal liver lesions (FLL) are common clinical findings during physical examination. Early diagnosis and intervention of liver malignancies are crucial to improving patient survival. Although the current 3D segmentation paradigm can accurately detect lesions, it faces limitations in distinguishing between malignant and benign liver lesions, primarily due to its inability to differentiate subtle var… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025 (Early Accepted)

  13. arXiv:2507.03724  [pdf, ps, other

    cs.CL

    MemOS: A Memory OS for AI System

    Authors: Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Jiawei Yang, Chunyu Li, Qingchen Yu, Jihao Zhao, Yezhaohui Wang, Peng Liu, Zehao Lin, Pengyuan Wang, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhen Tao, Junpeng Ren, Huayi Lai, Hao Wu, Bo Tang, Zhenren Wang , et al. (14 additional authors not shown)

    Abstract: Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user prefer… ▽ More

    Submitted 8 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: 36 pages, 10 figures, 5 tables

  14. arXiv:2507.03427  [pdf, ps, other

    cs.CV

    Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense

    Authors: Lina Ma, Xiaowei Fu, Fuxiang Huang, Xinbo Gao, Lei Zhang

    Abstract: Existing defense methods fail to defend against unknown attacks and thus raise generalization issue of adversarial robustness. To remedy this problem, we attempt to delve into some underlying common characteristics among various attacks for generality. In this work, we reveal the commonly overlooked low entropy prior (LE) implied in various adversarial samples, and shed light on the universal robu… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: To appear in IEEEE Transactions on Multimedia

  15. arXiv:2507.03315  [pdf, ps, other

    eess.IV cs.CV

    Towards Interpretable PolSAR Image Classification: Polarimetric Scattering Mechanism Informed Concept Bottleneck and Kolmogorov-Arnold Network

    Authors: Jinqi Zhang, Fangzhou Han, Di Zhuang, Lamei Zhang, Bin Zou, Li Yuan

    Abstract: In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems.… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  16. arXiv:2507.03041  [pdf, ps, other

    cs.LG cs.AI

    Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

    Authors: Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

    Abstract: Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and mode… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 20 pages

  17. arXiv:2507.03038  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Cautious Next Token Prediction

    Authors: Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi, Yun Fu

    Abstract: Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Findings of ACL 2025

  18. arXiv:2507.02870  [pdf, ps, other

    cs.CL

    Loki's Dance of Illusions: A Comprehensive Survey of Hallucination in Large Language Models

    Authors: Chaozhuo Li, Pengbo Wang, Chenxu Wang, Litian Zhang, Zheng Liu, Qiwei Ye, Yuanbo Xu, Feiran Huang, Xi Zhang, Philip S. Yu

    Abstract: Edgar Allan Poe noted, "Truth often lurks in the shadow of error," highlighting the deep complexity intrinsic to the interplay between truth and falsehood, notably under conditions of cognitive and informational asymmetry. This dynamic is strikingly evident in large language models (LLMs). Despite their impressive linguistic generation capabilities, LLMs sometimes produce information that appears… ▽ More

    Submitted 6 June, 2025; originally announced July 2025.

  19. arXiv:2507.02792  [pdf, ps, other

    cs.CV

    RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation

    Authors: Liheng Zhang, Lexi Pang, Hang Ye, Xiaoxuan Ma, Yizhou Wang

    Abstract: Text-to-image (T2I) diffusion models have shown remarkable success in generating high-quality images from text prompts. Recent efforts extend these models to incorporate conditional images (e.g., depth or pose maps) for fine-grained spatial control. Among them, feature injection methods have emerged as a training-free alternative to traditional fine-tuning approaches. However, they often suffer fr… ▽ More

    Submitted 8 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

    Comments: arXiv admin note: text overlap with arXiv:2406.07540 by other authors

  20. arXiv:2507.02598  [pdf, ps, other

    cs.AR cs.AI

    AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models

    Authors: Chenhao Xue, Kezhi Li, Jiaxing Zhang, Yi Ren, Zhengyuan Shi, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun

    Abstract: Arithmetic circuits, such as adders and multipliers, are fundamental components of digital systems, directly impacting the performance, power efficiency, and area footprint. However, optimizing these circuits remains challenging due to the vast design space and complex physical constraints. While recent deep learning-based approaches have shown promise, they struggle to consistently explore high-p… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 8 pages, 12 figures

  21. arXiv:2507.02592  [pdf, ps, other

    cs.CL cs.AI

    WebSailor: Navigating Super-human Reasoning for Web Agent

    Authors: Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, Weizhou Shen, Junkai Zhang, Dingchu Zhang, Xixi Wu, Yong Jiang, Ming Yan, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to sy… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  22. arXiv:2507.02029  [pdf, ps, other

    cs.RO

    RoboBrain 2.0 Technical Report

    Authors: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Mengfei Du, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Junkai Zhao, Xiaojie Zhang, Sh/anyu Rong, Huaihai Lyu, Zhengliang Cai , et al. (26 additional authors not shown)

    Abstract: We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  23. arXiv:2507.01838  [pdf, ps, other

    cs.CV

    MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

    Authors: Hailong Yan, Ao Li, Xiangtao Zhang, Zhe Liu, Zenglin Shi, Ce Zhu, Le Zhang

    Abstract: Recent advancements in deep neural networks have driven significant progress in image enhancement (IE). However, deploying deep learning models on resource-constrained platforms, such as mobile devices, remains challenging due to high computation and memory demands. To address these challenges and facilitate real-time IE on mobile, we introduce an extremely lightweight Convolutional Neural Network… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  24. arXiv:2507.01795  [pdf, ps, other

    math.NA cs.LG math-ph

    Neural Entropy-stable conservative flux form neural networks for learning hyperbolic conservation laws

    Authors: Lizuo Liu, Lu Zhang, Anne Gelb

    Abstract: We propose a neural entropy-stable conservative flux form neural network (NESCFN) for learning hyperbolic conservation laws and their associated entropy functions directly from solution trajectories, without requiring any predefined numerical discretization. While recent neural network architectures have successfully integrated classical numerical principles into learned models, most rely on prior… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    MSC Class: 65M08; 68T07; 65M22; 65M32; 65D25

  25. arXiv:2507.00577  [pdf, ps, other

    cs.CR cs.AI cs.CV

    BadViM: Backdoor Attack against Vision Mamba

    Authors: Yinghao Wu, Liyan Zhang

    Abstract: Vision State Space Models (SSMs), particularly architectures like Vision Mamba (ViM), have emerged as promising alternatives to Vision Transformers (ViTs). However, the security implications of this novel architecture, especially their vulnerability to backdoor attacks, remain critically underexplored. Backdoor attacks aim to embed hidden triggers into victim models, causing the model to misclassi… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  26. arXiv:2507.00316  [pdf, ps, other

    cs.LG cs.CL eess.IV

    $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

    Authors: Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

    Abstract: Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficult… ▽ More

    Submitted 1 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  27. arXiv:2507.00286  [pdf, ps, other

    cs.HC cs.AI cs.ET

    Visual Privacy Management with Generative AI for Blind and Low-Vision People

    Authors: Tanusree Sharma, Yu-Yun Tseng, Lotus Zhang, Ayae Ide, Kelly Avery Mack, Leah Findlater, Danna Gurari, Yang Wang

    Abstract: Blind and low vision (BLV) individuals use Generative AI (GenAI) tools to interpret and manage visual content in their daily lives. While such tools can enhance the accessibility of visual content and so enable greater user independence, they also introduce complex challenges around visual privacy. In this paper, we investigate the current practices and future design preferences of blind and low v… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  28. arXiv:2507.00015  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Vision Transformer with Adversarial Indicator Token against Adversarial Attacks in Radio Signal Classifications

    Authors: Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Xuekang Liu, Fabio Roli, Carsten Maple

    Abstract: The remarkable success of transformers across various fields such as natural language processing and computer vision has paved the way for their applications in automatic modulation classification, a critical component in the communication systems of Internet of Things (IoT) devices. However, it has been observed that transformer-based classification of radio signals is susceptible to subtle yet s… ▽ More

    Submitted 13 June, 2025; originally announced July 2025.

  29. arXiv:2506.23749  [pdf, ps, other

    cs.SE

    A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications

    Authors: Boyang Yang, Zijian Cai, Fengling Liu, Bach Le, Lingming Zhang, Tegawendé F. Bissyandé, Yang Liu, Haoye Tian

    Abstract: Large language models (LLMs) are reshaping automated program repair (APR). We categorize the recent 63 LLM-based APR systems published from January 2022 to June 2025 into four paradigms, and show how retrieval- or analysis-augmented contexts strengthen any of them. This taxonomy clarifies key trade-offs: fine-tuning delivers strong task alignment at high training cost; prompting enables rapid depl… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  30. arXiv:2506.23707  [pdf, ps, other

    cs.MM

    Efficient and Accurate Image Provenance Analysis: A Scalable Pipeline for Large-scale Images

    Authors: Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun

    Abstract: The rapid proliferation of modified images on social networks that are driven by widely accessible editing tools demands robust forensic tools for digital governance. Image provenance analysis, which filters various query image variants and constructs a directed graph to trace their phylogeny history, has emerged as a critical solution. However, existing methods face two fundamental limitations: F… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 25 pages, 6 figures

  31. arXiv:2506.23701  [pdf, ps, other

    eess.IV cs.CV

    MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction

    Authors: Lingtong Zhang, Mengdie Song, Xiaohan Hao, Huayu Mai, Bensheng Qiu

    Abstract: Magnetic Resonance Imaging (MRI) reconstruction is essential in medical diagnostics. As the latest generative models, diffusion models (DMs) have struggled to produce high-fidelity images due to their stochastic nature in image domains. Latent diffusion models (LDMs) yield both compact and detailed prior knowledge in latent domains, which could effectively guide the model towards more effective le… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accept by MICCAI2025

  32. arXiv:2506.23543  [pdf, ps, other

    cs.CV

    Pyramidal Patchification Flow for Visual Generation

    Authors: Hui Li, Baoyou Chen, Liwei Zhang, Jiaye Li, Jingdong Wang, Siyu Zhu

    Abstract: Diffusion transformers (DiTs) adopt Patchify, mapping patch representations to token representations through linear projections, to adjust the number of tokens input to DiT blocks and thus the computation cost. Instead of a single patch size for all the timesteps, we introduce a Pyramidal Patchification Flow (PPFlow) approach: Large patch sizes are used for high noise timesteps and small patch siz… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 10 pages, 9figures

  33. arXiv:2506.23488  [pdf, ps, other

    cs.NI

    Generative AI-enhanced Low-Altitude UAV-Mounted Stacked Intelligent Metasurfaces

    Authors: Geng Sun, Mingzhe Fan, Lei Zhang, Hongyang Pan, Jiahui Li, Chuang Zhang, Linyao Li, Changyuan Zhao, Chau Yuen

    Abstract: Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In thi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been already submitted to TCCN

  34. arXiv:2506.23150  [pdf, ps, other

    cs.CV

    AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation

    Authors: Xinyue Liang, Zhiyuan Ma, Lingchen Sun, Yanjun Guo, Lei Zhang

    Abstract: Single-image-to-3D models typically follow a sequential generation and reconstruction workflow. However, intermediate multi-view images synthesized by pre-trained generation models often lack cross-view consistency (CVC), significantly degrading 3D reconstruction performance. While recent methods attempt to refine CVC by feeding reconstruction results back into the multi-view generator, these appr… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  35. arXiv:2506.22803  [pdf, ps, other

    cs.CV cs.HC cs.LG

    Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding

    Authors: Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Mei Lin, Peiyi Shen, Liang Zhang

    Abstract: Recent advances in deep learning have led to increasingly complex models with deeper layers and more parameters, reducing interpretability and making their decisions harder to understand. While many methods explain black-box reasoning, most lack effective interventions or only operate at sample-level without modifying the model itself. To address this, we propose the Concept Bottleneck Model for E… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  36. arXiv:2506.22799  [pdf, ps, other

    cs.GR cs.CV cs.LG

    VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

    Authors: Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang

    Abstract: 3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a no… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  37. arXiv:2506.22756  [pdf, ps, other

    cs.CV cs.RO

    RoboPearls: Editable Video Simulation for Robot Manipulation

    Authors: Tao Tang, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, xia zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang

    Abstract: The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging th… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: ICCV 2025

  38. arXiv:2506.22242  [pdf, ps, other

    cs.CV

    4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

    Authors: Jiahui Zhang, Yurui Chen, Yueming Xu, Ze Huang, Yanpeng Zhou, Yu-Jie Yuan, Xinyue Cai, Guowei Huang, Xingyue Quan, Hang Xu, Li Zhang

    Abstract: Leveraging diverse robotic data for pretraining remains a critical challenge. Existing methods typically model the dataset's action distribution using simple observations as inputs. However, these inputs are often incomplete, resulting in a dispersed conditional action distribution-an issue we refer to as coordinate system chaos and state chaos. This inconsistency significantly hampers pretraining… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  39. arXiv:2506.22157  [pdf, ps, other

    cs.CL

    Training Language Model to Critique for Better Refinement

    Authors: Tianshu Yu, Chao Xiang, Mingchuan Yang, Pei Ke, Bosi Wen, Cunxiang Wang, Jiale Cheng, Li Zhang, Xinyu Mu, Chuxiong Sun, Minlie Huang

    Abstract: Large language models (LLMs) have demonstrated remarkable evaluation and critique capabilities, providing insightful feedback and identifying flaws in various tasks. However, limited research has explored which types of critiques are most effective for improving model responses or how to generate such critiques. To address this gap, we introduce \textbf{R}efinement-oriented \textbf{C}ritique \text… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 Findings

  40. arXiv:2506.22099  [pdf, ps, other

    cs.CV

    BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting

    Authors: Zipei Ma, Junzhe Jiang, Yurui Chen, Li Zhang

    Abstract: The realistic reconstruction of street scenes is critical for developing real-world simulators in autonomous driving. Most existing methods rely on object pose annotations, using these poses to reconstruct dynamic objects and move them during the rendering process. This dependence on high-precision object annotations limits large-scale and extensive scene reconstruction. To address this challenge,… ▽ More

    Submitted 8 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025, Project Page: https://github.com/fudan-zvg/BezierGS

  41. arXiv:2506.21875  [pdf, ps, other

    cs.CL

    WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation

    Authors: Jian Zhang, Linhao Zhang, Bokai Lei, Chuhan Wu, Wei Jia, Xiao Zhou

    Abstract: Recent multi-modal Large Language Models (LLMs) such as GPT-4o have demonstrated strong capabilities of direct speech interaction. However, the lack of specialized and comprehensive benchmarks for end-to-end speech LLM evaluation hinders optimizing the user experience of Audio LLMs in real-world applications. Existing evaluation methods often adapt text-based benchmarks, overlooking speech's uniqu… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  42. arXiv:2506.21602  [pdf, ps, other

    cs.CL cs.AI

    BiMark: Unbiased Multilayer Watermarking for Large Language Models

    Authors: Xiaoyan Feng, He Zhang, Yanjun Zhang, Leo Yu Zhang, Shirui Pan

    Abstract: Recent advances in Large Language Models (LLMs) have raised urgent concerns about LLM-generated text authenticity, prompting regulatory demands for reliable identification mechanisms. Although watermarking offers a promising solution, existing approaches struggle to simultaneously achieve three critical requirements: text quality preservation, model-agnostic detection, and message embedding capaci… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: This paper is accepted by International Conference on Machine Learning (ICML) 2025

  43. arXiv:2506.21591   

    cs.CL

    FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning

    Authors: Shaoyu Dou, Yutian Shen, Mofan Chen, Zixuan Wang, Jiajie Xu, Qi Guo, Kailai Shao, Chao Chen, Haixiang Hu, Haibo Shi, Min Min, Liwen Zhang

    Abstract: Large Language Models (LLMs) demonstrate significant potential but face challenges in complex financial reasoning tasks requiring both domain knowledge and sophisticated reasoning. Current evaluation benchmarks often fall short by not decoupling these capabilities indicators from single task performance and lack root cause analysis for task failure. To address this, we introduce FinEval-KR, a nove… ▽ More

    Submitted 29 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: The statistics included in the paper are incomplete (e.g., Tables 2 and 5 report only the results of a single run), which may lead readers to misunderstand

  44. arXiv:2506.21121  [pdf, ps, other

    cs.CV cs.RO

    GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction

    Authors: Muleilan Pei, Shaoshuai Shi, Lu Zhang, Peiliang Li, Shaojie Shen

    Abstract: Trajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context r… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  45. arXiv:2506.19852  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

    Authors: Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

    Abstract: Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal d… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Code: https://github.com/mit-han-lab/radial-attention

  46. arXiv:2506.18658  [pdf, ps, other

    cs.CV cs.AI

    Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation

    Authors: Ling Zhang, Boxiang Yun, Qingli Li, Yan Wang

    Abstract: Automated pathology report generation from Whole Slide Images (WSIs) faces two key challenges: (1) lack of semantic content in visual features and (2) inherent information redundancy in WSIs. To address these issues, we propose a novel Historical Report Guided \textbf{Bi}-modal Concurrent Learning Framework for Pathology Report \textbf{Gen}eration (BiGen) emulating pathologists' diagnostic reasoni… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  47. arXiv:2506.18564  [pdf, ps, other

    cs.CV

    VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning

    Authors: Xuanyu Zhang, Weiqi Li, Shijie Zhao, Junlin Li, Li Zhang, Jian Zhang

    Abstract: Recent advances in AI-generated content (AIGC) have led to the emergence of powerful text-to-video generation models. Despite these successes, evaluating the quality of AIGC-generated videos remains challenging due to limited generalization, lack of temporal awareness, heavy reliance on large-scale annotated datasets, and the lack of effective interaction with generation models. Most current appro… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Technical Report

  48. arXiv:2506.18325  [pdf, ps, other

    cs.CV

    NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation

    Authors: Yu Xie, Chengjie Zeng, Lingyun Zhang, Yanwei Fu

    Abstract: The rapid advancement of text-to-image (T2I) models, such as Stable Diffusion, has enhanced their capability to synthesize images from textual prompts. However, this progress also raises significant risks of misuse, including the generation of harmful content (e.g., pornography, violence, discrimination), which contradicts the ethical goals of T2I technology and hinders its sustainable development… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  49. arXiv:2506.18050  [pdf, ps, other

    cs.SE

    VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software

    Authors: Lyuye Zhang, Jian Zhang, Kaixuan Li, Chong Wang, Chengwei Liu, Jiahui Wu, Sen Chen, Yaowen Zheng, Yang Liu

    Abstract: Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerab… ▽ More

    Submitted 24 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: 15 pages

  50. arXiv:2506.17807  [pdf, ps, other

    cs.LG cs.AI

    Reimagining Parameter Space Exploration with Diffusion Models

    Authors: Lijun Zhang, Xiao Liu, Hui Guan

    Abstract: Adapting neural networks to new tasks typically requires task-specific fine-tuning, which is time-consuming and reliant on labeled data. We explore a generative alternative that produces task-specific parameters directly from task identity, eliminating the need for task-specific training. To this end, we propose using diffusion models to learn the underlying structure of effective task-specific pa… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: Accepted at ICML 2025 EXAIT Workshop