Skip to main content

Showing 1–50 of 2,616 results for author: Zhu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07831  [pdf, ps, other

    cs.CV

    Rethinking Query-based Transformer for Continual Image Segmentation

    Authors: Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang

    Abstract: Class-incremental/Continual image segmentation (CIS) aims to train an image segmenter in stages, where the set of available categories differs at each stage. To leverage the built-in objectness of query-based transformers, which mitigates catastrophic forgetting of mask proposals, current methods often decouple mask generation from the continual learning process. This study, however, identifies tw… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: This work is accepted by CVPR 2025

  2. arXiv:2507.07318  [pdf, ps, other

    cs.SD cs.AI eess.AS

    SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models

    Authors: Christian Templin, Yanda Zhu, Hao Wang

    Abstract: Spatial audio is an integral part of immersive entertainment, such as VR/AR, and has seen increasing popularity in cinema and music as well. The most common format of spatial audio is described as first-order Ambisonics (FOA). We seek to extend recent advancements in FOA generative AI models to enable the generation of 3D scenes with dynamic sound sources. Our proposed end-to-end model, SonicMotio… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  3. arXiv:2507.07148  [pdf, ps, other

    cs.CV cs.LG

    Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey

    Authors: Getamesay Haile Dagnaw, Yanming Zhu, Muhammad Hassan Maqsood, Wencheng Yang, Xingshuai Dong, Xuefei Yin, Alan Wee-Chung Liew

    Abstract: Explainable artificial intelligence (XAI) has become increasingly important in biomedical image analysis to promote transparency, trust, and clinical adoption of DL models. While several surveys have reviewed XAI techniques, they often lack a modality-aware perspective, overlook recent advances in multimodal and vision-language paradigms, and provide limited practical guidance. This survey address… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  4. arXiv:2507.07016  [pdf, ps, other

    cs.LG eess.SP

    On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

    Authors: Jian Huang, Yongli Zhu, Linna Xu, Zhe Zheng, Wenpeng Cui, Mingyang Sun

    Abstract: In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are invest… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: This paper is currently under reviewing by an IEEE publication; it may be subjected to minor changes due to review comments later

  5. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  6. arXiv:2507.05281  [pdf, ps, other

    cs.SE cs.CL

    CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark

    Authors: Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu

    Abstract: As Large Language Models (LLMs) demonstrate increasingly sophisticated code processing capabilities, evaluating their performance on engineering-level code remains challenging. Existing repository-level benchmarks primarily focus on single scenarios, such as code generation or bug fixing, without adequately capturing the diversity and complexity of real-world software or project engineering workfl… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  7. arXiv:2507.05261  [pdf, ps, other

    cs.CL cs.LG

    TokenShapley: Token Level Context Attribution with Shapley Value

    Authors: Yingtai Xiao, Yuqing Zhu, Sirat Samyoun, Wanrong Zhang, Jiachen T. Wang, Jian Du

    Abstract: Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose Tok… ▽ More

    Submitted 9 July, 2025; v1 submitted 18 June, 2025; originally announced July 2025.

  8. arXiv:2507.04793  [pdf, ps, other

    cs.CL cs.AI

    A Survey of Pun Generation: Datasets, Evaluations and Methodologies

    Authors: Yuchen Su, Yonghua Zhu, Ruofan Wang, Zijian Huang, Diana Benavides-Prado, Michael Witbrock

    Abstract: Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  9. arXiv:2507.04769  [pdf, ps, other

    cs.CV cs.AI

    From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

    Authors: Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Ying Deng, Zhiqiang Yuan, Jiapei Zhang, Jinchao Zhang, Jie Zhou

    Abstract: Current legal frameworks consider AI-generated works eligible for copyright protection when they meet originality requirements and involve substantial human intellectual input. However, systematic legal standards and reliable evaluation methods for AI art copyrights are lacking. Through comprehensive analysis of legal precedents, we establish three essential criteria for determining distinctive ar… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  10. arXiv:2507.04701  [pdf, ps, other

    cs.CL

    XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

    Authors: Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, Jingren Zhou

    Abstract: To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple highquality and diverse SQL queries; 3)… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  11. arXiv:2507.04699  [pdf, ps, other

    cs.CV

    A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

    Authors: Zexi Jia, Chuanwei Huang, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jinchao Zhang, Jie Zhou

    Abstract: Vision-language models (VLMs) often struggle with compositional reasoning due to insufficient high-quality image-text data. To tackle this challenge, we propose a novel block-based diffusion approach that automatically generates counterfactual datasets without manual annotation. Our method utilizes large language models to identify entities and their spatial relationships. It then independently ge… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  12. arXiv:2507.04635  [pdf, ps, other

    cs.CV

    MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

    Authors: Zhicheng Zhang, Wuyou Xia, Chenxi Zhao, Zhou Yan, Xiaoqiang Liu, Yongjie Zhu, Wenyu Qin, Pengfei Wan, Di Zhang, Jufeng Yang

    Abstract: Multimodal large language models (MLLMs) recently showed strong capacity in integrating data among multiple modalities, empowered by a generalizable attention architecture. Advanced methods predominantly focus on language-centric tuning while less exploring multimodal tokens mixed through attention, posing challenges in high-level tasks that require fine-grained cognition and emotion understanding… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: ICML 2025 (Spotlight, Top 2.6%)

  13. arXiv:2507.04276  [pdf, ps, other

    cs.AR

    FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification

    Authors: Gwok-Waa Wan, Shengchu Su, Ruihu Wang, Qixiang Chen, Sam-Zaak Wong, Mengnv Xing, Hefei Feng, Yubo Wang, Yinan Zhu, Jingyi Zhang, Jianmin Ye, Xinlai Wan, Tao Ni, Qiang Xu, Nan Guan, Zhe Jiang, Xi Wang, Yang Jun

    Abstract: Despite the transformative potential of Large Language Models (LLMs) in hardware design, a comprehensive evaluation of their capabilities in design verification remains underexplored. Current efforts predominantly focus on RTL generation and basic debugging, overlooking the critical domain of functional verification, which is the primary bottleneck in modern design methodologies due to the rapid e… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  14. arXiv:2507.04256  [pdf, ps, other

    cs.DB

    OneDB: A Distributed Multi-Metric Data Similarity Search System

    Authors: Tang Qian, Yifan Zhu, Lu Chen, Xiangyu Ke, Jingwen Zhao, Tianyi Li, Yunjun Gao, Christian S. Jensen

    Abstract: Increasingly massive volumes of multi-modal data are being accumulated in many {real world} settings, including in health care and e-commerce. This development calls for effective general-purpose data management solutions for multi-modal data. Such a solution must facilitate user-friendly and accurate retrieval of any multi-modal data according to diverse application requirements. Further, such a… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  15. arXiv:2507.04240  [pdf, ps, other

    cs.RO

    Optimal Scheduling of a Dual-Arm Robot for Efficient Strawberry Harvesting in Plant Factories

    Authors: Yuankai Zhu, Wenwu Lu, Guoqiang Ren, Yibin Ying, Stavros Vougioukas, Chen Peng

    Abstract: Plant factory cultivation is widely recognized for its ability to optimize resource use and boost crop yields. To further increase the efficiency in these environments, we propose a mixed-integer linear programming (MILP) framework that systematically schedules and coordinates dual-arm harvesting tasks, minimizing the overall harvesting makespan based on pre-mapped fruit locations. Specifically, w… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  16. arXiv:2507.03867  [pdf, ps, other

    cs.PL

    Semantically Separating Nominal Wyvern for Usability and Decidability

    Authors: Yu Xiang Zhu, Amos Robinson, Sophia Roshal, Timothy Mou, Julian Mackay, Jonathan Aldrich, Alex Potanin

    Abstract: The Dependent Object Types (DOT) calculus incorporates concepts from functional languages (e.g. modules) with traditional object-oriented features (e.g. objects, subtyping) to achieve greater expressivity (e.g. F-bounded polymorphism). However, this merger of paradigms comes at the cost of subtype decidability. Recent work on bringing decidability to DOT has either sacrificed expressiveness or eas… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  17. arXiv:2507.02825  [pdf, ps, other

    cs.AI

    Establishing Best Practices for Building Rigorous Agentic Benchmarks

    Authors: Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha Cui, Sayash Kapoor, Shayne Longpre, Kevin Meng, Rebecca Weiss, Fazl Barez, Rahul Gupta, Jwala Dhamala, Jacob Merizian, Mario Giulianelli, Harry Coppock, Cozmin Ududec, Jasjeet Sekhon, Jacob Steinhardt, Antony Kellerman, Sarah Schwettmann, Matei Zaharia, Ion Stoica, Percy Liang, Daniel Kang

    Abstract: Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These benchmarks typically measure agent capabilities by evaluating task outcomes via specific reward designs. However, we show that many agentic benchmarks have issues in tas… ▽ More

    Submitted 10 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

    Comments: 39 pages, 15 tables, 6 figures

    ACM Class: A.1; I.2.m

  18. arXiv:2507.02691  [pdf, ps, other

    cs.CV

    CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation

    Authors: Xiangyang Luo, Ye Zhu, Yunfei Liu, Lijian Lin, Cong Wan, Zijian Cai, Shao-Lun Huang, Yu Li

    Abstract: Video face swapping aims to address two primary challenges: effectively transferring the source identity to the target video and accurately preserving the dynamic attributes of the target face, such as head poses, facial expressions, lip-sync, \etc. Existing methods mainly focus on achieving high-quality identity transfer but often fall short in maintaining the dynamic attributes of the target fac… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: ICCV Accepted

  19. arXiv:2507.02652  [pdf, ps, other

    cs.AI cs.CL cs.IR

    Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

    Authors: Jiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu, Yang Zhao, Hongjin Qian, Zhicheng Dou

    Abstract: Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to ineffici… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 9 pages

  20. arXiv:2507.01327  [pdf, ps, other

    cs.LG cs.AI

    Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

    Authors: Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Jiansong Chen, Ke Zeng, Xunliang Cai

    Abstract: Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value. In this work, we propose a novel Adaptive Perplexity-… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 15 pages, 6 figures, submitted to EMNLP

  21. arXiv:2507.00377  [pdf, ps, other

    cs.CV

    MedDiff-FT: Data-Efficient Diffusion Model Fine-tuning with Structural Guidance for Controllable Medical Image Synthesis

    Authors: Jianhao Xie, Ziang Zhang, Zhenyu Weng, Yuesheng Zhu, Guibo Luo

    Abstract: Recent advancements in deep learning for medical image segmentation are often limited by the scarcity of high-quality training data.While diffusion models provide a potential solution by generating synthetic images, their effectiveness in medical imaging remains constrained due to their reliance on large-scale medical datasets and the need for higher image quality. To address these challenges, we… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 11 pages,3 figures

  22. arXiv:2506.23581  [pdf, ps, other

    cs.CV cs.AI cs.LG

    PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

    Authors: Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu

    Abstract: Object detection plays a crucial role in many security-sensitive applications. However, several recent studies have shown that object detectors can be easily fooled by physically realizable attacks, \eg, adversarial patches and recent adversarial textures, which pose realistic and urgent threats. Adversarial Training (AT) has been recognized as the most effective defense against adversarial attack… ▽ More

    Submitted 9 July, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  23. arXiv:2506.22523  [pdf

    cs.CY cs.AI

    Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center

    Authors: James Wen, Sahil Nalawade, Zhiwei Liang, Catherine Bielick, Marisa Ferrara Boston, Alexander Chowdhury, Adele Collin, Luigi De Angelis, Jacob Ellen, Heather Frase, Rodrigo R. Gameiro, Juan Manuel Gutierrez, Pooja Kadam, Murat Keceli, Srikanth Krishnamurthy, Anne Kwok, Yanan Lance Lu, Heather Mattie, Liam G. McCoy, Katherine Miller, Allison C. Morgan, Marlene Louisa Moerig, Trang Nguyen, Alexander Owen-Post, Alex D. Ruiz , et al. (16 additional authors not shown)

    Abstract: Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research missio… ▽ More

    Submitted 2 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  24. arXiv:2506.21722  [pdf, ps, other

    cs.CV cs.AI

    Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration

    Authors: Xin Lu, Xueyang Fu, Jie Xiao, Zihao Fan, Yurui Zhu, Zheng-Jun Zha

    Abstract: While diffusion models demonstrate strong generative capabilities in image restoration (IR) tasks, their complex architectures and iterative processes limit their practical application compared to mainstream reconstruction-based general ordinary IR networks. Existing approaches primarily focus on optimizing network architecture and diffusion paths but overlook the integration of the diffusion trai… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  25. arXiv:2506.20353  [pdf, ps, other

    cs.LG cs.AI

    DipSVD: Dual-importance Protected SVD for Efficient LLM Compression

    Authors: Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuanlong Xie, Yao Zhu

    Abstract: The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlook… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  26. arXiv:2506.19794  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.MA

    Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

    Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

    Abstract: Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understand… ▽ More

    Submitted 7 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

    Comments: Work in progress

  27. arXiv:2506.19767  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

    Authors: Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao

    Abstract: Large language models (LLMs) have achieved remarkable progress in reasoning tasks, yet the optimal integration of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) remains a fundamental challenge. Through comprehensive analysis of token distributions, learning dynamics, and integration mechanisms from entropy-based perspectives, we reveal key differences between these paradigms: SFT ind… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  28. arXiv:2506.19733  [pdf, ps, other

    cs.CL

    Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?

    Authors: Chuxuan Hu, Yuxuan Zhu, Antony Kellermann, Caleb Biddulph, Suppakit Waiwitlikhit, Jason Benn, Daniel Kang

    Abstract: Reinforcement post training (RPT) has recently shown promise in improving the reasoning abilities of large language models (LLMs). However, it remains unclear how well these improvements generalize to new domains, as prior work evaluates RPT models on data from the same domains used for fine-tuning. To understand the generalizability of RPT, we conduct two studies. (1) Observational: We compare a… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 9 pages, 4 figures, 2 tables

  29. arXiv:2506.19262  [pdf, ps, other

    cs.CL cs.LG

    What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning

    Authors: Yuchang Zhu, Huazhen Zhong, Qunshu Lin, Haotong Wei, Xiaolong Sun, Zixuan Yu, Minghao Liu, Zibin Zheng, Liang Chen

    Abstract: With the remarkable generative capabilities of large language models (LLMs), using LLM-generated data to train downstream models has emerged as a promising approach to mitigate data scarcity in specific domains and reduce time-consuming annotations. However, recent studies have highlighted a critical issue: iterative training on self-generated data results in model collapse, where model performanc… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Ongoing work

  30. arXiv:2506.18960  [pdf, ps, other

    cs.RO

    FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation

    Authors: Siqi Shang, Mingyo Seo, Yuke Zhu, Lillian Chin

    Abstract: Handling delicate and fragile objects remains a major challenge for robotic manipulation, especially for rigid parallel grippers. While the simplicity and versatility of parallel grippers have led to widespread adoption, these grippers are limited by their heavy reliance on visual feedback. Tactile sensing and soft robotics can add responsiveness and compliance. However, existing methods typically… ▽ More

    Submitted 25 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  31. arXiv:2506.18786  [pdf, ps, other

    cs.HC

    Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency

    Authors: Yitong Zhu, Guanxuan Jiang, Zhuowen Liang, Yuyang Wang

    Abstract: Cybersickness remains a critical barrier to the widespread adoption of Virtual Reality (VR), particularly in scenarios involving intense or artificial motion cues. Among the key contributors is excessive optical flow-perceived visual motion that, when unmatched by vestibular input, leads to sensory conflict and discomfort. While previous efforts have explored geometric or hardware based mitigation… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  32. arXiv:2506.18696  [pdf, ps, other

    cs.LG

    SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding

    Authors: Yuchang Zhu, Jintang Li, Huizhe Zhang, Liang Chen, Zibin Zheng

    Abstract: Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar ind… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Under review

  33. arXiv:2506.18586  [pdf

    cs.AI cs.CE cs.CL

    Airalogy: AI-empowered universal data digitization for research automation

    Authors: Zijie Yang, Qiji Zhou, Fang Guo, Sijie Zhang, Yexun Xi, Jinglei Nie, Yudian Zhu, Liping Huang, Chou Wu, Yonghe Xia, Xiaoyu Ma, Yingming Pu, Panzhong Lu, Junshu Pan, Mingtao Chen, Tiannan Guo, Yanmei Dou, Hongyu Chen, Anping Zeng, Jiaxing Huang, Tian Xu, Yue Zhang

    Abstract: Research data are the foundation of Artificial Intelligence (AI)-driven science, yet current AI applications remain limited to a few fields with readily available, well-structured, digitized datasets. Achieving comprehensive AI empowerment across multiple disciplines is still out of reach. Present-day research data collection is often fragmented, lacking unified standards, inefficiently managed, a… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 146 pages, 6 figures, 49 supplementary figures

  34. arXiv:2506.18292  [pdf

    cs.CV

    Rapeseed population point cloud completion network (RP-PCN) with dynamic graph convolution for 3D reconstruction of crop canopy occlusion architecture

    Authors: Ziyue Guo, Xin Yang, Yutao Shen, Yang Zhu, Lixi Jiang, Haiyan Cen

    Abstract: Quantitative descriptions of complete canopy architecture are crucial for evaluating crop photosynthesis and yield to guide ideotype design. Although three-dimensional (3D) sensing technologies have been developed for plant and canopy reconstruction, severe occlusion and complex architectures hinder accurate canopy descriptions. In this study, we propose a point cloud completion model for 3D recon… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  35. arXiv:2506.17844  [pdf, ps, other

    cs.CL cs.AI

    THCM-CAL: Temporal-Hierarchical Causal Modelling with Conformal Calibration for Clinical Risk Prediction

    Authors: Xin Zhang, Qiyu Wei, Yingjie Zhu, Fanyi Wu, Sophia Ananiadou

    Abstract: Automated clinical risk prediction from electronic health records (EHRs) demands modeling both structured diagnostic codes and unstructured narrative notes. However, most prior approaches either handle these modalities separately or rely on simplistic fusion strategies that ignore the directional, hierarchical causal interactions by which narrative observations precipitate diagnoses and propagate… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 13 pages, 4 figures

  36. arXiv:2506.16806  [pdf, ps, other

    cs.CV

    FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation

    Authors: Fan Yang, Yousong Zhu, Xin Li, Yufei Zhan, Hongyin Zhao, Shurong Zheng, Yaowei Wang, Ming Tang, Jinqiao Wang

    Abstract: Recent Large Vision Language Models (LVLMs) demonstrate promising capabilities in unifying visual understanding and generative modeling, enabling both accurate content understanding and flexible editing. However, current approaches treat "what to see" and "how to edit" separately: they either perform isolated object segmentation or utilize segmentation masks merely as conditional prompts for local… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  37. arXiv:2506.16764  [pdf, ps, other

    cs.AI

    Reinforcement learning for hybrid charging stations planning and operation considering fixed and mobile chargers

    Authors: Yanchen Zhu, Honghui Zou, Chufan Liu, Yuyu Luo, Yuankai Wu, Yuxuan Liang

    Abstract: The success of vehicle electrification, which brings significant societal and environmental benefits, is contingent upon the availability of efficient and adaptable charging infrastructure. Traditional fixed-location charging stations often face issues like underutilization or congestion due to the dynamic nature of charging demand. Mobile chargers have emerged as a flexible solution, capable of r… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 11pages

  38. arXiv:2506.16258  [pdf, ps, other

    cs.MM

    ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing

    Authors: Yisu Wang, Yixiang Zhu, Xinjiao Li, Yulong Zhang, Ruilong Wu, Dirk Kutscher

    Abstract: Large-scale video feature indexing in datacenters is critically dependent on efficient data transfer. Although in-network computation has emerged as a compelling strategy for accelerating feature extraction and reducing overhead in distributed multimedia systems, harnessing advanced networking resources at both the switch and host levels remains a formidable challenge. These difficulties are compo… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  39. arXiv:2506.15961  [pdf

    cs.DC cs.AI cs.LG

    TrainVerify: Equivalence-Based Verification for Distributed LLM Training

    Authors: Yunchi Lu, Youshan Miao, Cheng Tan, Peng Huang, Yi Zhu, Xian Zhang, Fan Yang

    Abstract: Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are rarely verified, leaving them prone to silent errors and potentially wasting millions of GPU hours. We introduce TrainVerify, a system for verifiable distributed training of LLMs. Given a deep learning model's lo… ▽ More

    Submitted 24 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

  40. arXiv:2506.15412  [pdf, ps, other

    cs.IT

    Golden Partition Zone: Rethinking Neural Network Partitioning Under Inversion Threats in Collaborative Inference

    Authors: Rongke Liu, Youwen Zhu

    Abstract: In collaborative inference, intermediate features transmitted from edge devices can be exploited by adversaries to reconstruct original inputs via model inversion attacks (MIA). While existing defenses focus on shallow layer protection, they often incur significant utility loss. A key open question is how to partition the edge cloud model to maximize resistance to MIA while minimizing accuracy deg… ▽ More

    Submitted 19 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: 8 pages, 11 figures, 5 tables

  41. arXiv:2506.14851  [pdf, ps, other

    cs.DC cs.AI cs.LG

    Efficient Serving of LLM Applications with Probabilistic Demand Modeling

    Authors: Yifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo

    Abstract: Applications based on Large Language Models (LLMs) contains a series of tasks to address real-world problems with boosted capability, which have dynamic demand volumes on diverse backends. Existing serving systems treat the resource demands of LLM applications as a blackbox, compromising end-to-end efficiency due to improper queuing order and backend warm up latency. We find that the resource dema… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  42. arXiv:2506.14727  [pdf, ps, other

    cs.RO cs.AI

    Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models

    Authors: Huihan Liu, Rutav Shah, Shuijing Liu, Jack Pittenger, Mingyo Seo, Yuchen Cui, Yonatan Bisk, Roberto Martín-Martín, Yuke Zhu

    Abstract: Assistive teleoperation, where control is shared between a human and a robot, enables efficient and intuitive human-robot collaboration in diverse and unstructured environments. A central challenge in real-world assistive teleoperation is for the robot to infer a wide range of human intentions from user control inputs and to assist users with correct actions. Existing methods are either confined t… ▽ More

    Submitted 4 July, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  43. arXiv:2506.14391  [pdf, ps, other

    cs.LG cs.AI

    HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control

    Authors: Yaqiao Zhu, Hongkai Wen, Geyong Min, Man Luo

    Abstract: Efficient traffic signal control (TSC) is essential for mitigating urban congestion, yet existing reinforcement learning (RL) methods face challenges in scaling to large networks while maintaining global coordination. Centralized RL suffers from scalability issues, while decentralized approaches often lack unified objectives, resulting in limited network-level efficiency. In this paper, we propose… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  44. arXiv:2506.14302  [pdf, ps, other

    cs.CL

    Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent

    Authors: Xueyang Feng, Jingsen Zhang, Jiakai Tang, Wei Li, Guohao Cai, Xu Chen, Quanyu Dai, Yue Zhu, Zhenhua Dong

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly propelled the development of Conversational Recommendation Agents (CRAs). However, these agents often generate short-sighted responses that fail to sustain user guidance and meet expectations. Although preference optimization has proven effective in aligning LLMs with user expectations, it remains costly and performs poorly in… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to Findings of ACL 2025

  45. arXiv:2506.13322  [pdf, ps, other

    cs.CV cs.AI

    Active Multimodal Distillation for Few-shot Action Recognition

    Authors: Weijia Feng, Yichen Zhu, Ruojia Zhang, Chenyang Wang, Fei Ma, Xiaobao Wang, Xiaobai Li

    Abstract: Owing to its rapid progress and broad application prospects, few-shot action recognition has attracted considerable interest. However, current methods are predominantly based on limited single-modal data, which does not fully exploit the potential of multimodal information. This paper presents a novel framework that actively identifies reliable modalities for each sample using task-specific contex… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: IJCAI 2025, the 34th International Joint Conference on Artificial Intelligence

  46. arXiv:2506.13055  [pdf, ps, other

    cs.CL

    CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

    Authors: Jiangtong Li, Yiyun Zhu, Dawei Cheng, Zhijun Ding, Changjun Jiang

    Abstract: Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 22 pages, 9 figures

  47. arXiv:2506.12821  [pdf

    cs.LG q-bio.BM

    PDCNet: a benchmark and general deep learning framework for activity prediction of peptide-drug conjugates

    Authors: Yun Liu, Jintu Huang, Yingying Zhu, Congrui Wen, Yu Pang, Ji-Quan Zhang, Ling Wang

    Abstract: Peptide-drug conjugates (PDCs) represent a promising therapeutic avenue for human diseases, particularly in cancer treatment. Systematic elucidation of structure-activity relationships (SARs) and accurate prediction of the activity of PDCs are critical for the rational design and optimization of these conjugates. To this end, we carefully design and construct a benchmark PDCs dataset compiled from… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  48. arXiv:2506.12721  [pdf, ps, other

    cs.AI cs.CL cs.LG stat.ML

    Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

    Authors: Bowen Zuo, Yinglun Zhu

    Abstract: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate qu… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  49. arXiv:2506.12492  [pdf, ps, other

    cs.CV cs.AI

    Comparative Analysis of Deep Learning Strategies for Hypertensive Retinopathy Detection from Fundus Images: From Scratch and Pre-trained Models

    Authors: Yanqiao Zhu

    Abstract: This paper presents a comparative analysis of deep learning strategies for detecting hypertensive retinopathy from fundus images, a central task in the HRDC challenge~\cite{qian2025hrdc}. We investigate three distinct approaches: a custom CNN, a suite of pre-trained transformer-based models, and an AutoML solution. Our findings reveal a stark, architecture-dependent response to data augmentation.… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  50. arXiv:2506.12409  [pdf, ps, other

    cs.CV

    Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

    Authors: Ziwei Liu, Borui Kang, Wei Li, Hangjie Yuan, Yanbing Yang, Wenbin Li, Jun Luo, Yifan Zhu, Tao Feng

    Abstract: Continual learning in vision-language models (VLMs) faces critical challenges in balancing parameter efficiency, memory consumption, and optimization stability. While First-Order (FO) optimization (e.g., SGD) dominate current approaches, their deterministic gradients often trap models in suboptimal local minima and incur substantial memory overhead. This paper pioneers a systematic exploration of… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.