Skip to main content

Showing 1–50 of 5,545 results for author: Zhang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01915  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

    Authors: Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, Qing He

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maxim… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 19 pages, 3 figures. Accepted by ACL 2025 (main)

  2. arXiv:2507.01827  [pdf, ps, other

    cs.SE

    APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search

    Authors: Haichuan Hu, Congqing He, Hao Zhang, Xiaochen Xie, Quanjun Zhang

    Abstract: Automated Program Repair (APR) attempts to fix software bugs without human intervention, which plays a crucial role in software development and maintenance. Recently, with the advances in Large Language Models (LLMs), a rapidly increasing number of APR techniques have been proposed with remarkable performance. However, existing LLM-based APR techniques typically adopt trial-and-error strategies, w… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01628  [pdf, ps, other

    cs.SE

    DaiFu: In-Situ Crash Recovery for Deep Learning Systems

    Authors: Zilong He, Pengfei Chen, Hongyu Zhang, Xiaoyun Li, Guangba Yu, Hongyang Chen, Zibin Zheng

    Abstract: Deep learning (DL) systems have been widely adopted in many areas, and are becoming even more popular with the emergence of large language models. However, due to the complex software stacks involved in their development and execution, crashes are unavoidable and common. Crashes severely waste computing resources and hinder development productivity, so efficient crash recovery is crucial. Existing… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.01535  [pdf, ps, other

    cs.CV

    TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking

    Authors: Bingxi Liu, Calvin Chen, Junhao Li, Guyang Yu, Haoqian Song, Xuchen Liu, Jinqiang Cui, Hong Zhang

    Abstract: The Vision Transformer (ViT) model has long struggled with the challenge of quadratic complexity, a limitation that becomes especially critical in unmanned aerial vehicle (UAV) tracking systems, where data must be processed in real time. In this study, we explore the recently proposed State-Space Model, Mamba, leveraging its computational efficiency and capability for long-sequence modeling to eff… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 12 pages

  5. arXiv:2507.01464  [pdf, ps, other

    cs.IT

    Coding for Quasi-Static Fading Channel with Imperfect CSI at the Transmitter and Quantized Feedback

    Authors: Yuhan Yang, Mei Han, Haonan Zhang, Haoheng Yuan, Fan Cheng, Bin Dai

    Abstract: The classical Schalkwijk-Kailath (SK) scheme for the additive Gaussian noise channel with noiseless feedback is highly efficient since its coding complexity is extremely low and the decoding error doubly exponentially decays as the coding blocklength tends to infinity. However, its application to the fading channel with imperfect CSI at the transmitter (I-CSIT) is challenging since the SK scheme i… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 7 pages, 6 figures, conference, this paper will be presented at the 2025 IEEE ITW

  6. arXiv:2507.01453  [pdf, ps, other

    cs.GT cs.CR cs.DC

    Rational Censorship Attack: Breaking Blockchain with a Blackboard

    Authors: Michelle Yeo, Haoqian Zhang

    Abstract: Censorship resilience is a fundamental assumption underlying the security of blockchain protocols. Additionally, the analysis of blockchain security from an economic and game theoretic perspective has been growing in popularity in recent years. In this work, we present a surprising rational censorship attack on blockchain censorship resilience when we adopt the analysis of blockchain security from… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  7. arXiv:2507.01378  [pdf, ps, other

    cs.MA cs.AI cs.RO

    RALLY: Role-Adaptive LLM-Driven Yoked Navigation for Agentic UAV Swarms

    Authors: Ziyao Wang, Rongpeng Li, Sizhao Li, Yuming Xiang, Haiping Wang, Zhifeng Zhao, Honggang Zhang

    Abstract: Intelligent control of Unmanned Aerial Vehicles (UAVs) swarms has emerged as a critical research focus, and it typically requires the swarm to navigate effectively while avoiding obstacles and achieving continuous coverage over multiple mission targets. Although traditional Multi-Agent Reinforcement Learning (MARL) approaches offer dynamic adaptability, they are hindered by the semantic gap in num… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  8. arXiv:2507.01059  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.CV cs.RO

    Automated Vehicles Should be Connected with Natural Language

    Authors: Xiangbo Gao, Keshu Wu, Hao Zhang, Kexin Tian, Yang Zhou, Zhengzhong Tu

    Abstract: Multi-agent collaborative driving promises improvements in traffic safety and efficiency through collective perception and decision making. However, existing communication media -- including raw sensor data, neural network features, and perception results -- suffer limitations in bandwidth efficiency, information completeness, and agent interoperability. Moreover, traditional approaches have large… ▽ More

    Submitted 29 June, 2025; originally announced July 2025.

  9. arXiv:2507.00942  [pdf, ps, other

    cs.IT

    Optimal Feedback Schemes for Dirty Paper Channels With State Estimation at the Receiver

    Authors: Dengfeng Xia, Han Deng, Haonan Zhang, Fan Cheng, Bin Dai, Liuguo Yin

    Abstract: In the literature, it has been shown that feedback does not increase the optimal rate-distortion region of the dirty paper channel with state estimation at the receiver (SE-R). On the other hand, it is well-known that feedback helps to construct low-complexity coding schemes in Gaussian channels, such as the elegant Schalkwijk-Kailath (SK) feedback scheme. This motivates us to explore capacity-ach… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: This paper will be presented at the 2025 IEEE Information Theory Workshop (ITW)

  10. arXiv:2507.00880  [pdf, ps, other

    cs.LG cs.AI

    NN-Former: Rethinking Graph Structure in Neural Architecture Representation

    Authors: Ruihan Xu, Haokui Zhang, Yaowei Wang, Wei Zeng, Shiliang Zhang

    Abstract: The growing use of deep learning necessitates efficient network design and deployment, making neural predictors vital for estimating attributes such as accuracy and latency. Recently, Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures. However, each of both methods has its disadvantages. GNNs lack the capabilities to represent compli… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted to CVPR 2025. Code is avaiable at https://github.com/XuRuihan/NNFormer

  11. arXiv:2507.00839  [pdf, ps, other

    cs.DB

    RapidStore: An Efficient Dynamic Graph Storage System for Concurrent Queries

    Authors: Chiyu Hao, Jixian Su, Shixuan Sun, Hao Zhang, Sen Gao, Jianwen Zhao, Chenyi Zhang, Jieru Zhao, Chen Chen, Minyi Guo

    Abstract: Dynamic graph storage systems are essential for real-time applications such as social networks and recommendation, where graph data continuously evolves. However, they face significant challenges in efficiently handling concurrent read and write operations. We find that existing methods suffer from write queries interfering with read efficiency, substantial time and space overhead due to per-edge… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 17 pages, 18 figures

  12. arXiv:2507.00537  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Not All Attention Heads Are What You Need: Refining CLIP's Image Representation with Attention Ablation

    Authors: Feng Lin, Marco Chen, Haokui Zhang, Xiaotian Yu, Guangming Lu, Rong Xiao

    Abstract: This paper studies the role of attention heads in CLIP's image encoder. While CLIP has exhibited robust performance across diverse applications, we hypothesize that certain attention heads negatively affect final representations and that ablating them can improve performance in downstream tasks. To capitalize on this insight, we propose a simple yet effective method, called Attention Ablation Tech… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 21 pages, 7 figures

  13. arXiv:2507.00506  [pdf, ps, other

    cs.CV

    SCING:Towards More Efficient and Robust Person Re-Identification through Selective Cross-modal Prompt Tuning

    Authors: Yunfei Xie, Yuxuan Cheng, Juncheng Wu, Haoyu Zhang, Yuyin Zhou, Shoudong Han

    Abstract: Recent advancements in adapting vision-language pre-training models like CLIP for person re-identification (ReID) tasks often rely on complex adapter design or modality-specific tuning while neglecting cross-modal interaction, leading to high computational costs or suboptimal alignment. To address these limitations, we propose a simple yet effective framework named Selective Cross-modal Prompt Tun… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  14. arXiv:2506.24019  [pdf, ps, other

    cs.CV cs.CL

    Ella: Embodied Social Agents with Lifelong Memory

    Authors: Hongxin Zhang, Zheyuan Zhang, Zeyuan Wang, Zunzhe Zhang, Lixing Fang, Qinhong Zhou, Chuang Gan

    Abstract: We introduce Ella, an embodied social agent capable of lifelong learning within a community in a 3D open world, where agents accumulate experiences and acquire knowledge through everyday visual observations and social interactions. At the core of Ella's capabilities is a structured, long-term multimodal memory system that stores, updates, and retrieves information effectively. It consists of a nam… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  15. arXiv:2506.23825  [pdf, ps, other

    cs.CV

    Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

    Authors: Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Xiaojie Jin

    Abstract: Benefiting from the advances in large language models and cross-modal alignment, existing multimodal large language models have achieved prominent performance in image and short video understanding. However, the understanding of long videos is still challenging, as their long-context nature results in significant computational and memory overhead. Most existing work treats long videos in the same… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  16. arXiv:2506.23361  [pdf, ps, other

    cs.CV

    OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

    Authors: Yuanhao Cai, He Zhang, Xi Chen, Jinbo Xing, Yiwei Hu, Yuqian Zhou, Kai Zhang, Zhifei Zhang, Soo Ye Kim, Tianyu Wang, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille

    Abstract: Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging problem that how to use the signals such as depth, mask, camera, and text prompts to control and edit the subject in the customized video is still less explored. In this paper, we first propose a data cons… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: A data construction pipeline and a diffusion Transformer framework for controllable subject-driven video customization

  17. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  18. arXiv:2506.23282  [pdf, ps, other

    cs.CV

    Autoregressive Denoising Score Matching is a Good Video Anomaly Detector

    Authors: Hanwen Zhang, Congqi Cao, Qinyi Lv, Lingtong Min, Yanning Zhang

    Abstract: Video anomaly detection (VAD) is an important computer vision problem. Thanks to the mode coverage capabilities of generative models, the likelihood-based paradigm is catching growing interest, as it can model normal distribution and detect out-of-distribution anomalies. However, these likelihood-based methods are blind to the anomalies located in local modes near the learned distribution. To hand… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  19. arXiv:2506.23202  [pdf, ps, other

    cs.CV

    Transformer-Based Person Search with High-Frequency Augmentation and Multi-Wave Mixing

    Authors: Qilin Shu, Qixian Zhang, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao

    Abstract: The person search task aims to locate a target person within a set of scene images. In recent years, transformer-based models in this field have made some progress. However, they still face three primary challenges: 1) the self-attention mechanism tends to suppress high-frequency components in the features, which severely impacts model performance; 2) the computational cost of transformers is rela… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  20. arXiv:2506.23086  [pdf, ps, other

    cs.CV

    Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation

    Authors: Jian Shi, Tianqi You, Pingping Zhang, Hongli Zhang, Rui Xu, Haojie Li

    Abstract: Automated and accurate segmentation of individual vertebra in 3D CT and MRI images is essential for various clinical applications. Due to the limitations of current imaging techniques and the complexity of spinal structures, existing methods still struggle with reducing the impact of image blurring and distinguishing similar vertebrae. To alleviate these issues, we introduce a Frequency-enhanced M… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by MICCAI2025. More modifications my be performed

  21. arXiv:2506.23068  [pdf, ps, other

    cs.LG cs.AI stat.AP

    Curious Causality-Seeking Agents Learn Meta Causal World

    Authors: Zhiyu Zhao, Haoxuan Li, Haifeng Zhang, Jun Wang, Francesco Faccio, Jürgen Schmidhuber, Mengyue Yang

    Abstract: When building a world model, a common assumption is that the environment has a single, unchanging underlying causal rule, like applying Newton's laws to every situation. In reality, what appears as a drifting causal mechanism is often the manifestation of a fixed underlying mechanism seen through a narrow observational window. This brings about a problem that, when building a world model, even sub… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: 33 pages

  22. arXiv:2506.22694  [pdf, ps, other

    cs.CL

    VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

    Authors: Raghavv Goel, Sudhanshu Agrawal, Mukul Gagrani, Junyoung Park, Yifan Zao, He Zhang, Tian Liu, Yiping Yang, Xin Yuan, Jiuyan Lu, Chris Lott, Mingu Lee

    Abstract: In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, f… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 7 pages, 4 figures, 5 tables, accepted at ICML 2025 workshop on Efficient Systems for Foundational Models

  23. arXiv:2506.22023  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy

    Authors: Bohan Li, Zhihan Li, Haoran Wang, Hanglei Zhang, Yiwei Guo, Hankun Wang, Xie Chen, Kai Yu

    Abstract: Recently, autoregressive (AR) language models have emerged as a dominant approach in speech synthesis, offering expressive generation and scalable training. However, conventional AR speech synthesis models relying on the next-token prediction paradigm often encounter significant challenges when handling long speech sequences. These models often struggle to construct stable frame-to-frame attention… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 17 pages, 8 figures, 5 tables

  24. arXiv:2506.21697  [pdf, ps, other

    eess.SY cs.RO

    Stochastic Neural Control Barrier Functions

    Authors: Hongchao Zhang, Manan Tayal, Jackson Cox, Pushpak Jagtap, Shishir Kolathaya, Andrew Clark

    Abstract: Control Barrier Functions (CBFs) are utilized to ensure the safety of control systems. CBFs act as safety filters in order to provide safety guarantees without compromising system performance. These safety guarantees rely on the construction of valid CBFs. Due to their complexity, CBFs can be represented by neural networks, known as neural CBFs (NCBFs). Existing works on the verification of the NC… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  25. arXiv:2506.21616  [pdf, ps, other

    cs.CL cs.CY

    TIM: A Large-Scale Dataset and large Timeline Intelligence Model for Open-domain Timeline Summarization

    Authors: Chuanrui Hu, Wei Hu, Penghang Yu, Hua Zhang, Bing-Kun Bao

    Abstract: Open-domain Timeline Summarization (TLS) is crucial for monitoring the evolution of news topics. To identify changes in news topics, existing methods typically employ general Large Language Models (LLMs) to summarize relevant timestamps from retrieved news. While general LLMs demonstrate capabilities in zero-shot news summarization and timestamp localization, they struggle with assessing topic rel… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  26. arXiv:2506.21615  [pdf

    cs.CL cs.AI cs.IR

    Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines

    Authors: Wenhao Li, Hongkuan Zhang, Hongwei Zhang, Zhengxu Li, Zengjie Dong, Yafan Chen, Niranjan Bidargaddi, Hong Liu

    Abstract: Current medical language models, adapted from large language models (LLMs), typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. However, ICD codes do not capture the nuanced, context-rich reasoning clinicians use for diagnosis. Clinicians synthesize diverse patient data and reference clinical practice guidelines (CPGs) to mak… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  27. arXiv:2506.21602  [pdf, ps, other

    cs.CL cs.AI

    BiMark: Unbiased Multilayer Watermarking for Large Language Models

    Authors: Xiaoyan Feng, He Zhang, Yanjun Zhang, Leo Yu Zhang, Shirui Pan

    Abstract: Recent advances in Large Language Models (LLMs) have raised urgent concerns about LLM-generated text authenticity, prompting regulatory demands for reliable identification mechanisms. Although watermarking offers a promising solution, existing approaches struggle to simultaneously achieve three critical requirements: text quality preservation, model-agnostic detection, and message embedding capaci… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: This paper is accepted by International Conference on Machine Learning (ICML) 2025

  28. arXiv:2506.21563  [pdf, ps, other

    cs.CL

    FormosanBench: Benchmarking Low-Resource Austronesian Languages in the Era of Large Language Models

    Authors: Kaiying Kevin Lin, Hsiyu Chen, Haopeng Zhang

    Abstract: While large language models (LLMs) have demonstrated impressive performance across a wide range of natural language processing (NLP) tasks in high-resource languages, their capabilities in low-resource and minority languages remain significantly underexplored. Formosan languages -- a subgroup of Austronesian languages spoken in Taiwan -- are both linguistically rich and endangered, largely due to… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  29. arXiv:2506.21513  [pdf, ps, other

    cs.CV

    GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

    Authors: Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian

    Abstract: Creating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue l… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project page: https://vincenthu19.github.io/GGTalker/

  30. arXiv:2506.21198  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang

    Abstract: Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Fre… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK

  31. arXiv:2506.21044  [pdf, ps, other

    cs.LG cs.AI

    Efficient Skill Discovery via Regret-Aware Optimization

    Authors: He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong

    Abstract: Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For existing methods, they focus on improving diversity through pure exploration, mutual information optimization, and learning temporal representation. Despite that they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional s… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  32. arXiv:2506.21033  [pdf, ps, other

    cs.DC

    BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services

    Authors: Zhaojiacheng Zhou, Hongze Liu, Shijing Yuan, Hanning Zhang, Jiong Lou, Chentao Wu, Jie Li

    Abstract: The hallucination problem of Large Language Models (LLMs) has increasingly drawn attention. Augmenting LLMs with external knowledge is a promising solution to address this issue. However, due to privacy and security concerns, a vast amount of downstream task-related knowledge remains dispersed and isolated across various "silos," making it difficult to access. To bridge this knowledge gap, we prop… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  33. arXiv:2506.20979  [pdf, ps, other

    cs.CV

    3D Scene-Camera Representation with Joint Camera Photometric Optimization

    Authors: Weichen Dai, Kangcheng Ma, Jiaxin Wang, Kecen Pan, Yuhang Ming, Hua Zhang, Wanzeng Kong

    Abstract: Representing scenes from multi-view images is a crucial task in computer vision with extensive applications. However, inherent photometric distortions in the camera imaging can significantly degrade image quality. Without accounting for these distortions, the 3D scene representation may inadvertently incorporate erroneous information unrelated to the scene, diminishing the quality of the represent… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  34. arXiv:2506.20971  [pdf, ps, other

    cs.SI

    Where is AIED Headed? Key Topics and Emerging Frontiers (2020-2024)

    Authors: Shihui Feng, Huilin Zhang, Dragan Gašević

    Abstract: In this study, we analyze 2,398 research articles published between 2020 and 2024 across eight core venues related to the field of Artificial Intelligence in Education (AIED). Using a three-step knowledge co-occurrence network analysis, we analyze the knowledge structure of the field, the evolving knowledge clusters, and the emerging frontiers. Our findings reveal that AIED research remains strong… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  35. arXiv:2506.20936  [pdf, ps, other

    cs.CV

    PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

    Authors: Hao Zhang, Haolan Xu, Chun Feng, Varun Jampani, Narendra Ahuja

    Abstract: Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning (LBS), due to its simplicity and differentiability. However, LBS introduces artifacts such as volume loss and unnatural deformations, and it fails to model elastic materials like soft tissues, fur, and f… ▽ More

    Submitted 27 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025 Page: https://physrig.github.io/

  36. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (63 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  37. arXiv:2506.20608  [pdf, ps, other

    cs.AI math.NA

    AI Assistants to Enhance and Exploit the PETSc Knowledge Base

    Authors: Barry Smith, Junchao Zhang, Hong Zhang, Lois Curfman McInnes, Murat Keceli, Archit Vasan, Satish Balay, Toby Isaac, Le Chen, Venkatram Vishwanath

    Abstract: Generative AI, especially through large language models (LLMs), is transforming how technical knowledge can be accessed, reused, and extended. PETSc, a widely used numerical library for high-performance scientific computing, has accumulated a rich but fragmented knowledge base over its three decades of development, spanning source code, documentation, mailing lists, GitLab issues, Discord conversa… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  38. Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search

    Authors: Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang, Lin Liu, Sulong Xu, Han Zhang

    Abstract: Semantic retrieval, which retrieves semantically matched items given a textual query, has been an essential component to enhance system effectiveness in e-commerce search. In this paper, we study the multimodal retrieval problem, where the visual information (e.g, image) of item is leveraged as supplementary of textual information to enrich item representation and further improve retrieval perform… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: published in sigir2023

  39. arXiv:2506.20291  [pdf, ps, other

    cs.HC cs.IR

    A Literature Review on Simulation in Conversational Recommender Systems

    Authors: Haoran Zhang, Xin Zhao, Jinze Chen, Junpeng Guo

    Abstract: Conversational Recommender Systems (CRSs) have garnered attention as a novel approach to delivering personalized recommendations through multi-turn dialogues. This review developed a taxonomy framework to systematically categorize relevant publications into four groups: dataset construction, algorithm design, system evaluation, and empirical studies, providing a comprehensive analysis of simulatio… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 6 pages, 1 figures, accepted as a poster for CSWIM 2025

  40. arXiv:2506.20151  [pdf, ps, other

    cs.CV cs.AI

    EAR: Erasing Concepts from Unified Autoregressive Models

    Authors: Haipeng Fan, Shiyuan Zhang, Baohunesitu, Zihang Guo, Huaiwen Zhang

    Abstract: Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR m… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, 1 tables

  41. arXiv:2506.19993  [pdf, ps, other

    cs.IR cs.LG

    CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems

    Authors: Haochen Zhang, Tianyi Zhang, Junze Yin, Oren Gal, Anshumali Shrivastava, Vladimir Braverman

    Abstract: Recommender systems play a pivotal role in providing relevant content to users. With the rapid development of large language models (LLMs), researchers have begun utilizing LLMs to build more powerful recommender systems. However, existing approaches that focus on aligning LLMs with recommendation tasks do not fully leverage their sequential information processing capabilities, leading to suboptim… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 Findings

  42. arXiv:2506.19975  [pdf, ps, other

    eess.IV cs.AI cs.CV eess.SP

    VoxelOpt: Voxel-Adaptive Message Passing for Discrete Optimization in Deformable Abdominal CT Registration

    Authors: Hang Zhang, Yuxi Zhang, Jiazheng Wang, Xiang Chen, Renjiu Hu, Xin Tian, Gaolei Li, Min Liu

    Abstract: Recent developments in neural networks have improved deformable image registration (DIR) by amortizing iterative optimization, enabling fast and accurate DIR results. However, learning-based methods often face challenges with limited training data, large deformations, and tend to underperform compared to iterative approaches when label supervision is unavailable. While iterative methods can achiev… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted for publication at MICCAI 2025

  43. arXiv:2506.19884  [pdf, ps, other

    cs.OS cs.AI cs.PF cs.SE

    MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

    Authors: Zhengxiang Huang, Chaoyue Niu, Zhaode Wang, Jiarui Xue, Hanming Zhang, Yugang Wang, Zewei Xin, Xiaotang Jiang, Chengfei Lv, Fan Wu, Guihai Chen

    Abstract: As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS)… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  44. arXiv:2506.19830  [pdf, ps, other

    cs.LG cs.CL

    Scaling Speculative Decoding with Lookahead Reasoning

    Authors: Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang

    Abstract: Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire $γ$-token guess is correct falls exponentially as $γ$ grows. This means allocating more compute for longer token drafts faces an algorithmic ceiling -- making the speedup modest… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  45. arXiv:2506.19643  [pdf, ps, other

    cs.LG

    Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model

    Authors: Shuncheng He, Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Xiangyang Ji

    Abstract: Offline reinforcement learning (RL) recently gains growing interests from RL researchers. However, the performance of offline RL suffers from the out-of-distribution problem, which can be corrected by feedback in online RL. Previous offline RL research focuses on restricting the offline algorithm in in-distribution even in-sample action sampling. In contrast, fewer work pays attention to the influ… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  46. arXiv:2506.19385  [pdf, ps, other

    cs.AI

    Conversational Intent-Driven GraphRAG: Enhancing Multi-Turn Dialogue Systems through Adaptive Dual-Retrieval of Flow Patterns and Context Semantics

    Authors: Ziqi Zhu, Tao Hu, Honglong Zhang, Dan Yang, HanGeng Chen, Mengran Zhang, Xilun Chen

    Abstract: We present CID-GraphRAG (Conversational Intent-Driven Graph Retrieval Augmented Generation), a novel framework that addresses the limitations of existing dialogue systems in maintaining both contextual coherence and goal-oriented progression in multi-turn customer service conversations. Unlike traditional RAG systems that rely solely on semantic similarity (Conversation RAG) or standard knowledge… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  47. arXiv:2506.18901  [pdf, ps, other

    cs.CV

    From Virtual Games to Real-World Play

    Authors: Wenqiang Sun, Fangyun Wei, Jinjing Zhao, Xi Chen, Zilong Chen, Hongyang Zhang, Jun Zhang, Yan Lu

    Abstract: We introduce RealPlay, a neural network-based real-world game engine that enables interactive video generation from user control signals. Unlike prior works focused on game-style visuals, RealPlay aims to produce photorealistic, temporally consistent video sequences that resemble real-world footage. It operates in an interactive loop: users observe a generated scene, issue a control command, and r… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page: https://wenqsun.github.io/RealPlay/

  48. arXiv:2506.18696  [pdf, ps, other

    cs.LG

    SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding

    Authors: Yuchang Zhu, Jintang Li, Huizhe Zhang, Liang Chen, Zibin Zheng

    Abstract: Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar ind… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Under review

  49. arXiv:2506.18385  [pdf, ps, other

    cs.CV

    InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models

    Authors: Nianchen Deng, Lixin Gu, Shenglong Ye, Yinan He, Zhe Chen, Songze Li, Haomin Wang, Xingguang Wei, Tianshuo Yang, Min Dou, Tong He, Wenqi Shao, Kaipeng Zhang, Yi Wang, Botian Shi, Yanting Zhang, Jifeng Dai, Yu Qiao, Hongjie Zhang, Wenhai Wang

    Abstract: Recent benchmarks and datasets have been proposed to improve spatial reasoning in vision-language models (VLMs), yet existing open resources remain limited in scale, visual diversity, and instruction expressiveness. In this work, we introduce InternSpatial, the largest open-source dataset for spatial reasoning in VLMs, along with InternSpatial-Bench, a corresponding evaluation benchmark designed t… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  50. arXiv:2506.18290  [pdf, ps, other

    cs.LG

    Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction

    Authors: Han Zhang, Jinghong Mao, Shangwen Zhu, Zhantao Yang, Lianghua Huang, Yu Liu, Deli Zhao, Ruili Feng, Fan Cheng

    Abstract: Diffusion reconstruction plays a critical role in various applications such as image editing, restoration, and style transfer. In theory, the reconstruction should be simple - it just inverts and regenerates images by numerically solving the Probability Flow-Ordinary Differential Equation (PF-ODE). Yet in practice, noticeable reconstruction errors have been observed, which cannot be well explained… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.