Skip to main content

Showing 1–50 of 2,413 results for author: Zhang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10527  [pdf, other

    cs.CL

    WorldPM: Scaling Human Preference Modeling

    Authors: Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin

    Abstract: Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from pub… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10483  [pdf, ps, other

    cs.CV cs.AI

    UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation

    Authors: Yi Li, Haonan Wang, Qixiang Zhang, Boyu Xiao, Chenchang Hu, Hualiang Wang, Xiaomeng Li

    Abstract: The emergence of unified multimodal understanding and generation models is rapidly attracting attention because of their ability to enhance instruction-following capabilities while minimizing model redundancy. However, there is a lack of a unified evaluation framework for these models, which would enable an elegant, simplified, and overall evaluation. Current models conduct evaluations on multiple… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: UniEval is the first evaluation framework designed for unified multimodal models, including a holistic benchmark UniBench and the UniScore metric

  3. arXiv:2505.10034  [pdf, ps, other

    cs.AI

    The First MPDD Challenge: Multimodal Personality-aware Depression Detection

    Authors: Changzeng Fu, Zelin Fu, Xinhe Kuang, Jiacheng Dong, Qi Zhang, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn Schuller, Hiroshi Ishiguro

    Abstract: Depression is a widespread mental health issue affecting diverse age groups, with notable prevalence among college students and the elderly. However, existing datasets and detection methods primarily focus on young adults, neglecting the broader age spectrum and individual differences that influence depression manifestation. Current approaches often establish a direct mapping between multimodal da… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted as part of the MPDD Challenge in the ACMMM 2025 Grand Challenge

    MSC Class: 68T07 ACM Class: I.2.0; H.5.1

  4. arXiv:2505.09205  [pdf, other

    cs.IR

    HMamba: Hyperbolic Mamba for Sequential Recommendation

    Authors: Qianru Zhang, Honggang Wen, Wei Yuan, Crystal Chen, Menglin Yang, Siu-Ming Yiu, Hongzhi Yin

    Abstract: Sequential recommendation systems have become a cornerstone of personalized services, adept at modeling the temporal evolution of user preferences by capturing dynamic interaction sequences. Existing approaches predominantly rely on traditional models, including RNNs and Transformers. Despite their success in local pattern recognition, Transformer-based methods suffer from quadratic computational… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.08695  [pdf, other

    cs.CV

    SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model

    Authors: Zhanjie Zhang, Quanwei Zhang, Junsheng Luan, Mengyuan Yang, Yun Wang, Lei Zhao

    Abstract: Given an arbitrary content and style image, arbitrary style transfer aims to render a new stylized image which preserves the content image's structure and possesses the style image's style. Existing arbitrary style transfer methods are based on either small models or pre-trained large-scale models. The small model-based methods fail to generate high-quality stylized images, bringing artifact… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted by Neural Networks

  6. arXiv:2505.07921  [pdf, ps, other

    cs.LG cs.AI

    Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning

    Authors: Qi Xu, Junyang Zhu, Dongdong Zhou, Hao Chen, Yang Liu, Jiangrong Shen, Qiang Zhang

    Abstract: Deep neural networks (DNNs) excel in computer vision tasks, especially, few-shot learning (FSL), which is increasingly important for generalizing from limited examples. However, DNNs are computationally expensive with scalability issues in real world. Spiking Neural Networks (SNNs), with their event-driven nature and low energy consumption, are particularly efficient in processing sparse and dynam… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.07855  [pdf, other

    cs.RO

    A Physics-informed End-to-End Occupancy Framework for Motion Planning of Autonomous Vehicles

    Authors: Shuqi Shen, Junjie Yang, Hongliang Lu, Hui Zhong, Qiming Zhang, Xinhu Zheng

    Abstract: Accurate and interpretable motion planning is essential for autonomous vehicles (AVs) navigating complex and uncertain environments. While recent end-to-end occupancy prediction methods have improved environmental understanding, they typically lack explicit physical constraints, limiting safety and generalization. In this paper, we propose a unified end-to-end framework that integrates verifiable… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  8. arXiv:2505.07591  [pdf, ps, other

    cs.CL cs.AI

    A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

    Authors: Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang

    Abstract: Instruction following evaluates large language models (LLMs) on their ability to generate outputs that adhere to user-defined constraints. However, existing benchmarks often rely on templated constraint prompts, which lack the diversity of real-world usage and limit fine-grained performance assessment. To fill this gap, we propose a multi-dimensional constraint framework encompassing three constra… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  9. arXiv:2505.07396  [pdf, ps, other

    cs.CV cs.LG

    TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

    Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

    Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

  10. arXiv:2505.06993  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Towards the Three-Phase Dynamics of Generalization Power of a DNN

    Authors: Yuxuan He, Junpeng Zhang, Hongyuan Zhang, Quanshi Zhang

    Abstract: This paper proposes a new perspective for analyzing the generalization power of deep neural networks (DNNs), i.e., directly disentangling and analyzing the dynamics of generalizable and non-generalizable interaction encoded by a DNN through the training process. Specifically, this work builds upon the recent theoretical achievement in explainble AI, which proves that the detailed inference logic o… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  11. arXiv:2505.06899  [pdf, ps, other

    cs.NI

    ContribChain: A Stress-Balanced Blockchain Sharding Protocol with Node Contribution Awareness

    Authors: Xinpeng Huang, Wanqing Jie, Shiwen Zhang, Haofu Yang, Wangjie Qiu, Qinnan Zhang, Huawei Huang, Zehui Xiong, Shaoting Tang, Hongwei Zheng, Zhiming Zheng

    Abstract: Existing blockchain sharding protocols have focused on eliminating imbalanced workload distributions. However, even with workload balance, disparities in processing capabilities can lead to differential stress among shards, resulting in transaction backlogs in certain shards. Therefore, achieving stress balance among shards in the dynamic and heterogeneous environment presents a significant challe… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by INFOCOM 2025

  12. arXiv:2505.06898  [pdf, ps, other

    cs.CV cs.CL

    Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration

    Authors: Honglong Yang, Shanshan Song, Yi Qin, Lehan Wang, Haonan Wang, Xinpeng Ding, Qixiang Zhang, Bodong Du, Xiaomeng Li

    Abstract: Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks, yet their clinical utility remains limited by inadequate multi-modal explainability and suboptimal prognostic capabilities. Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability to support transparent and trustworthy m… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  13. arXiv:2505.06831  [pdf, other

    cs.CV

    Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification

    Authors: Miaoyun Zhao, Qiang Zhang, Chenrong Li

    Abstract: Achieving group-robust generalization in the presence of spurious correlations remains a significant challenge, particularly when bias annotations are unavailable. Recent studies on Class-Conditional Distribution Balancing (CCDB) reveal that spurious correlations often stem from mismatches between the class-conditional and marginal distributions of bias attributes. They achieve promising results b… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  14. arXiv:2505.05622  [pdf, ps, other

    cs.RO cs.AI

    CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

    Authors: Weichen Zhang, Chen Gao, Shiquan Yu, Ruiying Peng, Baining Zhao, Qian Zhang, Jinqiang Cui, Xinlei Chen, Yong Li

    Abstract: Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN du… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.05512  [pdf, other

    cs.CV cs.RO

    Occupancy World Model for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Jingkai Sun, Jiahang Cao, Jiaxu Wang, Hao Cheng, Xiaozhu Ju, Zhengping Che, Renjing Xu, Jian Tang

    Abstract: Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structure… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  16. arXiv:2505.05422  [pdf, other

    cs.CV cs.AI cs.CL

    TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

    Authors: Haokun Lin, Teng Wang, Yixiao Ge, Yuying Ge, Zhichao Lu, Ying Wei, Qingfu Zhang, Zhenan Sun, Ying Shan

    Abstract: Pioneering token-based works such as Chameleon and Emu3 have established a foundation for multimodal unification but face challenges of high training computational overhead and limited comprehension performance due to a lack of high-level semantics. In this paper, we introduce TokLIP, a visual tokenizer that enhances comprehension by semanticizing vector-quantized (VQ) tokens and incorporating CLI… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Technical Report

  17. arXiv:2505.04445  [pdf, ps, other

    cs.IR

    M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation

    Authors: Qianru Zhang, Liang Qu, Honggang Wen, Dong Huang, Siu-Ming Yiu, Nguyen Quoc Viet Hung, Hongzhi Yin

    Abstract: Sequential recommendation systems aim to predict users' next preferences based on their interaction histories, but existing approaches face critical limitations in efficiency and multi-scale pattern recognition. While Transformer-based methods struggle with quadratic computational complexity, recent Mamba-based models improve efficiency but fail to capture periodic user behaviors, leverage rich se… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  18. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  19. arXiv:2505.02922  [pdf, ps, other

    cs.LG

    RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

    Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

    Abstract: The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages

  20. arXiv:2505.02351  [pdf

    cs.DC

    Opt-GPTQ: An Optimized GPTQ Combining Sparse Attention and Quantization Techniques

    Authors: Jie Kong, Junxiang Zhang, Jiheng Xu, Yalong Li, Shouhua Zhang, Jiehan Zhou, Yuhai Liu, Peng Liang, Quan Zhang, Luohan Jiang

    Abstract: In the field of deep learning, traditional attention mechanisms face significant challenges related to high computational complexity and large memory consumption when processing long sequence data. To address these limitations, we propose Opt-GPTQ, an optimized Gradient-based Post Training Quantization (GPTQ) combining the Grouped Query Attention (GQA) mechanism with paging memory management, opti… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  21. arXiv:2505.01572  [pdf, other

    cs.AI cs.DC

    PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding

    Authors: Bradley McDanel, Sai Qian Zhang, Yunhai Hu, Zining Liu

    Abstract: Speculative decoding accelerates large language model inference by using smaller draft models to generate candidate tokens for parallel verification. However, current approaches are limited by sequential stage dependencies that prevent full hardware utilization. We present PipeSpec, a framework that generalizes speculative decoding to $k$ models arranged in a hierarchical pipeline, enabling asynch… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, 2 tables

  22. arXiv:2505.01007  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Towards the Resistance of Neural Network Watermarking to Fine-tuning

    Authors: Ling Tang, Yuefeng Chen, Hui Xue, Quanshi Zhang

    Abstract: This paper proves a new watermarking method to embed the ownership information into a deep neural network (DNN), which is robust to fine-tuning. Specifically, we prove that when the input feature of a convolutional layer only contains low-frequency components, specific frequency components of the convolutional filter will not be changed by gradient descent during the fine-tuning process, where we… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  23. arXiv:2505.00998  [pdf, other

    cs.CV

    Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis

    Authors: Yu Hua, Weiming Liu, Gui Xu, Yaqing Hou, Yew-Soon Ong, Qiang Zhang

    Abstract: Human motion synthesis aims to generate plausible human motion sequences, which has raised widespread attention in computer animation. Recent score-based generative models (SGMs) have demonstrated impressive results on this task. However, their training process involves complex curvature trajectories, leading to unstable training process. In this paper, we propose a Deterministic-to-Stochastic Div… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025

  24. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  25. arXiv:2505.00551  [pdf, other

    cs.CL

    100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

    Authors: Chong Zhang, Yue Deng, Xiang Lin, Bin Wang, Dianwen Ng, Hai Ye, Xingxuan Li, Yao Xiao, Zhanfeng Mo, Qi Zhang, Lidong Bing

    Abstract: The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open… ▽ More

    Submitted 15 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  26. arXiv:2505.00257  [pdf, other

    cs.LG cs.CR

    Graph Privacy: A Heterogeneous Federated GNN for Trans-Border Financial Data Circulation

    Authors: Zhizhong Tan, Jiexin Zheng, Kevin Qi Zhang, Wenyong Wang

    Abstract: The sharing of external data has become a strong demand of financial institutions, but the privacy issue has led to the difficulty of interconnecting different platforms and the low degree of data openness. To effectively solve the privacy problem of financial data in trans-border flow and sharing, to ensure that the data is available but not visible, to realize the joint portrait of all kinds of… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  27. arXiv:2504.21043  [pdf, other

    cs.CR cs.AI

    CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

    Authors: Lingxiang Wang, Hainan Zhang, Qinnan Zhang, Ziwei Wang, Hongwei Zheng, Jin Dong, Zhiming Zheng

    Abstract: Large language models (LLMs) excel at generating code from natural language instructions, yet they often lack an understanding of security vulnerabilities. This limitation makes it difficult for LLMs to avoid security risks in generated code, particularly in high-security programming tasks such as smart contract development for blockchain. Researchers have attempted to enhance the vulnerability aw… ▽ More

    Submitted 6 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  28. arXiv:2504.20861  [pdf, other

    cs.CE cond-mat.dis-nn cond-mat.mtrl-sci

    Simulating Heterogeneity within Elastic and Inelastic Discrete Mechanical Models

    Authors: Jan Raisinger, Qiwei Zhang, John E. Bolander, Jan Eliáš

    Abstract: The study investigates the elastic and fracture behaviors of discrete, elastically homogeneous models of heterogeneous media. The homogeneity is accomplished either by volumetric-deviatoric decomposition of constitutive function or by an auxiliary stress homogenization method. The elastic parameters of the homogenized material models are randomly varied in space to introduce heterogeneity independ… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 24 pages, 11 figures

  29. arXiv:2504.19636  [pdf, other

    cs.AI cs.NE

    Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search

    Authors: Fei Liu, Qingfu Zhang, Xialiang Tong, Kun Mao, Mingxuan Yuan

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in algorithm design. However, when integrated into search frameworks for iterative algorithm search, the underlying fitness landscape--critical for understanding search behaviou--remains underexplored. In this paper, we illustrate and analyze the fitness landscape of LLM-assisted Algorithm Search (LAS) using a graph-based approac… ▽ More

    Submitted 1 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  30. arXiv:2504.19399  [pdf, other

    cs.RO

    Follow Everything: A Leader-Following and Obstacle Avoidance Framework with Goal-Aware Adaptation

    Authors: Qianyi Zhang, Shijian Ma, Boyi Liu, Jianhao Jiao, Dimitrios Kanoulas

    Abstract: Robust and flexible leader-following is a critical capability for robots to integrate into human society. While existing methods struggle to generalize to leaders of arbitrary form and often fail when the leader temporarily leaves the robot's field of view, this work introduces a unified framework addressing both challenges. First, traditional detection models are replaced with a segmentation mode… ▽ More

    Submitted 12 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  31. arXiv:2504.19373  [pdf, other

    cs.CR cs.AI

    Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model

    Authors: Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Yue Zhao, Zhen Xiang, Chaowei Xiao

    Abstract: The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation. In this paper, we conduct the first systematic and controlled study on the potential privacy risks associated with visual reasoning abilities of ChatGPT o3. We manually collect and construct a dataset comprisin… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  32. arXiv:2504.19210  [pdf, other

    cs.CV

    FlexPara: Flexible Neural Surface Parameterization

    Authors: Yuming Zhao, Qijian Zhang, Junhui Hou, Jiazhi Xia, Wenping Wang, Ying He

    Abstract: Surface parameterization is a fundamental geometry processing task, laying the foundations for the visual presentation of 3D assets and numerous downstream shape analysis scenarios. Conventional parameterization approaches demand high-quality mesh triangulation and are restricted to certain simple topologies unless additional surface cutting and decomposition are provided. In practice, the optimal… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  33. arXiv:2504.19101  [pdf, other

    cs.CL

    Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation

    Authors: Qianren Mao, Qili Zhang, Hanwen Hao, Zhentao Han, Runhua Xu, Weifeng Jiang, Qi Hu, Zhijun Chen, Tyler Zhou, Bo Li, Yangqiu Song, Jin Dong, Jianxin Li, Philip S. Yu

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution for enhancing the accuracy and credibility of Large Language Models (LLMs), particularly in Question & Answer tasks. This is achieved by incorporating proprietary and private data from integrated databases. However, private RAG systems face significant challenges due to the scarcity of private domain data and critica… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  34. arXiv:2504.18857  [pdf, other

    cs.CL cs.AI

    Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

    Authors: Yi Lu, Wanxu Zhao, Xin Zhou, Chenxin An, Chenglong Wang, Shuo Li, Yuming Yang, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Large Language Models (LLMs) often struggle to process and generate coherent context when the number of input tokens exceeds the pre-trained length. Recent advancements in long-context extension have significantly expanded the context window of LLMs but require expensive overhead to train the large-scale models with longer context. In this work, we propose Dimension-Wise Positional Embeddings Mani… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  35. arXiv:2504.18838  [pdf, other

    cs.CL

    Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

    Authors: Yixin Cao, Shibo Hong, Xinze Li, Jiahao Ying, Yubo Ma, Haiyuan Liang, Yantao Liu, Zijun Yao, Xiaozhi Wang, Dan Huang, Wenxuan Zhang, Lifu Huang, Muhao Chen, Lei Hou, Qianru Sun, Xingjun Ma, Zuxuan Wu, Min-Yen Kan, David Lo, Qi Zhang, Heng Ji, Jing Jiang, Juanzi Li, Aixin Sun, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  36. arXiv:2504.17314  [pdf, other

    cs.LG cs.CV

    Class-Conditional Distribution Balancing for Group Robust Classification

    Authors: Miaoyun Zhao, Qiang Zhang, Chenrong Li

    Abstract: Spurious correlations that lead models to correct predictions for the wrong reasons pose a critical challenge for robust real-world generalization. Existing research attributes this issue to group imbalance and addresses it by maximizing group-balanced or worst-group accuracy, which heavily relies on expensive bias annotations. A compromise approach involves predicting bias information using exten… ▽ More

    Submitted 24 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  37. arXiv:2504.17224  [pdf, other

    cs.CV

    Visual and textual prompts for enhancing emotion recognition in video

    Authors: Zhifeng Wang, Qixuan Zhang, Peter Zhang, Wenjia Niu, Kaihao Zhang, Ramesh Sankaranarayana, Sabrina Caldwell, Tom Gedeon

    Abstract: Vision Large Language Models (VLLMs) exhibit promising potential for multi-modal understanding, yet their application to video-based emotion recognition remains limited by insufficient spatial and contextual awareness. Traditional approaches, which prioritize isolated facial features, often neglect critical non-verbal cues such as body language, environmental context, and social interactions, lead… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 12 pages, 10 figures

  38. arXiv:2504.16922  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

    Authors: Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi

    Abstract: Many sparse attention mechanisms such as Neighborhood Attention have typically failed to consistently deliver speedup over the self attention baseline. This is largely due to the level of complexity in attention infrastructure, and the rapid evolution of AI hardware architecture. At the same time, many state-of-the-art foundational models, particularly in computer vision, are heavily bound by atte… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: https://github.com/SHI-Labs/NATTEN/

  39. arXiv:2504.15867  [pdf, other

    cs.SE

    Inducing Vulnerable Code Generation in LLM Coding Assistants

    Authors: Binqi Zeng, Quan Zhang, Chijin Zhou, Gwihwan Go, Yu Jiang, Heyuan Shi

    Abstract: Due to insufficient domain knowledge, LLM coding assistants often reference related solutions from the Internet to address programming problems. However, incorporating external information into LLMs' code generation process introduces new security risks. In this paper, we reveal a real-world threat, named HACKODE, where attackers exploit referenced external information to embed attack sequences, c… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  40. arXiv:2504.15623  [pdf, other

    cs.LG eess.SY

    RadioDiff-$k^2$: Helmholtz Equation Informed Generative Diffusion Model for Multi-Path Aware Radio Map Construction

    Authors: Xiucheng Wang, Qiming Zhang, Nan Cheng, Ruijin Sun, Zan Li, Shuguang Cui, Xuemin Shen

    Abstract: In this paper, we propose a novel physics-informed generative learning approach, termed RadioDiff-$\bm{k^2}$, for accurate and efficient multipath-aware radio map (RM) construction. As wireless communication evolves towards environment-aware paradigms, driven by the increasing demand for intelligent and proactive optimization in sixth-generation (6G) networks, accurate construction of RMs becomes… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  41. arXiv:2504.15131  [pdf, other

    cs.SI

    Beyond Binary Opinions: A Deep Reinforcement Learning-Based Approach to Uncertainty-Aware Competitive Influence Maximization

    Authors: Qi Zhang, Dian Chen, Lance M. Kaplan, Audun Jøsang, Dong Hyun Jeong, Feng Chen, Jin-Hee Cho

    Abstract: The Competitive Influence Maximization (CIM) problem involves multiple entities competing for influence in online social networks (OSNs). While Deep Reinforcement Learning (DRL) has shown promise, existing methods often assume users' opinions are binary and ignore their behavior and prior knowledge. We propose DRIM, a multi-dimensional uncertainty-aware DRL-based CIM framework that leverages Subje… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  42. arXiv:2504.14604  [pdf, other

    cs.RO

    RoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots

    Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Hengle Ren, Renjing Xu, Jian Tang

    Abstract: 3D occupancy prediction enables the robots to obtain spatial fine-grained geometry and semantics of the surrounding scene, and has become an essential task for embodied perception. Existing methods based on 3D Gaussians instead of dense voxels do not effectively exploit the geometry and opacity properties of Gaussians, which limits the network's estimation of complex environments and also limits t… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  43. arXiv:2504.14363  [pdf, other

    cs.LG cs.CL

    Improving RL Exploration for LLM Reasoning through Retrospective Replay

    Authors: Shihan Dou, Muling Wu, Jingwen Xu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited ca… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 13 pages, 3 figures

  44. arXiv:2504.13392  [pdf, ps, other

    cs.CV cs.HC

    POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

    Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh

    Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  45. arXiv:2504.12913  [pdf, other

    cs.CL

    MAIN: Mutual Alignment Is Necessary for instruction tuning

    Authors: Fanyi Yang, Jianfeng Liu, Xin Zhang, Haoyu Liu, Xixin Cao, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang

    Abstract: Instruction tuning has enabled large language models (LLMs) to achieve remarkable performance, but its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that high-quality instruction-response pai… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  46. arXiv:2504.12826  [pdf, other

    cs.RO cs.CV

    UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty

    Authors: Pengxuan Yang, Yupeng Zheng, Qichao Zhang, Kefei Zhu, Zebin Xing, Qiao Lin, Yun-Fu Liu, Zhiguo Su, Dongbin Zhao

    Abstract: End-to-end autonomous driving aims to produce planning trajectories from raw sensors directly. Currently, most approaches integrate perception, prediction, and planning modules into a fully differentiable network, promising great scalability. However, these methods typically rely on deterministic modeling of online maps in the perception module for guiding or constraining vehicle planning, which m… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  47. arXiv:2504.12764  [pdf, other

    cs.LG cs.DM

    GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

    Authors: Hao Xu, Xiangru Jian, Xinjian Zhao, Wei Pang, Chao Zhang, Suyuchen Wang, Qixin Zhang, Joao Monteiro, Qiuzhuang Sun, Tianshu Yu

    Abstract: In this paper, we presented GraphOmni, a comprehensive benchmark framework for systematically evaluating the graph reasoning capabilities of LLMs. By analyzing critical dimensions, including graph types, serialization formats, and prompt schemes, we provided extensive insights into the strengths and limitations of current LLMs. Our empirical findings emphasize that no single serialization or promp… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 82 pages

  48. arXiv:2504.11467  [pdf, other

    cs.CV eess.IV

    MultiCore+TPU Accelerated Multi-Modal TinyML for Livestock Behaviour Recognition

    Authors: Qianxue Zhang, Eiman Kanjo

    Abstract: The advancement of technology has revolutionised the agricultural industry, transitioning it from labour-intensive farming practices to automated, AI-powered management systems. In recent years, more intelligent livestock monitoring solutions have been proposed to enhance farming efficiency and productivity. This work presents a novel approach to animal activity recognition and movement tracking,… ▽ More

    Submitted 18 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: 11 pages, 10 figures

  49. arXiv:2504.11346  [pdf, other

    cs.CV

    Seedream 3.0 Technical Report

    Authors: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai , et al. (6 additional authors not shown)

    Abstract: We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 st… ▽ More

    Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Seedream 3.0 Technical Report

  50. arXiv:2504.11310  [pdf

    cs.CV

    Intelligent driving vehicle front multi-target tracking and detection based on YOLOv5 and point cloud 3D projection

    Authors: Dayong Liu, Qingrui Zhang, Zeyang Meng

    Abstract: In multi-target tracking and detection tasks, it is necessary to continuously track multiple targets, such as vehicles, pedestrians, etc. To achieve this goal, the system must be able to continuously acquire and process image frames containing these targets. These consecutive frame images enable the algorithm to update the position and state of the target in real-time in each frame of the image. H… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: in Chinese language