Skip to main content

Showing 1–50 of 566 results for author: Guo, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05683  [pdf, ps, other

    cs.CR cs.IT

    Polyadic encryption

    Authors: Steven Duplij, Qiang Guo

    Abstract: A novel original procedure of encryption/decryption based on the polyadic algebraic structures and on signal processing methods is proposed. First, we use signals with integer amplitudes to send information. Then we use polyadic techniques to transfer the plaintext into series of special integers. The receiver restores the plaintext using special rules and systems of equations.

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: revtex 4.2, 9 pages

  2. arXiv:2507.05197  [pdf, ps, other

    cs.CL cs.LG

    Pre-Trained Policy Discriminators are General Reward Models

    Authors: Shihan Dou, Shichun Liu, Yuming Yang, Yicheng Zou, Yunhua Zhou, Shuhao Xing, Chenhao Huang, Qiming Ge, Demin Song, Haijun Lv, Songyang Gao, Chengqi Lv, Enyu Zhou, Honglin Guo, Zhiheng Xi, Wenwei Zhang, Qipeng Guo, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Tao Gui, Kai Chen

    Abstract: We offer a novel perspective on reward modeling by formulating it as a policy discriminator, which quantifies the difference between two policies to generate a reward signal, guiding the training policy towards a target policy with desired behaviors. Based on this conceptual insight, we propose a scalable pre-training method named Policy Discriminative Learning (POLAR), which trains a reward model… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.02713  [pdf, ps, other

    cs.CV

    UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation

    Authors: Qin Guo, Ailing Zeng, Dongxu Yue, Ceyuan Yang, Yang Cao, Hanzhong Guo, Fei Shen, Wei Liu, Xihui Liu, Dan Xu

    Abstract: Although significant advancements have been achieved in the progress of keypoint-guided Text-to-Image diffusion models, existing mainstream keypoint-guided models encounter challenges in controlling the generation of more general non-rigid objects beyond humans (e.g., animals). Moreover, it is difficult to generate multiple overlapping humans and animals based on keypoint controls solely. These ch… ▽ More

    Submitted 4 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

  4. arXiv:2507.00817  [pdf, ps, other

    cs.CV cs.AI

    CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs

    Authors: Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim

    Abstract: Video Multimodal Large Language Models (V-MLLMs) have shown impressive capabilities in temporal reasoning and cross-modal understanding, yet their vulnerability to adversarial attacks remains underexplored due to unique challenges: complex cross-modal reasoning mechanisms, temporal dependencies, and computational constraints. We present CAVALRY-V (Cross-modal Language-Vision Adversarial Yielding f… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2507.00018  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

    Authors: Bo Wang, Qinyuan Cheng, Runyu Peng, Rong Bao, Peiji Li, Qipeng Guo, Linyang Li, Zhiyuan Zeng, Yunhua Zhou, Xipeng Qiu

    Abstract: Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we… ▽ More

    Submitted 4 July, 2025; v1 submitted 15 June, 2025; originally announced July 2025.

  6. arXiv:2506.23461  [pdf, ps, other

    cs.CV cs.AI

    Time-variant Image Inpainting via Interactive Distribution Transition Estimation

    Authors: Yun Xing, Qing Guo, Xiaoguang Li, Yihao Huang, Xiaofeng Cao, Di Lin, Ivor Tsang, Lei Ma

    Abstract: In this work, we focus on a novel and practical task, i.e., Time-vAriant iMage inPainting (TAMP). The aim of TAMP is to restore a damaged target image by leveraging the complementary information from a reference image, where both images captured the same scene but with a significant time gap in between, i.e., time-variant images. Different from conventional reference-guided image inpainting, the r… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  7. arXiv:2506.21591   

    cs.CL

    FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning

    Authors: Shaoyu Dou, Yutian Shen, Mofan Chen, Zixuan Wang, Jiajie Xu, Qi Guo, Kailai Shao, Chao Chen, Haixiang Hu, Haibo Shi, Min Min, Liwen Zhang

    Abstract: Large Language Models (LLMs) demonstrate significant potential but face challenges in complex financial reasoning tasks requiring both domain knowledge and sophisticated reasoning. Current evaluation benchmarks often fall short by not decoupling these capabilities indicators from single task performance and lack root cause analysis for task failure. To address this, we introduce FinEval-KR, a nove… ▽ More

    Submitted 29 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: The statistics included in the paper are incomplete (e.g., Tables 2 and 5 report only the results of a single run), which may lead readers to misunderstand

  8. arXiv:2506.21230  [pdf, ps, other

    cs.AI cs.RO

    World-aware Planning Narratives Enhance Large Vision-Language Model Planner

    Authors: Junhao Shi, Zhaoye Fei, Siyin Wang, Qipeng Guo, Jingjing Gong, Xipeng Qiu

    Abstract: Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather tha… ▽ More

    Submitted 2 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  9. arXiv:2506.20963  [pdf, ps, other

    cs.IR cs.LG

    EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

    Authors: Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou

    Abstract: Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a no… ▽ More

    Submitted 3 July, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Under review

  10. arXiv:2506.18084  [pdf, ps, other

    cs.CV

    TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

    Authors: Wenzhuo Liu, Yicheng Qiao, Zhen Wang, Qiannan Guo, Zilong Chen, Meihua Zhou, Xinran Li, Letian Wang, Zhiwei Li, Huaping Liu, Wenshuo Wang

    Abstract: Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM^3-Learning (Time-Efficient Multimodal Multi-task Learning), a… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  11. arXiv:2506.16690  [pdf, ps, other

    cs.CV

    DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

    Authors: Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo

    Abstract: Stereo Depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help reveal vulnerabilities before deployment. Previous work has shown that repeating optimized textures can effectively mislead stereo depth estimation in digit… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  12. arXiv:2506.14429  [pdf, ps, other

    cs.CL

    LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

    Authors: Xiaoran Liu, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu

    Abstract: Large Language Diffusion Models, or diffusion LLMs, have emerged as a significant focus in NLP research, with substantial effort directed toward understanding their scalability and downstream task performance. However, their long-context capabilities remain unexplored, lacking systematic analysis or methods for context extension. In this work, we present the first systematic investigation comparin… ▽ More

    Submitted 22 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: 16 pages, 12 figures, work in progress

  13. arXiv:2506.13216  [pdf, ps, other

    cs.CL

    Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law

    Authors: Qiming Ge, Shuhao Xing, Songyang Gao, Yunhua Zhou, Yicheng Zou, Songyang Zhang, Zhi Chen, Hang Yan, Qi Zhang, Qipeng Guo, Kai Chen

    Abstract: Scaling law builds the relationship between training computation and validation loss, enabling researchers to effectively predict the loss trending of models across different levels of computation. However, a gap still remains between validation loss and the model's downstream capabilities, making it untrivial to apply scaling law to direct performance prediction for downstream tasks. The loss typ… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 9 pages, 9 figures, ACL2025

  14. arXiv:2506.12430  [pdf, ps, other

    cs.CR cs.CV

    Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

    Authors: Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song , et al. (22 additional authors not shown)

    Abstract: Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents finding… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  15. arXiv:2506.12355  [pdf, ps, other

    cs.LG cs.CL

    QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm

    Authors: Qirui Zhou, Shaohui Peng, Weiqiang Xiong, Haixin Chen, Yuanbo Wen, Haochen Li, Ling Li, Qi Guo, Yongwei Zhao, Ke Gao, Ruizhi Chen, Yanjun Wu, Chen Zhao, Yunji Chen

    Abstract: The attention operator remains a critical performance bottleneck in large language models (LLMs), particularly for long-context scenarios. While FlashAttention is the most widely used and effective GPU-aware acceleration algorithm, it must require time-consuming and hardware-specific manual implementation, limiting adaptability across GPU architectures. Existing LLMs have shown a lot of promise in… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    ACM Class: I.2.7

  16. arXiv:2506.11886  [pdf, ps, other

    cs.CL

    Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

    Authors: Xiaoran Liu, Siyang He, Qiqi Wang, Ruixiao Li, Yuerong Song, Zhigeng Liu, Linlin Li, Qun Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu

    Abstract: Large Language Models struggle with memory demands from the growing Key-Value (KV) cache as context lengths increase. Existing compression methods homogenize head dimensions or rely on attention-guided token pruning, often sacrificing accuracy or introducing computational overhead. We propose FourierAttention, a training-free framework that exploits the heterogeneous roles of transformer head dime… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 pages, 7 figures, work in progress

  17. arXiv:2506.11153  [pdf, ps, other

    cs.SE cs.LG

    Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

    Authors: Changxin Ke, Rui Zhang, Shuo Wang, Li Ding, Guangli Li, Yuanbo Wen, Shuoming Zhang, Ruiyuan Xu, Jin Qin, Jiaming Guo, Chenxi Wang, Ling Li, Qi Guo, Yunji Chen

    Abstract: The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 28 pages

  18. arXiv:2506.09538  [pdf, ps, other

    cs.CV

    AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches

    Authors: Wenjun Ji, Yuxiang Fu, Luyang Ying, Deng-Ping Fan, Yuyi Wang, Ming-Ming Cheng, Ivor Tsang, Qing Guo

    Abstract: Cutting-edge works have demonstrated that text-to-image (T2I) diffusion models can generate adversarial patches that mislead state-of-the-art object detectors in the physical world, revealing detectors' vulnerabilities and risks. However, these methods neglect the T2I patches' attack effectiveness when observed from different views in the physical world (i.e., angle robustness of the T2I adversari… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  19. arXiv:2506.09344  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.SD eess.AS

    Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan , et al. (33 additional authors not shown)

    Abstract: We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages,8 figures

  20. arXiv:2506.07160  [pdf, ps, other

    cs.CL

    GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization

    Authors: Yikun Wang, Yibin Wang, Dianyi Wang, Zimian Peng, Qipeng Guo, Dacheng Tao, Jiaqi Wang

    Abstract: Recent advances in large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, particularly in mathematical reasoning, amid which geometry problem solving remains a challenging area where auxiliary construction plays a enssential role. Existing approaches either achieve suboptimal performance or rely on massive LLMs (e.g., GPT-4o), incurring massive computational… ▽ More

    Submitted 30 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

  21. arXiv:2506.05007  [pdf, ps, other

    cs.AR cs.LG

    QiMeng: Fully Automated Hardware and Software Design for Processor Chip

    Authors: Rui Zhang, Yuanbo Wen, Shuyao Cheng, Di Huang, Shaohui Peng, Jiaming Guo, Pengwei Jin, Jiacheng Zhao, Tianrui Ma, Yaoyu Zhu, Yifan Hao, Yongwei Zhao, Shengwen Liang, Ying Wang, Xing Hu, Zidong Du, Huimin Cui, Ling Li, Qi Guo, Yunji Chen

    Abstract: Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  22. arXiv:2505.24227  [pdf, ps, other

    cs.CV cs.CR

    Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models

    Authors: Ying Yang, Jie Zhang, Xiao Lv, Di Lin, Tao Xiang, Qing Guo

    Abstract: While adversarial attacks on vision-and-language pretraining (VLP) models have been explored, generating natural adversarial samples crafted through realistic and semantically meaningful perturbations remains an open challenge. Existing methods, primarily designed for classification tasks, struggle when adapted to VLP models due to their restricted optimization spaces, leading to ineffective attac… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  23. arXiv:2505.24183  [pdf, ps, other

    cs.LG cs.AR cs.PL

    CodeV-R1: Reasoning-Enhanced Verilog Generation

    Authors: Yaoyu Zhu, Di Huang, Hanqi Lyu, Xiaoyun Zhang, Chongxiao Li, Wenxuan Shi, Yutong Wu, Jianan Mu, Jinghua Wang, Yang Zhao, Pengwei Jin, Shuyao Cheng, Shengwen Liang, Xishan Zhang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) spec… ▽ More

    Submitted 20 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  24. arXiv:2505.23830  [pdf, ps, other

    cs.CL

    EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

    Authors: Linglin Jing, Yuting Gao, Zhigang Wang, Wang Lan, Yiwen Tang, Wenhai Wang, Kaipeng Zhang, Qingpei Guo

    Abstract: Recent advancements have shown that the Mixture of Experts (MoE) approach significantly enhances the capacity of large language models (LLMs) and improves performance on downstream tasks. Building on these promising results, multi-modal large language models (MLLMs) have increasingly adopted MoE techniques. However, existing multi-modal MoE tuning methods typically face two key challenges: expert… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  25. arXiv:2505.22226  [pdf, ps, other

    cs.CV

    Hadaptive-Net: Efficient Vision Models via Adaptive Cross-Hadamard Synergy

    Authors: Xuyang Zhang, Xi Zhang, Liang Chen, Hao Shi, Qingshan Guo

    Abstract: Recent studies have revealed the immense potential of Hadamard product in enhancing network representational capacity and dimensional compression. However, despite its theoretical promise, this technique has not been systematically explored or effectively applied in practice, leaving its full capabilities underdeveloped. In this work, we first analyze and identify the advantages of Hadamard produc… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

  26. arXiv:2505.20945  [pdf, ps, other

    cs.CR

    IRCopilot: Automated Incident Response with Large Language Models

    Authors: Xihuan Lin, Jie Zhang, Gelei Deng, Tianzhe Liu, Xiaolong Liu, Changcai Yang, Tianwei Zhang, Qing Guo, Riqing Chen

    Abstract: Incident response plays a pivotal role in mitigating the impact of cyber attacks. In recent years, the intensity and complexity of global cyber threats have grown significantly, making it increasingly challenging for traditional threat detection and incident response methods to operate effectively in complex network environments. While Large Language Models (LLMs) have shown great potential in ear… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  27. arXiv:2505.19225  [pdf, ps, other

    eess.IV cs.CV

    MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

    Authors: Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

    Abstract: Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  28. arXiv:2505.17652  [pdf, ps, other

    cs.LG cs.AI

    Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

    Authors: Deyang Kong, Qi Guo, Xiangyu Xi, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye

    Abstract: Reinforcement learning exhibits potential in enhancing the reasoning abilities of large language models, yet it is hard to scale for the low sample efficiency during the rollout phase. Existing methods attempt to improve efficiency by scheduling problems based on problem difficulties. However, these approaches suffer from unstable and biased estimations of problem difficulty and fail to capture th… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  29. arXiv:2505.17509  [pdf, other

    cs.CV

    Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning

    Authors: Shiji Zhao, Qihui Zhu, Shukun Xiong, Shouwei Ruan, Yize Fan, Ranjie Duan, Qing Guo, Xingxing Wei

    Abstract: Large pre-trained Vision Language Models (VLMs) have excellent generalization capabilities but are highly susceptible to adversarial examples, presenting potential security risks. To improve the robustness of VLMs against adversarial examples, adversarial prompt tuning methods are proposed to align the text feature with the adversarial image feature without changing model parameters. However, when… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  30. arXiv:2505.16335  [pdf, ps, other

    cs.CV cs.AI

    FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design

    Authors: Renjie Wei, Songqiang Xu, Qingyu Guo, Meng Li

    Abstract: Visual autoregressive (VAR) modeling has marked a paradigm shift in image generation from next-token prediction to next-scale prediction. VAR predicts a set of tokens at each step from coarse to fine scale, leading to better image quality and faster inference speed compared to existing diffusion models. However, the large parameter size and computation cost hinder its deployment on edge devices. T… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  31. arXiv:2505.12236  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Bridging Generative and Discriminative Learning: Few-Shot Relation Extraction via Two-Stage Knowledge-Guided Pre-training

    Authors: Quanjiang Guo, Jinchuan Zhang, Sijie Wang, Ling Tian, Zhao Kang, Bin Yan, Weidong Xiao

    Abstract: Few-Shot Relation Extraction (FSRE) remains a challenging task due to the scarcity of annotated data and the limited generalization capabilities of existing models. Although large language models (LLMs) have demonstrated potential in FSRE through in-context learning (ICL), their general-purpose training objectives often result in suboptimal performance for task-specific relation extraction. To ove… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: 13 pages, 6 figures, Appear on IJCAI 2025

  32. arXiv:2505.11861  [pdf, ps, other

    cs.AI cs.CL

    Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

    Authors: Qi Zhou, Jie Zhang, Dongxia Wang, Qiang Liu, Tianlin Li, Jin Song Dong, Wenhai Wang, Qing Guo

    Abstract: Human preference plays a crucial role in the refinement of large language models (LLMs). However, collecting human preference feedback is costly and most existing datasets neglect the correlation between personalization and preferences. To address this issue, we introduce Fair-PP, a synthetic dataset of personalized preferences targeting social equity, derived from real-world social survey data, w… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: under review

    MSC Class: 91C99 ACM Class: I.2.7; J.4

  33. arXiv:2505.10784  [pdf, ps, other

    cs.CV

    SynRailObs: A Synthetic Dataset for Obstacle Detection in Railway Scenarios

    Authors: Qiushi Guo, Jason Rambach

    Abstract: Detecting potential obstacles in railway environments is critical for preventing serious accidents. Identifying a broad range of obstacle categories under complex conditions requires large-scale datasets with precisely annotated, high-quality images. However, existing publicly available datasets fail to meet these requirements, thereby hindering progress in railway safety research. To address this… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  34. arXiv:2505.07818  [pdf, other

    cs.CV

    DanceGRPO: Unleashing GRPO on Visual Generation

    Authors: Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, Ping Luo

    Abstract: Recent breakthroughs in generative models-particularly diffusion models and rectified flows-have revolutionized visual content creation, yet aligning model outputs with human preferences remains a critical challenge. Existing reinforcement learning (RL)-based methods for visual generation face critical limitations: incompatibility with modern Ordinary Differential Equations (ODEs)-based sampling p… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project Page: https://dancegrpo.github.io/

  35. arXiv:2505.06302  [pdf, other

    cs.LG cs.AI

    QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

    Authors: Xuzhi Zhang, Shaohui Peng, Qirui Zhou, Yuanbo Wen, Qi Guo, Ruizhi Chen, Xinguo Zhu, Weiqiang Xiong, Haixin Chen, Congying Ma, Ke Gao, Chen Zhao, Yanjun Wu, Yunji Chen, Ling Li

    Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks po… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.2

  36. arXiv:2505.05375  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.NE

    Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

    Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghai Guo

    Abstract: Recently, spiking neural networks (SNNs), deployed on neuromorphic chips, provide highly efficient solutions on edge devices in different scenarios. However, their ability to adapt to distribution shifts after deployment has become a crucial challenge. Online test-time adaptation (OTTA) offers a promising solution by enabling models to dynamically adjust to new data distributions without requiring… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCNN 2025. \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  37. arXiv:2505.03195  [pdf, other

    cs.AR

    QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies

    Authors: Shuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen

    Abstract: Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on sup… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures

  38. arXiv:2505.02471  [pdf, ps, other

    cs.CV

    Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

    Abstract: We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale repr… ▽ More

    Submitted 12 June, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: https://github.com/inclusionAI/Ming/tree/Ming-Lite-Omni-Preview/Ming-unify

  39. arXiv:2505.02146  [pdf, other

    cs.CL cs.LG cs.PL

    QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

    Authors: Shouyang Dong, Yuanbo Wen, Jun Bi, Di Huang, Jiaming Guo, Jianxing Xu, Ruibai Xu, Xinkai Song, Yifan Hao, Xuehai Zhou, Tianshi Chen, Qi Guo, Yunji Chen

    Abstract: Heterogeneous deep learning systems (DLS) such as GPUs and ASICs have been widely deployed in industrial data centers, which requires to develop multiple low-level tensor programs for different platforms. An attractive solution to relieve the programming burden is to transcompile the legacy code of one platform to others. However, current transcompilation techniques struggle with either tremendous… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted to OSDI 2025

  40. arXiv:2505.01077  [pdf

    cs.NE

    Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM

    Authors: Lei Zhao, Ling Kang, Quan Guo

    Abstract: With the advent of artificial intelligence (AI), many researchers are attempting to extract structured information from document-level biomedical literature by fine-tuning large language models (LLMs). However, they face significant challenges such as the need for expensive hardware, like high-performance GPUs and the high labor costs associated with annotating training datasets, especially in bio… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  41. arXiv:2504.19456  [pdf, other

    cs.CR cs.SE

    FCGHunter: Towards Evaluating Robustness of Graph-Based Android Malware Detection

    Authors: Shiwen Song, Xiaofei Xie, Ruitao Feng, Qi Guo, Sen Chen

    Abstract: Graph-based detection methods leveraging Function Call Graphs (FCGs) have shown promise for Android malware detection (AMD) due to their semantic insights. However, the deployment of malware detectors in dynamic and hostile environments raises significant concerns about their robustness. While recent approaches evaluate the robustness of FCG-based detectors using adversarial attacks, their effecti… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 14 pages, 5 figures

  42. arXiv:2504.19398  [pdf

    cs.CV

    Dynamic Arthroscopic Navigation System for Anterior Cruciate Ligament Reconstruction Based on Multi-level Memory Architecture

    Authors: Shuo Wang, Weili Shi, Shuai Yang, Jiahao Cui, Qinwei Guo

    Abstract: This paper presents a dynamic arthroscopic navigation system based on multi-level memory architecture for anterior cruciate ligament (ACL) reconstruction surgery. The system extends our previously proposed markerless navigation method from static image matching to dynamic video sequence tracking. By integrating the Atkinson-Shiffrin memory model's three-level architecture (sensory memory, working… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 28 pages, 13 figures

    ACM Class: I.4.9; I.2.10; J.3; I.4.8; I.5.4

  43. arXiv:2504.18448  [pdf, other

    cs.CV

    NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration

    Authors: Haotian Dong, Xin Wang, Di Lin, Yipeng Wu, Qin Chen, Ruonan Liu, Kairui Yang, Ping Li, Qing Guo

    Abstract: High-quality video generation is crucial for many fields, including the film industry and autonomous driving. However, generating videos with spatiotemporal consistencies remains challenging. Current methods typically utilize attention mechanisms or modify noise to achieve consistent videos, neglecting global spatiotemporal information that could help ensure spatial and temporal consistency during… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  44. arXiv:2504.17990  [pdf, other

    cs.CV

    From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

    Authors: Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

    Abstract: Composed Image Retrieval (CIR) is a challenging multimodal task that retrieves a target image based on a reference image and accompanying modification text. Due to the high cost of annotating CIR triplet datasets, zero-shot (ZS) CIR has gained traction as a promising alternative. Existing studies mainly focus on projection-based methods, which map an image to a single pseudo-word token. However, t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  45. arXiv:2504.17815  [pdf, other

    cs.CV

    Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning

    Authors: Mingxuan Cui, Qing Guo, Yuyi Wang, Hongkai Yu, Di Lin, Qin Zou, Ming-Ming Cheng, Xi Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful and efficient 3D representation for novel view synthesis. This paper extends 3DGS capabilities to inpainting, where masked objects in a scene are replaced with new contents that blend seamlessly with the surroundings. Unlike 2D image inpainting, 3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and sem… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 14 pages, 12 figures, ICCV

  46. arXiv:2504.15585  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu , et al. (78 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 8 June, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  47. arXiv:2504.12739  [pdf, other

    cs.CV

    Mask Image Watermarking

    Authors: Runyi Hu, Jie Zhang, Shiqian Zhao, Nils Lukas, Jiwei Li, Qing Guo, Han Qiu, Tianwei Zhang

    Abstract: We present MaskMark, a simple, efficient, and flexible framework for image watermarking. MaskMark has two variants: (1) MaskMark-D, which supports global watermark embedding, watermark localization, and local watermark extraction for applications such as tamper detection; (2) MaskMark-ED, which focuses on local watermark embedding and extraction, offering enhanced robustness in small regions to su… ▽ More

    Submitted 20 May, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: 26 pages, 20 figures

  48. arXiv:2504.12132  [pdf, other

    cs.CV

    Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision

    Authors: Linhao Qu, Shiman Li, Xiaoyuan Luo, Shaolei Liu, Qinhao Guo, Manning Wang, Zhijian Song

    Abstract: Computer-aided Whole Slide Image (WSI) classification has the potential to enhance the accuracy and efficiency of clinical pathological diagnosis. It is commonly formulated as a Multiple Instance Learning (MIL) problem, where each WSI is treated as a bag and the small patches extracted from the WSI are considered instances within that bag. However, obtaining labels for a large number of bags is a… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  49. arXiv:2504.11346  [pdf, ps, other

    cs.CV

    Seedream 3.0 Technical Report

    Authors: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai , et al. (6 additional authors not shown)

    Abstract: We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 st… ▽ More

    Submitted 28 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Seedream 3.0 Technical Report

  50. arXiv:2504.11202  [pdf, other

    cs.CV eess.IV eess.SP

    Focal Split: Untethered Snapshot Depth from Differential Defocus

    Authors: Junjie Luo, John Mamish, Alan Fu, Thomas Concannon, Josiah Hester, Emma Alexander, Qi Guo

    Abstract: We introduce Focal Split, a handheld, snapshot depth camera with fully onboard power and computing based on depth-from-differential-defocus (DfDD). Focal Split is passive, avoiding power consumption of light sources. Its achromatic optical system simultaneously forms two differentially defocused images of the scene, which can be independently captured using two photosensors in a snapshot. The data… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: CVPR 2025, 8 pages, 7 figures

    MSC Class: 68U10 ACM Class: I.4.8