Skip to main content

Showing 1–50 of 533 results for author: Guo, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.07818  [pdf, other

    cs.CV

    DanceGRPO: Unleashing GRPO on Visual Generation

    Authors: Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, Ping Luo

    Abstract: Recent breakthroughs in generative models-particularly diffusion models and rectified flows-have revolutionized visual content creation, yet aligning model outputs with human preferences remains a critical challenge. Existing reinforcement learning (RL)-based methods for visual generation face critical limitations: incompatibility with modern Ordinary Differential Equations (ODEs)-based sampling p… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project Page: https://dancegrpo.github.io/

  2. arXiv:2505.06302  [pdf, other

    cs.LG cs.AI

    QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

    Authors: Xuzhi Zhang, Shaohui Peng, Qirui Zhou, Yuanbo Wen, Qi Guo, Ruizhi Chen, Xinguo Zhu, Weiqiang Xiong, Haixin Chen, Congying Ma, Ke Gao, Chen Zhao, Yanjun Wu, Yunji Chen, Ling Li

    Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks po… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.2

  3. arXiv:2505.05375  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.NE

    Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

    Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghai Guo

    Abstract: Recently, spiking neural networks (SNNs), deployed on neuromorphic chips, provide highly efficient solutions on edge devices in different scenarios. However, their ability to adapt to distribution shifts after deployment has become a crucial challenge. Online test-time adaptation (OTTA) offers a promising solution by enabling models to dynamically adjust to new data distributions without requiring… ▽ More

    Submitted 9 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCNN 2025. \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  4. arXiv:2505.03195  [pdf, other

    cs.AR

    QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies

    Authors: Shuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen

    Abstract: Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on sup… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures

  5. arXiv:2505.02471  [pdf, other

    cs.CV

    Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

    Abstract: We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale repr… ▽ More

    Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: https://github.com/inclusionAI/Ming/tree/main/Ming-unify

  6. arXiv:2505.02146  [pdf, other

    cs.CL cs.LG cs.PL

    QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

    Authors: Shouyang Dong, Yuanbo Wen, Jun Bi, Di Huang, Jiaming Guo, Jianxing Xu, Ruibai Xu, Xinkai Song, Yifan Hao, Xuehai Zhou, Tianshi Chen, Qi Guo, Yunji Chen

    Abstract: Heterogeneous deep learning systems (DLS) such as GPUs and ASICs have been widely deployed in industrial data centers, which requires to develop multiple low-level tensor programs for different platforms. An attractive solution to relieve the programming burden is to transcompile the legacy code of one platform to others. However, current transcompilation techniques struggle with either tremendous… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted to OSDI 2025

  7. arXiv:2505.01077  [pdf

    cs.NE

    Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM

    Authors: Lei Zhao, Ling Kang, Quan Guo

    Abstract: With the advent of artificial intelligence (AI), many researchers are attempting to extract structured information from document-level biomedical literature by fine-tuning large language models (LLMs). However, they face significant challenges such as the need for expensive hardware, like high-performance GPUs and the high labor costs associated with annotating training datasets, especially in bio… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  8. arXiv:2504.19456  [pdf, other

    cs.CR cs.SE

    FCGHunter: Towards Evaluating Robustness of Graph-Based Android Malware Detection

    Authors: Shiwen Song, Xiaofei Xie, Ruitao Feng, Qi Guo, Sen Chen

    Abstract: Graph-based detection methods leveraging Function Call Graphs (FCGs) have shown promise for Android malware detection (AMD) due to their semantic insights. However, the deployment of malware detectors in dynamic and hostile environments raises significant concerns about their robustness. While recent approaches evaluate the robustness of FCG-based detectors using adversarial attacks, their effecti… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 14 pages, 5 figures

  9. arXiv:2504.19398  [pdf

    cs.CV

    Dynamic Arthroscopic Navigation System for Anterior Cruciate Ligament Reconstruction Based on Multi-level Memory Architecture

    Authors: Shuo Wang, Weili Shi, Shuai Yang, Jiahao Cui, Qinwei Guo

    Abstract: This paper presents a dynamic arthroscopic navigation system based on multi-level memory architecture for anterior cruciate ligament (ACL) reconstruction surgery. The system extends our previously proposed markerless navigation method from static image matching to dynamic video sequence tracking. By integrating the Atkinson-Shiffrin memory model's three-level architecture (sensory memory, working… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 28 pages, 13 figures

    ACM Class: I.4.9; I.2.10; J.3; I.4.8; I.5.4

  10. arXiv:2504.18448  [pdf, other

    cs.CV

    NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration

    Authors: Haotian Dong, Xin Wang, Di Lin, Yipeng Wu, Qin Chen, Ruonan Liu, Kairui Yang, Ping Li, Qing Guo

    Abstract: High-quality video generation is crucial for many fields, including the film industry and autonomous driving. However, generating videos with spatiotemporal consistencies remains challenging. Current methods typically utilize attention mechanisms or modify noise to achieve consistent videos, neglecting global spatiotemporal information that could help ensure spatial and temporal consistency during… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  11. arXiv:2504.17990  [pdf, other

    cs.CV

    From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

    Authors: Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

    Abstract: Composed Image Retrieval (CIR) is a challenging multimodal task that retrieves a target image based on a reference image and accompanying modification text. Due to the high cost of annotating CIR triplet datasets, zero-shot (ZS) CIR has gained traction as a promising alternative. Existing studies mainly focus on projection-based methods, which map an image to a single pseudo-word token. However, t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  12. arXiv:2504.17815  [pdf, other

    cs.CV

    Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning

    Authors: Mingxuan Cui, Qing Guo, Yuyi Wang, Hongkai Yu, Di Lin, Qin Zou, Ming-Ming Cheng, Xi Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful and efficient 3D representation for novel view synthesis. This paper extends 3DGS capabilities to inpainting, where masked objects in a scene are replaced with new contents that blend seamlessly with the surroundings. Unlike 2D image inpainting, 3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and sem… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 14 pages, 12 figures, ICCV

  13. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  14. arXiv:2504.12739  [pdf, other

    cs.CV

    Mask Image Watermarking

    Authors: Runyi Hu, Jie Zhang, Shiqian Zhao, Nils Lukas, Jiwei Li, Qing Guo, Han Qiu, Tianwei Zhang

    Abstract: We present MaskMark, a simple, efficient and flexible framework for image watermarking. MaskMark has two variants: MaskMark-D, which supports global watermark embedding, watermark localization, and local watermark extraction for applications such as tamper detection, and MaskMark-ED, which focuses on local watermark embedding and extraction with enhanced robustness in small regions, enabling local… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 23 pages, 18 figures

  15. arXiv:2504.12132  [pdf, other

    cs.CV

    Weakly Semi-supervised Whole Slide Image Classification by Two-level Cross Consistency Supervision

    Authors: Linhao Qu, Shiman Li, Xiaoyuan Luo, Shaolei Liu, Qinhao Guo, Manning Wang, Zhijian Song

    Abstract: Computer-aided Whole Slide Image (WSI) classification has the potential to enhance the accuracy and efficiency of clinical pathological diagnosis. It is commonly formulated as a Multiple Instance Learning (MIL) problem, where each WSI is treated as a bag and the small patches extracted from the WSI are considered instances within that bag. However, obtaining labels for a large number of bags is a… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  16. arXiv:2504.11346  [pdf, other

    cs.CV

    Seedream 3.0 Technical Report

    Authors: Yu Gao, Lixue Gong, Qiushan Guo, Xiaoxia Hou, Zhichao Lai, Fanshi Li, Liang Li, Xiaochen Lian, Chao Liao, Liyang Liu, Wei Liu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Rui Wang, Xuanda Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Zhonghua Zhai , et al. (6 additional authors not shown)

    Abstract: We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 st… ▽ More

    Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: Seedream 3.0 Technical Report

  17. arXiv:2504.11202  [pdf, other

    cs.CV eess.IV eess.SP

    Focal Split: Untethered Snapshot Depth from Differential Defocus

    Authors: Junjie Luo, John Mamish, Alan Fu, Thomas Concannon, Josiah Hester, Emma Alexander, Qi Guo

    Abstract: We introduce Focal Split, a handheld, snapshot depth camera with fully onboard power and computing based on depth-from-differential-defocus (DfDD). Focal Split is passive, avoiding power consumption of light sources. Its achromatic optical system simultaneously forms two differentially defocused images of the scene, which can be independently captured using two photosensors in a snapshot. The data… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: CVPR 2025, 8 pages, 7 figures

    MSC Class: 68U10 ACM Class: I.4.8

  18. arXiv:2504.09474  [pdf, other

    cs.SE cs.AI cs.OS

    MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions

    Authors: Pucheng Dang, Di Huang, Dong Li, Kang Chen, Yuanbo Wen, Qi Guo, Xing Hu, Ninghui Sun

    Abstract: Out-of-tree kernel patches are essential for adapting the Linux kernel to new hardware or enabling specific functionalities. Maintaining and updating these patches across different kernel versions demands significant effort from experienced engineers. Large language models (LLMs) have shown remarkable progress across various domains, suggesting their potential for automating out-of-tree kernel pat… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  19. arXiv:2504.09130  [pdf, other

    cs.CL

    VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

    Authors: Yikun Wang, Siyin Wang, Qinyuan Cheng, Zhaoye Fei, Liang Ding, Qipeng Guo, Dacheng Tao, Xipeng Qiu

    Abstract: Recent advancements in Large Vision-Language Models have showcased remarkable capabilities. However, they often falter when confronted with complex reasoning tasks that humans typically address through visual aids and deliberate, step-by-step thinking. While existing methods have explored text-based slow thinking or rudimentary visual assistance, they fall short of capturing the intricate, interle… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: 12 pages

  20. arXiv:2504.04827  [pdf, other

    cs.CV cs.AI

    From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

    Authors: Long Ma, Zhiyuan Yan, Yize Chen, Jin Xu, Qinglang Guo, Hu Huang, Yong Liao, Hui Lin

    Abstract: Detecting deepfakes has been an increasingly important topic, especially given the rapid development of AI generation techniques. In this paper, we ask: How can we build a universal detection framework that is effective for most facial deepfakes? One significant challenge is the wide variety of deepfake generators available, resulting in varying forgery artifacts (e.g., lighting inconsistency, col… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  21. arXiv:2504.02264  [pdf, other

    cs.CV

    MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

    Authors: Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu

    Abstract: Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emo… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  22. arXiv:2503.23708  [pdf, other

    cs.RO cs.AI

    Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios

    Authors: Jingzheng Li, Xianglong Liu, Shikui Wei, Zhijun Chen, Bing Li, Qing Guo, Xianqi Yang, Yanjun Pu, Jiakai Wang

    Abstract: Autonomous driving has made significant progress in both academia and industry, including performance improvements in perception task and the development of end-to-end autonomous driving systems. However, the safety and robustness assessment of autonomous driving has not received sufficient attention. Current evaluations of autonomous driving are typically conducted in natural driving scenarios. H… ▽ More

    Submitted 7 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  23. arXiv:2503.23606  [pdf, other

    cs.CV

    Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries

    Authors: Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo

    Abstract: Extracting depth information from photon-limited, defocused images is challenging because depth from defocus (DfD) relies on accurate estimation of defocus blur, which is fundamentally sensitive to image noise. We present a novel approach to robustly measure object depths from photon-limited images along the defocused boundaries. It is based on a new image patch representation, Blurry-Edges, that… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025. Project page: https://blurry-edges.qiguo.org/

  24. arXiv:2503.23496  [pdf, other

    cs.AR

    FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption

    Authors: Shangyi Shi, Husheng Han, Jianan Mu, Xinyao Zheng, Ling Liang, Hang Lu, Zidong Du, Xiaowei Li, Xing Hu, Qi Guo

    Abstract: Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelera… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 9 pages,ICCAD

  25. arXiv:2503.21227  [pdf, other

    cs.CL

    LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models

    Authors: Hengyuan Zhao, Ziqin Wang, Qixin Sun, Kaiyou Song, Yilin Li, Xiaolin Hu, Qingpei Guo, Si Liu

    Abstract: Although applying Mixture of Experts to large language models for learning new tasks is widely regarded as an effective strategy for continuous learning, there still remain two major challenges: (1) As the number of tasks grows, simple parameter expansion strategies can lead to excessively large models. (2) Modifying the parameters of the existing router results in the erosion of previously acquir… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Preprint

  26. arXiv:2503.20184  [pdf, other

    cs.CV eess.IV

    Spectrum from Defocus: Fast Spectral Imaging with Chromatic Focal Stack

    Authors: M. Kerem Aydin, Yi-Chun Hung, Jaclyn Pytlarz, Qi Guo, Emma Alexander

    Abstract: Hyperspectral cameras face harsh trade-offs between spatial, spectral, and temporal resolution in an inherently low-photon regime. Computational imaging systems break through these trade-offs with compressive sensing, but require complex optics and/or extensive compute. We present Spectrum from Defocus (SfD), a chromatic focal sweep method that recovers state-of-the-art hyperspectral images with a… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  27. arXiv:2503.17029  [pdf, other

    cs.CV

    AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process

    Authors: Junjie Hu, Shuyong Gao, Qianyu Guo, Yan Wang, Qishan Wang, Yuang Feng, Wenqiang Zhang

    Abstract: Humans can intuitively decompose an image into a sequence of strokes to create a painting, yet existing methods for generating drawing processes are limited to specific data types and often rely on expensive human-annotated datasets. We propose a novel self-supervised framework for generating drawing processes from any type of image, treating the task as a video generation problem. Our approach re… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  28. arXiv:2503.13109  [pdf, other

    cs.CL

    Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

    Authors: Kedi Chen, Zhikai Lei, Fan Zhang, Yinqi Zhang, Qin Chen, Jie Zhou, Liang He, Qipeng Guo, Kai Chen, Wei Zhang

    Abstract: Large language models make remarkable progress in reasoning capabilities. Existing works focus mainly on deductive reasoning tasks (e.g., code and math), while another type of reasoning mode that better aligns with human learning, inductive reasoning, is not well studied. We attribute the reason to the fact that obtaining high-quality process supervision data is challenging for inductive reasoning… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  29. arXiv:2503.08625  [pdf, other

    cs.CV

    SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

    Authors: Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang, Chunhua Shen

    Abstract: While MLLMs have demonstrated adequate image understanding capabilities, they still struggle with pixel-level comprehension, limiting their practical applications. Current evaluation tasks like VQA and visual grounding remain too coarse to assess fine-grained pixel comprehension accurately. Though segmentation is foundational for pixel-level understanding, existing methods often require MLLMs to g… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: CVPR2025;Code will be released at \url{https://github.com/aim-uofa/SegAgent}

  30. arXiv:2503.04385  [pdf, other

    cs.CV

    Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

    Authors: Yihao Huang, Xin Luo, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Weikai Miao, Geguang Pu, Yang Liu

    Abstract: The advent of local continuous image function (LIIF) has garnered significant attention for arbitrary-scale super-resolution (SR) techniques. However, while the vulnerabilities of fixed-scale SR have been assessed, the robustness of continuous representation-based arbitrary-scale SR against adversarial attacks remains an area warranting further exploration. The elaborately designed adversarial att… ▽ More

    Submitted 12 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 17 pages, accepted by TIFS 2025

  31. arXiv:2503.04249  [pdf, other

    cs.LG cs.AI

    How to Mitigate Overfitting in Weak-to-strong Generalization?

    Authors: Junhao Shi, Qinyuan Cheng, Zhaoye Fei, Yining Zheng, Qipeng Guo, Xipeng Qiu

    Abstract: Aligning powerful AI models on tasks that surpass human evaluation capabilities is the central problem of \textbf{superalignment}. To address this problem, weak-to-strong generalization aims to elicit the capabilities of strong models through weak supervisors and ensure that the behavior of strong models aligns with the intentions of weak supervisors without unsafe behaviors such as deception. Alt… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  32. arXiv:2503.03528  [pdf, other

    cs.CV cs.AI

    AdaSin: Enhancing Hard Sample Metrics with Dual Adaptive Penalty for Face Recognition

    Authors: Qiqi Guo, Zhuowen Zheng, Guanghua Yang, Zhiquan Liu, Xiaofan Li, Jianqing Li, Jinyu Tian, Xueyuan Gong

    Abstract: In recent years, the emergence of deep convolutional neural networks has positioned face recognition as a prominent research focus in computer vision. Traditional loss functions, such as margin-based, hard-sample mining-based, and hybrid approaches, have achieved notable performance improvements, with some leveraging curriculum learning to optimize training. However, these methods often fall short… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  33. arXiv:2503.02242  [pdf, other

    cs.CV eess.IV

    $\mathbfΦ$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data

    Authors: Xidan Zhang, Yihan Zhuang, Qian Guo, Haodong Yang, Xuelin Qian, Gong Cheng, Junwei Han, Zhongling Huang

    Abstract: Approaches for improving generative adversarial networks (GANs) training under a few samples have been explored for natural images. However, these methods have limited effectiveness for synthetic aperture radar (SAR) images, as they do not account for the unique electromagnetic scattering properties of SAR. To remedy this, we propose a physics-inspired regularization method dubbed $Φ$-GAN, which i… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  34. arXiv:2503.00957  [pdf, other

    cs.SD cs.AI cs.CR eess.AS

    Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks

    Authors: Chang Liu, Haolin Wu, Xi Yang, Kui Zhang, Cong Wu, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang

    Abstract: As speech translation (ST) systems become increasingly prevalent, understanding their vulnerabilities is crucial for ensuring robust and reliable communication. However, limited work has explored this issue in depth. This paper explores methods of compromising these systems through imperceptible audio manipulations. Specifically, we present two innovative approaches: (1) the injection of perturbat… ▽ More

    Submitted 4 March, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: Preprint,17 pages, 17 figures

  35. arXiv:2503.00784  [pdf, other

    cs.CL

    DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

    Authors: Kai Lv, Honglin Guo, Qipeng Guo, Xipeng Qiu

    Abstract: Large language models (LLMs) exhibit exceptional performance across a wide range of tasks; however, their token-by-token autoregressive generation process significantly hinders inference speed. Speculative decoding presents a promising draft-then-verify framework that reduces generation latency while maintaining output distribution fidelity. Nevertheless, the draft model introduces additional comp… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  36. arXiv:2502.20154  [pdf, other

    cs.CV

    Cutting-edge 3D reconstruction solutions for underwater coral reef images: A review and comparison

    Authors: Jiageng Zhong, Ming Li, Armin Gruen, Konrad Schindler, Xuan Liao, Qinghua Guo

    Abstract: Corals serve as the foundational habitat-building organisms within reef ecosystems, constructing extensive structures that extend over vast distances. However, their inherent fragility and vulnerability to various threats render them susceptible to significant damage and destruction. The application of advanced 3D reconstruction technologies for high-quality modeling is crucial for preserving them… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  37. arXiv:2502.19279  [pdf, other

    cs.CL

    CritiQ: Mining Data Quality Criteria from Human Preferences

    Authors: Honglin Guo, Kai Lv, Qipeng Guo, Tianyi Liang, Zhiheng Xi, Demin Song, Qiuyinzhe Zhang, Yu Sun, Kai Chen, Xipeng Qiu, Tao Gui

    Abstract: Language model heavily depends on high-quality data for optimal performance. Existing approaches rely on manually designed heuristics, the perplexity of existing models, training classifiers, or careful prompt engineering, which require significant expert experience and human annotation effort while introduce biases. We introduce CritiQ, a novel data selection method that automatically mines crite… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  38. arXiv:2502.19125  [pdf, other

    cs.CV

    The NeRF Signature: Codebook-Aided Watermarking for Neural Radiance Fields

    Authors: Ziyuan Luo, Anderson Rocha, Boxin Shi, Qing Guo, Haoliang Li, Renjie Wan

    Abstract: Neural Radiance Fields (NeRF) have been gaining attention as a significant form of 3D content representation. With the proliferation of NeRF-based creations, the need for copyright protection has emerged as a critical issue. Although some approaches have been proposed to embed digital watermarks into NeRF, they often neglect essential model-level considerations and incur substantial time overheads… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 16 pages, accepted by TPAMI

  39. arXiv:2502.19108  [pdf, other

    cs.IR cs.MM

    A 106K Multi-Topic Multilingual Conversational User Dataset with Emoticons

    Authors: Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Qinglang Guo, Min Zhang

    Abstract: Instant messaging has become a predominant form of communication, with texts and emoticons enabling users to express emotions and ideas efficiently. Emoticons, in particular, have gained significant traction as a medium for conveying sentiments and information, leading to the growing importance of emoticon retrieval and recommendation systems. However, one of the key challenges in this area has be… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  40. arXiv:2502.18778  [pdf, other

    cs.LG cs.AI cs.CL

    M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

    Authors: Qingpei Guo, Kaiyou Song, Zipeng Feng, Ziping Ma, Qinglong Zhang, Sirui Gao, Xuzheng Yu, Yunxiao Sun, Tai-Wei Chang, Jingdong Chen, Ming Yang, Jun Zhou

    Abstract: We present M2-omni, a cutting-edge, open-source omni-MLLM that achieves competitive performance to GPT-4o. M2-omni employs a unified multimodal sequence modeling framework, which empowers Large Language Models(LLMs) to acquire comprehensive cross-modal understanding and generation capabilities. Specifically, M2-omni can process arbitrary combinations of audio, video, image, and text modalities as… ▽ More

    Submitted 7 April, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  41. arXiv:2502.17940  [pdf, ps, other

    cs.DS

    Optimal Approximate Matrix Multiplication over Sliding Window

    Authors: Haoming Xian, Qintian Guo, Jun Zhang, Sibo Wang

    Abstract: Matrix multiplication is a core operation in numerous applications, yet its exact computation becomes prohibitively expensive as data scales, especially in streaming environments where timeliness is critical. In many real-world scenarios, data arrives continuously, making it essential to focus on recent information via sliding windows. While existing approaches offer approximate solutions, they of… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  42. arXiv:2502.17129  [pdf, other

    cs.CL

    Thus Spake Long-Context Large Language Model

    Authors: Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu

    Abstract: Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation

  43. arXiv:2502.15260  [pdf, other

    cs.CL

    LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design

    Authors: Renjie Wei, Songqiang Xu, Linfeng Zhong, Zebin Yang, Qingyu Guo, Yuan Wang, Runsheng Wang, Meng Li

    Abstract: State space models (SSMs) like Mamba have recently attracted much attention. Compared to Transformer-based large language models (LLMs), Mamba achieves linear computation complexity with the sequence length and demonstrates superior performance. However, Mamba is hard to accelerate due to the scattered activation outliers and the complex computation dependency, rendering existing LLM accelerators… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: Accepted by DATE 2025

  44. arXiv:2502.14837  [pdf, other

    cs.CL cs.AI

    Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

    Authors: Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao Gui

    Abstract: Multi-head Latent Attention (MLA) is an innovative architecture proposed by DeepSeek, designed to ensure efficient and economical inference by significantly compressing the Key-Value (KV) cache into a latent vector. Compared to MLA, standard LLMs employing Multi-Head Attention (MHA) and its variants such as Grouped-Query Attention (GQA) exhibit significant cost disadvantages. Enabling well-trained… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 16 pages, 8 figures

  45. arXiv:2502.14529  [pdf, other

    cs.CL cs.AI

    CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models

    Authors: Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo

    Abstract: Large Language Model-based Multi-Agent Systems (LLM-MASs) have demonstrated remarkable real-world capabilities, effectively collaborating to complete complex tasks. While these systems are designed with safety mechanisms, such as rejecting harmful instructions through alignment, their security remains largely unexplored. This gap leaves LLM-MASs vulnerable to targeted disruptions. In this paper, w… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  46. arXiv:2502.11476  [pdf, other

    cs.CL

    FastMCTS: A Simple Sampling Strategy for Data Synthesis

    Authors: Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo

    Abstract: Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strat… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: work in progress

  47. arXiv:2502.11460  [pdf, other

    cs.CL cs.SE

    UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

    Authors: Yichuan Ma, Yunfan Shao, Peiji Li, Demin Song, Qipeng Guo, Linyang Li, Xipeng Qiu, Kai Chen

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale pre-training data and (ii) synthesizing instruction data through prompt engineering with powerful models. While pre-training data faces quality consistency issues… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: work in progress

  48. arXiv:2502.09533  [pdf, other

    cs.CV

    Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model

    Authors: Fei Shen, Cong Wang, Junyao Gao, Qin Guo, Jisheng Dang, Jinhui Tang, Tat-Seng Chua

    Abstract: Recent advances in conditional diffusion models have shown promise for generating realistic TalkingFace videos, yet challenges persist in achieving consistent head movement, synchronized facial expressions, and accurate lip synchronization over extended generations. To address these, we introduce the \textbf{M}otion-priors \textbf{C}onditional \textbf{D}iffusion \textbf{M}odel (\textbf{MCDM}), whi… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  49. arXiv:2502.08172  [pdf, other

    cs.SE

    Intention is All You Need: Refining Your Code from Your Intention

    Authors: Qi Guo, Xiaofei Xie, Shangqing Liu, Ming Hu, Xiaohong Li, Lei Bu

    Abstract: Code refinement aims to enhance existing code by addressing issues, refactoring, and optimizing to improve quality and meet specific requirements. As software projects scale in size and complexity, the traditional iterative exchange between reviewers and developers becomes increasingly burdensome. While recent deep learning techniques have been explored to accelerate this process, their performanc… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  50. arXiv:2502.07787  [pdf

    cs.CY

    A Simulation-Based Framework for Leveraging Shared Autonomous Vehicles to Enhance Disaster Evacuations in Rural Regions with a Focus on Vulnerable Populations

    Authors: Alican Sevim, Qian-wen Guo, Eren Erman Ozguven

    Abstract: Rapid advancements in autonomous vehicles (AVs) are poised to revolutionize transportation and communities, including disaster evacuations, particularly through the deployment of Shared Autonomous Vehicles (SAVs). Despite the potential, the use of SAVs in rural disaster evacuations remains an underexplored area. To address this gap, this study proposes a simulation-based framework that integrates… ▽ More

    Submitted 14 February, 2025; v1 submitted 19 January, 2025; originally announced February 2025.