Skip to main content

Showing 1–50 of 459 results for author: Yu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01055  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Prompt Mechanisms in Medical Imaging: A Comprehensive Survey

    Authors: Hao Yang, Xinlong Liang, Zhang Li, Yue Sun, Zheyu Hu, Xinghe Xie, Behdad Dashtbozorg, Jincheng Huang, Shiwei Zhu, Luyi Han, Jiong Zhang, Shanshan Wang, Ritse Mann, Qifeng Yu, Tao Tan

    Abstract: Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performa… ▽ More

    Submitted 27 June, 2025; originally announced July 2025.

  2. arXiv:2506.22720  [pdf, ps, other

    cs.CV

    Deterministic Object Pose Confidence Region Estimation

    Authors: Jinghao Wang, Zhang Li, Zi Wang, Banglei Guan, Yang Shang, Qifeng Yu

    Abstract: 6D pose confidence region estimation has emerged as a critical direction, aiming to perform uncertainty quantification for assessing the reliability of estimated poses. However, current sampling-based approach suffers from critical limitations that severely impede their practical deployment: 1) the sampling speed significantly decreases as the number of samples increases. 2) the derived confidence… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  3. arXiv:2506.18919  [pdf, ps, other

    cs.CL cs.AI cs.CV

    MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection

    Authors: Hexiang Gu, Qifan Yu, Saihui Hou, Zhiqin Fang, Huijia Wu, Zhaofeng He

    Abstract: The rapid development of social media has intensified the spread of harmful content. Harmful memes, which integrate both images and text, pose significant challenges for automated detection due to their implicit semantics and complex multimodal interactions. Although existing research has made progress in detection accuracy and interpretability, the lack of a systematic, large-scale, diverse, and… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  4. arXiv:2506.18559  [pdf, ps, other

    cs.AI cs.LO

    T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent

    Authors: Hong Qing Yu

    Abstract: Large language models excel at generating fluent text but frequently struggle with structured reasoning involving temporal constraints, causal relationships, and probabilistic reasoning. To address these limitations, we propose Temporal Causal Probabilistic Description Logic (T-CPDL), an integrated framework that extends traditional Description Logic with temporal interval operators, explicit caus… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    ACM Class: I.2.7; F.4.1

  5. arXiv:2506.16050  [pdf, ps, other

    cs.RO cs.CV

    Noise Fusion-based Distillation Learning for Anomaly Detection in Complex Industrial Environments

    Authors: Jiawen Yu, Jieji Ren, Yang Chang, Qiaojun Yu, Xuan Tong, Boyang Wang, Yan Song, You Li, Xinji Mai, Wenqiang Zhang

    Abstract: Anomaly detection and localization in automated industrial manufacturing can significantly enhance production efficiency and product quality. Existing methods are capable of detecting surface defects in pre-defined or controlled imaging environments. However, accurately detecting workpiece defects in complex and unstructured industrial environments with varying views, poses and illumination remain… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IROS 2025 Oral

  6. arXiv:2506.15050  [pdf, ps, other

    cs.AI

    Truncated Proximal Policy Optimization

    Authors: Tiantian Fan, Lingjun Liu, Yu Yue, Jiaze Chen, Chengyi Wang, Qiying Yu, Chi Zhang, Zhiqi Lin, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Bole Ma, Mofan Zhang, Gaohong Liu, Ru Zhang, Haotian Zhou, Cong Xie, Ruidong Zhu, Zhi Zhang, Xin Liu, Mingxuan Wang, Lin Yan, Yonghui Wu

    Abstract: Recently, test-time scaling Large Language Models (LLMs) have demonstrated exceptional reasoning capabilities across scientific and professional tasks by generating long chains-of-thought (CoT). As a crucial component for developing these reasoning models, reinforcement learning (RL), exemplified by Proximal Policy Optimization (PPO) and its variants, allows models to learn through trial and error… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  7. arXiv:2506.13224  [pdf, ps, other

    cs.CV

    SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds

    Authors: Jinfeng Xu, Xianzhi Li, Yuan Tang, Xu Han, Qiao Yu, Yixue Hao, Long Hu, Min Chen

    Abstract: Recent advancements in deep learning have greatly enhanced 3D object recognition, but most models are limited to closed-set scenarios, unable to handle unknown samples in real-world applications. Open-set recognition (OSR) addresses this limitation by enabling models to both classify known classes and identify novel classes. However, current OSR methods rely on global features to differentiate kno… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 10 pages, conference

  8. arXiv:2506.09740  [pdf, ps, other

    cs.CV cs.AI

    ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models

    Authors: Qin Zhou, Zhiyang Zhang, Jinglong Wang, Xiaobin Li, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

    Abstract: Diffusion models excel at image generation. Recent studies have shown that these models not only generate high-quality images but also encode text-image alignment information through attention maps or loss functions. This information is valuable for various downstream tasks, including segmentation, text-guided image editing, and compositional image generation. However, current methods heavily rely… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  9. arXiv:2506.08933  [pdf, ps, other

    cs.CV

    What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities

    Authors: Wendong Bu, Yang Wu, Qifan Yu, Minghe Gao, Bingchen Miao, Zhenkui Zhang, Kaihang Pan, Yunfei Li, Mengze Li, Wei Ji, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: As multimodal large language models (MLLMs) advance, MLLM-based virtual agents have demonstrated remarkable performance. However, existing benchmarks face significant limitations, including uncontrollable task complexity, extensive manual annotation with limited scenarios, and a lack of multidimensional evaluation. In response to these challenges, we introduce OmniBench, a self-generating, cross-p… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025 (Oral)

  10. arXiv:2506.07616  [pdf, other

    cs.LG

    FuXi-Air: Urban Air Quality Forecasting Based on Emission-Meteorology-Pollutant multimodal Machine Learning

    Authors: Zhixin Geng, Xu Fan, Xiqiao Lu, Yan Zhang, Guangyuan Yu, Cheng Huang, Qian Wang, Yuewu Li, Weichun Ma, Qi Yu, Libo Wu, Hao Li

    Abstract: Air pollution has emerged as a major public health challenge in megacities. Numerical simulations and single-site machine learning approaches have been widely applied in air quality forecasting tasks. However, these methods face multiple limitations, including high computational costs, low operational efficiency, and limited integration with observational data. With the rapid advancement of artifi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  11. arXiv:2506.06796  [pdf, ps, other

    cs.IT

    Polarized Element-pair Code Based FFMA over a Gaussian Multiple-access Channel

    Authors: Zhang-li-han Liu, Qi-yue Yu

    Abstract: This paper presents polarized element-pair (EP) codes for polarization-adjusted finite-field multiple-access (PA-FFMA) systems. The core innovation of FFMA systems lies in their unique processing order that exchanges the conventional sequence of channel coding and multiplexing operations, effectively solving the multiuser finite-blocklength (FBL) problem while enhancing error performance. In this… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 13 pages 8 figures

  12. Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement

    Authors: Taihang Lei, Banglei Guan, Minzu Liang, Xiangyu Li, Jianbing Liu, Jing Tao, Yang Shang, Qifeng Yu

    Abstract: The characterization of mechanical properties for high-dynamic, high-velocity target motion is essential in industries. It provides crucial data for validating weapon systems and precision manufacturing processes etc. However, existing measurement methods face challenges such as limited dynamic range, discontinuous observations, and high costs. This paper presents a new approach leveraging an even… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 9 pages, 9 figures, 1 table. This paper was accepted by Acta Mechanica Sinica (Date:30.May 2025)

  13. arXiv:2506.00541  [pdf, ps, other

    cs.CV

    3D Trajectory Reconstruction of Moving Points Based on Asynchronous Cameras

    Authors: Huayu Huang, Banglei Guan, Yang Shang, Qifeng Yu

    Abstract: Photomechanics is a crucial branch of solid mechanics. The localization of point targets constitutes a fundamental problem in optical experimental mechanics, with extensive applications in various missions of UAVs. Localizing moving targets is crucial for analyzing their motion characteristics and dynamic properties. Reconstructing the trajectories of points from asynchronous cameras is a signific… ▽ More

    Submitted 2 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

    Comments: This paper has been accepted by Acta Mechanica Sinica

  14. arXiv:2505.24567  [pdf, ps, other

    cs.CV

    Unleashing the Power of Intermediate Domains for Mixed Domain Semi-Supervised Medical Image Segmentation

    Authors: Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised med… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE TMI 2025. arXiv admin note: text overlap with arXiv:2404.08951

  15. arXiv:2505.24499  [pdf, ps, other

    cs.CV

    Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation

    Authors: Ximing Xing, Yandong Guan, Jing Zhang, Dong Xu, Qian Yu

    Abstract: Generating high-quality Scalable Vector Graphics (SVGs) is challenging for Large Language Models (LLMs), as it requires advanced reasoning for structural validity, semantic faithfulness, and visual coherence -- capabilities in which current LLMs often fall short. In this work, we introduce Reason-SVG, a novel framework designed to enhance LLM reasoning for SVG generation. Reason-SVG pioneers the "… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 17 pages, 5 figures

  16. arXiv:2505.22661  [pdf, ps, other

    cs.CL

    GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

    Authors: Qingchen Yu, Zifan Zheng, Ding Chen, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: The evaluation of large language models (LLMs) has traditionally relied on static benchmarks, a paradigm that poses two major limitations: (1) predefined test sets lack adaptability to diverse application domains, and (2) standardized evaluation protocols often fail to capture fine-grained assessments of domain-specific knowledge and contextual reasoning abilities. To overcome these challenges, we… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025

  17. arXiv:2505.22461  [pdf, ps, other

    cs.CV

    SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels

    Authors: Qiucheng Yu, Yuan Xie, Xin Tan

    Abstract: 3D occupancy prediction has attracted much attention in the field of autonomous driving due to its powerful geometric perception and object recognition capabilities. However, existing methods have not explored the most essential distribution patterns of voxels, resulting in unsatisfactory results. This paper first explores the inter-class distribution and geometric distribution of voxels, thereby… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  18. arXiv:2505.22159  [pdf, ps, other

    cs.RO cs.CV

    ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

    Authors: Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, Cewu Lu, Wenqiang Zhang

    Abstract: Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose \textbf{ForceVLA}, a novel end-to-end manipulation f… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  19. arXiv:2505.22101  [pdf, other

    cs.CL

    MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

    Authors: Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, Feiyu Xiong

    Abstract: Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  20. arXiv:2505.21366  [pdf, ps, other

    cs.LG

    PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment

    Authors: Qi Yu, Zhichen Zeng, Yuchen Yan, Zhining Liu, Baoyu Jing, Ruizhong Qiu, Ariful Azad, Hanghang Tong

    Abstract: Network alignment (NA) aims to identify node correspondence across different networks and serves as a critical cornerstone behind various downstream multi-network learning tasks. Despite growing research in NA, there lacks a comprehensive library that facilitates the systematic development and benchmarking of NA methods. In this work, we introduce PLANETALIGN, a comprehensive Python library for ne… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  21. arXiv:2505.20809  [pdf, ps, other

    cs.CL

    Improved Representation Steering for Language Models

    Authors: Zhengxuan Wu, Qinan Yu, Aryaman Arora, Christopher D. Manning, Christopher Potts

    Abstract: Steering methods for language models (LMs) seek to provide fine-grained and interpretable control over model generations by variously changing model inputs, weights, or representations to adjust behavior. Recent work has shown that adjusting weights or representations is often less effective than steering by prompting, for instance when wanting to introduce or suppress a particular concept. We dem… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 46 pages, 23 figures, preprint

  22. arXiv:2505.19914  [pdf, ps, other

    cs.CL cs.AI

    Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

    Authors: Jiangjie Chen, Qianyu He, Siyu Yuan, Aili Chen, Zhicheng Cai, Weinan Dai, Hongli Yu, Qiying Yu, Xuefeng Li, Jiaze Chen, Hao Zhou, Mingxuan Wang

    Abstract: Large Language Models (LLMs), such as OpenAI's o1 and DeepSeek's R1, excel at advanced reasoning tasks like math and coding via Reinforcement Learning with Verifiable Rewards (RLVR), but still struggle with puzzles solvable by humans without domain knowledge. We introduce Enigmata, the first comprehensive suite tailored for improving LLMs with puzzle reasoning skills. It includes 36 tasks across s… ▽ More

    Submitted 9 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  23. arXiv:2505.19713  [pdf, ps, other

    cs.GR

    CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

    Authors: Yandong Guan, Xilin Wang, Xingxi Ming, Jing Zhang, Dong Xu, Qian Yu

    Abstract: In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  24. arXiv:2505.19492  [pdf, ps, other

    cs.CV

    ViewCraft3D: High-Fidelity and View-Consistent 3D Vector Graphics Synthesis

    Authors: Chuang Wang, Haitao Zhou, Ling Luo, Qian Yu

    Abstract: 3D vector graphics play a crucial role in various applications including 3D shape retrieval, conceptual design, and virtual reality interactions due to their ability to capture essential structural information with minimal representation. While recent approaches have shown promise in generating 3D vector graphics, they often suffer from lengthy processing times and struggle to maintain view consis… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  25. arXiv:2505.17629  [pdf, ps, other

    cs.HC cs.AI

    TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments

    Authors: Yuheng Lu, Qian Yu, Hongru Wang, Zeming Liu, Wei Su, Yanping Liu, Yuhang Guo, Maocheng Liang, Yunhong Wang, Haifeng Wang

    Abstract: Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025 Findings

  26. arXiv:2505.15380  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding

    Authors: Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu

    Abstract: Modern autoregressive speech synthesis models leveraging language models have demonstrated remarkable performance. However, the sequential nature of next token prediction in these models leads to significant latency, hindering their deployment in scenarios where inference speed is critical. In this work, we propose Speech Speculative Decoding (SSD), a novel framework for autoregressive speech synt… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH 2025

  27. arXiv:2505.15091  [pdf, other

    cs.IR cs.AI

    ThinkRec: Thinking-based recommendation via LLM

    Authors: Qihang Yu, Kairui Fu, Shengyu Zhang, Zheqi Lv, Fan Wu, Fei Wu

    Abstract: Recent advances in large language models (LLMs) have enabled more semantic-aware recommendations through natural language generation. Existing LLM for recommendation (LLM4Rec) methods mostly operate in a System 1-like manner, relying on superficial features to match similar items based on click history, rather than reasoning through deeper behavioral logic. This often leads to superficial and erro… ▽ More

    Submitted 24 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  28. arXiv:2505.14687  [pdf, ps, other

    cs.CV

    Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers

    Authors: Sucheng Ren, Qihang Yu, Ju He, Alan Yuille, Liang-Chieh Chen

    Abstract: Diffusion-based Transformers have demonstrated impressive generative capabilities, but their high computational costs hinder practical deployment, for example, generating an $8192\times 8192$ image can take over an hour on an A100 GPU. In this work, we propose GRAT (\textbf{GR}ouping first, \textbf{AT}tending smartly), a training-free attention acceleration strategy for fast image and video genera… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Project website at oliverrensu.github.io/project/GRAT

  29. arXiv:2505.13073  [pdf, other

    cs.SE cs.AI

    Structure-Aware Corpus Construction and User-Perception-Aligned Metrics for Large-Language-Model Code Completion

    Authors: Dengfeng Liu, Jucai Zhai, Xiaoguang Jiang, Ziqun Li, Qianjin Yu, Feng Liu, Rui Ye, Huang Liu, Zhiguo Yang, Yongsheng Du, Fang Tan

    Abstract: Code completion technology based on large language model has significantly improved the development efficiency of programmers. However, in practical applications, there remains a gap between current commonly used code completion evaluation metrics and users' actual perception. To address this issue, we propose two evaluation metrics for code completion tasks--LCP and ROUGE-LCP, from the perspectiv… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 14 pages,8 figures

  30. arXiv:2505.10940  [pdf, ps, other

    cs.IR cs.AI

    Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation

    Authors: Qing Yu, Xiaobei Wang, Shuchang Liu, Yandong Bai, Xiaoyu Yang, Xueliang Wang, Chang Meng, Shanshan Wu, Hailan Yang, Huihui Xiao, Xiang Li, Fan Yang, Xiaoqiang Feng, Lantao Hu, Han Li, Kun Gai, Lixin Zou

    Abstract: Recommender systems filter contents/items valuable to users by inferring preferences from user features and historical behaviors. Mainstream approaches follow the learning-to-rank paradigm, which focus on discovering and modeling item topics (e.g., categories), and capturing user preferences on these topics based on historical interactions. However, this paradigm often neglects the modeling of use… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  31. arXiv:2505.10218  [pdf, ps, other

    cs.CL

    RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward

    Authors: Zongsheng Wang, Kaili Sun, Bowen Wu, Qun Yu, Ying Li, Baoxun Wang

    Abstract: Role-playing conversational agents (RPCAs) face persistent challenges in maintaining role consistency. To address this, we propose RAIDEN-R1, a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR). The method introduces both singular and multi-term mining strategies to generate quantifiable rewards by assessing role-specific keys. Additionally, we construc… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  32. arXiv:2505.07896  [pdf, ps, other

    q-bio.GN cs.AI

    Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

    Authors: Douglas Jiang, Zilin Dai, Luxuan Zhang, Qiyi Yu, Haoqi Sun, Feng Tian

    Abstract: Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, re… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  33. arXiv:2505.07198  [pdf, other

    cs.CV

    Ranking-aware Continual Learning for LiDAR Place Recognition

    Authors: Xufei Wang, Gengxuan Tian, Junqiao Zhao, Siyue Tao, Qiwen Gu, Qiankun Yu, Tiantian Feng

    Abstract: Place recognition plays a significant role in SLAM, robot navigation, and autonomous driving applications. Benefiting from deep learning, the performance of LiDAR place recognition (LPR) has been greatly improved. However, many existing learning-based LPR methods suffer from catastrophic forgetting, which severely harms the performance of LPR on previously trained places after training on a new en… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures

  34. arXiv:2505.02761  [pdf, other

    cs.DC

    Optimistic, Signature-Free Reliable Broadcast and Its Applications

    Authors: Nibesh Shrestha, Qianyu Yu, Aniket Kate, Giuliano Losa, Kartik Nayak, Xuechao Wang

    Abstract: Reliable broadcast (RBC) is a key primitive in fault-tolerant distributed systems, and improving its efficiency can benefit a wide range of applications. This work focuses on signature-free RBC protocols, which are particularly attractive due to their computational efficiency. Existing protocols in this setting incur an optimal 3 steps to reach a decision while tolerating up to $f < n/3$ Byzantine… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  35. arXiv:2504.21855  [pdf, other

    cs.CV

    ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction

    Authors: Qihao Liu, Ju He, Qihang Yu, Liang-Chieh Chen, Alan Yuille

    Abstract: In recent years, video generation has seen significant advancements. However, challenges still persist in generating complex motions and interactions. To address these challenges, we introduce ReVision, a plug-and-play framework that explicitly integrates parameterized 3D physical knowledge into a pretrained conditional video generation model, significantly enhancing its ability to generate high-q… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: Project Page: https://revision-video.github.io/

  36. arXiv:2504.21294  [pdf, other

    cs.CV

    Learning Multi-view Multi-class Anomaly Detection

    Authors: Qianzi Yu, Yang Cao, Yu Kang

    Abstract: The latest trend in anomaly detection is to train a unified model instead of training a separate model for each category. However, existing multi-class anomaly detection (MCAD) models perform poorly in multi-view scenarios because they often fail to effectively model the relationships and complementary information among different views. In this paper, we introduce a Multi-View Multi-Class Anomaly… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  37. arXiv:2504.20103  [pdf, other

    q-bio.QM cs.AI cs.LG

    Heterogeneous network drug-target interaction prediction model based on graph wavelet transform and multi-level contrastive learning

    Authors: Wenfeng Dai, Yanhong Wang, Shuai Yan, Qingzhi Yu, Xiang Cheng

    Abstract: Drug-target interaction (DTI) prediction is a core task in drug development and precision medicine in the biomedical field. However, traditional machine learning methods generally have the black box problem, which makes it difficult to reveal the deep correlation between the model decision mechanism and the interaction pattern between biological molecules. This study proposes a heterogeneous netwo… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  38. arXiv:2504.20102  [pdf, other

    cs.LG cs.AI q-bio.QM

    HyboWaveNet: Hyperbolic Graph Neural Networks with Multi-Scale Wavelet Transform for Protein-Protein Interaction Prediction

    Authors: Qingzhi Yu, Shuai Yan, Wenfeng Dai, Xiang Cheng

    Abstract: Protein-protein interactions (PPIs) are fundamental for deciphering cellular functions,disease pathways,and drug discovery.Although existing neural networks and machine learning methods have achieved high accuracy in PPI prediction,their black-box nature leads to a lack of causal interpretation of the prediction results and difficulty in capturing hierarchical geometries and multi-scale dynamic in… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 9 pages

  39. arXiv:2504.15928  [pdf, other

    cs.CV cs.AI

    A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers

    Authors: Meng Wang, Tian Lin, Qingshan Hou, Aidi Lin, Jingcheng Wang, Qingsheng Peng, Truong X. Nguyen, Danqi Fang, Ke Zou, Ting Xu, Cancan Xue, Ten Cheer Quek, Qinkai Yu, Minxin Liu, Hui Zhou, Zixuan Xiao, Guiqin He, Huiyu Liang, Tingkun Shi, Man Chen, Linna Liu, Yuanyuan Peng, Lianyu Wang, Qiuming Hu, Junhong Chen , et al. (15 additional authors not shown)

    Abstract: Artificial intelligence (AI) shows remarkable potential in medical imaging diagnostics, yet most current models require retraining when applied across different clinical settings, limiting their scalability. We introduce GlobeReady, a clinician-friendly AI platform that enables fundus disease diagnosis that operates without retraining, fine-tuning, or the needs for technical expertise. GlobeReady… ▽ More

    Submitted 23 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  40. arXiv:2504.15823  [pdf, other

    cs.CV cs.AI

    Human-Imperceptible Physical Adversarial Attack for NIR Face Recognition Models

    Authors: Songyan Xie, Jinghang Wen, Encheng Su, Qiucheng Yu

    Abstract: Near-infrared (NIR) face recognition systems, which can operate effectively in low-light conditions or in the presence of makeup, exhibit vulnerabilities when subjected to physical adversarial attacks. To further demonstrate the potential risks in real-world applications, we design a novel, stealthy, and practical adversarial patch to attack NIR face recognition systems in a black-box setting. We… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  41. arXiv:2504.14838  [pdf, other

    cs.AI

    Establishing Reliability Metrics for Reward Models in Large Language Models

    Authors: Yizhou Chen, Yawen Liu, Xuesi Wang, Qingtao Yu, Guangda Huzhang, Anxiang Zeng, Han Yu, Zhiming Zhou

    Abstract: The reward model (RM) that represents human preferences plays a crucial role in optimizing the outputs of large language models (LLMs), e.g., through reinforcement learning from human feedback (RLHF) or rejection sampling. However, a long challenge for RM is its uncertain reliability, i.e., LLM outputs with higher rewards may not align with actual human preferences. Currently, there is a lack of a… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  42. arXiv:2504.14704  [pdf, other

    cs.LG cs.AI eess.SY

    Can We Ignore Labels In Out of Distribution Detection?

    Authors: Hong Yang, Qi Yu, Travis Desel

    Abstract: Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection,… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  43. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  44. arXiv:2504.13592  [pdf, ps, other

    cs.CL

    Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling

    Authors: Zihao Feng, Xiaoxue Wang, Ziwei Bai, Donghang Su, Bowen Wu, Qun Yu, Baoxun Wang

    Abstract: Intent detection, a critical component in task-oriented dialogue (TOD) systems, faces significant challenges in adapting to the rapid influx of integrable tools with complex interrelationships. Existing approaches, such as zero-shot reformulations and LLM-based dynamic recognition, struggle with performance degradation when encountering unseen intents, leading to erroneous task routing. To enhance… ▽ More

    Submitted 20 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  45. arXiv:2504.13059  [pdf, other

    cs.RO cs.AI cs.CL

    RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins

    Authors: Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, Lunkai Lin, Zhiqiang Xie, Mingyu Ding, Ping Luo

    Abstract: In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: CVPR 2025 Highlight. 22 pages. Project page: https://robotwin-benchmark.github.io/

  46. arXiv:2504.11919  [pdf, other

    cs.AI

    Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

    Authors: Qianjin Yu, Keyu Wu, Zihan Chen, Chushu Zhang, Manlin Mei, Lingjun Huang, Fang Tan, Yongsheng Du, Kunlin Liu, Yurui Zhu

    Abstract: Recently, DeepSeek-R1 (671B) (DeepSeek-AIet al., 2025) has demonstrated its excellent reasoning ability in complex tasks and has publiclyshared its methodology. This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for g… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  47. arXiv:2504.10481  [pdf, other

    cs.CL

    xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

    Authors: Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li

    Abstract: With the release of the o1 model by OpenAI, reasoning models adopting slow thinking strategies have gradually emerged. As the responses generated by such models often include complex reasoning, intermediate steps, and self-reflection, existing evaluation methods are often inadequate. They struggle to determine whether the LLM output is truly equivalent to the reference answer, and also have diffic… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 32 pages

  48. arXiv:2504.06937  [pdf, ps, other

    cs.IT

    Finite Field Multiple Access III: from 2-ary to p-ary

    Authors: Qi-yue Yu

    Abstract: This paper extends finite-field multiple-access (FFMA) techniques from binary to general $p$-ary source transmission. We introduce element-assemblage (EA) codes over GF($p^m$), which generalize element-pair (EP) codes, and define two specific types for ternary transmission: orthogonal EA codes and double codeword EA (D-CWEA) codes. We propose a unique sum-pattern mapping (USPM) constraint for the… ▽ More

    Submitted 17 June, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: 50 pages, 5 figures

  49. arXiv:2504.06675  [pdf, other

    cs.CV

    Probability Density Geodesics in Image Diffusion Latent Space

    Authors: Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang, Peter Henry Tu, Jing Zhang, Hongdong Li, Richard Hartley, Dylan Campbell

    Abstract: Diffusion models indirectly estimate the probability density over a data space, which can be used to study its structure. In this work, we show that geodesics can be computed in diffusion latent space, where the norm induced by the spatially-varying inner product is inversely proportional to the probability density. In this formulation, a path that traverses a high density (that is, probable) regi… ▽ More

    Submitted 6 May, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: CVPR2025

  50. arXiv:2504.05118  [pdf, other

    cs.AI

    VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

    Authors: Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang , et al. (2 additional authors not shown)

    Abstract: We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm. Benchmarked the AIME 2024 dataset, VAPO, built on the Qwen 32B pre-trained model, attains a state-of-the-art score of $\mathbf{60.4}$. In direct comparison under identical experimental settings, VAPO outperforms the pr… ▽ More

    Submitted 10 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.