Skip to main content

Showing 1–50 of 686 results for author: Xu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04276  [pdf, ps, other

    cs.AR

    FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification

    Authors: Gwok-Waa Wan, Shengchu Su, Ruihu Wang, Qixiang Chen, Sam-Zaak Wong, Mengnv Xing, Hefei Feng, Yubo Wang, Yinan Zhu, Jingyi Zhang, Jianmin Ye, Xinlai Wan, Tao Ni, Qiang Xu, Nan Guan, Zhe Jiang, Xi Wang, Yang Jun

    Abstract: Despite the transformative potential of Large Language Models (LLMs) in hardware design, a comprehensive evaluation of their capabilities in design verification remains underexplored. Current efforts predominantly focus on RTL generation and basic debugging, overlooking the critical domain of functional verification, which is the primary bottleneck in modern design methodologies due to the rapid e… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2507.03255  [pdf, ps, other

    cs.AR cs.AI

    ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis

    Authors: Zedong Peng, Zeju Li, Mingzhe Gao, Qiang Xu, Chen Zhang, Jieru Zhao

    Abstract: We introduce ForgeEDA, an open-source comprehensive circuit dataset across various categories. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post-mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development. We demonstrate ForgeEDA's utility by benchmarking state-of-the-art EDA algorithms on… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  3. arXiv:2507.02822  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model

    Authors: Wencheng Zhang, Shiqin Qiao, Lingjie Luo, Yinfeng Li, Chuanyang Zheng, Qian Xu, Meng Li, Yong Gui, Yijun He, Jianing Qiu, Jindong Hong, Jiankai Sun

    Abstract: With the widespread adoption of large language models (LLMs) in practical applications, selecting an appropriate model requires balancing not only performance but also operational cost. The emergence of reasoning-capable models has further widened the cost gap between "thinking" (high reasoning) and "non-thinking" (fast, low-cost) modes. In this work, we reveal that approximately 58% of medical qu… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  4. arXiv:2507.02705  [pdf, ps, other

    cs.CV

    SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment

    Authors: Qi Xu, Dongxu Wei, Lingzhe Zhao, Wenpu Li, Zhangchi Huang, Shunping Ji, Peidong Liu

    Abstract: Simultaneous understanding and 3D reconstruction plays an important role in developing end-to-end embodied intelligent systems. To achieve this, recent approaches resort to 2D-to-3D feature alignment paradigm, which leads to limited 3D understanding capability and potential semantic information loss. In light of this, we propose SIU3R, the first alignment-free framework for generalizable simultane… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  5. arXiv:2507.02598  [pdf, ps, other

    cs.AR cs.AI

    AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models

    Authors: Chenhao Xue, Kezhi Li, Jiaxing Zhang, Yi Ren, Zhengyuan Shi, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun

    Abstract: Arithmetic circuits, such as adders and multipliers, are fundamental components of digital systems, directly impacting the performance, power efficiency, and area footprint. However, optimizing these circuits remains challenging due to the vast design space and complex physical constraints. While recent deep learning-based approaches have shown promise, they struggle to consistently explore high-p… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 8 pages, 12 figures

  6. arXiv:2507.02135  [pdf, ps, other

    cs.OS cs.CL

    Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency

    Authors: Zongpu Zhang, Pranab Dash, Y. Charlie Hu, Qiang Xu, Jian Li, Haibing Guan

    Abstract: Large Language Models (LLMs) are increasingly being integrated into various applications and services running on billions of mobile devices. However, deploying LLMs on resource-limited mobile devices faces a significant challenge due to their high demand for computation, memory, and ultimately energy. While current LLM frameworks for mobile use three power-hungry components-CPU, GPU, and Memory-ev… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: equal contribution between Zhang and Dash

  7. arXiv:2507.01735  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

    Authors: Kai Chen, Ruiyuan Gao, Lanqing Hong, Hang Xu, Xu Jia, Holger Caesar, Dengxin Dai, Bingbing Liu, Dzmitry Tsishkou, Songcen Xu, Chunjing Xu, Qiang Xu, Huchuan Lu, Dit-Yan Yeung

    Abstract: In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ECCV 2024. Workshop page: https://coda-dataset.github.io/w-coda2024/

  8. arXiv:2507.00810  [pdf, ps, other

    cs.AI math.OC

    A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis

    Authors: Qing Xu, Xiaohua Xuan

    Abstract: In this paper, we propose an improved numerical algorithm for solving minimax problems based on nonsmooth optimization, quadratic programming and iterative process. We also provide a rigorous proof of convergence for our algorithm under some mild assumptions, such as gradient continuity and boundedness. Such an algorithm can be widely applied in various fields such as robust optimization, imbalanc… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  9. arXiv:2507.00642  [pdf, ps, other

    cs.AR

    ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis

    Authors: Runkai Li, Jia Xiong, Xiuyuan He, Jieru Zhao, Qiang Xu, Xi Wang

    Abstract: The increasing complexity of computational demands has accelerated the adoption of domain-specific accelerators, yet traditional hardware design methodologies remain constrained by prolonged development and verification cycles. High-Level Synthesis (HLS) bridges the gap between software and hardware by enabling hardware design from high-level programming languages. However, its widespread adoption… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  10. arXiv:2506.23999  [pdf, ps, other

    cs.RO

    Predictive Risk Analysis and Safe Trajectory Planning for Intelligent and Connected Vehicles

    Authors: Zeyu Han, Mengchi Cai, Chaoyi Chen, Qingwen Meng, Guangwei Wang, Ying Liu, Qing Xu, Jianqiang Wang, Keqiang Li

    Abstract: The safe trajectory planning of intelligent and connected vehicles is a key component in autonomous driving technology. Modeling the environment risk information by field is a promising and effective approach for safe trajectory planning. However, existing risk assessment theories only analyze the risk by current information, ignoring future prediction. This paper proposes a predictive risk analys… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  11. arXiv:2506.23676  [pdf, ps, other

    cs.CV

    A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement

    Authors: Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang

    Abstract: Due to their powerful image generation capabilities, diffusion-based adversarial example generation methods through image editing are rapidly gaining popularity. However, due to reliance on the discriminative capability of the diffusion model, these diffusion-based methods often struggle to generalize beyond conventional image classification tasks, such as in Deepfake detection. Moreover, traditio… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  12. arXiv:2506.19384  [pdf, ps, other

    cs.LG eess.SP physics.comp-ph

    Deep Electromagnetic Structure Design Under Limited Evaluation Budgets

    Authors: Shijian Zheng, Fangxiao Jin, Shuhai Zhang, Quan Xue, Mingkui Tan

    Abstract: Electromagnetic structure (EMS) design plays a critical role in developing advanced antennas and materials, but remains challenging due to high-dimensional design spaces and expensive evaluations. While existing methods commonly employ high-quality predictors or generators to alleviate evaluations, they are often data-intensive and struggle with real-world scale and budget constraints. To address… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: ICML 2025 (accepted)

  13. arXiv:2506.17629  [pdf, ps, other

    cs.CV cs.AI cs.CL

    CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning

    Authors: Kailing Li, Qi'ao Xu, Tianwen Qian, Yuqian Fu, Yang Jiao, Xiaoling Wang

    Abstract: Embodied Visual Reasoning (EVR) seeks to follow complex, free-form instructions based on egocentric video, enabling semantic understanding and spatiotemporal reasoning in dynamic environments. Despite its promising potential, EVR encounters significant challenges stemming from the diversity of complex instructions and the intricate spatiotemporal dynamics in long-term egocentric videos. Prior solu… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  14. arXiv:2506.17159  [pdf, ps, other

    cs.CV

    Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

    Authors: Qing Xu, Yuxiang Luo, Wenting Duan, Zhen Chen

    Abstract: Medical image analysis is critical yet challenged by the need of jointly segmenting organs or tissues, and numerous instances for anatomical structures and tumor microenvironment analysis. Existing studies typically formulated different segmentation tasks in isolation, which overlooks the fundamental interdependencies between these tasks, leading to suboptimal segmentation performance and insuffic… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Under Review

  15. arXiv:2506.15896  [pdf

    cs.LG cs.AI

    KG-FGNN: Knowledge-guided GNN Foundation Model for Fertilisation-oriented Soil GHG Flux Prediction

    Authors: Yu Zhang, Gaoshan Bi, Simon Jeffery, Max Davis, Yang Li, Qing Xue, Po Yang

    Abstract: Precision soil greenhouse gas (GHG) flux prediction is essential in agricultural systems for assessing environmental impacts, developing emission mitigation strategies and promoting sustainable agriculture. Due to the lack of advanced sensor and network technologies on majority of farms, there are challenges in obtaining comprehensive and diverse agricultural data. As a result, the scarcity of agr… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 8 pages, 4 figures

  16. arXiv:2506.15697  [pdf, ps, other

    cs.AR cs.CL cs.LG

    DeepRTL2: A Versatile Model for RTL-Related Tasks

    Authors: Yi Liu, Hongji Zhang, Yunhao Zhou, Zhengyuan Shi, Changran Xu, Qiang Xu

    Abstract: The integration of large language models (LLMs) into electronic design automation (EDA) has significantly advanced the field, offering transformative benefits, particularly in register transfer level (RTL) code generation and understanding. While previous studies have demonstrated the efficacy of fine-tuning LLMs for these generation-based tasks, embedding-based tasks, which are equally critical t… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  17. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  18. arXiv:2506.14707  [pdf, ps, other

    cs.DB

    HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search

    Authors: Qian Xu, Feng Zhang, Chengxi Li, Lei Cao, Zheng Chen, Jidong Zhai, Xiaoyong Du

    Abstract: Approximate Nearest Neighbor Search (ANNS) is essential for various data-intensive applications, including recommendation systems, image retrieval, and machine learning. Scaling ANNS to handle billions of high-dimensional vectors on a single machine presents significant challenges in memory capacity and processing efficiency. To address these challenges, distributed vector databases leverage multi… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  19. arXiv:2506.14265  [pdf, ps, other

    cs.CV

    Exploring Non-contrastive Self-supervised Representation Learning for Image-based Profiling

    Authors: Siran Dai, Qianqian Xu, Peisong Wen, Yang Liu, Qingming Huang

    Abstract: Image-based cell profiling aims to create informative representations of cell images. This technique is critical in drug discovery and has greatly advanced with recent improvements in computer vision. Inspired by recent developments in non-contrastive Self-Supervised Learning (SSL), this paper provides an initial exploration into training a generalizable feature extractor for cell images using suc… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 Computer Vision for Drug Discovery

  20. arXiv:2506.13723  [pdf, ps, other

    cs.CV

    OTFusion: Bridging Vision-only and Vision-Language Models via Optimal Transport for Transductive Zero-Shot Learning

    Authors: Qiyu Xu, Wenyang Chen, Zhanxuan Hu, Huafeng Li, Yonghang Tai

    Abstract: Transductive zero-shot learning (ZSL) aims to classify unseen categories by leveraging both semantic class descriptions and the distribution of unlabeled test data. While Vision-Language Models (VLMs) such as CLIP excel at aligning visual inputs with textual semantics, they often rely too heavily on class-level priors and fail to capture fine-grained visual cues. In contrast, Vision-only Foundatio… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  21. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  22. arXiv:2506.12430  [pdf, ps, other

    cs.CR cs.CV

    Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

    Authors: Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song , et al. (22 additional authors not shown)

    Abstract: Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents finding… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  23. arXiv:2506.11080  [pdf, ps, other

    cs.CL

    MANBench: Is Your Multimodal Model Smarter than Human?

    Authors: Han Zhou, Qitong Xu, Yiheng Dong, Xin Yang

    Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has ignited discussions regarding their potential to surpass human performance in multimodal tasks. In response, we introduce MANBench (Multimodal Ability Norms Benchmark), a bilingual benchmark (English and Chinese) comprising 1,314 questions across nine tasks, spanning knowledge-based and non-knowledge-based domains. MANBench emph… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Multimodal Benchmark, Project Url: https://github.com/micdz/MANBench, ACL2025 Findings

  24. arXiv:2506.11070  [pdf, ps, other

    cs.CL

    Targeted control of fast prototyping through domain-specific interface

    Authors: Yu-Zhe Shi, Mingchen Liu, Hanlu Ma, Qiao Xu, Huamin Qu, Kun He, Lecheng Ruan, Qining Wang

    Abstract: Industrial designers have long sought a natural and intuitive way to achieve the targeted control of prototype models -- using simple natural language instructions to configure and adjust the models seamlessly according to their intentions, without relying on complex modeling commands. While Large Language Models have shown promise in this area, their potential for controlling prototype models thr… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: In International Conference on Machine Learning (ICML'25)

  25. arXiv:2506.09344  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.SD eess.AS

    Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan , et al. (33 additional authors not shown)

    Abstract: We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages,8 figures

  26. arXiv:2506.08507  [pdf, ps, other

    cs.MA cs.AI cs.LG

    MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

    Authors: Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang

    Abstract: Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing syst… ▽ More

    Submitted 12 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  27. arXiv:2506.07611  [pdf, ps, other

    cs.CV

    DragNeXt: Rethinking Drag-Based Image Editing

    Authors: Yuan Zhou, Junbao Zhou, Qingshan Xu, Kesen Zhao, Yuxuan Wang, Hao Fei, Richang Hong, Hanwang Zhang

    Abstract: Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (\emph{\textcolor{magenta}{i}}) point-based drag is often highly ambiguous and difficult to align with users' intentions; (\emph{\textcolor{magenta}{ii}}) current DBIE methods primarily rel… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  28. arXiv:2506.07576  [pdf, other

    cs.CV

    Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding

    Authors: Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang

    Abstract: Video understanding has been considered as one critical step towards world modeling, which is an important long-term problem in AI research. Recently, multi-modal foundation models have shown such potential via large-scale pretraining. However, these models simply align encoders of different modalities via contrastive learning, while lacking deeper multi-modal interactions, which is critical for u… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  29. arXiv:2506.07047  [pdf, ps, other

    cs.AI

    Mathesis: Towards Formal Theorem Proving from Natural Languages

    Authors: Yu Xuejun, Jianyuan Zhong, Zijin Feng, Pengyi Zhai, Roozbeh Yousefzadeh, Wei Chong Ng, Haoxiong Liu, Ziyi Shou, Jing Xiong, Yudong Zhou, Claudia Beth Ong, Austen Jeremy Sugiarto, Yaoxi Zhang, Wai Ming Tai, Huan Cao, Dongcai Lu, Jiacheng Sun, Qiang Xu, Shen Xin, Zhenguo Li

    Abstract: Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  30. arXiv:2506.06331  [pdf, other

    cs.CL cs.AI cs.IR

    How Significant Are the Real Performance Gains? An Unbiased Evaluation Framework for GraphRAG

    Authors: Qiming Zeng, Xiao Yan, Hao Luo, Yuhao Lin, Yuxiang Wang, Fangcheng Fu, Bo Du, Quanqing Xu, Jiawei Jiang

    Abstract: By retrieving contexts from knowledge graphs, graph-based retrieval-augmented generation (GraphRAG) enhances large language models (LLMs) to generate quality answers for user questions. Many GraphRAG methods have been proposed and reported inspiring performance in answer quality. However, we observe that the current answer evaluation framework for GraphRAG has two critical flaws, i.e., unrelated q… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  31. arXiv:2506.05815  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces

    Authors: Pierluigi Zama Ramirez, Fabio Tosi, Luigi Di Stefano, Radu Timofte, Alex Costanzino, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Zhe Zhang, Yang Yang, Wu Chen, Anlong Ming, Mingshuai Zhao, Mengying Yu, Shida Gao, Xiangfeng Wang, Feng Xue, Jun Shi, Yong Yang, Yong A, Yixiang Jin, Dingzhe Li, Aryan Shukla, Liam Frija-Altarac, Matthew Toews , et al. (14 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 challenge on HR Depth From images of Specular and Transparent surfaces, held in conjunction with the New Trends in Image Restoration and Enhancement (NTIRE) workshop at CVPR 2025. This challenge aims to advance the research on depth estimation, specifically to address two of the main open issues in the field: high-resolution and non-Lambertian surfaces. The cha… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: NTIRE Workshop Challenge Report, CVPR 2025

  32. arXiv:2506.04672  [pdf, other

    cs.LG

    FedAPM: Federated Learning via ADMM with Partial Model Personalization

    Authors: Shengkun Zhu, Feiteng Nie, Jinshan Zeng, Sheng Wang, Yuan Sun, Yuan Yao, Shangfeng Chen, Quanqing Xu, Chuanhui Yang

    Abstract: In federated learning (FL), the assumption that datasets from different devices are independent and identically distributed (i.i.d.) often does not hold due to user differences, and the presence of various data modalities across clients makes using a single model impractical. Personalizing certain parts of the model can effectively address these issues by allowing those parts to differ across clie… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  33. arXiv:2506.03144  [pdf, ps, other

    cs.CV cs.CL cs.MM

    MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

    Authors: Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li

    Abstract: Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios freq… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Preprint; Project Page, Code, and Dataset at: https://merit-2025.github.io/

  34. arXiv:2506.02954  [pdf, ps, other

    cs.SE

    On Mutation-Guided Unit Test Generation

    Authors: Guancheng Wang, Qinghua Xu, Lionel C. Briand, Kui Liu

    Abstract: Unit tests play a vital role in uncovering potential faults in software. While tools like EvoSuite focus on maximizing code coverage, recent advances in large language models (LLMs) have shifted attention toward LLM-based test generation. However, code coverage metrics -- such as line and branch coverage -- remain overly emphasized in reported research, despite being weak indicators of a test suit… ▽ More

    Submitted 12 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  35. arXiv:2506.02943  [pdf, ps, other

    cs.SE

    Hallucination to Consensus: Multi-Agent LLMs for End-to-End Test Generation with Accurate Oracles

    Authors: Qinghua Xu, Guancheng Wang, Lionel Briand, Kui Liu

    Abstract: Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is laborious, especially for strong typed languages like Java, motivating the need for automated approaches. Traditional methods primarily rely on search-based or randomized algorithms to generate tests that achieve high code coverage and produce regression oracles, which are derived from the… ▽ More

    Submitted 15 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  36. arXiv:2506.01968  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Efficient ANN-SNN Conversion with Error Compensation Learning

    Authors: Chang Liu, Jiangrong Shen, Xuming Ran, Mingkun Xu, Qi Xu, Yi Xu, Gang Pan

    Abstract: Artificial neural networks (ANNs) have demonstrated outstanding performance in numerous tasks, but deployment in resource-constrained environments remains a challenge due to their high computational and memory requirements. Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency, providing a bio-inspired alternative. However, current ANN-to-SNN con… ▽ More

    Submitted 12 May, 2025; originally announced June 2025.

  37. arXiv:2505.24413  [pdf, ps, other

    cs.LG stat.CO

    Multi-task Learning for Heterogeneous Multi-source Block-Wise Missing Data

    Authors: Yang Sui, Qi Xu, Yang Bai, Annie Qu

    Abstract: Multi-task learning (MTL) has emerged as an imperative machine learning tool to solve multiple learning tasks simultaneously and has been successfully applied to healthcare, marketing, and biomedical fields. However, in order to borrow information across different tasks effectively, it is essential to utilize both homogeneous and heterogeneous information. Among the extensive literature on MTL, va… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  38. arXiv:2505.24281  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Multi-task Learning for Heterogeneous Data via Integrating Shared and Task-Specific Encodings

    Authors: Yang Sui, Qi Xu, Yang Bai, Annie Qu

    Abstract: Multi-task learning (MTL) has become an essential machine learning tool for addressing multiple learning tasks simultaneously and has been effectively applied across fields such as healthcare, marketing, and biomedical research. However, to enable efficient information sharing across tasks, it is crucial to leverage both shared and heterogeneous information. Despite extensive research on MTL, vari… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  39. arXiv:2505.24164  [pdf, ps, other

    cs.CL cs.CV

    Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

    Authors: Shilin Xu, Yanwei Li, Rui Yang, Tao Zhang, Yueyi Sun, Wei Chow, Linfeng Li, Hang Song, Qi Xu, Yunhai Tong, Xiangtai Li, Hao Fei

    Abstract: Recent works on large language models (LLMs) have successfully demonstrated the emergence of reasoning capabilities via reinforcement learning (RL). Although recent efforts leverage group relative policy optimization (GRPO) for MLLMs post-training, they constantly explore one specific aspect, such as grounding tasks, math problems, or chart analysis. There are no works that can leverage multi-sour… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Report number: arxiv:2505.24164

  40. arXiv:2505.21988  [pdf, ps, other

    cs.AI

    Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism

    Authors: Ziyang Zheng, Kezhi Li, Zhengyuan Shi, Qiang Xu

    Abstract: Subgraph matching in logic circuits is foundational for numerous Electronic Design Automation (EDA) applications, including datapath optimization, arithmetic verification, and hardware trojan detection. However, existing techniques rely primarily on structural graph isomorphism and thus fail to identify function-related subgraphs when synthesis transformations substantially alter circuit topology.… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  41. arXiv:2505.21522  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    CIM-NET: A Video Denoising Deep Neural Network Model Optimized for Computing-in-Memory Architectures

    Authors: Shan Gao, Zhiqiang Wu, Yawen Niu, Xiaotao Li, Qingqing Xu

    Abstract: While deep neural network (DNN)-based video denoising has demonstrated significant performance, deploying state-of-the-art models on edge devices remains challenging due to stringent real-time and energy efficiency requirements. Computing-in-Memory (CIM) chips offer a promising solution by integrating computation within memory cells, enabling rapid matrix-vector multiplication (MVM). However, exis… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  42. arXiv:2505.20429  [pdf, other

    cs.CL cs.CV

    PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy

    Authors: Shuhao Guan, Moule Lin, Cheng Xu, Xinyi Liu, Jinman Zhao, Jiexin Fan, Qi Xu, Derek Greene

    Abstract: This paper introduces PreP-OCR, a two-stage pipeline that combines document image restoration with semantic-aware post-OCR correction to enhance both visual clarity and textual consistency, thereby improving text extraction from degraded historical documents. First, we synthesize document-image pairs from plaintext, rendering them with diverse fonts and layouts and then applying a randomly ordered… ▽ More

    Submitted 28 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: ACL 2025 main

  43. arXiv:2505.19785  [pdf, other

    cs.LG cs.AI

    MedDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support

    Authors: Qianyi Xu, Gousia Habib, Dilruk Perera, Mengling Feng

    Abstract: Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses vary significantly and evolve over time. Clinical data used to support these decisions are often irregularly sampled, sparse, and noisy. Existing decision support systems commonly rely on discretization and imputation, which can distort critical temporal dynamics and degrade… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  44. arXiv:2505.19347  [pdf, ps, other

    cs.AI

    PatentMind: A Multi-Aspect Reasoning Graph for Patent Similarity Evaluation

    Authors: Yongmin Yoo, Qiongkai Xu, Longbing Cao

    Abstract: Patent similarity evaluation plays a critical role in intellectual property analysis. However, existing methods often overlook the intricate structure of patent documents, which integrate technical specifications, legal boundaries, and application contexts. We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG). PatentMind decompo… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  45. arXiv:2505.19345  [pdf, ps, other

    cs.CL cs.AI

    PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

    Authors: Yongmin Yoo, Qiongkai Xu, Longbing Cao

    Abstract: Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims,… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  46. arXiv:2505.17663  [pdf, ps, other

    cs.CL cs.CY

    Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

    Authors: Yang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu

    Abstract: As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world socia… ▽ More

    Submitted 8 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025 Main Conference

  47. arXiv:2505.16314  [pdf, ps, other

    cs.CV cs.AI

    NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

    Authors: Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao , et al. (90 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspe… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  48. arXiv:2505.16162  [pdf, other

    cs.CL

    KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization

    Authors: Mingbo Song, Heming Xia, Jun Zhang, Chak Tou Leong, Qiancheng Xu, Wenjie Li, Sujian Li

    Abstract: Speculative Decoding (SD) has emerged as a widely used paradigm to accelerate the inference of large language models (LLMs) without compromising generation quality. It works by efficiently drafting multiple tokens using a compact model and then verifying them in parallel using the target LLM. Notably, Self-Speculative Decoding proposes skipping certain layers to construct the draft model, which el… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 8 pages

  49. arXiv:2505.16027  [pdf

    eess.IV cs.AI cs.CV

    Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

    Authors: Qinmei Xu, Yiheng Li, Xianghao Zhan, Ahmet Gorkem Er, Brittany Dashevsky, Chuanjun Xu, Mohammed Alawad, Mengya Yang, Liu Ya, Changsheng Zhou, Xiao Li, Haruka Itakura, Olivier Gevaert

    Abstract: Foundation models leveraging vision-language pretraining have shown promise in chest X-ray (CXR) interpretation, yet their real-world performance across diverse populations and diagnostic tasks remains insufficiently evaluated. This study benchmarks the diagnostic performance and generalizability of foundation models versus traditional convolutional neural networks (CNNs) on multinational CXR data… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 78 pages, 7 figures, 2 tabeles

    MSC Class: I.2 ACM Class: I.2

  50. arXiv:2505.15322  [pdf, other

    cs.CV

    CEBSNet: Change-Excited and Background-Suppressed Network with Temporal Dependency Modeling for Bitemporal Change Detection

    Authors: Qi'ao Xu, Yan Xing, Jiali Hu, Yunan Jia, Rui Huang

    Abstract: Change detection, a critical task in remote sensing and computer vision, aims to identify pixel-level differences between image pairs captured at the same geographic area but different times. It faces numerous challenges such as illumination variation, seasonal changes, background interference, and shooting angles, especially with a large time gap between images. While current methods have advance… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.