Skip to main content

Showing 1–50 of 1,450 results for author: He, S

.
  1. arXiv:2506.07796  [pdf, ps, other

    hep-th

    Bootstrapping form factor squared in ${\cal N}=4$ super-Yang-Mills

    Authors: Song He, Xiang Li, Jingwen Lin, Jiahao Liu, Kai Yan

    Abstract: We propose a bootstrap program for the {\it form factor squared} with operator ${\rm tr}(φ^2)$ in maximally supersymmetric Yang-Mills theory in the planar limit, which plays a central role for perturbative calculations of important physical observables such as energy correlators. The tree-level $N$-point form factor (FF) squared can be obtained by cutting $N$ propagators of a collection of two-poi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 32 pages + appendix and refs, 2 tables, many figures and attached with ancillary files

  2. arXiv:2506.06970  [pdf, other

    cs.CV

    Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

    Authors: Pengfei Zhao, Rongbo Luan, Wei Zhang, Peng Wu, Sifeng He

    Abstract: Despite Contrastive Language-Image Pretraining (CLIP)'s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, we discover that off-the-shelf MLLMs (Multimodal Large Language Models) demonstrate powerful inherent modality alignment properties. While recent MLLM-based retrievers with unified architectures partially mitiga… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  3. arXiv:2506.06825  [pdf, ps, other

    cs.HC cs.CR

    Identity Deepfake Threats to Biometric Authentication Systems: Public and Expert Perspectives

    Authors: Shijing He, Yaxiong Lei, Zihan Zhang, Yuzhou Sun, Shujun Li, Chi Zhang, Juan Ye

    Abstract: Generative AI (Gen-AI) deepfakes pose a rapidly evolving threat to biometric authentication, yet a significant gap exists between expert understanding of these risks and public perception. This disconnection creates critical vulnerabilities in systems trusted by millions. To bridge this gap, we conducted a comprehensive mixed-method study, surveying 408 professionals across key sectors and conduct… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    MSC Class: 68T10; 68T45; 68M25 ACM Class: I.4.9; I.5.4; K.4.1; K.6.5

  4. arXiv:2506.06591  [pdf, ps, other

    cs.CY cs.HC

    Privacy Perspectives and Practices of Chinese Smart Home Product Teams

    Authors: Shijing He, Yaxiong Lei, Xiao Zhan, Chi Zhang, Juan Ye, Ruba Abu-Salma, Jose Such

    Abstract: Previous research has explored the privacy needs and concerns of device owners, primary users, and different bystander groups with regard to smart home devices like security cameras, smart speakers, and hubs, but little is known about the privacy views and practices of smart home product teams, particularly those in non-Western contexts. This paper presents findings from 27 semi-structured intervi… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  5. arXiv:2506.04957  [pdf, ps, other

    math.DG

    The asymptotics of the $\mathrm{SL}_2(\mathbb{C})$-Hitchin metric on the singular locus: subintegrable systems

    Authors: Siqi He, Johannes Horn, Nianzi Li

    Abstract: We study the asymptotic hyperkähler geometry of the $\mathrm{SL}_2(\mathbb{C})$-Hitchin moduli space over the singular fibers of the Hitchin fibration. We extend the previously known exponential convergence results for solutions to the Hitchin equation to the class of locally fiducial Higgs bundles defined by a special local description at the singularities of the spectral curve. This condition is… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 41 pages

    MSC Class: 53C26; 53C07

  6. arXiv:2506.03922  [pdf, ps, other

    cs.CL cs.AI cs.CV

    HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

    Authors: Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated significant potential to advance a broad range of domains. However, current benchmarks for evaluating MLLMs primarily emphasize general knowledge and vertical step-by-step reasoning typical of STEM disciplines, while overlooking the distinct needs and potential of the Humanities and Social Sciences (HSS). Tasks in the HSS domain require mo… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  7. arXiv:2506.03543  [pdf, ps, other

    cs.AI cs.CY cs.MA

    CogniPair: From LLM Chatbots to Conscious AI Agents -- GNWT-Based Multi-Agent Digital Twins for Social Pairing -- Dating & Hiring Applications

    Authors: Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Hanzhang Qin, Ang Li

    Abstract: Current large language model (LLM) agents lack authentic human psychological processes necessary for genuine digital twins and social AI applications. To address this limitation, we present a computational implementation of Global Workspace Theory (GNWT) that integrates human cognitive architecture principles into LLM agents, creating specialized sub-agents for emotion, memory, social norms, plann… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  8. arXiv:2506.03530  [pdf, ps, other

    cs.MM cs.CL cs.CV

    How Far Are We from Predicting Missing Modalities with Foundation Models?

    Authors: Guanzhou Ke, Yi Xie, Xiaoli Wang, Guoqing Chao, Bo Wang, Shengfeng He

    Abstract: Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality prediction remains underexplored. To investigate this, we categorize existing approaches into three representative paradigms, encompassing a total of 42 model variants, and conduct a comprehensive evaluation in terms of prediction acc… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  9. arXiv:2506.03129  [pdf, other

    hep-th gr-qc math-ph

    Self-Dual Electrodynamics via the Characteristic Method: Relativistic and Carrollian Perspectives

    Authors: Bin Chen, Song He, Jue Hou

    Abstract: Electric-magnetic duality plays a pivotal role in understanding the structure of nonlinear electrodynamics (NED). The Gaillard-Zumino (GZ) criterion provides a powerful constraint for identifying self-dual theories. In this work, we systematically explore solutions to the GZ self-duality condition by applying the method of characteristics, a robust tool for solving nonlinear partial differential e… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 26 pages, 1 figure

  10. arXiv:2506.02875  [pdf, ps, other

    cs.CV

    NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

    Authors: Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte , et al. (70 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking he… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: NTIRE 2025 XGC Quality Assessment Challenge Report. arXiv admin note: text overlap with arXiv:2404.16687

  11. arXiv:2506.02692  [pdf, ps, other

    cs.CV

    Large-scale Self-supervised Video Foundation Model for Intelligent Surgery

    Authors: Shu Yang, Fengtao Zhou, Leon Mayer, Fuxiang Huang, Yiliang Chen, Yihui Wang, Sunan He, Yuxiang Nie, Xi Wang, Ömer Sümer, Yueming Jin, Huihui Sun, Shuchang Xu, Alex Qinyang Liu, Zheng Li, Jing Qin, Jeremy YuenChun Teoh, Lena Maier-Hein, Hao Chen

    Abstract: Computer-Assisted Intervention (CAI) has the potential to revolutionize modern surgery, with surgical scene understanding serving as a critical component in supporting decision-making, improving procedural efficacy, and ensuring intraoperative safety. While existing AI-driven approaches alleviate annotation burdens via self-supervised spatial representation learning, their lack of explicit tempora… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  12. arXiv:2506.02553  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective

    Authors: Shenghua He, Tian Xia, Xuan Zhou, Hui Wei

    Abstract: We study a common challenge in reinforcement learning for large language models (LLMs): the Zero-Reward Assumption, where non-terminal actions (i.e., intermediate token generations) receive zero task-specific immediate reward, while only the final token receives a reward for the entire response. This assumption arises frequently in practice, as precise token-level rewards are often difficult or in… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2506.01710  [pdf, other

    cs.CL

    Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning

    Authors: Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Table reasoning, encompassing tasks such as table question answering, fact verification, and text-to-SQL, requires precise understanding of structured tabular data, coupled with numerical computation and code manipulation for effective inference. Supervised fine-tuning (SFT) approaches have achieved notable success but often struggle with generalization and robustness due to biases inherent in imi… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Work in progress

  14. arXiv:2506.01637  [pdf, ps, other

    eess.SP

    Local Ambiguity Shaping for Doppler-Resilient Sequences Under Spectral and PAPR Constraints

    Authors: Shi He, Lingsheng Meng, Yao Ge, Yong Liang Guan, David González G., Zilong Liu

    Abstract: This paper focuses on designing Doppler-resilient sequences with low local Ambiguity Function (AF) sidelobes, subject to certain spectral and Peak-to-Average Power Ratio (PAPR) constraints. To achieve this, we propose two distinctoptimization algorithms: (i) an Alternating Minimization (AM) algorithm for superior Weighted Peak Sidelobe Level (WPSL) minimization, and (ii) a low-complexity Augmented… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: This Work is Accepted to IEEE VTC2025-Fall

  15. arXiv:2506.01130  [pdf, ps, other

    cs.CV

    ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection

    Authors: Yiliang Chen, Zhixi Li, Cheng Xu, Alex Qinyang Liu, Xuemiao Xu, Jeremy Yuen-Chun Teoh, Shengfeng He, Jing Qin

    Abstract: Surgical triplet detection has emerged as a pivotal task in surgical video analysis, with significant implications for performance assessment and the training of novice surgeons. However, existing datasets such as CholecT50 exhibit critical limitations: they lack precise spatial bounding box annotations, provide inconsistent and clinically ungrounded temporal labels, and rely on a single data sour… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  16. arXiv:2506.00855  [pdf, other

    cs.AI

    MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book

    Authors: Sau Lai Yip, Sunan He, Yuxiang Nie, Shu Pui Chan, Yilin Ye, Sum Ying Lam, Hao Chen

    Abstract: The accelerating development of general medical artificial intelligence (GMAI), powered by multimodal large language models (MLLMs), offers transformative potential for addressing persistent healthcare challenges, including workforce deficits and escalating costs. The parallel development of systematic evaluation benchmarks emerges as a critical imperative to enable performance assessment and prov… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: For data and code, see: https://huggingface.co/datasets/slyipae1/MedBookVQA and https://github.com/slyipae1/MedBookVQA

  17. arXiv:2506.00475  [pdf, ps, other

    cs.CV

    BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation

    Authors: Wei Tao, Xiaoyang Qu, Kai Lu, Jiguang Wan, Shenglin He, Jianzong Wang

    Abstract: Since the point cloud data is inherently irregular and unstructured, point cloud semantic segmentation has always been a challenging task. The graph-based method attempts to model the irregular point cloud by representing it as a graph; however, this approach incurs substantial computational cost due to the necessity of constructing a graph for every point within a large-scale point cloud. In this… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by the 2025 International Joint Conference on Neural Networks (IJCNN 2025)

  18. arXiv:2506.00433  [pdf, ps, other

    cs.CV cs.LG eess.IV

    Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free

    Authors: Luigi Sigillo, Shengfeng He, Danilo Comminiello

    Abstract: High-resolution image synthesis remains a core challenge in generative modeling, particularly in balancing computational efficiency with the preservation of fine-grained visual detail. We present Latent Wavelet Diffusion (LWD), a lightweight framework that enables any latent diffusion model to scale to ultra-high-resolution image generation (2K to 4K) for free. LWD introduces three key components:… ▽ More

    Submitted 3 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

  19. arXiv:2505.24626  [pdf, ps, other

    quant-ph

    Co-designed Quantum Discrete Adiabatic Linear System Solver Via Dynamic Circuits

    Authors: Boxuan Ai, Shuo He, Xiang Zhao, Lin Yang, Guozhen Liu, Pengfei Gao, Hongbao Liu, Tao Tang, Jiecheng Yang, Jie Wu

    Abstract: Existing quantum discrete adiabatic approaches are hindered by circuit depth that increases linearly with the number of evolution steps, a significant challenge for current quantum hardware with limited coherence times. To address this, we propose a co-designed framework that synergistically integrates dynamic circuit capabilities with real-time classical processing. This framework reformulates th… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  20. arXiv:2505.24171  [pdf, ps, other

    econ.TH

    A note on the Diversity Owen values

    Authors: Songtao He, Erfang Shan, Xinyu Sun

    Abstract: Béal et al. (Int J Game Theory 54, 2025) introduce the Diversity Owen value for TU-games with diversity constraints, and provide axiomatic characterizations using the axioms of fairness and balanced contributions. However, there exist logical flaws in the proofs of the uniqueness of these characterizations. In this note we provide the corrected proofs of the characterizations by introducing the nu… ▽ More

    Submitted 5 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    MSC Class: 91A12

  21. arXiv:2505.23474  [pdf, ps, other

    cs.AI cs.CL

    Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns

    Authors: Xiang Li, Haiyang Yu, Xinghua Zhang, Ziyang Huang, Shizhu He, Kang Liu, Jun Zhao, Fei Huang, Yongbin Li

    Abstract: Process Reward Models (PRMs) are crucial in complex reasoning and problem-solving tasks (e.g., LLM agents with long-horizon decision-making) by verifying the correctness of each intermediate reasoning step. In real-world scenarios, LLMs may apply various reasoning patterns (e.g., decomposition) to solve a problem, potentially suffering from errors under various reasoning patterns. Therefore, PRMs… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  22. arXiv:2505.23419  [pdf, ps, other

    cs.SE cs.AI cs.CL

    SWE-bench Goes Live!

    Authors: Linghao Zhang, Shilin He, Chaoyun Zhang, Yu Kang, Bowen Li, Chengxing Xie, Junhao Wang, Maoquan Wang, Yufan Huang, Shengyu Fu, Elsie Nallipogu, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang

    Abstract: The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a critical benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench and its variants have become standard in this domain, they suffer from key limitations: they have not been updated since their initial releases, cover a narrow set of repositories, and depend heavily o… ▽ More

    Submitted 1 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Homepage: \url{https://swe-bench-live.github.io/}, Code: \url{https://github.com/SWE-bench-Live}, Dataset: \url{https://huggingface.co/SWE-bench-Live}

  23. arXiv:2505.22769  [pdf, ps, other

    cs.HC cs.CV

    MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking

    Authors: Yaxiong Lei, Mingyue Zhao, Yuheng Wang, Shijing He, Yusuke Sugano, Mohamed Khamis, Juan Ye

    Abstract: Mobile gaze tracking faces a fundamental challenge: maintaining accuracy as users naturally change their postures and device orientations. Traditional calibration approaches, like one-off, fail to adapt to these dynamic conditions, leading to degraded performance over time. We present MAC-Gaze, a Motion-Aware continual Calibration approach that leverages smartphone Inertial measurement unit (IMU)… ▽ More

    Submitted 5 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: 24 pages, 7 figures

    MSC Class: 68T10; 68U35 ACM Class: H.5.2; H.1.2; C.2.4; I.5.4

  24. arXiv:2505.21124  [pdf, ps, other

    physics.flu-dyn physics.data-an

    UniFoil: A Universal Dataset of Airfoils in Transitional and Turbulent Regimes for Subsonic and Transonic Flows

    Authors: Rohit Sunil Kanchi, Benjamin Melanson, Nithin Somasekharan, Shaowu Pan, Sicheng He

    Abstract: We present UniFoil, a large publicly available universal airfoil dataset based on Reynolds-averaged Navier-Stokes (RANS) simulations. It contains over 500,000 samples spanning a wide range of Reynolds and Mach numbers, capturing both transitional and fully turbulent flows across incompressible to compressible regimes. UniFoil is designed to support machine learning research in fluid dynamics, part… ▽ More

    Submitted 3 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  25. arXiv:2505.19490  [pdf, other

    cs.AI

    Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models

    Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao

    Abstract: Designing complex computer-aided design (CAD) models is often time-consuming due to challenges such as computational inefficiency and the difficulty of generating precise models. We propose a novel language-guided framework for industrial design automation to address these issues, integrating large language models (LLMs) with computer-automated design (CAutoD).Through this framework, CAD models ar… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025 Main Conference

    ACM Class: I.2.7; I.2.6

  26. arXiv:2505.19480  [pdf, other

    cs.SD eess.AS

    Room Impulse Response as a Prompt for Acoustic Echo Cancellation

    Authors: Fei Zhao, Shulin He, Xueliang Zhang

    Abstract: Data-driven acoustic echo cancellation (AEC) methods, predominantly trained on synthetic or constrained real-world datasets, encounter performance declines in unseen echo scenarios, especially in real environments where echo paths are not directly observable. Our proposed method counters this limitation by integrating room impulse response (RIR) as a pivotal training prompt, aiming to improve the… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  27. arXiv:2505.18594  [pdf, other

    cs.CV cs.IR

    EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models

    Authors: GuangHao Meng, Sunan He, Jinpeng Wang, Tao Dai, Letian Zhang, Jieming Zhu, Qing Li, Gang Wang, Rui Zhang, Yong Jiang

    Abstract: Vision-language retrieval (VLR) has attracted significant attention in both academia and industry, which involves using text (or images) as queries to retrieve corresponding images (or text). However, existing methods often neglect the rich visual semantics knowledge of entities, thus leading to incorrect retrieval results. To address this problem, we propose the Entity Visual Description enhanced… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 9 pages, 6 figures

  28. arXiv:2505.18447  [pdf, ps, other

    cs.LG

    Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning

    Authors: Chi Zhang, Ziying Jia, George K. Atia, Sihong He, Yue Wang

    Abstract: Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data by leveraging abundant data from related source domains. However, it faces two key challenges: the lack of performance guarantees for the transferred policy, which can lead to undesired actions, and the risk of negative transfer when multiple source domains are involved. We propose a nov… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  29. arXiv:2505.17649  [pdf, ps, other

    cs.CV

    Instruct2See: Learning to Remove Any Obstructions Across Distributions

    Authors: Junhang Li, Yu Guo, Chuhua Xian, Shengfeng He

    Abstract: Images are often obstructed by various obstacles due to capture limitations, hindering the observation of objects of interest. Most existing methods address occlusions from specific elements like fences or raindrops, but are constrained by the wide range of real-world obstructions, making comprehensive data collection impractical. To overcome these challenges, we propose Instruct2See, a novel zero… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  30. arXiv:2505.17153  [pdf, other

    cs.CL cs.AI

    Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN

    Authors: Yao Xu, Mingyu Xu, Fangyu Lei, Wangtao Sun, Xiangrong Zeng, Bingning Wang, Guang Liu, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Recently, models such as OpenAI-o1 and DeepSeek-R1 have demonstrated remarkable performance on complex reasoning tasks through Long Chain-of-Thought (Long-CoT) reasoning. Although distilling this capability into student models significantly enhances their performance, this paper finds that fine-tuning LLMs with full parameters or LoRA with a low rank on long CoT data often leads to Cyclical Reason… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  31. arXiv:2505.16314  [pdf, ps, other

    cs.CV cs.AI

    NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

    Authors: Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao , et al. (90 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspe… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  32. arXiv:2505.16184  [pdf

    cond-mat.supr-con cond-mat.str-el

    Pure nematic transition inside the superconducting dome of iron chalcogenide superconductor FeSe$_{1-x}$Te$_x$

    Authors: K. Y. Liang, R . Z. Zhang, Z. F. Lin, Z. J. Li, B. R. Chen, P. H. Zhang, K. Z. Yao, Q. S. He, Q. Z. Zhou, H. X. Yao, K. Jin, Y. H. Wang

    Abstract: Nematicity and magnetism are prevalent orders in high transition temperature (Tc) superconductors, coexisting in the parent compound of most material families. Quantum fluctuations of nematicity or spin orders are both plausible candidates for mediating unconventional Cooper pairing. Identifying the sole effect of a nematic quantum critical point (QCP) on the emergence of superconducting dome with… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  33. arXiv:2505.15774  [pdf, ps, other

    cs.CL cs.LG

    Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention

    Authors: Huanxuan Liao, Wen Hu, Yao Xu, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Large Language Models (LLMs) encounter significant challenges in long-sequence inference due to computational inefficiency and redundant processing, driving interest in context compression techniques. Existing methods often rely on token importance to perform hard local compression or encode context into latent representations for soft global compression. However, the uneven distribution of textua… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  34. arXiv:2505.15398  [pdf, ps, other

    cs.CV

    Expanding Zero-Shot Object Counting with Rich Prompts

    Authors: Huilin Zhu, Senyao Li, Jingling Yuan, Zhengwei Yang, Yu Guo, Wenxuan Liu, Xian Zhong, Shengfeng He

    Abstract: Expanding pre-trained zero-shot counting models to handle unseen categories requires more than simply adding new prompts, as this approach does not achieve the necessary alignment between text and visual features for accurate counting. We introduce RichCount, the first framework to address these limitations, employing a two-stage training strategy that enhances text encoding and strengthens the mo… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  35. arXiv:2505.14436  [pdf, other

    cs.CL cs.AI

    Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models

    Authors: Yuqiao Tan, Shizhu He, Kang Liu, Jun Zhao

    Abstract: Large Language Models (LLMs) offer a transparent brain with accessible parameters that encode extensive knowledge, which can be analyzed, located and transferred. Consequently, a key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language and achieve genuine Parametric Knowledge Transfer (PKT). Significantly, exploring effective methods for transferr… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL'25 Main. Code link: https://github.com/Trae1ounG/Neural_Incompatibility

  36. arXiv:2505.13778  [pdf, other

    cs.AI

    CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

    Authors: Guoheng Sun, Ziyao Wang, Bowei Tian, Meng Liu, Zheyu Shen, Shwai He, Yexiao He, Wanghao Ye, Yiting Wang, Ang Li

    Abstract: As post-training techniques evolve, large language models (LLMs) are increasingly augmented with structured multi-step reasoning abilities, often optimized through reinforcement learning. These reasoning-enhanced models outperform standard LLMs on complex tasks and now underpin many commercial LLM APIs. However, to protect proprietary behavior and reduce verbosity, providers typically conceal the… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  37. arXiv:2505.12667  [pdf, other

    cs.CV

    Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

    Authors: Zihan Su, Xuerui Qiu, Hongbin Xu, Tangyu Jiang, Junhao Zhuang, Chun Yuan, Ming Li, Shengfeng He, Fei Richard Yu

    Abstract: The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. M… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  38. arXiv:2505.10807  [pdf, ps, other

    math.OC

    Convergence analysis of the Halpern iteration with adaptive anchoring parameters

    Authors: Songnian He, Hong-Kun Xu, Qiao-Li Dong, Na Mei

    Abstract: We propose an adaptive way to choose the anchoring parameters for the Halpern iteration to find a fixed point of a nonexpansive mapping in a real Hilbert space. We prove strong convergence of this adaptive Halpern iteration and obtain the rate of asymptotic regularity at least O(1/k), where k is the number of iterations. Numerical experiments are also provided to show advantages and outperformance… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  39. arXiv:2505.10800  [pdf, ps, other

    math.OC

    Contractive difference-of-convex algorithms

    Authors: Songnian He, Qiao-Li Dong, Michael Th. Rassias

    Abstract: The difference-of-convex algorithm (DCA) and its variants are the most popular methods to solve the difference-of-convex optimization problem. Each iteration of them is reduced to a convex optimization problem, which generally needs to be solved by iterative methods such as proximal gradient algorithm. However, these algorithms essentially belong to some iterative methods of fixed point problems o… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  40. arXiv:2505.09808  [pdf, ps, other

    hep-th

    Leading singularities and chambers of Correlahedron

    Authors: Song He, Yu-tin Huang, Chia-Kai Kuo

    Abstract: In this paper, we explore the Chamber dissection of the loop-geometry of Correlehedron, which encodes the loop integrand of four-point stress-energy correlators in planar $\mathcal{N}=4$ super Yang-Mills. We demonstrate that at four loops, continuing the pattern of lower loops, the integrand of four-point correlation function can be written as a sum over products of chamber-forms and local loop in… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 48 pages, 4 figures

    Report number: MPP-2025-80

  41. arXiv:2505.05114  [pdf, other

    eess.AS cs.SD

    Listen to Extract: Onset-Prompted Target Speaker Extraction

    Authors: Pengjie Shen, Kangrui Chen, Shulin He, Pengru Chen, Shuqi Yuan, He Kong, Xueliang Zhang, Zhong-Qiu Wang

    Abstract: We propose $\textit{listen to extract}$ (LExt), a highly-effective while extremely-simple algorithm for monaural target speaker extraction (TSE). Given an enrollment utterance of a target speaker, LExt aims at extracting the target speaker from the speaker's mixed speech with other speakers. For each mixture, LExt concatenates an enrollment utterance of the target speaker to the mixture signal at… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: in submission

  42. arXiv:2505.04942  [pdf, other

    math.PR math.OC

    Randomized Routing to Remote Queues

    Authors: Shuangchi He, Yunfang Yang, Yao Yu

    Abstract: We study load balancing for a queueing system where parallel stations are distant from customers. In the presence of traveling delays, the join-the-shortest-queue (JSQ) policy induces queue length oscillations and prolongs the mean waiting time. A variant of the JSQ policy, dubbed the randomized join-the-shortest-queue (RJSQ) policy, is devised to mitigate the oscillation phenomenon. By the RJSQ p… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  43. arXiv:2505.04908  [pdf

    physics.optics

    Order within disorder: spectral key generation and distribution in random lasers

    Authors: Zhijia Hu, Shilong He, Lianghao Qi, Yalan Li, Siqi Li, Bin Chen, Wenyu Du, Yan Kuai, Zhigang Cao, Min Wang, Kaiming Zhou, Lin Zhang, Qingchuan Guo, Weimin Ding, Chao Li, Kang Xie, Anderson S. L. Gomes, Benli Yu

    Abstract: In secure communication, highly random entropy sources are essential for information security. Random lasers (RLs), which arise from multiple scattering in disordered structures, are potentially ideal entropy sources. Traditionally, RLs are viewed as disordered and unpredictable. However, in this work, we present novel evidence that orderly patterns exist beneath the seemingly disordered outputs o… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 26 pages, 8 figures

  44. arXiv:2505.04098  [pdf, other

    cs.NI eess.SP

    Satellite-Assisted Low-Altitude Economy Networking: Concepts, Applications, and Opportunities

    Authors: Shizhao He, Jiacheng Wang, Ying-Chang Liang, Geng Sun, Dusit Niyato

    Abstract: The low-altitude economy (LAE) is a new economic paradigm that leverages low-altitude vehicles (LAVs) to perform diverse missions across diverse areas. To support the operations of LAE, it is essential to establish LAE networks that enable LAV management and communications.Existing studies mainly reuse terrestrial networks to construct LAE networks. However, the limited coverage of terrestrial net… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures

  45. arXiv:2505.03093  [pdf, other

    cs.CV

    Estimating the Diameter at Breast Height of Trees in a Forest With a Single 360 Camera

    Authors: Siming He, Zachary Osman, Fernando Cladera, Dexter Ong, Nitant Rai, Patrick Corey Green, Vijay Kumar, Pratik Chaudhari

    Abstract: Forest inventories rely on accurate measurements of the diameter at breast height (DBH) for ecological monitoring, resource management, and carbon accounting. While LiDAR-based techniques can achieve centimeter-level precision, they are cost-prohibitive and operationally complex. We present a low-cost alternative that only needs a consumer-grade 360 video camera. Our semi-automated pipeline compri… ▽ More

    Submitted 15 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  46. Whleaper: A 10-DOF Flexible Bipedal Wheeled Robot

    Authors: Yinglei Zhu, Sixiao He, Zhenghao Qi, Zhuoyuan Yong, Yihua Qin, Jianyu Chen

    Abstract: Wheel-legged robots combine the advantages of both wheeled robots and legged robots, offering versatile locomotion capabilities with excellent stability on challenging terrains and high efficiency on flat surfaces. However, existing wheel-legged robots typically have limited hip joint mobility compared to humans, while hip joint plays a crucial role in locomotion. In this paper, we introduce Whlea… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Journal ref: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 2024, pp. 11272-11277

  47. arXiv:2504.21676  [pdf, other

    hep-th

    Superstring amplitudes meet surfaceology

    Authors: Qu Cao, Jin Dong, Song He, Fan Zhu

    Abstract: We reformulate tree-level amplitudes in open superstring theory (type-I) in terms of stringy Tr$(φ^3)$ amplitudes with various kinematical shifts in the "curve-integral" formulation: while the bosonic-string amplitude with $n$ pairs of "scaffolding" scalars comes from a particularly simple shift of the Tr$(φ^3)$ one (corresponding to $n$ length-$2$ cycles), the analogous superstring amplitude requ… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 34 pages, 3 figures

  48. arXiv:2504.21336  [pdf, ps, other

    cs.CV

    UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

    Authors: Linshan Wu, Yuxiang Nie, Sunan He, Jiaxin Zhuang, Luyang Luo, Neeraj Mahboobani, Varut Vardhanabhuti, Ronald Cheong Kin Chan, Yifan Peng, Pranav Rajpurkar, Hao Chen

    Abstract: The integration of AI-assisted biomedical image analysis into clinical practice demands AI-generated findings that are not only accurate but also interpretable to clinicians. However, existing biomedical AI models generally lack the ability to simultaneously generate diagnostic findings and localize corresponding biomedical objects. This limitation makes it challenging for clinicians to correlate… ▽ More

    Submitted 29 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

    Comments: The first universal foundation model for grounded biomedical image interpretation

  49. arXiv:2504.19117  [pdf

    math.OC

    Rotation excursion algorithm with learning

    Authors: Sheng-Xue He

    Abstract: We introduce a novel heuristic algorithm named the Rotation Excursion Algorithm with Learning (REAL) designed for general-purpose optimization. REAL draws inspiration from the construction mechanism inherent in CEC optimization suites, integrating three fundamental operations with a natural growth rule to address optimization tasks. The initial operation involves rotating the current feasible solu… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    MSC Class: 90 ACM Class: I.2.0

  50. arXiv:2504.19114  [pdf

    math.OC cs.RO

    Snake locomotion learning search

    Authors: Sheng-Xue He

    Abstract: This research introduces a novel heuristic algorithm known as the Snake Locomotion Learning Search algorithm (SLLS) designed to address optimization problems. The SLLS draws inspiration from the locomotion patterns observed in snakes, particularly serpentine and caterpillar locomotion. We leverage these two modes of snake locomotion to devise two distinct search mechanisms within the SLLS. In our… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 43 pages, 13 figures

    MSC Class: 90 ACM Class: I.2.0