Skip to main content

Showing 51–100 of 3,625 results for author: Hang

.
  1. arXiv:2506.02496  [pdf, ps, other

    cond-mat.str-el cond-mat.stat-mech hep-th math-ph

    Identification of gapless phases by squaring a twist operator

    Authors: Hang Su, Yuan Yao, Akira Furusaki

    Abstract: We propose a general necessary condition for a spin chain with SO(3) spin-rotation symmetry to be gapped. Specifically, we prove that the ground state(s) of an SO(3)-symmetric gapped spin chain must be spin singlet(s), and the expectation value of the square of a twist operator asymptotically approaches unity in the thermodynamic limit, where finite-size corrections are inversely proportional to t… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 6 pages, 3 figures

  2. arXiv:2506.02126  [pdf, ps, other

    cs.CL

    Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains

    Authors: Juncheng Wu, Sheng Liu, Haoqin Tu, Hang Yu, Xiaoke Huang, James Zou, Cihang Xie, Yuyin Zhou

    Abstract: Recent advances in reasoning-enhanced Large Language Models such as OpenAI-o1/3 and DeepSeek-R1 have significantly improved performance on complex tasks. However, the quality and transparency of their internal reasoning processes remain underexplored. This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medical and mathematical domains by explicitly decom… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 17 pages, preprint

  3. arXiv:2506.02096  [pdf, ps, other

    cs.LG cs.CL cs.CV

    SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

    Authors: Zijian Wu, Jinjie Ni, Xiangyan Liu, Zichen Liu, Hang Yan, Michael Qizhe Shieh

    Abstract: Vision-language models (VLMs) trained via reinforcement learning with verifiable reward (RLVR) have shown notable progress in scaling test-time compute effectively. In this work, we investigate how synthesized RL data can further improve RLVR. To this end, we propose \textbf{SynthRL}-a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training. SynthRL comprises… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  4. arXiv:2506.01927  [pdf, ps, other

    cs.GT cs.AI cs.MA cs.RO

    Online Competitive Information Gathering for Partially Observable Trajectory Games

    Authors: Mel Krusniak, Hang Xu, Parker Palermo, Forrest Laine

    Abstract: Game-theoretic agents must make plans that optimally gather information about their opponents. These problems are modeled by partially observable stochastic games (POSGs), but planning in fully continuous POSGs is intractable without heavy offline computation or assumptions on the order of belief maintained by each player. We formulate a finite history/horizon refinement of POSGs which admits comp… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted at RSS 2025

  5. arXiv:2506.01616  [pdf, ps, other

    cs.AI

    MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments

    Authors: Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu

    Abstract: The emergence of multimodal LLM-based agents (MLAs) has transformed interaction paradigms by seamlessly integrating vision, language, action and dynamic environments, enabling unprecedented autonomous capabilities across GUI applications ranging from web automation to mobile systems. However, MLAs introduce critical trustworthiness challenges that extend far beyond traditional language models' lim… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  6. arXiv:2506.01480  [pdf, ps, other

    cs.CV

    Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation

    Authors: Kaihang Pan, Yang Wu, Wendong Bu, Kai Shen, Juncheng Li, Yingting Wang, Yunfei Li, Siliang Tang, Jun Xiao, Fei Wu, Hang Zhao, Yueting Zhuang

    Abstract: Recent endeavors in Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation. However, these two capabilities remain largely independent, as if they are two separate functions encapsulated within the same model. Consequently, visual comprehension does not enhance visual generation, and the reasoning mechanisms of LLMs have not been fully integrated to revolutionize… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 21 pages, 7 figures

  7. arXiv:2506.00671  [pdf, ps, other

    cs.CL

    DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA

    Authors: Yuelyu Ji, Hang Zhang, Shiven Verma, Hui Ji, Chun Li, Yushui Han, Yanshan Wang

    Abstract: We propose DeepRAG, a novel framework that integrates DeepSeek hierarchical question decomposition capabilities with RAG Gym unified retrieval-augmented generation optimization using process level supervision. Targeting the challenging MedHopQA biomedical question answering task, DeepRAG systematically decomposes complex queries into precise sub-queries and employs concept level reward signals inf… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2506.00549  [pdf, ps, other

    cs.CL cs.AI

    Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages

    Authors: Hyangsuk Min, Yuho Lee, Minjeong Ban, Jiaqi Deng, Nicole Hee-Yeon Kim, Taewon Yun, Hang Su, Jason Cai, Hwanjun Song

    Abstract: Evaluation frameworks for text summarization have evolved in terms of both domain coverage and metrics. However, existing benchmarks still lack domain-specific assessment criteria, remain predominantly English-centric, and face challenges with human annotation due to the complexity of reasoning. To address these, we introduce MSumBench, which provides a multi-dimensional, multi-domain evaluation o… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 34 pages, 6 figures

  9. arXiv:2506.00531  [pdf, ps, other

    cs.LG cs.AI

    M2WLLM: Multi-Modal Multi-Task Ultra-Short-term Wind Power Prediction Algorithm Based on Large Language Model

    Authors: Hang Fana, Mingxuan Lib, Zuhan Zhanga, Long Chengc, Yujian Ye, Dunnan Liua

    Abstract: The integration of wind energy into power grids necessitates accurate ultra-short-term wind power forecasting to ensure grid stability and optimize resource allocation. This study introduces M2WLLM, an innovative model that leverages the capabilities of Large Language Models (LLMs) for predicting wind power output at granular time intervals. M2WLLM overcomes the limitations of traditional and deep… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  10. arXiv:2505.24283  [pdf, ps, other

    math.PR math-ph

    Characterizing the limiting critical Potts measures on locally regular-tree-like expander graphs

    Authors: Hang Du, Yanxin Zhou

    Abstract: For any integers $d,q\ge 3$, we consider the $q$-state ferromagnetic Potts model with an external field on a sequence of expander graphs that converges to the $d$-regular tree $\mathtt{T}_d$ in the Benjamini-Schramm sense. We show that along the critical line, any subsequential local weak limit of the Potts measures is a mixture of the free and wired Potts Gibbs measures on $\mathtt{T}_d$. Further… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 52 pages, 1 figure

    MSC Class: 60K35; 82B20; 82B27

  11. arXiv:2505.24164  [pdf, ps, other

    cs.CL cs.CV

    Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

    Authors: Shilin Xu, Yanwei Li, Rui Yang, Tao Zhang, Yueyi Sun, Wei Chow, Linfeng Li, Hang Song, Qi Xu, Yunhai Tong, Xiangtai Li, Hao Fei

    Abstract: Recent works on large language models (LLMs) have successfully demonstrated the emergence of reasoning capabilities via reinforcement learning (RL). Although recent efforts leverage group relative policy optimization (GRPO) for MLLMs post-training, they constantly explore one specific aspect, such as grounding tasks, math problems, or chart analysis. There are no works that can leverage multi-sour… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Report number: arxiv:2505.24164

  12. arXiv:2505.24160  [pdf, ps, other

    eess.IV cs.CV

    Beyond the LUMIR challenge: The pathway to foundational registration models

    Authors: Junyu Chen, Shuwen Wei, Joel Honkamaa, Pekka Marttinen, Hang Zhang, Min Liu, Yichao Zhou, Zuopeng Tan, Zhuoyuan Wang, Yi Wang, Hongchao Zhou, Shunbo Hu, Yi Zhang, Qian Tao, Lukas Förner, Thomas Wendler, Bailiang Jian, Benedikt Wiestler, Tim Hable, Jin Kim, Dan Ruan, Frederic Madesta, Thilo Sentker, Wiebke Heyer, Lianrui Zuo , et al. (11 additional authors not shown)

    Abstract: Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  13. arXiv:2505.23757  [pdf, ps, other

    cs.CV

    Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

    Authors: Haohan Chi, Huan-ang Gao, Ziming Liu, Jianing Liu, Chenyu Liu, Jinwei Li, Kaisen Yang, Yangcheng Yu, Zeda Wang, Wenyi Li, Leichen Wang, Xingtao Hu, Hao Sun, Hang Zhao, Hao Zhao

    Abstract: Vision-Language-Action (VLA) models for autonomous driving show promise but falter in unstructured corner case scenarios, largely due to a scarcity of targeted benchmarks. To address this, we introduce Impromptu VLA. Our core contribution is the Impromptu VLA Dataset: over 80,000 meticulously curated video clips, distilled from over 2M source clips sourced from 8 open-source large-scale datasets.… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page: https://github.com/ahydchh/Impromptu-VLA

  14. arXiv:2505.23340  [pdf, ps, other

    math.AG math-ph math.RT math.SG

    Quantum cohomology, shift operators, and Coulomb branches

    Authors: Ki Fung Chan, Kwokwai Chan, Chin Hang Eddie Lam

    Abstract: Given a complex reductive group $G$ and a $G$-representation $\mathbf{N}$, there is an associated quantized Coulomb branch algebra $\mathcal{A}_{G,\mathbf{N}}^\hbar$ defined by Braverman, Finkelberg and Nakajima. In this paper, we give a new interpretation of $\mathcal{A}_{G,\mathbf{N}}^\hbar$ as the largest subalgebra of the equivariant Borel--Moore homology of the affine Grassmannian on which sh… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  15. arXiv:2505.23115  [pdf, ps, other

    cs.CV

    Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

    Authors: Yunshen Wang, Yicheng Liu, Tianyuan Yuan, Yucheng Mao, Yingshi Liang, Xiuyu Yang, Honggang Zhang, Hang Zhao

    Abstract: Accurately predicting 3D occupancy grids from visual inputs is critical for autonomous driving, but current discriminative methods struggle with noisy data, incomplete observations, and the complex structures inherent in 3D scenes. In this work, we reframe 3D occupancy prediction as a generative modeling task using diffusion models, which learn the underlying data distribution and incorporate 3D s… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: ICRA 2025

  16. arXiv:2505.23013  [pdf, other

    cs.LG

    Scalable Complexity Control Facilitates Reasoning Ability of LLMs

    Authors: Liangkai Hang, Junjie Yao, Zhiwei Bai, Tianyi Chen, Yang Chen, Rongjie Diao, Hezhou Li, Pengxiao Lin, Zhiwei Wang, Cheng Xu, Zhongwang Zhang, Zhangchen Zhou, Zhiyu Li, Zehao Lin, Kai Chen, Feiyu Xiong, Yaoyu Zhang, Weinan E, Hongkang Yang, Zhi-Qin John Xu

    Abstract: The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over va… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  17. arXiv:2505.22105  [pdf, ps, other

    cs.CV

    Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

    Authors: Hang Chen, Maoyuan Ye, Peng Yang, Haibin He, Juhua Liu, Bo Du

    Abstract: Power transmission corridor hazard segmentation (PTCHS) aims to separate transmission equipment and surrounding hazards from complex background, conveying great significance to maintaining electric power transmission safety. Recently, the Segment Anything Model (SAM) has emerged as a foundational vision model and pushed the boundaries of segmentation tasks. However, SAM struggles to deal with the… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  18. arXiv:2505.22058  [pdf

    cond-mat.mtrl-sci

    The experimental determination of exchange mass terms in surface states on both terminations of MnBi4Te7

    Authors: Dezhi Song, Fuyang Hang, Gang Yao, Jun Zhang, Ye-Ping Jiang, Jin-Feng Jia

    Abstract: The intrinsic antiferromagnetic topological insulators in the Mn-Bi-Te family, composed of superlattice-like MnBi2Te4/(Bi2Te3)n (n = 0, 1, 2, 3...) layered structure, present intriguing states of matter such as quantum anomalous Hall effect and the axion insulator. However, the surface state gap, which is the prerequisite for the observation of these states, remains elusive. Here by molecular beam… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 19 pages,9 figures, including supporting materials

  19. arXiv:2505.22008  [pdf, ps, other

    physics.ao-ph cs.LG

    Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences

    Authors: Jing-An Sun, Hang Fan, Junchao Gong, Ben Fei, Kun Chen, Fenghua Ling, Wenlong Zhang, Wanghan Xu, Li Yan, Pierre Gentine, Lei Bai

    Abstract: Data assimilation (DA) aims to estimate the full state of a dynamical system by combining partial and noisy observations with a prior model forecast, commonly referred to as the background. In atmospheric applications, this problem is fundamentally ill-posed due to the sparsity of observations relative to the high-dimensional state space. Traditional methods address this challenge by simplifying b… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  20. arXiv:2505.21541  [pdf, ps, other

    cs.CV cs.AI

    DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

    Authors: Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song

    Abstract: Diffusion models have recently motivated great success in many generation tasks like object removal. Nevertheless, existing image decomposition methods struggle to disentangle semi-transparent or transparent layer occlusions due to mask prior dependencies, static object assumptions, and the lack of datasets. In this paper, we delve into a novel task: Layer-Wise Decomposition of Alpha-Composited Im… ▽ More

    Submitted 30 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

  21. arXiv:2505.21500  [pdf, ps, other

    cs.CV cs.AI cs.CL

    ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

    Authors: Dingming Li, Hongxing Li, Zixuan Wang, Yuchen Yan, Hang Zhang, Siqi Chen, Guiyang Hou, Shengpei Jiang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

    Abstract: Vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and reasoning about visual content, but significant challenges persist in tasks requiring cross-viewpoint understanding and spatial reasoning. We identify a critical limitation: current VLMs excel primarily at egocentric spatial reasoning (from the camera's perspective) but fail to generalize to allocentric vi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project: https://zju-real.github.io/ViewSpatial-Page/

  22. arXiv:2505.21411  [pdf, ps, other

    cs.CL

    Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

    Authors: Yehui Tang, Xiaosong Li, Fangcheng Liu, Wei Guo, Hang Zhou, Yaoyuan Wang, Kai Han, Xianzhi Yu, Jinpeng Li, Hui Zang, Fei Mi, Xiaojun Meng, Zhicheng Liu, Hanting Chen, Binfan Zheng, Can Chen, Youliang Yan, Ruiming Tang, Peifeng Qin, Xinghao Chen, Dacheng Tao, Yunhe Wang

    Abstract: The surgence of Mixture of Experts (MoE) in Large Language Models promises a small price of execution cost for a much larger model parameter count and learning capacity, because only a small fraction of parameters are activated for each input token. However, it is commonly observed that some experts are activated far more often than others, leading to system inefficiency when running the experts o… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  23. arXiv:2505.21138  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis

    Authors: Tianyi Xu, Hongjie Chen, Wang Qing, Lv Hang, Jian Kang, Li Jie, Zhennan Lin, Yongxiang Li, Xie Lei

    Abstract: Large-scale training corpora have significantly improved the performance of ASR models. Unfortunately, due to the relative scarcity of data, Chinese accents and dialects remain a challenge for most ASR models. Recent advancements in self-supervised learning have shown that self-supervised pre-training, combined with large language models (LLM), can effectively enhance ASR performance in low-resour… ▽ More

    Submitted 16 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  24. arXiv:2505.20941  [pdf, ps, other

    cs.CV

    PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter

    Authors: Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, Shu-Tao Xia

    Abstract: Applying pre-trained models to assist point cloud understanding has recently become a mainstream paradigm in 3D perception. However, existing application strategies are straightforward, utilizing only the final output of the pre-trained model for various task heads. It neglects the rich complementary information in the intermediate layer, thereby failing to fully unlock the potential of pre-traine… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025

  25. arXiv:2505.20910  [pdf, other

    cs.CL

    Automated Privacy Information Annotation in Large Language Model Interactions

    Authors: Hang Zeng, Xiangyu Liu, Yong Hu, Chaoyue Niu, Fan Wu, Shaojie Tang, Guihai Chen

    Abstract: Users interacting with large language models (LLMs) under their real identifiers often unknowingly risk disclosing private information. Automatically notifying users whether their queries leak privacy and which phrases leak what private information has therefore become a practical need. Existing privacy detection methods, however, were designed for different objectives and application scenarios, t… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 9 content pages

  26. arXiv:2505.20426  [pdf, other

    cs.CV

    MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

    Authors: Yunlong Tang, Pinxin Liu, Mingqian Feng, Zhangyun Tan, Rui Mao, Chao Huang, Jing Bi, Yunzhong Xiao, Susan Liang, Hang Hua, Ali Vosoughi, Luchuan Song, Zeliang Zhang, Chenliang Xu

    Abstract: Understanding perspective is fundamental to human visual perception, yet the extent to which multimodal large language models (MLLMs) internalize perspective geometry remains unclear. We introduce MMPerspective, the first benchmark specifically designed to systematically evaluate MLLMs' understanding of perspective through 10 carefully crafted tasks across three complementary dimensions: Perspecti… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  27. arXiv:2505.20375  [pdf, ps, other

    physics.chem-ph

    Breaking the Quadrillion Determinant Barrier in Numerically Exact Configuration Interaction

    Authors: Agam Shayit, Can Liao, Shiv Upadhyay, Hang Hu, Tianyuan Zhang, Eugene DePrince III, Chao Yang, Xiaosong Li

    Abstract: The combinatorial scaling of configuration interaction (CI) has long restricted its applicability to only the simplest molecular systems. Here, we report the first numerically exact CI calculation exceeding one quadrillion ($10^{15}$) determinants, enabled by lossless categorical compression within the small-tensor-product distributed active space (STP-DAS) framework. As a demonstration, we conver… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  28. arXiv:2505.19750  [pdf, other

    cs.CV

    SuperAD: A Training-free Anomaly Classification and Segmentation Method for CVPR 2025 VAND 3.0 Workshop Challenge Track 1: Adapt & Detect

    Authors: Huaiyuan Zhang, Hang Chen, Yu Cheng, Shunyi Wu, Linghao Sun, Linao Han, Zeyu Shi, Lei Qi

    Abstract: In this technical report, we present our solution to the CVPR 2025 Visual Anomaly and Novelty Detection (VAND) 3.0 Workshop Challenge Track 1: Adapt & Detect: Robust Anomaly Detection in Real-World Applications. In real-world industrial anomaly detection, it is crucial to accurately identify anomalies with physical complexity, such as transparent or reflective surfaces, occlusions, and low-contras… ▽ More

    Submitted 27 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  29. arXiv:2505.19655  [pdf, ps, other

    math.AP

    Flow approach on Riesz type nonlocal energies

    Authors: Jiaxin He, Qinfeng Li, Juncheng Wei, Hang Yang

    Abstract: Via continuous deformations based on natural flow evolutions, we prove several novel monotonicity results for Riesz-type nonlocal energies on triangles and quadrilaterals. Some of these results imply new and simpler proofs for known theorems without relying on any symmetrization arguments.

    Submitted 26 May, 2025; originally announced May 2025.

  30. arXiv:2505.19627  [pdf, ps, other

    cond-mat.mtrl-sci

    In-depth Investigation of Conduction Mechanism on Defect-induced Proton-conducting Electrolytes BaHfO$_3$

    Authors: Peng Feng, Hang Ma, Kuan Yang, Yingjie Lv, Ying Liang, Tianxing Ma, Jiajun Linghu, Zhi-Peng Li

    Abstract: This study utilizes first-principles computational methods to comprehensively analyze the impact of A-site doping on the proton conduction properties of BaHfO$_3$. The goal is to offer theoretical support for the advancement of electrolyte materials for solid oxide fuel cells. Our research has uncovered that BaHfO$_3$ demonstrates promising potential for proton conduction, with a low proton migrat… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  31. arXiv:2505.19415  [pdf, ps, other

    cs.CV

    MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

    Authors: Hang Hua, Ziyun Zeng, Yizhi Song, Yunlong Tang, Liu He, Daniel Aliaga, Wei Xiong, Jiebo Luo

    Abstract: Recent multimodal image generators such as GPT-4o, Gemini 2.0 Flash, and Gemini 2.5 Pro excel at following complex instructions, editing images and maintaining concept consistency. However, they are still evaluated by disjoint toolkits: text-to-image (T2I) benchmarks that lacks multi-modal conditioning, and customized image generation benchmarks that overlook compositional semantics and common kno… ▽ More

    Submitted 27 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  32. arXiv:2505.19099  [pdf, ps, other

    cs.AI physics.ed-ph physics.pop-ph

    SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

    Authors: Kun Xiang, Heng Li, Terry Jingchen Zhang, Yinya Huang, Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, Mrinmaya Sachan, Xiaodan Liang

    Abstract: We present SeePhys, a large-scale multimodal benchmark for LLM reasoning grounded in physics questions ranging from middle school to PhD qualifying exams. The benchmark covers 7 fundamental domains spanning the physics discipline, incorporating 21 categories of highly heterogeneous diagrams. In contrast to prior works where visual elements mainly serve auxiliary purposes, our benchmark features a… ▽ More

    Submitted 17 June, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

    Comments: 46 pages

  33. arXiv:2505.18994  [pdf, ps, other

    cs.RO

    Designing Pin-pression Gripper and Learning its Dexterous Grasping with Online In-hand Adjustment

    Authors: Hewen Xiao, Xiuping Liu, Hang Zhao, Jian Liu, Kai Xu

    Abstract: We introduce a novel design of parallel-jaw grippers drawing inspiration from pin-pression toys. The proposed pin-pression gripper features a distinctive mechanism in which each finger integrates a 2D array of pins capable of independent extension and retraction. This unique design allows the gripper to instantaneously customize its finger's shape to conform to the object being grasped by dynamica… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  34. arXiv:2505.18993  [pdf

    cond-mat.mes-hall

    A high-efficiency neuroevolution potential for tobermorite and calcium silicate hydrate systems with ab initio accuracy

    Authors: Xiao Xu, Shijie Wang, Haifeng Qin, Zhiqiang Zhao, Zheyong Fan, Zhuhua Zhang, Hang Yin

    Abstract: Tobermorite and Calcium Silicate Hydrate (C-S-H) systems are indispensable cement materials but still lack a satisfactory interatomic potential with both high accuracy and high computational efficiency for better understanding their mechanical performance. Here, we develop a Neuroevolution Machine Learning Potential (NEP) with Ziegler-Biersack-Littmark hybrid framework for tobermorite and C-S-H sy… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  35. arXiv:2505.18829  [pdf, ps, other

    cs.AI cs.HC cs.OS

    LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS

    Authors: Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang

    Abstract: We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structur… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  36. arXiv:2505.18334  [pdf, ps, other

    cs.RO cs.AI cs.MA

    Towards Natural Language Communication for Cooperative Autonomous Driving via Self-Play

    Authors: Jiaxun Cui, Chen Tang, Jarrett Holtz, Janice Nguyen, Alessandro G. Allievi, Hang Qiu, Peter Stone

    Abstract: Past work has demonstrated that autonomous vehicles can drive more safely if they communicate with one another than if they do not. However, their communication has often not been human-understandable. Using natural language as a vehicle-to-vehicle (V2V) communication protocol offers the potential for autonomous vehicles to drive cooperatively not only with each other but also with human drivers.… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  37. arXiv:2505.18190  [pdf, other

    eess.SP cs.AI cs.LG

    PhySense: Sensor Placement Optimization for Accurate Physics Sensing

    Authors: Yuezhou Ma, Haixu Wu, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long

    Abstract: Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placeme… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  38. arXiv:2505.17827  [pdf, ps, other

    cs.CL

    Not All Tokens Are What You Need In Thinking

    Authors: Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen

    Abstract: Modern reasoning models, such as OpenAI's o1 and DeepSeek-R1, exhibit impressive problem-solving capabilities but suffer from critical inefficiencies: high inference latency, excessive computational resource consumption, and a tendency toward overthinking -- generating verbose chains of thought (CoT) laden with redundant tokens that contribute minimally to the final answer. To address these issues… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 11 pages, 7 figures and 3 tables

  39. arXiv:2505.17806  [pdf, ps, other

    math.GN

    d-Boolean algebras and their bitopological representation

    Authors: Hang Yang, Dexue Zhang

    Abstract: We present a Stone duality for bitopological spaces in analogy to the duality between Stone spaces and Boolean algebras, in the same vein as the duality between d-sober bitopological spaces and spatial d-frames established by Jung and Moshier. Precisely, we introduce the notion of d-Boolean algebras and prove that the category of such algebras is dually equivalent to the category of Stone bitopolo… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 26 pages

    MSC Class: 54E55; 18F70; 06E75

  40. arXiv:2505.17646  [pdf, ps, other

    cs.LG

    Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives

    Authors: Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu

    Abstract: Recent studies have revealed that the loss landscape of large language models resembles a basin, within which the models perform nearly identically, and outside of which they lose all their capabilities. In this work, we conduct further studies on the loss landscape of large language models. We discover that pre-training creates a "basic capability" basin, and subsequent fine-tuning creates "speci… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  41. arXiv:2505.16901  [pdf, ps, other

    cs.SE cs.LG

    Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

    Authors: Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di

    Abstract: Recent advances in Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging. Current solutions predominantly rely on proprietary LLM agents, which introduce unpredictability and limit accessibility, raising concerns about data privacy and model customization. This paper investigates whether open-source LLM… ▽ More

    Submitted 19 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 35 pages, 10 figures

  42. arXiv:2505.16633  [pdf, ps, other

    cs.CV

    Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds

    Authors: Valentin Schmuker, Alex Hoi Hang Chan, Bastian Goldluecke, Urs Waldmann

    Abstract: In this paper, we present a texture-independent approach to estimate and track 3D joint positions of multiple pigeons. For this purpose, we build upon the existing 3D-MuPPET framework, which estimates and tracks the 3D poses of up to 10 pigeons using a multi-view camera setup. We extend this framework by using a segmentation method that generates silhouettes of the individuals, which are then used… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  43. arXiv:2505.16059  [pdf, ps, other

    cs.LO

    Monitoring in the Dark: Privacy-Preserving Runtime Verification of Cyber-Physical Systems

    Authors: Charles Koll, Preston Tan Hang, Mike Rosulek, Houssam Abbas

    Abstract: In distributed Cyber-Physical Systems and Internet-of-Things applications, the nodes of the system send measurements to a monitor that checks whether these measurements satisfy given formal specifications. For instance in Urban Air Mobility, a local traffic authority will be monitoring drone traffic to evaluate its flow and detect emerging problematic patterns. Certain applications require both th… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 18 pages, 9 figures

    ACM Class: B.5.0; I.2.4

  44. arXiv:2505.15880  [pdf, other

    cs.CV

    Challenger: Affordable Adversarial Driving Video Generation

    Authors: Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao

    Abstract: Generating photorealistic driving videos has seen significant progress recently, but current methods largely focus on ordinary, non-adversarial scenarios. Meanwhile, efforts to generate adversarial driving scenarios often operate on abstract trajectory or BEV representations, falling short of delivering realistic sensor data that can truly stress-test autonomous driving (AD) systems. In this work,… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Project page: https://pixtella.github.io/Challenger/

  45. arXiv:2505.14163  [pdf, ps, other

    cs.AI

    DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation

    Authors: He Wang, Alexander Hanbo Li, Yiqun Hu, Sheng Zhang, Hideo Kobayashi, Jiani Zhang, Henry Zhu, Chung-Wei Hang, Patrick Ng

    Abstract: Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning techniques, while overlooking the importance of the order in which problems are tackled during inference. In this work, we develop a novel inference-time optim… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  46. arXiv:2505.13971  [pdf, ps, other

    cs.SD cs.AI eess.AS

    The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition

    Authors: Ming Gao, Shilong Wu, Hang Chen, Jun Du, Chin-Hui Lee, Shinji Watanabe, Jingdong Chen, Siniscalchi Sabato Marco, Odette Scharenborg

    Abstract: Meetings are a valuable yet challenging scenario for speech applications due to complex acoustic conditions. This paper summarizes the outcomes of the MISP 2025 Challenge, hosted at Interspeech 2025, which focuses on multi-modal, multi-device meeting transcription by incorporating video modality alongside audio. The tasks include Audio-Visual Speaker Diarization (AVSD), Audio-Visual Speech Recogni… ▽ More

    Submitted 27 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025. Camera-ready version

  47. arXiv:2505.12428  [pdf, ps, other

    cs.RO

    Depth Transfer: Learning to See Like a Simulator for Real-World Drone Navigation

    Authors: Hang Yu, Christophe De Wagter, Guido C. H. E de Croon

    Abstract: Sim-to-real transfer is a fundamental challenge in robot reinforcement learning. Discrepancies between simulation and reality can significantly impair policy performance, especially if it receives high-dimensional inputs such as dense depth estimates from vision. We propose a novel depth transfer method based on domain adaptation to bridge the visual gap between simulated and real-world depth data… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  48. arXiv:2505.12320  [pdf, ps, other

    physics.acc-ph

    First Lasing and Stable Operation of a Direct-Amplification Enabled Harmonic Generation Free-Electron laser

    Authors: Zheng Qi, Junhao Liu, Lanpeng Ni, Tao Liu, Zhen Wang, Kaiqing Zhang, Hanxiang Yang, Zhangfeng Gao, Nanshun Huang, Si Chen, Hang Luo, Yaozong Xiao, Cheng Yu, Yongmei Wen, Fei Gao, Yangyang Lei, Huan Zhao, Yanyan Zhu, Liping Sun, Weiyi Yin, Xingtao Wang, Taihe Lan, Xiaoqing Liu, Lie Feng, Wenyan Zhang , et al. (5 additional authors not shown)

    Abstract: Seeded free-electron lasers (FELs) capable of operating at repetition rates up to the MHz level are in high demand for advanced time-resolved spectroscopies, which require both full longitudinal coherence and high average photon flux in the extreme ultraviolet (EUV) and x-ray regimes. However, conventional external-seed laser systems cannot sustain MHz operation with sufficient hundreds of megawat… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  49. arXiv:2505.11216  [pdf, ps, other

    cs.CV

    GeoMM: On Geodesic Perspective for Multi-modal Learning

    Authors: Shibin Mei, Hang Wang, Bingbing Ni

    Abstract: Geodesic distance serves as a reliable means of measuring distance in nonlinear spaces, and such nonlinear manifolds are prevalent in the current multimodal learning. In these scenarios, some samples may exhibit high similarity, yet they convey different semantics, making traditional distance metrics inadequate for distinguishing between positive and negative samples. This paper introduces geodesi… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 15 pages, 3 figures, accepted by CVPR2025

  50. arXiv:2505.11123  [pdf, ps, other

    cs.RO cs.AI

    Conditioning Matters: Training Diffusion Policies is Faster Than You Think

    Authors: Zibin Dong, Yicheng Liu, Yinchuan Li, Hang Zhao, Jianye Hao

    Abstract: Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into mo… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2505.10105