Skip to main content

Showing 101–150 of 6,717 results for author: Mao, Y

.
  1. arXiv:2506.13549  [pdf, ps, other

    hep-ph hep-lat nucl-th

    Connecting dilaton thermal fluctuation with the Polyakov loop at finite temperature

    Authors: Bing-Kai Sheng, Yong-Liang Ma

    Abstract: Understanding the character of the deconfinement phase transition is one of the fundamental challenges in particle physics. In this work, we derive a formula for the expectation value of the Polyakov loop -- the order parameter of the deconfinement phase transition -- in pure $\mathrm{SU(N_{\mathrm{c}})}$ gauge systems at finite temperatures starting from the Coleman\textendash Weinberg-type effec… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 28 pages. Comments are welcome

  2. arXiv:2506.13138  [pdf, ps, other

    cs.CV

    STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation

    Authors: Jiamin Wang, Yichen Yao, Xiang Feng, Hang Wu, Yaming Wang, Qingqiu Huang, Yuexin Ma, Xinge Zhu

    Abstract: The generation of temporally consistent, high-fidelity driving videos over extended horizons presents a fundamental challenge in autonomous driving world modeling. Existing approaches often suffer from error accumulation and feature misalignment due to inadequate decoupling of spatio-temporal dynamics and limited cross-frame feature propagation mechanisms. To address these limitations, we present… ▽ More

    Submitted 21 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  3. arXiv:2506.13096  [pdf, ps, other

    cond-mat.str-el quant-ph

    Diagnosing 2D symmetry protected topological states via mixed state anomaly

    Authors: Chao Xu, Yunlong Zang, Yixin Ma, Yingfei Gu, Shenghan Jiang

    Abstract: Symmetry-protected topological (SPT) phases are short-range entangled quantum states characterized by anomalous edge behavior, a manifestation of the bulk-boundary correspondence for topological phases. Moreover, the Li-Haldane conjecture posits that the entanglement spectrum exhibits the same anomaly as the physical edge spectrum, thereby serving as an entanglement-based fingerprint for identifyi… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 8+14 pages, 1 captioned figure, 1 table

  4. arXiv:2506.13085  [pdf, ps, other

    quant-ph gr-qc physics.ins-det

    Testing the quantum nature of gravity through interferometry

    Authors: Yubao Liu, Yanbei Chen, Kentaro Somiya, Yiqiu Ma

    Abstract: We propose a Michelson-type interferometric protocol for testing the quantum nature of gravity through testing the phenomenology of semi-classical gravity theory, which predicts a state-dependent Schrodinger-Newton (SN) evolution of the test mass. The protocol's feature lies in utilizing the asymmetry of two interferometric arms induced by SN self-gravity to create cross-talk between the common an… ▽ More

    Submitted 17 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 30 pages, 15 figures

  5. arXiv:2506.12797  [pdf, ps, other

    quant-ph gr-qc

    Distinguishing Quantum and Classical Gravity via Non-Stationary Test Mass Dynamics

    Authors: Wenjie Zhong, Yubao Liu, Yiqiu Ma

    Abstract: Classical gravity theory predicts a state-dependent gravitational potential for a quantum test mass, leading to nonlinear Schrodinger-Newton (SN) state evolution that contrasts with quantum gravity. Testing the effect of SN evolution can provide evidence for distinguishing quantum gravity and classical gravity, which is challenging to realize in the stationary optomechanical systems as analyzed in… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 20 pages, 15 figures

  6. arXiv:2506.12285  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD

    CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

    Authors: Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa

    Abstract: Recent advances in audio-text large language models (LLMs) have opened new possibilities for music understanding and generation. However, existing benchmarks are limited in scope, often relying on simplified tasks or multi-choice evaluations that fail to reflect the complexity of real-world music analysis. We reinterpret a broad range of traditional MIR annotations as instruction-following formats… ▽ More

    Submitted 27 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by ISMIR 2025

  7. arXiv:2506.12055  [pdf

    q-bio.NC cs.AI

    Towards Unified Neural Decoding with Brain Functional Network Modeling

    Authors: Di Wu, Linghao Bu, Yifei Jia, Lu Cao, Siyuan Li, Siyu Chen, Yueqian Zhou, Sheng Fan, Wenjie Ren, Dengchang Wu, Kang Wang, Yue Zhang, Yuehui Ma, Jie Yang, Mohamad Sawan

    Abstract: Recent achievements in implantable brain-computer interfaces (iBCIs) have demonstrated the potential to decode cognitive and motor behaviors with intracranial brain recordings; however, individual physiological and electrode implantation heterogeneities have constrained current approaches to neural decoding within single individuals, rendering interindividual neural decoding elusive. Here, we pres… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  8. arXiv:2506.11425  [pdf, ps, other

    cs.CL cs.AI

    Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards

    Authors: Jeff Da, Clinton Wang, Xiang Deng, Yuntao Ma, Nikhil Barhate, Sean Hendryx

    Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has been widely adopted as the de facto method for enhancing the reasoning capabilities of large language models and has demonstrated notable success in verifiable domains like math and competitive programming tasks. However, the efficacy of RLVR diminishes significantly when applied to agentic environments. These settings, characterized by mul… ▽ More

    Submitted 20 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  9. arXiv:2506.11041  [pdf, ps, other

    cs.LG

    ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery

    Authors: Xiaobao Huang, Yihong Ma, Anjali Gurajapu, Jules Schleinitz, Zhichun Guo, Sarah E. Reisman, Nitesh V. Chawla

    Abstract: Reaction virtual screening and discovery are fundamental challenges in chemistry and materials science, where traditional graph neural networks (GNNs) struggle to model multi-reactant interactions. In this work, we propose ChemHGNN, a hypergraph neural network (HGNN) framework that effectively captures high-order relationships in reaction networks. Unlike GNNs, which require constructing complete… ▽ More

    Submitted 21 May, 2025; originally announced June 2025.

  10. arXiv:2506.10914  [pdf, ps, other

    cs.LG

    Foundation Models for Causal Inference via Prior-Data Fitted Networks

    Authors: Yuchen Ma, Dennis Frauen, Emil Javurek, Stefan Feuerriegel

    Abstract: Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in v… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  11. arXiv:2506.10762  [pdf, ps, other

    cs.HC

    Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction

    Authors: Bao Zhang, Zihan Li, Zhenglei Liu, Huanchen Wang, Yuxin Ma

    Abstract: Text animation, a foundational element in video creation, enables efficient and cost-effective communication, thriving in advertisements, journalism, and social media. However, traditional animation workflows present significant usability barriers for non-professionals, with intricate operational procedures severely hindering creative productivity. To address this, we propose a Large Language Mode… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  12. arXiv:2506.10601  [pdf, ps, other

    cs.CV

    Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection

    Authors: Xinyuan Liu, Hang Xu, Yike Ma, Yucheng Zhang, Feng Dai

    Abstract: Recent remote sensing tech advancements drive imagery growth, making oriented object detection rapid development, yet hindered by labor-intensive annotation for high-density scenes. Oriented object detection with point supervision offers a cost-effective solution for densely packed scenes in remote sensing, yet existing methods suffer from inadequate sample assignment and instance confusion due to… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  13. arXiv:2506.10316  [pdf, ps, other

    hep-ex

    Search for sub-GeV invisible particles in inclusive decays of $J/ψ$ to $φ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (704 additional authors not shown)

    Abstract: A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 10 pages, 3 figures

  14. arXiv:2506.09713  [pdf, ps, other

    cs.SE

    A First Look at Bugs in LLM Inference Engines

    Authors: Mugeng Liu, Siqi Zhong, Weichen Bi, Yixuan Zhang, Zhiyang Chen, Zhenpeng Chen, Xuanzhe Liu, Yun Ma

    Abstract: Large language model-specific inference engines (in short as \emph{LLM inference engines}) have become a fundamental component of modern AI infrastructure, enabling the deployment of LLM-powered applications (LLM apps) across cloud and local devices. Despite their critical role, LLM inference engines are prone to bugs due to the immense resource demands of LLMs and the complexities of cross-platfo… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Under review

  15. arXiv:2506.09386  [pdf, ps, other

    hep-ex

    Search for the charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (705 additional authors not shown)

    Abstract: Based on $(10087\pm44)\times 10^6$ $J/ψ$ events recorded with the BESIII detector, we search for the rare charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$ No signal is observed, and upper limits on the branching fractions at the $90\%$ confidence level are set as $\mathcal{B}(J/ψ\to D_{s}^{-}ρ^{+}+c.c.)<8.0\times10^{-7}$ and… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 18 pages, 3 figures

  16. arXiv:2506.09373  [pdf, ps, other

    cs.LG cs.AI cs.CV

    LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

    Authors: Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen

    Abstract: The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive positional data. Existing… ▽ More

    Submitted 15 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  17. arXiv:2506.09349  [pdf, ps, other

    cs.CL

    OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment

    Authors: Chao-Hong Tan, Qian Chen, Wen Wang, Chong Deng, Qinglin Zhang, Luyao Cheng, Hai Yu, Xin Zhang, Xiang Lv, Tianyu Zhao, Chong Zhang, Yukun Ma, Yafeng Chen, Hui Wang, Jiaqing Liu, Jieping Ye

    Abstract: Recent studies on end-to-end speech generation with large language models (LLMs) have attracted significant community attention, with multiple works extending text-based LLMs to generate discrete speech tokens. Existing approaches primarily fall into two categories: (1) Methods that generate discrete speech tokens independently without incorporating them into the LLM's autoregressive process, resu… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  18. arXiv:2506.09169  [pdf, ps, other

    cs.RO

    Hearing the Slide: Acoustic-Guided Constraint Learning for Fast Non-Prehensile Transport

    Authors: Yuemin Mao, Bardienus P. Duisterhof, Moonyoung Lee, Jeffrey Ichnowski

    Abstract: Object transport tasks are fundamental in robotic automation, emphasizing the importance of efficient and secure methods for moving objects. Non-prehensile transport can significantly improve transport efficiency, as it enables handling multiple objects simultaneously and accommodating objects unsuitable for parallel-jaw or suction grasps. Existing approaches incorporate constraints based on the C… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  19. arXiv:2506.09092  [pdf, ps, other

    cs.LG cs.AI

    CUDA-LLM: LLMs Can Write Efficient CUDA Kernels

    Authors: Wentao Chen, Jiace Zhu, Qi Fan, Yehan Ma, An Zou

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively parallel GPUs, remains a complex challenge. In this work, we explore the use of LLMs for the automated generation and optimization of CUDA programs, with the goal… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  20. arXiv:2506.09046  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

    Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma

    Abstract: Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design,… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  21. arXiv:2506.09025  [pdf, ps, other

    cond-mat.stat-mech math.DS

    Mixed phases in feedback Ising models

    Authors: Yi-Ping Ma, Ivan Sudakow, P. L. Krapivsky

    Abstract: We study mean-field Ising models whose coupling depends on the magnetization via a feedback function. We identify mixed phases (MPs) and show that they can be stable at zero temperature for sufficiently strong feedback. Moreover, stable MPs are always super-stable with perturbation decaying linearly in time. We argue that such feedback Ising models (FIMs) provide a useful framework for phase trans… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 7 pages, 3 figures

  22. arXiv:2506.08908  [pdf, ps, other

    cs.CV

    SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping

    Authors: Jiajun Li, Yue Ma, Xinyu Zhang, Qingyan Wei, Songhua Liu, Linfeng Zhang

    Abstract: Recent studies on Visual Autoregressive (VAR) models have highlighted that high-frequency components, or later steps, in the generation process contribute disproportionately to inference latency. However, the underlying computational redundancy involved in these steps has yet to be thoroughly investigated. In this paper, we conduct an in-depth analysis of the VAR inference process and identify two… ▽ More

    Submitted 10 July, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  23. arXiv:2506.08797  [pdf, ps, other

    cs.CV

    HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation

    Authors: Ziyao Huang, Zixiang Zhou, Juan Cao, Yifeng Ma, Yi Chen, Zejing Rao, Zhiyong Xu, Hongmei Wang, Qin Lin, Yuan Zhou, Qinglin Lu, Fan Tang

    Abstract: To address key limitations in human-object interaction (HOI) video generation -- specifically the reliance on curated motion data, limited generalization to novel objects/scenarios, and restricted accessibility -- we introduce HunyuanVideo-HOMA, a weakly conditioned multimodal-driven framework. HunyuanVideo-HOMA enhances controllability and reduces dependency on precise inputs through sparse, deco… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  24. arXiv:2506.08594  [pdf, ps, other

    quant-ph cond-mat.dis-nn cs.AI cs.LG

    Solving excited states for long-range interacting trapped ions with neural networks

    Authors: Yixuan Ma, Chang Liu, Weikang Li, Shun-Yao Zhang, L. -M. Duan, Yukai Wu, Dong-Ling Deng

    Abstract: The computation of excited states in strongly interacting quantum many-body systems is of fundamental importance. Yet, it is notoriously challenging due to the exponential scaling of the Hilbert space dimension with the system size. Here, we introduce a neural network-based algorithm that can simultaneously output multiple low-lying excited states of a quantum many-body spin system in an accurate… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  25. arXiv:2506.08576  [pdf, ps, other

    hep-ex

    Measurement of the $η$ transition form factor through $η' \rightarrow π^+π^-η$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

    Abstract: Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  26. arXiv:2506.08516  [pdf, ps, other

    cs.LG

    NeurIPS 2024 ML4CFD Competition: Results and Retrospective Analysis

    Authors: Mouadh Yagoubi, David Danan, Milad Leyli-Abadi, Ahmed Mazari, Jean-Patrick Brunet, Abbas Kabalan, Fabien Casenave, Yuxin Ma, Giovanni Catalani, Jean Fesquet, Jacob Helwig, Xuan Zhang, Haiyang Yu, Xavier Bertrand, Frederic Tost, Michael Baurheim, Joseph Morlier, Shuiwang Ji

    Abstract: The integration of machine learning (ML) into the physical sciences is reshaping computational paradigms, offering the potential to accelerate demanding simulations such as computational fluid dynamics (CFD). Yet, persistent challenges in accuracy, generalization, and physical consistency hinder the practical deployment of ML models in scientific domains. To address these limitations and systemati… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  27. arXiv:2506.08502  [pdf, ps, other

    physics.atom-ph cond-mat.quant-gas quant-ph

    Topological Invariants in Nonlinear Thouless Pumping of Solitons

    Authors: Fei-Fei Wu, Xian-Da Zuo, Qing-Qing Zhu, Tao Yuan, Yi-Yi Mao, Chao Zeng, Yi Jiang, Yu-Ao Chen, Jian-Wei Pan, Wei Zheng, Han-Ning Dai

    Abstract: Recent explorations of quantized solitons transport in optical waveguides have thrust nonlinear topological pumping into the spotlight. In this work, we introduce a unified topological invariant applicable across both weakly and strongly nonlinear regimes. In the weak nonlinearity regime, where the nonlinear bands are wellseparated, the invariant reduces to the Abelian Chern number of the occupied… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 9 pages, 8 figures

  28. arXiv:2506.08466  [pdf, ps, other

    gr-qc hep-th

    Universality on thermodynamic relation with corrections in Einstein-Bel-Robinson gravity Black hole

    Authors: Hai-Long Zhen, Huai-Fan Li, Yu-Bo Ma

    Abstract: The generalized thermodynamic extremum relation, as proposed by Goon and Penco, establishes a novel theoretical framework for the study of spacetime thermodynamics. However, extant investigations generally assume that the black hole state parameter is solely a first-order function of the perturbation parameter when exploring the Goon-Penco relation in diverse spacetime contexts. An analytic expres… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  29. arXiv:2506.08424  [pdf, ps, other

    cs.AI

    SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy

    Authors: Yong Liang Goh, Zhiguang Cao, Yining Ma, Jianan Zhou, Mohammed Haroon Dupty, Wee Sun Lee

    Abstract: Recent advances toward foundation models for routing problems have shown great potential of a unified deep model for various VRP variants. However, they overlook the complex real-world customer distributions. In this work, we advance the Multi-Task VRP (MTVRP) setting to the more realistic yet challenging Multi-Task Multi-Distribution VRP (MTMDVRP) setting, and introduce SHIELD, a novel model that… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted in the 42nd International Conference of Machine Learning (ICML)

  30. arXiv:2506.08029  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Inverse Design in Distributed Circuits Using Single-Step Reinforcement Learning

    Authors: Jiayu Li, Masood Mortazavi, Ning Yan, Yihong Ma, Reza Zafarani

    Abstract: The goal of inverse design in distributed circuits is to generate near-optimal designs that meet a desirable transfer function specification. Existing design exploration methods use some combination of strategies involving artificial grids, differentiable evaluation procedures, and specific template topologies. However, real-world design practices often require non-differentiable evaluation proced… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: A briefer version of this paper was accepted as a Work-in-Progress (WIP) at the Design Automation Conference (DAC) 2024

  31. arXiv:2506.08011  [pdf, ps, other

    cs.CV cs.CL

    Play to Generalize: Learning to Reason Through Game Play

    Authors: Yunfei Xie, Yinsong Ma, Shiyi Lan, Alan Yuille, Junfei Xiao, Chen Wei

    Abstract: Developing generalizable reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by cognitive science literature suggesting that gameplay promotes transferable cognitive skills, we propose a novel post-training paradigm, Visual Game Learning, or ViGaL, where MLLMs develop out-of-domain generalization of multimodal reasoning through playing arcade-like game… ▽ More

    Submitted 4 July, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Project Page: https://yunfeixie233.github.io/ViGaL/

  32. arXiv:2506.07964  [pdf, ps, other

    cs.CV cs.AI

    SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design

    Authors: Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu

    Abstract: Manual slide creation is labor-intensive and requires expert prior knowledge. Existing natural language-based LLM generation methods struggle to capture the visual and structural nuances of slide designs. To address this, we formalize the Reference Image to Slide Generation task and propose Slide2Code, the first benchmark with difficulty-tiered samples based on a novel Slide Complexity Metric. We… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  33. arXiv:2506.07907  [pdf, ps, other

    hep-ex

    A novel measurement of the strong-phase difference between $D^0\to K^-π^+$ and $\bar{D}^0\to K^-π^+$ decays using $C$-even and $C$-odd quantum-correlated $D\bar{D}$ pairs

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (707 additional authors not shown)

    Abstract: A novel measurement technique of strong-phase differences between the decay amplitudes of $D^0$ and $\bar{D}^0$ mesons is introduced which exploits quantum-correlated $D\bar{D}$ pairs produced by $e^+e^-$ collisions at energies above the $ψ(3770)$ production threshold, where $D\bar{D}$ pairs are produced in both even and odd eigenstates of the charge-conjugation symmetry. Employing this technique,… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  34. arXiv:2506.07906  [pdf, ps, other

    hep-ex

    First observation of quantum correlations in $e^+e^-\to XD\bar{D}$ and $C$-even constrained $D\bar{D}$ pairs

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (707 additional authors not shown)

    Abstract: The study of meson pairs produced with quantum correlations gives direct access to parameters that are challenging to measure in other systems. In this Letter, the existence of quantum correlations due to charge-conjugation symmetry $C$ are demonstrated in $D\bar{D}$ pairs produced through the processes $e^+e^-\to D\bar{D}$, $e^+e^- \to D^{*}\bar{D}$, and $e^+e^- \to D^{*} \bar{D}^*$, where the la… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  35. arXiv:2506.07879  [pdf, ps, other

    hep-ex

    Measurement of the CP asymmetry in $D^+ \to π^+ π^0$ decays at Belle II

    Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett , et al. (380 additional authors not shown)

    Abstract: We measure the CP asymmetry in $D^+ \to π^+ π^0$ decays reconstructed in $e^+ e^-$ collisions at the Belle II experiment using a data set corresponding to an integrated luminosity of 428 fb$^{-1}$. A control sample of $D^+ \to π^+ K_{S}$ decays is used to correct for detection and production asymmetries. The result, $A_{CP}(D^+ \to π^+π^0) =(-1.8 \pm 0.9 \pm 0.1)\%$, where the first uncertainty is… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Report number: Belle II Preprint 2025-012, KEK Preprint 2025-10

  36. arXiv:2506.07818  [pdf, ps, other

    cs.CL

    WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code

    Authors: Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao, Tianrui Wan, Yilun Ma, Junyu Gao, Xuelong Li

    Abstract: With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application development. Considering that the model requires a confluence of multidimensional sub-capabilities to address the challenges of various development phases, constructing a multi-view evaluation framework is cr… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  37. arXiv:2506.07491  [pdf, ps, other

    cs.CV

    SpatialLM: Training Large Language Models for Structured Indoor Modeling

    Authors: Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, Zihan Zhou

    Abstract: SpatialLM is a large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object boxes with their semantic categories. Unlike previous methods which exploit task-specific network designs, our model adheres to the standard multimodal LLM architecture and is… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  38. arXiv:2506.07335  [pdf, ps, other

    cs.CL cs.AI

    Improving LLM Reasoning through Interpretable Role-Playing Steering

    Authors: Anyi Wang, Dong Shu, Yifan Wang, Yunpu Ma, Mengnan Du

    Abstract: Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs). However, existing methods primarily rely on prompt engineering, which often lacks stability and interpretability. In this paper, we introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 21 pages, 8 figures, 8 tables

  39. arXiv:2506.06366  [pdf, ps, other

    q-bio.NC cs.CY cs.MA

    AI Agent Behavioral Science

    Authors: Lin Chen, Yunke Zhang, Jie Feng, Haoye Chai, Honglin Zhang, Bingbing Fan, Yibo Ma, Shiyuan Zhang, Nian Li, Tianhui Liu, Nicholas Sukiennik, Keyu Zhao, Yu Li, Ziyi Liu, Fengli Xu, Yong Li

    Abstract: Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems o… ▽ More

    Submitted 12 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  40. arXiv:2506.06155  [pdf, ps, other

    cs.CV cs.LG

    Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method

    Authors: Wenyuan Li, Shunlin Liang, Yuxiang Zhang, Liqin Liu, Keyan Chen, Yongzhe Chen, Han Ma, Jianglei Xu, Yichuan Ma, Shikang Guan, Zhenwei Shi

    Abstract: Fine-grained crop type classification serves as the fundamental basis for large-scale crop mapping and plays a vital role in ensuring food security. It requires simultaneous capture of both phenological dynamics (obtained from multi-temporal satellite data like Sentinel-2) and subtle spectral variations (demanding nanometer-scale spectral resolution from hyperspectral imagery). Research combining… ▽ More

    Submitted 9 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

    Comments: 27 pages, 12 figures

  41. arXiv:2506.05761  [pdf, ps, other

    hep-ex

    Observation of $D^+\to K^0_Sπ^0μ^+ν_μ$, Test of Lepton Flavor Universality and First Angular Analysis of $D^+\to \bar{K}^\ast(892)^0\ell^+ν_\ell$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (696 additional authors not shown)

    Abstract: We report a study of the semileptonic decays $D^+\to K_S^0π^0\ell^+ν_\ell$ ($\ell = e, μ$) based on $20.3\,\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The $D^+\to K_S^0π^0μ^+ν_μ$ decay is observed for the first time, with a branching fraction of $(0.896\pm0.017_{\rm stat}\pm0.008_{\rm syst})\%$, and the branching frac… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  42. arXiv:2506.05554  [pdf, ps, other

    cs.CV

    EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh

    Authors: Tao Hu, Haoyang Peng, Xiao Liu, Yuewen Ma

    Abstract: Generating high-quality camera-controllable videos from monocular input is a challenging task, particularly under extreme viewpoint. Existing methods often struggle with geometric inconsistencies and occlusion artifacts in boundaries, leading to degraded visual quality. In this paper, we introduce EX-4D, a novel framework that addresses these challenges through a Depth Watertight Mesh representati… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  43. arXiv:2506.05507  [pdf, other

    hep-ex hep-th nucl-ex quant-ph

    Challenging Spontaneous Quantum Collapse with XENONnT

    Authors: E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Antón Martin, S. R. Armbruster, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, C. Cai, C. Capelli, J. M. R. Cardoso, A. P. Cimental Chávez, A. P. Colijn, J. Conrad , et al. (152 additional authors not shown)

    Abstract: We report on the search for X-ray radiation as predicted from dynamical quantum collapse with low-energy electronic recoil data in the energy range of 1-140 keV from the first science run of the XENONnT dark matter detector. Spontaneous radiation is an unavoidable effect of dynamical collapse models, which were introduced as a possible solution to the long-standing measurement problem in quantum m… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 7 pages, 3 figures

  44. arXiv:2506.05207  [pdf, ps, other

    cs.CV

    Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

    Authors: Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

    Abstract: Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to larg… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: project page: https://follow-your-motion.github.io/

  45. arXiv:2506.05055  [pdf, ps, other

    hep-ex

    Study of $f_1(1420)$ and $η(1405)$ in the decay $J/ψ\to γπ^{0}π^{0}π^{0}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (650 additional authors not shown)

    Abstract: A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$,… ▽ More

    Submitted 7 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  46. arXiv:2506.05019  [pdf, ps, other

    cs.CE

    FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis

    Authors: Wenyan Xu, Dawei Xiang, Yue Liu, Xiyu Wang, Yanxiang Ma, Liang Zhang, Chang Xu, Jiaheng Zhang

    Abstract: Pure time series forecasting tasks typically focus exclusively on numerical features; however, real-world financial decision-making demands the comparison and analysis of heterogeneous sources of information. Recent advances in deep learning and large scale language models (LLMs) have made significant strides in capturing sentiment and other qualitative signals, thereby enhancing the accuracy of f… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Under review

  47. arXiv:2506.04997  [pdf, ps, other

    cs.IR cs.CL

    Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings

    Authors: Yubo Ma, Jinsong Li, Yuhang Zang, Xiaobao Wu, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Despite the strong performance of ColPali/ColQwen2 in Visualized Document Retrieval (VDR), it encodes each page into multiple patch-level embeddings and leads to excessive memory usage. This empirical study investigates methods to reduce patch embeddings per page at minimum performance degradation. We evaluate two token-reduction strategies: token pruning and token merging. Regarding token pruning… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 findings

  48. arXiv:2506.04590  [pdf, ps, other

    cs.CV

    Follow-Your-Creation: Empowering 4D Creation through Video Inpainting

    Authors: Yue Ma, Kunyu Feng, Xinhua Zhang, Hongyu Liu, David Junhao Zhang, Jinbo Xing, Yinhan Zhang, Ayden Yang, Zeyu Wang, Qifeng Chen

    Abstract: We introduce Follow-Your-Creation, a novel 4D video creation framework capable of both generating and editing 4D content from a single monocular video input. By leveraging a powerful video inpainting foundation model as a generative prior, we reformulate 4D video creation as a video inpainting task, enabling the model to fill in missing content caused by camera trajectory changes or user edits. To… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Project Page: https://follow-your-creation.github.io/

  49. arXiv:2506.04560  [pdf, ps, other

    math.PR

    Universality of convergence rate of rightmost eigenvalue of complex IID random matrices

    Authors: Yutao Ma, Xinchen Hu

    Abstract: Let $X$ be an $n\times n$ matrix with independent and identically distributed (i.i.d.) entries $x_{ij} \stackrel{\text { d }}{=} n^{-1 / 2} ξ$ with $ξ$ being a complex random variable of mean zero and variance one. Let $\{σ_i\}_{1\le i\le n}$ be the eigenvalues of $X,$ and $R_n:=\max_i \Re σ_i$ and $Z_n$ be some rescaled version of $R_n.$ It was proved that $Z_n$ converges weakly to the Gumbel dis… ▽ More

    Submitted 8 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    MSC Class: 60G70; 60B20; 60B10

  50. arXiv:2506.04355  [pdf, ps, other

    hep-ex

    Charged-hadron identification at Belle II

    Authors: Belle II Collaboration, I. Adachi, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, A. Albert, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati , et al. (386 additional authors not shown)

    Abstract: The Belle II experiment's ability to identify particles critically affects the sensitivity of its measurements. We describe Belle II's algorithms for identifying charged particles and evaluate their performance in separating pions, kaons, and protons using 426 fb$^{-1}$ of data collected at the energy-asymmetric $e^+e^-$ collider SuperKEKB in 2019--2022 at center-of-mass energies at and near the m… ▽ More

    Submitted 10 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: 29 pages, 14 figures

    Report number: Belle II Preprint 2025-016, KEK Preprint 2025-15