Search | arXiv e-print repository

Heterodyne detection of low-frequency fields via Rydberg EIT with phase demodulation

Authors: Shenchao Jin, Xiayang Fan, Xin Wang, Yi Song, Yuan Sun

Abstract: Recently, the rapid progress of quantum sensing research reveals that the Rydberg atoms have great potentials in becoming high-precision centimeter-scale antenna of low-frequency fields. In order to facilitate efficient and reliable detection of low-frequency fields via Rydberg atoms, we design, implement and analyze a special but low-cost and scalable method based on heterodyning processes under… ▽ More Recently, the rapid progress of quantum sensing research reveals that the Rydberg atoms have great potentials in becoming high-precision centimeter-scale antenna of low-frequency fields. In order to facilitate efficient and reliable detection of low-frequency fields via Rydberg atoms, we design, implement and analyze a special but low-cost and scalable method based on heterodyning processes under the condition of electromagnetically induced transparency (EIT) embedded in typical two-photon ground-Rydberg transition. Instead of relying on observing changes in absorption of light by Rydberg atoms, our method focuses on the phase modulation effect on the probe laser induced by the low-frequency fields via the Rydberg EIT mechanism and utilizes a demodulation process to accurately retrieve the signal. The general principles of our method apply to both electric and magnetic fields and it is even possible to realize the combination of both functionalities in the same apparatus. In particular, we experimentally demonstrate the full cycle of operations with respect to both cases. In the measurement of low-frequency electric fields, we discover that the Rydberg dipole-dipole interaction among atoms induce linear superposition of Rydberg states with different angular momentum that generates a first-order response corresponding to the signature of linear Stark effect. As the Rydberg atoms have excellent coupling strengths with electric fields, our results indicate that our method can hopefully reach high-precision performance for practical tasks in the future. △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: 5 figures;

arXiv:2505.24241 [pdf, ps, other]

Advantageous Parameter Expansion Training Makes Better Large Language Models

Authors: Naibin Gu, Yilong Chen, Zhenyu Zhang, Peng Fu, Zheng Lin, Shuohuan Wang, Yu Sun, Hua Wu, Weiping Wang, Haifeng Wang

Abstract: Although scaling up the number of trainable parameters in both pre-training and fine-tuning can effectively improve the performance of large language models, it also leads to increased computational overhead. When delving into the parameter difference, we find that a subset of parameters, termed advantageous parameters, plays a crucial role in determining model performance. Further analysis reveal… ▽ More Although scaling up the number of trainable parameters in both pre-training and fine-tuning can effectively improve the performance of large language models, it also leads to increased computational overhead. When delving into the parameter difference, we find that a subset of parameters, termed advantageous parameters, plays a crucial role in determining model performance. Further analysis reveals that stronger models tend to possess more such parameters. In this paper, we propose Advantageous Parameter EXpansion Training (APEX), a method that progressively expands advantageous parameters into the space of disadvantageous ones, thereby increasing their proportion and enhancing training effectiveness. Further theoretical analysis from the perspective of matrix effective rank explains the performance gains of APEX. Extensive experiments on both instruction tuning and continued pre-training demonstrate that, in instruction tuning, APEX outperforms full-parameter tuning while using only 52% of the trainable parameters. In continued pre-training, APEX achieves the same perplexity level as conventional training with just 33% of the training data, and yields significant improvements on downstream tasks. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2505.24164 [pdf, ps, other]

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Authors: Shilin Xu, Yanwei Li, Rui Yang, Tao Zhang, Yueyi Sun, Wei Chow, Linfeng Li, Hang Song, Qi Xu, Yunhai Tong, Xiangtai Li, Hao Fei

Abstract: Recent works on large language models (LLMs) have successfully demonstrated the emergence of reasoning capabilities via reinforcement learning (RL). Although recent efforts leverage group relative policy optimization (GRPO) for MLLMs post-training, they constantly explore one specific aspect, such as grounding tasks, math problems, or chart analysis. There are no works that can leverage multi-sour… ▽ More Recent works on large language models (LLMs) have successfully demonstrated the emergence of reasoning capabilities via reinforcement learning (RL). Although recent efforts leverage group relative policy optimization (GRPO) for MLLMs post-training, they constantly explore one specific aspect, such as grounding tasks, math problems, or chart analysis. There are no works that can leverage multi-source MLLM tasks for stable reinforcement learning. In this work, we present a unified perspective to solve this problem. We present Mixed-R1, a unified yet straightforward framework that contains a mixed reward function design (Mixed-Reward) and a mixed post-training dataset (Mixed-45K). We first design a data engine to select high-quality examples to build the Mixed-45K post-training dataset. Then, we present a Mixed-Reward design, which contains various reward functions for various MLLM tasks. In particular, it has four different reward functions: matching reward for binary answer or multiple-choice problems, chart reward for chart-aware datasets, IoU reward for grounding problems, and open-ended reward for long-form text responses such as caption datasets. To handle the various long-form text content, we propose a new open-ended reward named Bidirectional Max-Average Similarity (BMAS) by leveraging tokenizer embedding matching between the generated response and the ground truth. Extensive experiments show the effectiveness of our proposed method on various MLLMs, including Qwen2.5-VL and Intern-VL on various sizes. Our dataset and model are available at https://github.com/xushilin1/mixed-r1. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Report number: arxiv:2505.24164

arXiv:2505.22140 [pdf, other]

Search for a dark baryon in the $Ξ^-\rightarrowπ^-+{\rm invisible}$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

Abstract: A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction… ▽ More A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction $B(Ξ^-\rightarrowπ^-+{\rm invisible})$ are determined to be $4.2\times10^{-5}$ ($5.2\times10^{-5}$), $6.9\times10^{-5}$ ($8.4\times10^{-5}$), $6.5\times10^{-4}$ ($7.6\times10^{-4}$), $1.1\times10^{-4}$ ($1.3\times10^{-4}$) and $4.5\times10^{-5}$ ($5.5\times10^{-5}$), under the dark baryon mass hypotheses of 1.07$\,\mbox{GeV}/c^2$, 1.10$\,\mbox{GeV}/c^2$, $m_Λ$ (1.116$\,\mbox{GeV}/c^2$), 1.13$\,\mbox{GeV}/c^2$, and 1.16$\,\mbox{GeV}/c^2$, respectively. The constraints obtained on the Wilson coefficients $C_{u s, s}^L$ and $C_{u s, s}^R$ are more stringent than the previous limits derived from the LHC searches for the colored mediators. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: 11 pages, 4 figures, 1 table

arXiv:2505.21946 [pdf, ps, other]

doi 10.1145/3731198

Fluid Simulation on Vortex Particle Flow Maps

Authors: Sinan Wang, Junwei Zhou, Fan Feng, Zhiqi Li, Yuchen Sun, Duowen Chen, Greg Turk, Bo Zhu

Abstract: We propose the Vortex Particle Flow Map (VPFM) method to simulate incompressible flow with complex vortical evolution in the presence of dynamic solid boundaries. The core insight of our approach is that vorticity is an ideal quantity for evolution on particle flow maps, enabling significantly longer flow map distances compared to other fluid quantities like velocity or impulse. To achieve this go… ▽ More We propose the Vortex Particle Flow Map (VPFM) method to simulate incompressible flow with complex vortical evolution in the presence of dynamic solid boundaries. The core insight of our approach is that vorticity is an ideal quantity for evolution on particle flow maps, enabling significantly longer flow map distances compared to other fluid quantities like velocity or impulse. To achieve this goal, we developed a hybrid Eulerian-Lagrangian representation that evolves vorticity and flow map quantities on vortex particles, while reconstructing velocity on a background grid. The method integrates three key components: (1) a vorticity-based particle flow map framework, (2) an accurate Hessian evolution scheme on particles, and (3) a solid boundary treatment for no-through and no-slip conditions in VPFM. These components collectively allow a substantially longer flow map length (3-12 times longer) than the state-of-the-art, enhancing vorticity preservation over extended spatiotemporal domains. We validated the performance of VPFM through diverse simulations, demonstrating its effectiveness in capturing complex vortex dynamics and turbulence phenomena. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: ACM Transactions on Graphics (SIGGRAPH 2025), 24 pages

arXiv:2505.21502 [pdf, ps, other]

Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis

Authors: Yipengjing Sun, Chenyang Wang, Shunyuan Zheng, Zonglin Li, Shengping Zhang, Xiangyang Ji

Abstract: We propose GRGS, a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions. Unlike existing methods that rely on per-character optimization or ignore physical constraints, GRGS adopts a feed-forward, fully supervised strategy that projects geometry, material, and illumination cues from multi-view 2D observations into 3D Gau… ▽ More We propose GRGS, a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions. Unlike existing methods that rely on per-character optimization or ignore physical constraints, GRGS adopts a feed-forward, fully supervised strategy that projects geometry, material, and illumination cues from multi-view 2D observations into 3D Gaussian representations. Specifically, to reconstruct lighting-invariant geometry, we introduce a Lighting-aware Geometry Refinement (LGR) module trained on synthetically relit data to predict accurate depth and surface normals. Based on the high-quality geometry, a Physically Grounded Neural Rendering (PGNR) module is further proposed to integrate neural prediction with physics-based shading, supporting editable relighting with shadows and indirect illumination. Besides, we design a 2D-to-3D projection training scheme that leverages differentiable supervision from ambient occlusion, direct, and indirect lighting maps, which alleviates the computational cost of explicit ray tracing. Extensive experiments demonstrate that GRGS achieves superior visual quality, geometric consistency, and generalization across characters and lighting conditions. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Project Webpage: https://sypj-98.github.io/grgs/

arXiv:2505.21277 [pdf, ps, other]

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Authors: Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei

Abstract: Large Language Models (LLMs), despite advanced general capabilities, still suffer from numerous safety risks, especially jailbreak attacks that bypass safety protocols. Understanding these vulnerabilities through black-box jailbreak attacks, which better reflect real-world scenarios, offers critical insights into model robustness. While existing methods have shown improvements through various prom… ▽ More Large Language Models (LLMs), despite advanced general capabilities, still suffer from numerous safety risks, especially jailbreak attacks that bypass safety protocols. Understanding these vulnerabilities through black-box jailbreak attacks, which better reflect real-world scenarios, offers critical insights into model robustness. While existing methods have shown improvements through various prompt engineering techniques, their success remains limited against safety-aligned models, overlooking a more fundamental problem: the effectiveness is inherently bounded by the predefined strategy spaces. However, expanding this space presents significant challenges in both systematically capturing essential attack patterns and efficiently navigating the increased complexity. To better explore the potential of expanding the strategy space, we address these challenges through a novel framework that decomposes jailbreak strategies into essential components based on the Elaboration Likelihood Model (ELM) theory and develops genetic-based optimization with intention evaluation mechanisms. To be striking, our experiments reveal unprecedented jailbreak capabilities by expanding the strategy space: we achieve over 90% success rate on Claude-3.5 where prior methods completely fail, while demonstrating strong cross-model transferability and surpassing specialized safeguard models in evaluation accuracy. The code is open-sourced at: https://github.com/Aries-iai/CL-GSO. △ Less

Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

Comments: 19 pages, 20 figures, accepted by ACL 2025, Findings

arXiv:2505.20771 [pdf, ps, other]

Bridging the Gap: Self-Optimized Fine-Tuning for LLM-based Recommender Systems

Authors: Heng Tang, Feng Liu, Xinbo Chen, Jiawei Chen, Bohao Wang, Changwang Zhang, Jun Wang, Yuegang Sun, Bingde Hu, Can Wang

Abstract: Recent years have witnessed extensive exploration of Large Language Models (LLMs) on the field of Recommender Systems (RS). There are currently two commonly used strategies to enable LLMs to have recommendation capabilities: 1) The "Guidance-Only" strategy uses in-context learning to exploit and amplify the inherent semantic understanding and item recommendation capabilities of LLMs; 2) The "Tunin… ▽ More Recent years have witnessed extensive exploration of Large Language Models (LLMs) on the field of Recommender Systems (RS). There are currently two commonly used strategies to enable LLMs to have recommendation capabilities: 1) The "Guidance-Only" strategy uses in-context learning to exploit and amplify the inherent semantic understanding and item recommendation capabilities of LLMs; 2) The "Tuning-Only" strategy uses supervised fine-tuning (SFT) to fine-tune LLMs with the aim of fitting them to real recommendation data. However, neither of these strategies can effectively bridge the gap between the knowledge space of LLMs and recommendation, and their performance do not meet our expectations. To better enable LLMs to learn recommendation knowledge, we combine the advantages of the above two strategies and proposed a novel "Guidance+Tuning" method called Self-Optimized Fine-Tuning (SOFT), which adopts the idea of curriculum learning. It first employs self-distillation to construct an auxiliary easy-to-learn but meaningful dataset from a fine-tuned LLM. Then it further utilizes a self-adaptive curriculum scheduler to enable LLMs to gradually learn from simpler data (self-distilled data) to more challenging data (real RS data). Extensive experiments demonstrate that SOFT significantly enhances the recommendation accuracy (37.59\% on average) of LLM-based methods. The code is available via https://anonymous.4open.science/r/Self-Optimized-Fine-Tuning-264E △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.20602 [pdf, ps, other]

Connecting randomized iterative methods with Krylov subspaces

Authors: Yonghan Sun, Deren Han, Jiaxin Xie

Abstract: Randomized iterative methods, such as the randomized Kaczmarz method, have gained significant attention for solving large-scale linear systems due to their simplicity and efficiency. Meanwhile, Krylov subspace methods have emerged as a powerful class of algorithms, known for their robust theoretical foundations and rapid convergence properties. Despite the individual successes of these two paradig… ▽ More Randomized iterative methods, such as the randomized Kaczmarz method, have gained significant attention for solving large-scale linear systems due to their simplicity and efficiency. Meanwhile, Krylov subspace methods have emerged as a powerful class of algorithms, known for their robust theoretical foundations and rapid convergence properties. Despite the individual successes of these two paradigms, their underlying connection has remained largely unexplored. In this paper, we develop a unified framework that bridges randomized iterative methods and Krylov subspace techniques, supported by both rigorous theoretical analysis and practical implementation. The core idea is to formulate each iteration as an adaptively weighted linear combination of the sketched normal vector and previous iterates, with the weights optimally determined via a projection-based mechanism. This formulation not only reveals how subspace techniques can enhance the efficiency of randomized iterative methods, but also enables the design of a new class of iterative-sketching-based Krylov subspace algorithms. We prove that our method converges linearly in expectation and validate our findings with numerical experiments. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.20513 [pdf, other]

MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning

Authors: Wenhao Gu, Li Gu, Ching Yee Suen, Yang Wang

Abstract: Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this ga… ▽ More Recent advancements in handwritten text recognition (HTR) have enabled the effective conversion of handwritten text to digital formats. However, achieving robust recognition across diverse writing styles remains challenging. Traditional HTR methods lack writer-specific personalization at test time due to limitations in model architecture and training strategies. Existing attempts to bridge this gap, through gradient-based meta-learning, still require labeled examples and suffer from parameter-inefficient fine-tuning, leading to substantial computational and memory overhead. To overcome these challenges, we propose an efficient framework that formulates personalization as prompt tuning, incorporating an auxiliary image reconstruction task with a self-supervised loss to guide prompt adaptation with unlabeled test-time examples. To ensure self-supervised loss effectively minimizes text recognition error, we leverage meta-learning to learn the optimal initialization of the prompts. As a result, our method allows the model to efficiently capture unique writing styles by updating less than 1% of its parameters and eliminating the need for time-intensive annotation processes. We validate our approach on the RIMES and IAM Handwriting Database benchmarks, where it consistently outperforms previous state-of-the-art methods while using 20x fewer parameters. We believe this represents a significant advancement in personalized handwritten text recognition, paving the way for more reliable and practical deployment in resource-constrained scenarios. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: CVPR2025

arXiv:2505.20510 [pdf, other]

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

Authors: Yuxuan Sun, Yixuan Si, Chenglu Zhu, Kai Zhang, Zhongyi Shui, Bowen Ding, Tao Lin, Lin Yang

Abstract: Recent advances in computational pathology have led to the emergence of numerous foundation models. However, these approaches fail to replicate the diagnostic process of pathologists, as they either simply rely on general-purpose encoders with multi-instance learning for classification or directly apply multimodal models to generate reports from images. A significant limitation is their inability… ▽ More Recent advances in computational pathology have led to the emergence of numerous foundation models. However, these approaches fail to replicate the diagnostic process of pathologists, as they either simply rely on general-purpose encoders with multi-instance learning for classification or directly apply multimodal models to generate reports from images. A significant limitation is their inability to emulate the diagnostic logic employed by pathologists, who systematically examine slides at low magnification for overview before progressively zooming in on suspicious regions to formulate comprehensive diagnoses. To address this gap, we introduce CPathAgent, an innovative agent-based model that mimics pathologists' reasoning processes by autonomously executing zoom-in/out and navigation operations across pathology images based on observed visual features. To achieve this, we develop a multi-stage training strategy unifying patch-level, region-level, and whole-slide capabilities within a single model, which is essential for mimicking pathologists, who require understanding and reasoning capabilities across all three scales. This approach generates substantially more detailed and interpretable diagnostic reports compared to existing methods, particularly for huge region understanding. Additionally, we construct an expert-validated PathMMU-HR$^{2}$, the first benchmark for huge region analysis, a critical intermediate scale between patches and whole slides, as diagnosticians typically examine several key regions rather than entire slides at once. Extensive experiments demonstrate that CPathAgent consistently outperforms existing approaches across three scales of benchmarks, validating the effectiveness of our agent-based diagnostic approach and highlighting a promising direction for the future development of computational pathology. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 49 pages, 33 figures

arXiv:2505.20437 [pdf, ps, other]

Rough backward SDEs with discontinuous Young drivers

Authors: Dirk Becherer, Yuchen Sun

Abstract: We study solutions to backward differential equations that are driven hybridly by a deterministic discontinuous rough path $W$ of finite $q$-variation for $q \in [1, 2)$ and by Brownian motion $B$. To distinguish between integration of jumps in a forward- or Marcus-sense, we refer to these equations as forward- respectively Marcus-type rough backward stochastic differential equations (RBSDEs). We… ▽ More We study solutions to backward differential equations that are driven hybridly by a deterministic discontinuous rough path $W$ of finite $q$-variation for $q \in [1, 2)$ and by Brownian motion $B$. To distinguish between integration of jumps in a forward- or Marcus-sense, we refer to these equations as forward- respectively Marcus-type rough backward stochastic differential equations (RBSDEs). We establish global well-posedness by proving global apriori bounds for solutions and employing fixed-point arguments locally. Furthermore, we lift the RBSDE solution and the driving rough noise to the space of decorated paths endowed with a Skorokhod-type metric and show stability of solutions with respect to perturbations of the rough noise. Finally, we prove well-posedness for a new class of backward doubly stochastic differential equations (BDSDEs), which are jointly driven by a Brownian martingale $B$ and an independent discontinuous stochastic process $L$ of finite $q$-variation. We explain, how our RBSDEs can be understood as conditional solutions to such BDSDEs, conditioned on the information generated by the path of $L$. △ Less

Submitted 26 May, 2025; originally announced May 2025.

MSC Class: 60L90; 60J76; 60H20; 60H15; 37H30

arXiv:2505.20349 [pdf, ps, other]

FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation

Authors: Haixin Wang, Ruoyan Li, Fred Xu, Fang Sun, Kaiqiao Han, Zijie Huang, Guancheng Wan, Ching Chang, Xiao Luo, Wei Wang, Yizhou Sun

Abstract: Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we… ▽ More Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained generalization analysis across resolutions, initial conditions, and temporal windows; and (4) a user-friendly, extensible codebase to support future research. Through rigorous empirical studies, FD-Bench establishes the most comprehensive leaderboard to date, resolving long-standing issues in reproducibility and comparability, and laying a foundation for robust evaluation of future data-driven fluid models. The code is open-sourced at https://anonymous.4open.science/r/FD-Bench-15BC. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 31 pages, 18 figures, paper under review

arXiv:2505.20293 [pdf, ps, other]

Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery

Authors: Yifan Sun, Danding Wang, Qiang Sheng, Juan Cao, Jintao Li

Abstract: Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce… ▽ More Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce explanations that are not intuitively comprehensible to humans, potentially diminishing user trust. These methods fall short of discovering comprehensible concepts automatically. To address this issue, we propose \textbf{ECO-Concept}, an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations. ECO-Concept first utilizes an object-centric architecture to extract semantic concepts automatically. Then the comprehensibility of the extracted concepts is evaluated by large language models. Finally, the evaluation result guides the subsequent model fine-tuning to obtain more understandable explanations. Experiments show that our method achieves superior performance across diverse tasks. Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: ACL 2025 Findings

arXiv:2505.20131 [pdf, ps, other]

MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning

Authors: Yuanxin Zhuang, Dazhong Shen, Ying Sun

Abstract: Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose Mo… ▽ More Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74\% improvement in editing success rate while using 98\% fewer parameters. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19907 [pdf, ps, other]

First measurement of $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ cross-sections via $Σ^+$-nucleus scattering at an electron-positron collider

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

Abstract: Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals o… ▽ More Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals of these two reactions are observed for the first time. Their cross-sections are measured to be $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΛ+p+{^8\rm{Be}})=(45.2\pm12.1_{\rm{stat}}\pm7.2_{\rm{sys}})$ mb and $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΣ^{0}+p+{^8\rm{Be}})=(29.8\pm9.7_{\rm{stat}}\pm6.9_{\rm{sys}})$ mb for a $Σ^{+}$ average momentum of $0.992$ GeV/$c$, within a range of $\pm0.015$ GeV/$c$. This is the first study of $Σ^{+}$-nucleon scattering at an electron-positron collider. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 9 pages, 2 figures

arXiv:2505.19699 [pdf, ps, other]

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

Authors: Junming Liu, Yanting Gao, Siyuan Meng, Yifei Sun, Aoqi Wu, Yufei Jin, Yirong Chen, Ding Wang, Guosun Zeng

Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosai… ▽ More Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments. Mosaic first trains local generative models to approximate each client's personalized distribution, enabling synthetic data generation that safeguards privacy through strict separation from real data. Subsequently, Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data. To further enhance the MoE architecture, Mosaic integrates expert predictions via a lightweight meta model trained on a few representative prototypes. Extensive experiments on standard image classification benchmarks demonstrate that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. The source code has been published at https://github.com/Wings-Of-Disaster/Mosaic. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 43 pages, 23 figures, 15 tables; the last dance

arXiv:2505.19640 [pdf, other]

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Authors: Roy Xie, David Qiu, Deepak Gopinath, Dong Lin, Yanchao Sun, Chong Wang, Saloni Potdar, Bhuwan Dhingra

Abstract: Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inhe… ▽ More Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability to perform interleaved reasoning, which can be further enhanced through RL. We introduce a simple yet effective rule-based reward to incentivize correct intermediate steps, which guides the policy model toward correct reasoning paths by leveraging intermediate signals generated during interleaved reasoning. Extensive experiments conducted across five diverse datasets and three RL algorithms (PPO, GRPO, and REINFORCE++) demonstrate consistent improvements over traditional think-answer reasoning, without requiring external tools. Specifically, our approach reduces TTFT by over 80% on average and improves up to 19.3% in Pass@1 accuracy. Furthermore, our method, trained solely on question answering and logical reasoning datasets, exhibits strong generalization ability to complex reasoning datasets such as MATH, GPQA, and MMLU. Additionally, we conduct in-depth analysis to reveal several valuable insights into conditional reward modeling. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19612 [pdf, ps, other]

Optimal Intervention for Self-triggering Spatial Networks with Application to Urban Crime Analytics

Authors: Pramit Das, Moulinath Banerjee, Yuekai Sun

Abstract: In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such… ▽ More In many network systems, events at one node trigger further activity at other nodes, e.g., social media users reacting to each other's posts or the clustering of criminal activity in urban environments. These systems are typically referred to as self-exciting networks. In such systems, targeted intervention at critical nodes can be an effective strategy for mitigating undesirable consequences such as further propagation of criminal activity or the spreading of misinformation on social media. In our work, we develop an optimal network intervention model to explore how targeted interventions at critical nodes can mitigate cascading effects throughout a Spatiotemporal Hawkes network. Similar models have been studied previously in the literature in purely temporal Hawkes networks, but in our work, we extend them to a spatiotemporal setup and demonstrate the efficacy of our methods by comparing the post-intervention reduction in intensity to other heuristic strategies in simulated networks. Subsequently, we use our method on crime data from the LA police department database to find neighborhoods for strategic intervention to demonstrate an application in predictive policing. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19597 [pdf, ps, other]

A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Authors: Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu

Abstract: Although deep learning based multi-channel speech enhancement has achieved significant advancements, its practical deployment is often limited by constrained computational resources, particularly in low signal-to-noise ratio (SNR) conditions. In this paper, we propose a lightweight hybrid dual-channel speech enhancement system that combines independent vector analysis (IVA) with a modified version… ▽ More Although deep learning based multi-channel speech enhancement has achieved significant advancements, its practical deployment is often limited by constrained computational resources, particularly in low signal-to-noise ratio (SNR) conditions. In this paper, we propose a lightweight hybrid dual-channel speech enhancement system that combines independent vector analysis (IVA) with a modified version of the dual-channel grouped temporal convolutional recurrent network (GTCRN). IVA functions as a coarse estimator, providing auxiliary information for both speech and noise, while the modified GTCRN further refines the speech quality. We investigate several modifications to ensure the comprehensive utilization of both original and auxiliary information. Experimental results demonstrate the effectiveness of the proposed system, achieving enhanced speech with minimal parameters and low computational complexity. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Accepted by Interspeech 2025

arXiv:2505.19490 [pdf, other]

Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models

Authors: Jianxing Liao, Junyan Xu, Yatao Sun, Maowen Tang, Sicheng He, Jingxian Liao, Shui Yu, Yun Li, Hongguan Xiao

Abstract: Designing complex computer-aided design (CAD) models is often time-consuming due to challenges such as computational inefficiency and the difficulty of generating precise models. We propose a novel language-guided framework for industrial design automation to address these issues, integrating large language models (LLMs) with computer-automated design (CAutoD).Through this framework, CAD models ar… ▽ More Designing complex computer-aided design (CAD) models is often time-consuming due to challenges such as computational inefficiency and the difficulty of generating precise models. We propose a novel language-guided framework for industrial design automation to address these issues, integrating large language models (LLMs) with computer-automated design (CAutoD).Through this framework, CAD models are automatically generated from parameters and appearance descriptions, supporting the automation of design tasks during the detailed CAD design phase. Our approach introduces three key innovations: (1) a semi-automated data annotation pipeline that leverages LLMs and vision-language large models (VLLMs) to generate high-quality parameters and appearance descriptions; (2) a Transformer-based CAD generator (TCADGen) that predicts modeling sequences via dual-channel feature aggregation; (3) an enhanced CAD modeling generation model, called CADLLM, that is designed to refine the generated sequences by incorporating the confidence scores from TCADGen. Experimental results demonstrate that the proposed approach outperforms traditional methods in both accuracy and efficiency, providing a powerful tool for automating industrial workflows and generating complex CAD models from textual prompts. The code is available at https://jianxliao.github.io/cadllm-page/ △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Accepted by ACL 2025 Main Conference

ACM Class: I.2.7; I.2.6

arXiv:2505.19473 [pdf, ps, other]

Improving Recommendation Fairness without Sensitive Attributes Using Multi-Persona LLMs

Authors: Haoran Xin, Ying Sun, Chao Wang, Yanke Yu, Weijia Zhang, Hui Xiong

Abstract: Despite the success of recommender systems in alleviating information overload, fairness issues have raised concerns in recent years, potentially leading to unequal treatment for certain user groups. While efforts have been made to improve recommendation fairness, they often assume that users' sensitive attributes are available during model training. However, collecting sensitive information can b… ▽ More Despite the success of recommender systems in alleviating information overload, fairness issues have raised concerns in recent years, potentially leading to unequal treatment for certain user groups. While efforts have been made to improve recommendation fairness, they often assume that users' sensitive attributes are available during model training. However, collecting sensitive information can be difficult, especially on platforms that involve no personal information disclosure. Therefore, we aim to improve recommendation fairness without any access to sensitive attributes. However, this is a non-trivial task because uncovering latent sensitive patterns from complicated user behaviors without explicit sensitive attributes can be difficult. Consequently, suboptimal estimates of sensitive distributions can hinder the fairness training process. To address these challenges, leveraging the remarkable reasoning abilities of Large Language Models (LLMs), we propose a novel LLM-enhanced framework for Fair recommendation withOut Sensitive Attributes (LLMFOSA). A Multi-Persona Sensitive Information Inference module employs LLMs with distinct personas that mimic diverse human perceptions to infer and distill sensitive information. Furthermore, a Confusion-Aware Sensitive Representation Learning module incorporates inference results and rationales to develop robust sensitive representations, considering the mislabeling confusion and collective consensus among agents. The model is then optimized by a formulated mutual information objective. Extensive experiments on two public datasets validate the effectiveness of LLMFOSA in improving fairness. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 18 pages, 9 figures

arXiv:2505.19464 [pdf, ps, other]

LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach

Authors: Haoran Xin, Ying Sun, Chao Wang, Weijia Zhang, Hui Xiong

Abstract: Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align w… ▽ More Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align with LLMs' semantic space. We achieve this by retrieving a curated set of the most relevant user behaviors in natural language form. However, identifying informative CI is challenging due to the complexity of similarity and utility assessment. To tackle this, we introduce a Self-assessing COllaborative REtrieval framework (SCORE) following the retrieve-rerank paradigm. First, a Collaborative Retriever (CAR) is developed to consider both collaborative patterns and semantic similarity. Then, a Self-assessing Reranker (SARE) leverages LLMs' own reasoning to assess and prioritize retrieved behaviors. Finally, the selected behaviors are prepended to the LLM prompt as natural-language CI to guide recommendation. Extensive experiments on two public datasets validate the effectiveness of SCORE in improving LLM-based recommendation. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 13 pages, 6 figures

arXiv:2505.19432 [pdf, ps, other]

Advanced long-term earth system forecasting by learning the small-scale nature

Authors: Hao Wu, Yuan Gao, Ruiqi Shu, Kun Wang, Ruijian Gou, Chuhan Wu, Xinliang Liu, Juncai He, Shuhao Cao, Junfeng Fang, Xingjian Shi, Feng Tao, Qi Song, Shengxuan Ji, Yanfei Xiang, Yuze Sun, Jiahao Li, Fan Xu, Huanshuo Dong, Haixin Wang, Fan Zhang, Penghao Zhao, Xian Wu, Qingsong Wen, Deliang Chen , et al. (1 additional authors not shown)

Abstract: Reliable long-term forecast of Earth system dynamics is heavily hampered by instabilities in current AI models during extended autoregressive simulations. These failures often originate from inherent spectral bias, leading to inadequate representation of critical high-frequency, small-scale processes and subsequent uncontrolled error amplification. We present Triton, an AI framework designed to ad… ▽ More Reliable long-term forecast of Earth system dynamics is heavily hampered by instabilities in current AI models during extended autoregressive simulations. These failures often originate from inherent spectral bias, leading to inadequate representation of critical high-frequency, small-scale processes and subsequent uncontrolled error amplification. We present Triton, an AI framework designed to address this fundamental challenge. Inspired by increasing grids to explicitly resolve small scales in numerical models, Triton employs a hierarchical architecture processing information across multiple resolutions to mitigate spectral bias and explicitly model cross-scale dynamics. We demonstrate Triton's superior performance on challenging forecast tasks, achieving stable year-long global temperature forecasts, skillful Kuroshio eddy predictions till 120 days, and high-fidelity turbulence simulations preserving fine-scale structures all without external forcing, with significantly surpassing baseline AI models in long-term stability and accuracy. By effectively suppressing high-frequency error accumulation, Triton offers a promising pathway towards trustworthy AI-driven simulation for climate and earth system science. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.19371 [pdf, other]

Foundations of Top-$k$ Decoding For Language Models

Authors: Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban

Abstract: Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledg… ▽ More Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization). △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.19009 [pdf, ps, other]

Capturing Aperiodic Temporal Dynamics of EEG Signals through Stochastic Fluctuation Modeling

Authors: Yuhao Sun, Zhiyuan Ma, Xinke Shen, Jinhao Li, Guan Wang, Sen Song

Abstract: Electrophysiological brain signals, such as electroencephalography (EEG), exhibit both periodic and aperiodic components, with the latter often modeled as 1/f noise and considered critical to cognitive and neurological processes. Although various theoretical frameworks have been proposed to account for aperiodic activity, its scale-invariant and long-range temporal dependency remain insufficiently… ▽ More Electrophysiological brain signals, such as electroencephalography (EEG), exhibit both periodic and aperiodic components, with the latter often modeled as 1/f noise and considered critical to cognitive and neurological processes. Although various theoretical frameworks have been proposed to account for aperiodic activity, its scale-invariant and long-range temporal dependency remain insufficiently explained. Drawing on neural fluctuation theory, we propose a novel framework that parameterizes intrinsic stochastic neural fluctuations to account for aperiodic dynamics. Within this framework, we introduce two key parameters-self-similarity and scale factor-to characterize these fluctuations. Our findings reveal that EEG fluctuations exhibit self-similar and non-stable statistical properties, challenging the assumptions of conventional stochastic models in neural dynamical modeling. Furthermore, the proposed parameters enable the reconstruction of EEG-like signals that faithfully replicate the aperiodic spectrum, including the characteristic 1/f spectral profile, and long range dependency. By linking structured neural fluctuations to empirically observed aperiodic EEG activity, this work offers deeper mechanistic insights into brain dynamics, resulting in a more robust biomarker candidate than the traditional 1/f slope, and provides a computational methodology for generating biologically plausible neurophysiological signals. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.18812 [pdf, ps, other]

SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models

Authors: Ye Sun, Hao Zhang, Henghui Ding, Tiehua Zhang, Xingjun Ma, Yu-Gang Jiang

Abstract: Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most exis… ▽ More Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most existing approaches tackle these tasks in isolation, limiting progress toward unified, referentially grounded video interaction. We identify a key bottleneck in the lack of high-quality, unified video instruction data and a comprehensive benchmark for evaluating referentially grounded video chat. To address these challenges, we contribute in three core aspects: dataset, model, and benchmark. First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically curated to enable joint learning of video referring understanding, grounding, and multi-turn video chat. Second, we propose the SAMA model, which incorporates a versatile spatio-temporal context aggregator and a Segment Anything Model to jointly enhance fine-grained video comprehension and precise grounding capabilities. Finally, we establish SAMA-Bench, a meticulously designed benchmark consisting of 5,067 questions from 522 videos, to comprehensively evaluate the integrated capabilities of Video LMMs in multi-turn, spatio-temporal referring understanding and grounded dialogue. Extensive experiments and benchmarking results show that SAMA not only achieves strong performance on SAMA-Bench but also sets a new state-of-the-art on general grounding benchmarks, while maintaining highly competitive performance on standard visual understanding benchmarks. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.18355 [pdf, ps, other]

X-MethaneWet: A Cross-scale Global Wetland Methane Emission Benchmark Dataset for Advancing Science Discovery with AI

Authors: Yiming Sun, Shuo Chen, Shengyu Chen, Chonghao Qiu, Licheng Liu, Youmi Oh, Sparkle L. Malone, Gavin McNicol, Qianlai Zhuang, Chris Smith, Yiqun Xie, Xiaowei Jia

Abstract: Methane (CH$_4$) is the second most powerful greenhouse gas after carbon dioxide and plays a crucial role in climate change due to its high global warming potential. Accurately modeling CH$_4$ fluxes across the globe and at fine temporal scales is essential for understanding its spatial and temporal variability and developing effective mitigation strategies. In this work, we introduce the first-of… ▽ More Methane (CH$_4$) is the second most powerful greenhouse gas after carbon dioxide and plays a crucial role in climate change due to its high global warming potential. Accurately modeling CH$_4$ fluxes across the globe and at fine temporal scales is essential for understanding its spatial and temporal variability and developing effective mitigation strategies. In this work, we introduce the first-of-its-kind cross-scale global wetland methane benchmark dataset (X-MethaneWet), which synthesizes physics-based model simulation data from TEM-MDM and the real-world observation data from FLUXNET-CH$_4$. This dataset can offer opportunities for improving global wetland CH$_4$ modeling and science discovery with new AI algorithms. To set up AI model baselines for methane flux prediction, we evaluate the performance of various sequential deep learning models on X-MethaneWet. Furthermore, we explore four different transfer learning techniques to leverage simulated data from TEM-MDM to improve the generalization of deep learning models on real-world FLUXNET-CH$_4$ observations. Our extensive experiments demonstrate the effectiveness of these approaches, highlighting their potential for advancing methane emission modeling and contributing to the development of more accurate and scalable AI-driven climate models. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 8 pages, 8 figures, 3 tables

arXiv:2505.18302 [pdf, ps, other]

Sampling Strategies for Efficient Training of Deep Learning Object Detection Algorithms

Authors: Gefei Shen, Yung-Hong Sun, Yu Hen Hu, Hongrui Jiang

Abstract: Two sampling strategies are investigated to enhance efficiency in training a deep learning object detection model. These sampling strategies are employed under the assumption of Lipschitz continuity of deep learning models. The first strategy is uniform sampling which seeks to obtain samples evenly yet randomly through the state space of the object dynamics. The second strategy of frame difference… ▽ More Two sampling strategies are investigated to enhance efficiency in training a deep learning object detection model. These sampling strategies are employed under the assumption of Lipschitz continuity of deep learning models. The first strategy is uniform sampling which seeks to obtain samples evenly yet randomly through the state space of the object dynamics. The second strategy of frame difference sampling is developed to explore the temporal redundancy among successive frames in a video. Experiment result indicates that these proposed sampling strategies provide a dataset that yields good training performance while requiring relatively few manually labelled samples. △ Less

Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.18154 [pdf, ps, other]

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas

Authors: Ya Wu, Qiang Sheng, Danding Wang, Guang Yang, Yifan Sun, Zhengjia Wang, Yuyan Bu, Juan Cao

Abstract: Ethical decision-making is a critical aspect of human judgment, and the growing use of LLMs in decision-support systems necessitates a rigorous evaluation of their moral reasoning capabilities. However, existing assessments primarily rely on single-step evaluations, failing to capture how models adapt to evolving ethical challenges. Addressing this gap, we introduce the Multi-step Moral Dilemmas (… ▽ More Ethical decision-making is a critical aspect of human judgment, and the growing use of LLMs in decision-support systems necessitates a rigorous evaluation of their moral reasoning capabilities. However, existing assessments primarily rely on single-step evaluations, failing to capture how models adapt to evolving ethical challenges. Addressing this gap, we introduce the Multi-step Moral Dilemmas (MMDs), the first dataset specifically constructed to evaluate the evolving moral judgments of LLMs across 3,302 five-stage dilemmas. This framework enables a fine-grained, dynamic analysis of how LLMs adjust their moral reasoning across escalating dilemmas. Our evaluation of nine widely used LLMs reveals that their value preferences shift significantly as dilemmas progress, indicating that models recalibrate moral judgments based on scenario complexity. Furthermore, pairwise value comparisons demonstrate that while LLMs often prioritize the value of care, this value can sometimes be superseded by fairness in certain contexts, highlighting the dynamic and context-dependent nature of LLM ethical reasoning. Our findings call for a shift toward dynamic, context-aware evaluation paradigms, paving the way for more human-aligned and value-sensitive development of LLMs. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 25 pages, 8 figures

arXiv:2505.18021 [pdf, other]

Building Floor Number Estimation from Crowdsourced Street-Level Images: Munich Dataset and Baseline Method

Authors: Yao Sun, Sining Chen, Yifan Tian, Xiao Xiang Zhu

Abstract: Accurate information on the number of building floors, or above-ground storeys, is essential for household estimation, utility provision, risk assessment, evacuation planning, and energy modeling. Yet large-scale floor-count data are rarely available in cadastral and 3D city databases. This study proposes an end-to-end deep learning framework that infers floor numbers directly from unrestricted, c… ▽ More Accurate information on the number of building floors, or above-ground storeys, is essential for household estimation, utility provision, risk assessment, evacuation planning, and energy modeling. Yet large-scale floor-count data are rarely available in cadastral and 3D city databases. This study proposes an end-to-end deep learning framework that infers floor numbers directly from unrestricted, crowdsourced street-level imagery, avoiding hand-crafted features and generalizing across diverse facade styles. To enable benchmarking, we release the Munich Building Floor Dataset, a public set of over 6800 geo-tagged images collected from Mapillary and targeted field photography, each paired with a verified storey label. On this dataset, the proposed classification-regression network attains 81.2% exact accuracy and predicts 97.9% of buildings within +/-1 floor. The method and dataset together offer a scalable route to enrich 3D city models with vertical information and lay a foundation for future work in urban informatics, remote sensing, and geographic information science. Source code and data will be released under an open license at https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: Code and data: https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark

arXiv:2505.18018 [pdf, ps, other]

ExoGait-MS: Learning Periodic Dynamics with Multi-Scale Graph Network for Exoskeleton Gait Recognition

Authors: Lijiang Liu, Junyu Shi, Yong Sun, Zhiyuan Zhang, Jinni Zhou, Shugen Ma, Qiang Nie

Abstract: Current exoskeleton control methods often face challenges in delivering personalized treatment. Standardized walking gaits can lead to patient discomfort or even injury. Therefore, personalized gait is essential for the effectiveness of exoskeleton robots, as it directly impacts their adaptability, comfort, and rehabilitation outcomes for individual users. To enable personalized treatment in exosk… ▽ More Current exoskeleton control methods often face challenges in delivering personalized treatment. Standardized walking gaits can lead to patient discomfort or even injury. Therefore, personalized gait is essential for the effectiveness of exoskeleton robots, as it directly impacts their adaptability, comfort, and rehabilitation outcomes for individual users. To enable personalized treatment in exoskeleton-assisted therapy and related applications, accurate recognition of personal gait is crucial for implementing tailored gait control. The key challenge in gait recognition lies in effectively capturing individual differences in subtle gait features caused by joint synergy, such as step frequency and step length. To tackle this issue, we propose a novel approach, which uses Multi-Scale Global Dense Graph Convolutional Networks (GCN) in the spatial domain to identify latent joint synergy patterns. Moreover, we propose a Gait Non-linear Periodic Dynamics Learning module to effectively capture the periodic characteristics of gait in the temporal domain. To support our individual gait recognition task, we have constructed a comprehensive gait dataset that ensures both completeness and reliability. Our experimental results demonstrate that our method achieves an impressive accuracy of 94.34% on this dataset, surpassing the current state-of-the-art (SOTA) by 3.77%. This advancement underscores the potential of our approach to enhance personalized gait control in exoskeleton-assisted therapy. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.18004 [pdf, ps, other]

Measurement of branching fractions of $Λ_{c}^{+}$ decays to $Σ^{+} η$ and $Σ^{+} η'$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: By analyzing $e^+e^-$ collision data taken at center-of-mass energies $\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of… ▽ More By analyzing $e^+e^-$ collision data taken at center-of-mass energies $\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of $Λ_{c}^+ \rightarrow Σ^+ η$ relative to $Λ_{c}^+ \rightarrow Σ^+ π^0$ is determined to be $0.305 \pm 0.046_{\rm stat.} \pm 0.007_{\rm sys.}$, and that of $Λ_{c}^+ \rightarrow Σ^+ η'$ relative to $Λ_{c}^+ \rightarrow Σ^+ ω$ is $0.336 \pm 0.094_{\rm stat.} \pm 0.037_{\rm sys.}$. The ratio of $\frac{\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} η'\right)}{\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} η\right)} $ is determined to be $1.50\pm 0.48 \pm 0.17 \pm 0.21$, where the uncertainties are statistical, systematic, and from $\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} π^0\right) $ or $\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} ω\right) $, respectively. These results enrich our knowledge of charmed baryon decays. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17826 [pdf, other]

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

Authors: Xuchen Pan, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Yaliang Li, Bolin Ding, Jingren Zhou

Abstract: Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and ro… ▽ More Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: This technical report will be continuously updated as the codebase evolves. GitHub: https://github.com/modelscope/Trinity-RFT

arXiv:2505.17288 [pdf, ps, other]

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Authors: Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun

Abstract: Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model.… ▽ More Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.17249 [pdf, other]

Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning

Authors: Yuran Sun, Susu Xu, Chenguang Wang, Xilei Zhao

Abstract: Big trajectory data hold great promise for human mobility analysis, but their utility is often constrained by the absence of critical traveler attributes, particularly sociodemographic information. While prior studies have explored predicting such attributes from mobility patterns, they often overlooked underlying cognitive mechanisms and exhibited low predictive accuracy. This study introduces SI… ▽ More Big trajectory data hold great promise for human mobility analysis, but their utility is often constrained by the absence of critical traveler attributes, particularly sociodemographic information. While prior studies have explored predicting such attributes from mobility patterns, they often overlooked underlying cognitive mechanisms and exhibited low predictive accuracy. This study introduces SILIC, short for Sociodemographic Inference with LLM-guided Inverse Reinforcement Learning (IRL) and Cognitive Chain Reasoning (CCR), a theoretically grounded framework that leverages LLMs to infer sociodemographic attributes from observed mobility patterns by capturing latent behavioral intentions and reasoning through psychological constructs. Particularly, our approach explicitly follows the Theory of Planned Behavior (TPB), a foundational behavioral framework in transportation research, to model individuals' latent cognitive processes underlying travel decision-making. The LLMs further provide heuristic guidance to improve IRL reward function initialization and update by addressing its ill-posedness and optimization challenges arising from the vast and unstructured reward space. Evaluated in the 2017 Puget Sound Regional Council Household Travel Survey, our method substantially outperforms state-of-the-art baselines and shows great promise for enriching big trajectory data to support more behaviorally grounded applications in transportation planning and beyond. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.17133 [pdf, ps, other]

Learning Probabilities of Causation from Finite Population Data

Authors: Shuai Wang, Song Jiang, Yizhou Sun, Judea Pearl, Ang Li

Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with \textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities of causation: the probability of necessity and sufficiency (PNS), the probabil… ▽ More Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with \textbf{insufficient} data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities of causation: the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unavailable or impractical to obtain with limited population-level data. Therefore, for most subgroups, the amount of data they have is not enough to guarantee the accuracy of their probabilities. Hence, to estimate these probabilities for subpopulations with \textbf{insufficient} data, we propose using machine learning models that draw insights from subpopulations with sufficient data. Our evaluation of multiple machine learning models indicates that, given the population-level data and an appropriate choice of machine learning model and activation function, PNS can be effectively predicted. Through simulation studies on multiple Structured Causal Models (SCMs), we show that our multilayer perceptron (MLP) model with the Mish activation function achieves a mean absolute error (MAE) of approximately $0.02$ in predicting PNS for $32,768$ subpopulations across most SCMs using data from only $2,000$ subpopulations with known PNS values. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: arXiv admin note: text overlap with arXiv:2502.08858

arXiv:2505.16643 [pdf, other]

From Evaluation to Defense: Advancing Safety in Video Large Language Models

Authors: Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie

Abstract: While the safety risks of image-based large language models have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce \textbf{VideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety}, which compromises 77,646 video-query pairs and spans 19 principal ri… ▽ More While the safety risks of image-based large language models have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce \textbf{VideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety}, which compromises 77,646 video-query pairs and spans 19 principal risk categories across 10 language communities. \textit{We reveal that integrating video modality degrades safety performance by an average of 42.3\%, exposing systemic risks in multimodal attack exploitation.} To address this vulnerability, we propose \textbf{VideoSafety-R1}, a dual-stage framework achieving unprecedented safety gains through two innovations: (1) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives. (2) Then, Safety-Guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from passive harm recognition to active reasoning. The resulting framework achieves a 65.1\% improvement on VSB-Eval-HH, and improves by 59.1\%, 44.3\%, and 15.0\% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. \textit{Our codes are available in the supplementary materials.} \textcolor{red}{Warning: This paper contains examples of harmful language and videos, and reader discretion is recommended.} △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 49 pages, 12 figures, 17 tables

arXiv:2505.15656 [pdf, ps, other]

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Authors: Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang

Abstract: Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to t… ▽ More Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. Our comprehensive experiments, across 4 popularly used open-source models with 3B to 32B parameters and 2 downstream datasets, suggest that the extraction performance can be strikingly high: in practical settings, as much as 76.3% downstream fine-tuning data (queries) out of a total 5,000 samples can be perfectly extracted, and the success rate can increase to 94.9% in more ideal settings. We also explore a detection-based defense strategy but find it can be bypassed with improved attack. Overall, we highlight the emergency of this newly identified data breaching risk in fine-tuning, and we hope that more follow-up research could push the progress of addressing this concerning risk. The code and data used in our experiments are released at https://github.com/thu-coai/Backdoor-Data-Extraction. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 19 pages

arXiv:2505.15620 [pdf, ps, other]

Observation of $χ_{cJ}\to 3K_S^0K^\pmπ^\mp$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (678 additional authors not shown)

Abstract: By analyzing $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays $χ_{c0,1,2} \to 3K_S^0K^\pmπ^\mp$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to 3K_S^0K^\pmπ^\mp )=(7.95\pm0.50\pm0.65)\times10^{-5},$… ▽ More By analyzing $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays $χ_{c0,1,2} \to 3K_S^0K^\pmπ^\mp$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to 3K_S^0K^\pmπ^\mp )=(7.95\pm0.50\pm0.65)\times10^{-5},$ $\mathcal{B}(χ_{c1}\to 3K_S^0K^\pmπ^\mp)=(2.62\pm0.08\pm0.19)\times10^{-4},$ and $\mathcal{B}(χ_{c2}\to 3K_S^0K^\pmπ^\mp)=(1.72\pm0.07\pm0.15)\times10^{-4},$ where the first uncertainties are statistical and the second systematic. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 11 pages, 6 figures

arXiv:2505.15151 [pdf, ps, other]

Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines

Authors: Xiaohou Shi, Ke Li, Aobo Liang, Yan Sun

Abstract: In the past few years, time series foundation models have achieved superior predicting accuracy. However, real-world time series often exhibit significant diversity in their temporal patterns across different time spans and domains, making it challenging for a single model architecture to fit all complex scenarios. In addition, time series data may have multiple variables exhibiting complex correl… ▽ More In the past few years, time series foundation models have achieved superior predicting accuracy. However, real-world time series often exhibit significant diversity in their temporal patterns across different time spans and domains, making it challenging for a single model architecture to fit all complex scenarios. In addition, time series data may have multiple variables exhibiting complex correlations between each other. Recent mainstream works have focused on modeling times series in a channel-independent manner in both pretraining and finetuning stages, overlooking the valuable inter-series dependencies. To this end, we propose \textbf{Time Tracker} for better predictions on multivariate time series data. Firstly, we leverage sparse mixture of experts (MoE) within Transformers to handle the modeling of diverse time series patterns, thereby alleviating the learning difficulties of a single model while improving its generalization. Besides, we propose Any-variate Attention, enabling a unified model structure to seamlessly handle both univariate and multivariate time series, thereby supporting channel-independent modeling during pretraining and channel-mixed modeling for finetuning. Furthermore, we design a graph learning module that constructs relations among sequences from frequency-domain features, providing more precise guidance to capture inter-series dependencies in channel-mixed modeling. Based on these advancements, Time Tracker achieves state-of-the-art performance in predicting accuracy, model generalization and adaptability. △ Less

Submitted 21 May, 2025; originally announced May 2025.

arXiv:2505.14988 [pdf, ps, other]

doi 10.1038/s41467-025-59498-4

Test of local realism via entangled $Λ\barΛ$ system

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (597 additional authors not shown)

Abstract: The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However… ▽ More The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However, examples of Bell inequalities violation in high energy physics are scarce. In this study, we utilize $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected with the BES-III detector at the BEPCII collider, performing non-local correlation tests using the entangled hyperon pairs. The massive-entangled $Λ\barΛ$ systems are formed and decay through strong and weak interactions, respectively. Through measurements of the angular distribution of $p\bar{p}$ in $J/ψ\to γη_c$ and subsequent $η_c\toΛ(pπ^-)\barΛ(\bar{p}π^{+})$ cascade decays, a significant violation of LHVT predictions is observed. The exclusion of LHVT is found to be statistically significant at a level exceeding $5.2σ$ in the testing of three Bell-like inequalities. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Journal ref: Nat Commun 16, 4948 (2025)

arXiv:2505.14974 [pdf, ps, other]

Full spectral response of grating-induced loss in photonic crystal microrings

Authors: Daniel Pimbi, Yi Sun, Roy Zektzer, Xiyuan Lu, Kartik Srinivasan

Abstract: Photonic crystal microrings (PhCRs) have emerged as powerful and versatile platforms for integrated nonlinear photonics, offering precise control over frequency and phase matching while maintaining high optical quality factors. Through grating-mediated mode coupling, PhCRs enable advanced dispersion engineering, which is critical for wideband nonlinear processes such as optical parametric oscillat… ▽ More Photonic crystal microrings (PhCRs) have emerged as powerful and versatile platforms for integrated nonlinear photonics, offering precise control over frequency and phase matching while maintaining high optical quality factors. Through grating-mediated mode coupling, PhCRs enable advanced dispersion engineering, which is critical for wideband nonlinear processes such as optical parametric oscillation, Kerr frequency comb generation, and dual-pump spontaneous and Bragg scattering four-wave mixing. Beyond dispersion control, PhCRs also facilitate the manipulation of orbital angular momentum (OAM) emission, a key functionality for encoding high-dimensional quantum states in emerging quantum photonic platforms. Despite these advances, the broadband spectral behavior of grating-induced losses in PhCRs remains largely unexplored, with most studies focusing on grating periods near the modal wavelength or its half. Such losses can significantly impact broadband nonlinear processes, where excess loss at unintended wavelengths can degrade device performance. In this work, we experimentally characterize grating-induced losses in PhCRs and reveal their full spectral response as a function of the ratio between modal wavelength and grating period. We identify distinct loss channels arising from either radiation or mode conversion, including a broad excess-loss region attributed to vertical out-coupling into OAM-carrying states. These observations are supported by three-dimensional finite-difference time-domain simulations and further analyzed through OAM radiation angle and phase-mismatch analysis. The resulting broadband loss spectrum highlights critical design trade-offs and provides practical guidelines for optimizing PhCR-based devices for nonlinear photonic applications involving widely separated frequencies. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14916 [pdf]

Super-Resolution Optical Coherence Tomography Using Diffusion Model-Based Plug-and-Play Priors

Authors: Yaning Wang, Jinglun Yu, Wenhan Guo, Yu Sun, Jin U. Kang

Abstract: We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal ima… ▽ More We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal images and apply a deep learning-based up-sampling pipeline to build realistic training pairs. Evaluations on in vivo and ex vivo fish-eye corneal models show that PnP-DM outperforms conventional 2D-UNet baselines, producing sharper structures and better noise suppression. This approach advances high-fidelity OCT imaging in high-speed acquisition for clinical applications. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14560 [pdf, ps, other]

Neural Inverse Scattering with Score-based Regularization

Authors: Yuan Gao, Wenhan Guo, Yu Sun

Abstract: Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the d… ▽ More Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the denoising score function used in score-based generative models. The neural field formulation offers convenient flexibility to performing joint estimation, while the denoising score function imposes the rich structural prior of images. Our results on three high-contrast simulated objects show that the proposed approach yields a better imaging quality compared to the state-of-the-art NF approach, where regularization is based on total variation. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14161 [pdf, other]

Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation

Authors: Ting Wei, Biao Mei, Junliang Lyu, Renquan Zhang, Feng Zhou, Yifan Sun

Abstract: Personalized Bayesian federated learning (PBFL) handles non-i.i.d. client data and quantifies uncertainty by combining personalization with Bayesian inference. However, existing PBFL methods face two limitations: restrictive parametric assumptions in client posterior inference and naive parameter averaging for server aggregation. To overcome these issues, we propose FedWBA, a novel PBFL method tha… ▽ More Personalized Bayesian federated learning (PBFL) handles non-i.i.d. client data and quantifies uncertainty by combining personalization with Bayesian inference. However, existing PBFL methods face two limitations: restrictive parametric assumptions in client posterior inference and naive parameter averaging for server aggregation. To overcome these issues, we propose FedWBA, a novel PBFL method that enhances both local inference and global aggregation. At the client level, we use particle-based variational inference for nonparametric posterior representation. At the server level, we introduce particle-based Wasserstein barycenter aggregation, offering a more geometrically meaningful approach. Theoretically, we provide local and global convergence guarantees for FedWBA. Locally, we prove a KL divergence decrease lower bound per iteration for variational inference convergence. Globally, we show that the Wasserstein barycenter converges to the true parameter as the client data size increases. Empirically, experiments show that FedWBA outperforms baselines in prediction accuracy, uncertainty calibration, and convergence rate, with ablation studies confirming its robustness. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14135 [pdf, other]

Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

Authors: Ruihuang Li, Caijin Zhou, Shoujian Zheng, Jianxiang Lu, Jiabin Huang, Comi Chen, Junshu Tang, Guangzheng Xu, Jiale Tao, Hongmei Wang, Donghao Li, Wenqing Yu, Senbo Wang, Zhimin Li, Yetshuan Shi, Haoyu Yang, Yukun Wang, Wenxun Dai, Jiaqi Li, Linqing Wang, Qixun Wang, Zhiyong Xu, Yingfang Zhang, Jiangfeng Xiong, Weijie Kong , et al. (33 additional authors not shown)

Abstract: Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simult… ▽ More Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simultaneously aligns with player preferences and significantly boosts designer efficiency, we present Hunyuan-Game, an innovative project designed to revolutionize intelligent game production. Hunyuan-Game encompasses two primary branches: image generation and video generation. The image generation component is built upon a vast dataset comprising billions of game images, leading to the development of a group of customized image generation models tailored for game scenarios: (1) General Text-to-Image Generation. (2) Game Visual Effects Generation, involving text-to-effect and reference image-based game visual effect generation. (3) Transparent Image Generation for characters, scenes, and game visual effects. (4) Game Character Generation based on sketches, black-and-white images, and white models. The video generation component is built upon a comprehensive dataset of millions of game and anime videos, leading to the development of five core algorithmic models, each targeting critical pain points in game development and having robust adaptation to diverse game video scenarios: (1) Image-to-Video Generation. (2) 360 A/T Pose Avatar Video Synthesis. (3) Dynamic Illustration Generation. (4) Generative Video Super-Resolution. (5) Interactive Game Video Generation. These image and video generation models not only exhibit high-level aesthetic expression but also deeply integrate domain-specific knowledge, establishing a systematic understanding of diverse game and anime art styles. △ Less

Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14057 [pdf, ps, other]

Field Matters: A lightweight LLM-enhanced Method for CTR Prediction

Authors: Yu Cui, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Xiaohu Yang, Can Wang

Abstract: Click-through rate (CTR) prediction is a fundamental task in modern recommender systems. In recent years, the integration of large language models (LLMs) has been shown to effectively enhance the performance of traditional CTR methods. However, existing LLM-enhanced methods often require extensive processing of detailed textual descriptions for large-scale instances or user/item entities, leading… ▽ More Click-through rate (CTR) prediction is a fundamental task in modern recommender systems. In recent years, the integration of large language models (LLMs) has been shown to effectively enhance the performance of traditional CTR methods. However, existing LLM-enhanced methods often require extensive processing of detailed textual descriptions for large-scale instances or user/item entities, leading to substantial computational overhead. To address this challenge, this work introduces LLaCTR, a novel and lightweight LLM-enhanced CTR method that employs a field-level enhancement paradigm. Specifically, LLaCTR first utilizes LLMs to distill crucial and lightweight semantic knowledge from small-scale feature fields through self-supervised field-feature fine-tuning. Subsequently, it leverages this field-level semantic knowledge to enhance both feature representation and feature interactions. In our experiments, we integrate LLaCTR with six representative CTR models across four datasets, demonstrating its superior performance in terms of both effectiveness and efficiency compared to existing LLM-enhanced methods. Our code is available at https://anonymous.4open.science/r/LLaCTR-EC46. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.13756 [pdf, other]

doi 10.1093/mnras/staf665

OGHReS: Star formation in the Outer Galaxy II ($\ell = 180^\circ$-$280^\circ$)

Authors: J. S. Urquhart, C. Koenig, D. Colombo, A. Karska, A. Giannetti, T. J. T. Moore, A. Y. Yang, F. Wyrowski, Y. Sun, Z. Jiang, K. R. Neralwar, D. Eden, I. Grozdanova, S. Neupane, M. Figueira, E. Dann, V., S. Veena, W. -J. Kim, S. Leurini, J. Brand, M. -Y. Lee

Abstract: The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122… ▽ More The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122 clumps ($180^\circ < \ell < 250^\circ$) and determine reliable velocities for \withVLSR\ of these, finding good agreement with the previously assigned velocities ($\sim$80 percent within 5 \kms). We update velocities for 288 clumps and provide new values for an additional 411. Combining these with the previous results, we have velocities and physical properties for 6193 clumps (92.3 percent). The \allnonDetections\ non-detections are low surface density clumps or likely contamination by evolved stars and galaxies. Key findings: i) improved correlation between clumps and spiral arm loci, and the discovery of clumps beyond the outer arm supports the existence of a new spiral structure; ii) decreasing trend in the $L/M$-ratio consistent with less high-mass star formation in the outer Galaxy; iii) increase in the star formation fraction (SFF) in the outer Galaxy, suggesting that more clumps are forming stars despite their lower mass; iv) discrepancies in velocity assignments across different surveys that could affect $\sim$10000 clumps, especially in the fourth quadrant. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 18 pages, 14 figues. Full versions of Tables 1 and 2 are only available in electronic form via CDS. arXiv admin note: text overlap with arXiv:2401.00808

arXiv:2505.13633 [pdf, ps, other]

IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion

Authors: Wentao Song, He Huang, Youqiang Sun, Fang Qu, Jiaqi Zhang, Longhui Fang, Yuwei Hao, Chenyang Peng

Abstract: Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised… ▽ More Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised multi-target point cloud extraction method. The method utilizes radiance field information to lift 2D masks, which are segmented by SAM2 (Segment Anything Model 2), into 3D space for target point cloud extraction. A multi-target collaborative optimization strategy is designed to effectively resolve the single-interaction multi-target segmentation challenge. Experimental validation demonstrates that IPENS achieves a grain-level segmentation accuracy (mIoU) of 63.72% on a rice dataset, with strong phenotypic estimation capabilities: grain volume prediction yields R2 = 0.7697 (RMSE = 0.0025), leaf surface area R2 = 0.84 (RMSE = 18.93), and leaf length and width predictions achieve R2 = 0.97 and 0.87 (RMSE = 1.49 and 0.21). On a wheat dataset,IPENS further improves segmentation accuracy to 89.68% (mIoU), with equally outstanding phenotypic estimation performance: spike volume prediction achieves R2 = 0.9956 (RMSE = 0.0055), leaf surface area R2 = 1.00 (RMSE = 0.67), and leaf length and width predictions reach R2 = 0.99 and 0.92 (RMSE = 0.23 and 0.15). This method provides a non-invasive, high-quality phenotyping extraction solution for rice and wheat. Without requiring annotated data, it rapidly extracts grain-level point clouds within 3 minutes through simple single-round interactions on images for multiple targets, demonstrating significant potential to accelerate intelligent breeding efficiency. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Showing 151–200 of 6,528 results for author: Suen, Y