Search | arXiv e-print repository

Observation of $ψ(3686) \to Ξ^- K^0_S \barΩ^+ $+c.c

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

Abstract: Using a sample of $(2.712\pm0.014) \times 10^{9}$ $ψ(3686)$ events collected with the BESIII detector at the electron positron collider BEPCII, the decay $ψ(3686) \to Ξ^- K^0_S \barΩ^+ +c.c.$ is observed for the first time, which has a significance of 5.9 standard deviations. The branching fraction of this decay is measured to be $(2.91\pm0.47\pm0.33)\times 10^{-6}$, where the first and second unc… ▽ More Using a sample of $(2.712\pm0.014) \times 10^{9}$ $ψ(3686)$ events collected with the BESIII detector at the electron positron collider BEPCII, the decay $ψ(3686) \to Ξ^- K^0_S \barΩ^+ +c.c.$ is observed for the first time, which has a significance of 5.9 standard deviations. The branching fraction of this decay is measured to be $(2.91\pm0.47\pm0.33)\times 10^{-6}$, where the first and second uncertainties are statistical and systematic, respectively. The ratio between $\mathcal{B}_{ψ(3686) \to Ξ^- K^0_S \barΩ^+ +c.c.}$ and $\mathcal{B}_{ψ(3686) \to Ω^- K^+ \barΞ^0 +c.c.}$ is determined to be $1.05\pm0.23\pm0.14 $, which deviates with the isospin symmetry conservation predicted value of 0.5 by $2.1σ$. △ Less

Submitted 13 June, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.04096 [pdf, ps, other]

Observation of a Three-Resonance Structure in the Cross Section of $e^+e^-\toπ^+π^- h_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

Abstract: Using $e^+e^-$ collision data collected with the BESIII detector operating at the Beijing Electron Positron Collider, the cross section of $e^+e^-\to π^+π^- h_c$ is measured at 59 points with center-of-mass energy $\sqrt{s}$ ranging from $4.009$ to $4.950~\mathrm{GeV}$ with a total integrated luminosity of $22.2~\mathrm{fb}^{-1}$. The cross section between $4.3$ and $4.45~\mathrm{GeV}$ exhibits a… ▽ More Using $e^+e^-$ collision data collected with the BESIII detector operating at the Beijing Electron Positron Collider, the cross section of $e^+e^-\to π^+π^- h_c$ is measured at 59 points with center-of-mass energy $\sqrt{s}$ ranging from $4.009$ to $4.950~\mathrm{GeV}$ with a total integrated luminosity of $22.2~\mathrm{fb}^{-1}$. The cross section between $4.3$ and $4.45~\mathrm{GeV}$ exhibits a plateau-like shape and drops sharply around $4.5~\mathrm{GeV}$, which cannot be described by two resonances only. Three coherent Breit-Wigner functions are used to parameterize the $\sqrt{s}$-dependent cross section line shape. The masses and widths are determined to be $M_1=(4223.6_{-3.7-2.9}^{+3.6+2.6})~\mathrm{MeV}/c^2$, $Γ_1=(58.5_{-11.4-6.5}^{+10.8+6.7})~\mathrm{MeV}$, $M_2=(4327.4_{-18.8-9.3}^{+20.1+10.7})~\mathrm{MeV}/c^2$, $Γ_2=(244.1_{-27.1-18.0}^{+34.0+23.9})~\mathrm{MeV}$, and $M_3=(4467.4_{-5.4-2.7}^{+7.2+3.2})~\mathrm{MeV}/c^2$, $Γ_3=(62.8_{-14.4-6.6}^{+19.2+9.8})~\mathrm{MeV}$. The first uncertainties are statistical and the other two are systematic. The statistical significance of the three Breit-Wigner assumption over the two Breit-Wigner assumption is greater than $5σ$. △ Less

Submitted 5 April, 2025; originally announced April 2025.

arXiv:2504.02272 [pdf, other]

Generative Classifier for Domain Generalization

Authors: Shaocong Long, Qianyu Zhou, Xiangtai Li, Chenhao Ying, Yunhai Tong, Lizhuang Ma, Yuan Luo, Dacheng Tao

Abstract: Domain generalization (DG) aims to improve the generalizability of computer vision models toward distribution shifts. The mainstream DG methods focus on learning domain invariance, however, such methods overlook the potential inherent in domain-specific information. While the prevailing practice of discriminative linear classifier has been tailored to domain-invariant features, it struggles when c… ▽ More Domain generalization (DG) aims to improve the generalizability of computer vision models toward distribution shifts. The mainstream DG methods focus on learning domain invariance, however, such methods overlook the potential inherent in domain-specific information. While the prevailing practice of discriminative linear classifier has been tailored to domain-invariant features, it struggles when confronted with diverse domain-specific information, e.g., intra-class shifts, that exhibits multi-modality. To address these issues, we explore the theoretical implications of relying on domain invariance, revealing the crucial role of domain-specific information in mitigating the target risk for DG. Drawing from these insights, we propose Generative Classifier-driven Domain Generalization (GCDG), introducing a generative paradigm for the DG classifier based on Gaussian Mixture Models (GMMs) for each class across domains. GCDG consists of three key modules: Heterogeneity Learning Classifier~(HLC), Spurious Correlation Blocking~(SCB), and Diverse Component Balancing~(DCB). Concretely, HLC attempts to model the feature distributions and thereby capture valuable domain-specific information via GMMs. SCB identifies the neural units containing spurious correlations and perturbs them, mitigating the risk of HLC learning spurious patterns. Meanwhile, DCB ensures a balanced contribution of components in HLC, preventing the underestimation or neglect of critical components. In this way, GCDG excels in capturing the nuances of domain-specific information characterized by diverse distributions. GCDG demonstrates the potential to reduce the target risk and encourage flat minima, improving the generalizability. Extensive experiments show GCDG's comparable performance on five DG benchmarks and one face anti-spoofing dataset, seamlessly integrating into existing DG methods with consistent improvements. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: Code will be available at https://github.com/longshaocong/GCDG

arXiv:2504.02242 [pdf, other]

Measurement of the transverse energy density in Au+Au collisions at $\sqrt{s_{NN}} = 200$ GeV with the sPHENIX detector

Authors: sPHENIX Collaboration, M. I. Abdulhamid, U. Acharya, E. R. Adams, G. Adawi, C. A. Aidala, Y. Akiba, M. Alfred, S. Ali, A. Alsayegh, S. Altaf, H. Amedi, D. M. Anderson, V. V. Andrieux, A. Angerami, N. Applegate, H. Aso, S. Aune, B. Azmoun, V. R. Bailey, D. Baranyai, S. Bathe, A. Bazilevsky, S. Bela, R. Belmont , et al. (281 additional authors not shown)

Abstract: This paper reports measurements of the transverse energy per unit pseudorapidity ($dE_{T}/dη$) produced in Au+Au collisions at $\sqrt{s_{NN}} = 200$ GeV, performed with the sPHENIX detector at the Relativistic Heavy Ion Collider (RHIC). The results cover the pseudorapidity range $\left|η\right| < 1.1$ and constitute the first such measurement performed using a hadronic calorimeter at RHIC. Measure… ▽ More This paper reports measurements of the transverse energy per unit pseudorapidity ($dE_{T}/dη$) produced in Au+Au collisions at $\sqrt{s_{NN}} = 200$ GeV, performed with the sPHENIX detector at the Relativistic Heavy Ion Collider (RHIC). The results cover the pseudorapidity range $\left|η\right| < 1.1$ and constitute the first such measurement performed using a hadronic calorimeter at RHIC. Measurements of $dE_{T}/dη$ are presented for a range of centrality intervals and the average $dE_{T}/dη$ as a function of the number of participating nucleons, $N_{\mathrm{part}}$, is compared to a variety of Monte Carlo heavy-ion event generators. The results are in agreement with previous measurements at RHIC, and feature an improved granularity in $η$ and improved precision in low-$N_{\mathrm{part}}$ events. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 18 pages total, 6 figures, 2 tables. All figures and tables can be found at https://www.sphenix.bnl.gov/PublicResults/sPH-BULK-2025-02

arXiv:2504.02240 [pdf, other]

Measurement of charged hadron multiplicity in Au+Au collisions at $\sqrt{\text{s}_{\text{NN}}} = 200$ GeV with the sPHENIX detector

Authors: sPHENIX Collaboration, M. I. Abdulhamid, U. Acharya, E. R. Adams, G. Adawi, C. A. Aidala, Y. Akiba, M. Alfred, S. Ali, A. Alsayegh, S. Altaf, H. Amedi, D. M. Anderson, V. V. Andrieux, A. Angerami, N. Applegate, H. Aso, S. Aune, B. Azmoun, V. R. Bailey, D. Baranyai, S. Bathe, A. Bazilevsky, S. Bela, R. Belmont , et al. (281 additional authors not shown)

Abstract: The pseudorapidity distribution of charged hadrons produced in Au+Au collisions at a center-of-mass energy of $\sqrt{s_\mathrm{NN}} = 200$ GeV is measured using data collected by the sPHENIX detector. Charged hadron yields are extracted by counting cluster pairs in the inner and outer layers of the Intermediate Silicon Tracker, with corrections applied for detector acceptance, reconstruction effic… ▽ More The pseudorapidity distribution of charged hadrons produced in Au+Au collisions at a center-of-mass energy of $\sqrt{s_\mathrm{NN}} = 200$ GeV is measured using data collected by the sPHENIX detector. Charged hadron yields are extracted by counting cluster pairs in the inner and outer layers of the Intermediate Silicon Tracker, with corrections applied for detector acceptance, reconstruction efficiency, combinatorial pairs, and contributions from secondary decays. The measured distributions cover $|η| < 1.1$ across various centralities, and the average pseudorapidity density of charged hadrons at mid-rapidity is compared to predictions from Monte Carlo heavy-ion event generators. This result, featuring full azimuthal coverage at mid-rapidity, is consistent with previous experimental measurements at the Relativistic Heavy Ion Collider, thereby supporting the broader sPHENIX physics program. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 22 pages total, 6 figures, 4 tables. All figures and tables can be found at https://www.sphenix.bnl.gov/PublicResults/sPH-BULK-2025-01

arXiv:2504.02154 [pdf, ps, other]

FreSca: Scaling in Frequency Space Enhances Diffusion Models

Authors: Chao Huang, Susan Liang, Yunlong Tang, Jing Bi, Li Ma, Yapeng Tian, Chenliang Xu

Abstract: Latent diffusion models (LDMs) have achieved remarkable success in a variety of image tasks, yet achieving fine-grained, disentangled control over global structures versus fine details remains challenging. This paper explores frequency-based control within latent diffusion models. We first systematically analyze frequency characteristics across pixel space, VAE latent space, and internal LDM repre… ▽ More Latent diffusion models (LDMs) have achieved remarkable success in a variety of image tasks, yet achieving fine-grained, disentangled control over global structures versus fine details remains challenging. This paper explores frequency-based control within latent diffusion models. We first systematically analyze frequency characteristics across pixel space, VAE latent space, and internal LDM representations. This reveals that the "noise difference" term, derived from classifier-free guidance at each step t, is a uniquely effective and semantically rich target for manipulation. Building on this insight, we introduce FreSca, a novel and plug-and-play framework that decomposes noise difference into low- and high-frequency components and applies independent scaling factors to them via spatial or energy-based cutoffs. Essentially, FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control. We demonstrate its versatility and effectiveness in improving generation quality and structural emphasis on multiple architectures (e.g., SD3, SDXL) and across applications including image generation, editing, depth estimation, and video synthesis, thereby unlocking a new dimension of expressive control within LDMs. △ Less

Submitted 29 May, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

Comments: Project page: https://wikichao.github.io/FreSca/

arXiv:2504.01823 [pdf, other]

Evidence of doubly OZI-suppressed decay $η_{c} \to ωφ$ in the radiative decay $J/ψ\to γη_{c}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

Abstract: Using a sample of $(10087\pm44) \times 10^{6}$ $J/ψ$ events collected with the BESIII detector at the BEPCII collider, the first evidence for the doubly OZI-suppressed decay $η_{c} \to ωφ$ is reported with a significance of 4.0$σ$. The branching fraction of $η_{c} \to ωφ$ is measured to be $\mathcal{B}(η_{c} \to ωφ) = (3.86 \pm 0.92 \pm 0.62) \times 10^{-5}$, where the first uncertainty is statist… ▽ More Using a sample of $(10087\pm44) \times 10^{6}$ $J/ψ$ events collected with the BESIII detector at the BEPCII collider, the first evidence for the doubly OZI-suppressed decay $η_{c} \to ωφ$ is reported with a significance of 4.0$σ$. The branching fraction of $η_{c} \to ωφ$ is measured to be $\mathcal{B}(η_{c} \to ωφ) = (3.86 \pm 0.92 \pm 0.62) \times 10^{-5}$, where the first uncertainty is statistical and the second is systematic. This result provides valuable insights into the underlying mechanisms of charmonium decays, particularly for processes such as $η_{c} \to VV$ (where $V$ represents a vector meson). △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.01792 [pdf, ps, other]

UniViTAR: Unified Vision Transformer with Native Resolution

Authors: Limeng Qiao, Yiyang Gan, Bairui Wang, Jie Qin, Shuang Xu, Siqi Yang, Lin Ma

Abstract: Conventional Vision Transformer simplifies visual modeling by standardizing input resolutions, often disregarding the variability of natural visual data and compromising spatial-contextual fidelity. While preliminary explorations have superficially investigated native resolution modeling, existing approaches still lack systematic analysis from a visual representation perspective. To bridge this ga… ▽ More Conventional Vision Transformer simplifies visual modeling by standardizing input resolutions, often disregarding the variability of natural visual data and compromising spatial-contextual fidelity. While preliminary explorations have superficially investigated native resolution modeling, existing approaches still lack systematic analysis from a visual representation perspective. To bridge this gap, we introduce UniViTAR, a family of homogeneous vision foundation models tailored for unified visual modality and native resolution scenario in the era of multimodal. Our framework first conducts architectural upgrades to the vanilla paradigm by integrating multiple advanced components. Building upon these improvements, a progressive training paradigm is introduced, which strategically combines two core mechanisms: (1) resolution curriculum learning, transitioning from fixed-resolution pretraining to native resolution tuning, thereby leveraging ViT's inherent adaptability to variable-length sequences, and (2) visual modality adaptation via inter-batch image-video switching, which balances computational efficiency with enhanced temporal reasoning. In parallel, a hybrid training framework further synergizes sigmoid-based contrastive loss with feature distillation from a frozen teacher model, thereby accelerating early-stage convergence. Finally, trained exclusively on public datasets, externsive experiments across multiple model scales from 0.3B to 1B demonstrate its effectiveness. △ Less

Submitted 29 May, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.01603 [pdf, other]

A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting

Authors: Yizhe Tang, Zhimin Sun, Yuzhen Du, Ran Yi, Guangben Lu, Teng Hu, Luying Li, Lizhuang Ma, Fangyuan Zou

Abstract: Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, res… ▽ More Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the "Text-Guided Subject-Position Variable Background Inpainting", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position. Secondly, we design the Reverse Displacement Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse structure, to transform hierarchical feature maps from deep to shallow based on semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed. Extensive comparative experiments validate the effectiveness of our A$^\text{T}$A approach, which not only demonstrates superior inpainting capabilities in subject-position variable inpainting, but also ensures good performance on subject-position fixed inpainting. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR 2025

arXiv:2504.00911 [pdf, other]

Foundation Models for Autonomous Driving System: An Initial Roadmap

Authors: Xiongfei Wu, Mingfei Cheng, Qiang Hu, Jianlang Chen, Yuheng Huang, Manabu Okada, Michio Hayashi, Tomoyuki Tsuchiya, Xiaofei Xie, Lei Ma

Abstract: Recent advancements in Foundation Models (FMs), such as Large Language Models (LLMs), have significantly enhanced Autonomous Driving Systems (ADSs) by improving perception, reasoning, and decision-making in dynamic and uncertain environments. However, ADSs are highly complex cyber-physical systems that demand rigorous software engineering practices to ensure reliability and safety. Integrating FMs… ▽ More Recent advancements in Foundation Models (FMs), such as Large Language Models (LLMs), have significantly enhanced Autonomous Driving Systems (ADSs) by improving perception, reasoning, and decision-making in dynamic and uncertain environments. However, ADSs are highly complex cyber-physical systems that demand rigorous software engineering practices to ensure reliability and safety. Integrating FMs into ADSs introduces new challenges in system design and evaluation, requiring a systematic review to establish a clear research roadmap. To unlock these challenges, we present a structured roadmap for integrating FMs into autonomous driving, covering three key aspects: the infrastructure of FMs, their application in autonomous driving systems, and their current applications in practice. For each aspect, we review the current research progress, identify existing challenges, and highlight research gaps that need to be addressed by the community. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2504.00394 [pdf, other]

AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline

Authors: Lei Wang, Yujie Zhong, Xiaopeng Sun, Jingchun Cheng, Chengjian Feng, Qiong Cao, Lin Ma, Zhaoxin Fan

Abstract: The task of 2D animal pose estimation plays a crucial role in advancing deep learning applications in animal behavior analysis and ecological research. Despite notable progress in some existing approaches, our study reveals that the scarcity of high-quality datasets remains a significant bottleneck, limiting the full potential of current methods. To address this challenge, we propose a novel Contr… ▽ More The task of 2D animal pose estimation plays a crucial role in advancing deep learning applications in animal behavior analysis and ecological research. Despite notable progress in some existing approaches, our study reveals that the scarcity of high-quality datasets remains a significant bottleneck, limiting the full potential of current methods. To address this challenge, we propose a novel Controllable Image Generation Pipeline for synthesizing animal pose estimation data, termed AP-CAP. Within this pipeline, we introduce a Multi-Modal Animal Image Generation Model capable of producing images with expected poses. To enhance the quality and diversity of the generated data, we further propose three innovative strategies: (1) Modality-Fusion-Based Animal Image Synthesis Strategy to integrate multi-source appearance representations, (2) Pose-Adjustment-Based Animal Image Synthesis Strategy to dynamically capture diverse pose variations, and (3) Caption-Enhancement-Based Animal Image Synthesis Strategy to enrich visual semantic understanding. Leveraging the proposed model and strategies, we create the MPCH Dataset (Modality-Pose-Caption Hybrid), the first hybrid dataset that innovatively combines synthetic and real data, establishing the largest-scale multi-source heterogeneous benchmark repository for animal pose estimation to date. Extensive experiments demonstrate the superiority of our method in improving both the performance and generalization capability of animal pose estimators. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.24345 [pdf, other]

PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks

Authors: Fang Yan, Jianfeng Wu, Jiawen Li, Wei Wang, Jiaxuan Lu, Wen Chen, Zizhao Gao, Jianan Li, Hong Yan, Jiabo Ma, Minda Chen, Yang Lu, Qing Chen, Yizhi Wang, Xitong Ling, Xuenian Wang, Zihan Wang, Qiang Huang, Shengyi Hua, Mianxin Liu, Lei Ma, Tian Shen, Xiaofan Zhang, Yonghong He, Hao Chen , et al. (2 additional authors not shown)

Abstract: The complexity and variability inherent in high-resolution pathological images present significant challenges in computational pathology. While pathology foundation models leveraging AI have catalyzed transformative advancements, their development demands large-scale datasets, considerable storage capacity, and substantial computational resources. Furthermore, ensuring their clinical applicability… ▽ More The complexity and variability inherent in high-resolution pathological images present significant challenges in computational pathology. While pathology foundation models leveraging AI have catalyzed transformative advancements, their development demands large-scale datasets, considerable storage capacity, and substantial computational resources. Furthermore, ensuring their clinical applicability and generalizability requires rigorous validation across a broad spectrum of clinical tasks. Here, we present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides from 20 tissue and organ types across multiple centers. The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets. These tasks encompass digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and the generation of structured reports. PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks, including pan-cancer classification across various organs, lymphoma subtype diagnosis, and bladder cancer screening. Notably, it is the first model to generate structured reports for high-incidence colorectal cancer and diagnostically complex lymphoma-areas that are infrequently addressed by foundational models but hold immense clinical potential. Overall, PathOrchestra exemplifies the feasibility and efficacy of a large-scale, self-supervised pathology foundation model, validated across a broad range of clinical-grade tasks. Its high accuracy and reduced reliance on extensive data annotation underline its potential for clinical integration, offering a pathway toward more efficient and high-quality medical services. △ Less

Submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.24137 [pdf, other]

Half-life and precision shape measurement of 2νββ decay of $^{130}$Te

Authors: D. Q. Adams, C. Alduino, K. Alfonso, F. T. Avignone III, O. Azzolini, G. Bari, F. Bellini, G. Benato, M. Beretta, M. Biassoni, A. Branca, C. Brofferio, C. Bucci, J. Camilleri, A. Caminata, A. Campani, J. Cao, C. Capelli, S. Capelli, L. Cappelli, L. Cardani, P. Carniti, N. Casali, E. Celi, D. Chiesa , et al. (97 additional authors not shown)

Abstract: We present a new measurement of the 2nbb half-life of 130Te (T1/2) using the first complete model of the CUORE data, based on 1038 kg yr of collected exposure. Thanks to optimized data selection, we achieve a factor of two improvement in precision, obtaining T1/2 = (9.32 +0.05 -0.04 (stat.) +0.07 -0.07 (syst.)) x10^20 yr. The signal-to-background ratio is increased by 70% compared to our previous… ▽ More We present a new measurement of the 2nbb half-life of 130Te (T1/2) using the first complete model of the CUORE data, based on 1038 kg yr of collected exposure. Thanks to optimized data selection, we achieve a factor of two improvement in precision, obtaining T1/2 = (9.32 +0.05 -0.04 (stat.) +0.07 -0.07 (syst.)) x10^20 yr. The signal-to-background ratio is increased by 70% compared to our previous results, enabling the first application of the improved 2nbb formalism to 130Te. Within this framework, we determine a credibility interval for the effective axial coupling in the nuclear medium as a function of nuclear matrix elements. We also extract values for the higher-order nuclear matrix element ratios: second-to-first and third-to-first. The second-to-first ratio agrees with nuclear model predictions, while the third-to-first ratio deviates from theoretical expectations. These findings provide essential tests of nuclear models and key inputs for future 0nbb searches. △ Less

Submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.23368 [pdf, other]

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Authors: Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia

Abstract: Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos and drawing the attention of the community in their potential as world simulators. However, despite their capabilities, VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics, resulting in incorrect dynamics and event sequ… ▽ More Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos and drawing the attention of the community in their potential as world simulators. However, despite their capabilities, VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics, resulting in incorrect dynamics and event sequences. To address this limitation, we propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior. In the first stage, we employ a Vision Language Model (VLM) as a coarse-grained motion planner, integrating chain-of-thought and physics-aware reasoning to predict a rough motion trajectories/changes that approximate real-world physical dynamics while ensuring the inter-frame consistency. In the second stage, we use the predicted motion trajectories/changes to guide the video generation of a VDM. As the predicted motion trajectories/changes are rough, noise is added during inference to provide freedom to the VDM in generating motion with more fine details. Extensive experimental results demonstrate that our framework can produce physically plausible motion, and comparative evaluations highlight the notable superiority of our approach over existing methods. More video results are available on our Project Page: https://madaoer.github.io/projects/physically_plausible_video_generation. △ Less

Submitted 4 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

Comments: 18 pages, 11 figures

arXiv:2503.22913 [pdf, other]

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval

Authors: Xinyu Wang, Linrui Ma, Jerry Huang, Peng Lu, Prasanna Parthasarathi, Xiao-Wen Chang, Boxing Chen, Yufei Cui

Abstract: Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms o… ▽ More Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms of in-context learning among other tasks that require recalling information from a context. In this work, we introduce __Resona__, a simple and scalable framework for augmenting linear recurrent models with retrieval. __Resona__~augments models with the ability to integrate retrieved information from the provided input context, enabling tailored behavior to diverse task requirements. Experiments on a variety of linear recurrent models demonstrate that __Resona__-augmented models observe significant performance gains on a variety of synthetic as well as real-world natural language tasks, highlighting its ability to act as a general purpose method to improve the in-context learning and language modeling abilities of linear recurrent LLMs. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.22223 [pdf, other]

DREMnet: An Interpretable Denoising Framework for Semi-Airborne Transient Electromagnetic Signal

Authors: Shuang Wang, Ming Guo, Xuben Wang, Fei Deng, Lifeng Mao, Bin Wang, Wenlong Gao

Abstract: The semi-airborne transient electromagnetic method (SATEM) is capable of conducting rapid surveys over large-scale and hard-to-reach areas. However, the acquired signals are often contaminated by complex noise, which can compromise the accuracy of subsequent inversion interpretations. Traditional denoising techniques primarily rely on parameter selection strategies, which are insufficient for proc… ▽ More The semi-airborne transient electromagnetic method (SATEM) is capable of conducting rapid surveys over large-scale and hard-to-reach areas. However, the acquired signals are often contaminated by complex noise, which can compromise the accuracy of subsequent inversion interpretations. Traditional denoising techniques primarily rely on parameter selection strategies, which are insufficient for processing field data in noisy environments. With the advent of deep learning, various neural networks have been employed for SATEM signal denoising. However, existing deep learning methods typically use single-mapping learning approaches that struggle to effectively separate signal from noise. These methods capture only partial information and lack interpretability. To overcome these limitations, we propose an interpretable decoupled representation learning framework, termed DREMnet, that disentangles data into content and context factors, enabling robust and interpretable denoising in complex conditions. To address the limitations of CNN and Transformer architectures, we utilize the RWKV architecture for data processing and introduce the Contextual-WKV mechanism, which allows unidirectional WKV to perform bidirectional signal modeling. Our proposed Covering Embedding technique retains the strong local perception of convolutional networks through stacked embedding. Experimental results on test datasets demonstrate that the DREMnet method outperforms existing techniques, with processed field data that more accurately reflects the theoretical signal, offering improved identification of subsurface electrical structures. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.22214 [pdf, other]

Interpretable Deep Learning Paradigm for Airborne Transient Electromagnetic Inversion

Authors: Shuang Wang, Xuben Wang, Fei Deng, Xiaodong Yu, Peifan Jiang, Lifeng Mao

Abstract: The extraction of geoelectric structural information from airborne transient electromagnetic(ATEM)data primarily involves data processing and inversion. Conventional methods rely on empirical parameter selection, making it difficult to process complex field data with high noise levels. Additionally, inversion computations are time consuming and often suffer from multiple local minima. Existing dee… ▽ More The extraction of geoelectric structural information from airborne transient electromagnetic(ATEM)data primarily involves data processing and inversion. Conventional methods rely on empirical parameter selection, making it difficult to process complex field data with high noise levels. Additionally, inversion computations are time consuming and often suffer from multiple local minima. Existing deep learning-based approaches separate the data processing steps, where independently trained denoising networks struggle to ensure the reliability of subsequent inversions. Moreover, end to end networks lack interpretability. To address these issues, we propose a unified and interpretable deep learning inversion paradigm based on disentangled representation learning. The network explicitly decomposes noisy data into noise and signal factors, completing the entire data processing workflow based on the signal factors while incorporating physical information for guidance. This approach enhances the network's reliability and interpretability. The inversion results on field data demonstrate that our method can directly use noisy data to accurately reconstruct the subsurface electrical structure. Furthermore, it effectively processes data severely affected by environmental noise, which traditional methods struggle with, yielding improved lateral structural resolution. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.22126 [pdf, other]

Updated model-independent measurement of the strong-phase differences between $D^0$ and $\bar{D}^0 \to K^{0}_{S/L}π^+π^-$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (696 additional authors not shown)

Abstract: The strong-phase differences between $D^0\to K_{S/L}^0π^+π^-$ and $\bar{D}^0\to K_{S/L}^0π^+π^-$ decays are one of the most important inputs in measuring the $C\!P$ violating angle $γ$ via $B^- \to D K^-$ decays. They also play a key role in studies of charm mixing and indirect $C\!P$ violation. In this paper, the strong-phase differences are determined in a model-independent way with quantum-corr… ▽ More The strong-phase differences between $D^0\to K_{S/L}^0π^+π^-$ and $\bar{D}^0\to K_{S/L}^0π^+π^-$ decays are one of the most important inputs in measuring the $C\!P$ violating angle $γ$ via $B^- \to D K^-$ decays. They also play a key role in studies of charm mixing and indirect $C\!P$ violation. In this paper, the strong-phase differences are determined in a model-independent way with quantum-correlated $D^0$-$\bar{D}^0$ decays from 7.93 fb$^{-1}$ of $e^+e^-$ annihilation data at $\sqrt{s}$=3.773 GeV by the BESIII experiment. These results are the most precise to date and are expected to significantly reduce associated uncertainties in determining the $C\!P$ violating angle $γ$ and related charm mixing parameters. △ Less

Submitted 18 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.21413 [pdf, other]

First observation of $Λ_{c}(2595)^{+} \to Λ^{+}_{c}π^0π^0$ and $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (657 additional authors not shown)

Abstract: By analysing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 368.48~pb$^{-1}$ collected at the centre-of-mass energies of $\sqrt{s} = 4.918$ and $4.951$~GeV with the BESIII detector, we report the first observation of $Λ_{c}(2595)^{+}$ and $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^0π^0$ with statistical significances of 7.9$σ$ and 11.8$σ$, respectively. The branching fractions of… ▽ More By analysing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 368.48~pb$^{-1}$ collected at the centre-of-mass energies of $\sqrt{s} = 4.918$ and $4.951$~GeV with the BESIII detector, we report the first observation of $Λ_{c}(2595)^{+}$ and $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^0π^0$ with statistical significances of 7.9$σ$ and 11.8$σ$, respectively. The branching fractions of $Λ_{c}(2595)^{+}$ and $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^0π^0$ are measured to be $(59.5 \pm 11.1_{\rm stat.} \pm 7.9_{\rm syst.}) \%$ and $(41.0 \pm 5.2_{\rm stat.} \pm 3.3_{\rm syst.}) \%$, respectively. The absolute branching fraction of $Λ_{c}(2595)^{+}$ is consistent with the expectation of the mechanism referred to as the threshold effect, proposed for the strong decays of $Λ_{c}(2595)^{+}$ within uncertainty. △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: 20 pages, 4 figures

arXiv:2503.20209 [pdf, other]

BEAR: A Video Dataset For Fine-grained Behaviors Recognition Oriented with Action and Environment Factors

Authors: Chengyang Hu, Yuduo Chen, Lizhuang Ma

Abstract: Behavior recognition is an important task in video representation learning. An essential aspect pertains to effective feature learning conducive to behavior recognition. Recently, researchers have started to study fine-grained behavior recognition, which provides similar behaviors and encourages the model to concern with more details of behaviors with effective features for distinction. However, p… ▽ More Behavior recognition is an important task in video representation learning. An essential aspect pertains to effective feature learning conducive to behavior recognition. Recently, researchers have started to study fine-grained behavior recognition, which provides similar behaviors and encourages the model to concern with more details of behaviors with effective features for distinction. However, previous fine-grained behaviors limited themselves to controlling partial information to be similar, leading to an unfair and not comprehensive evaluation of existing works. In this work, we develop a new video fine-grained behavior dataset, named BEAR, which provides fine-grained (i.e. similar) behaviors that uniquely focus on two primary factors defining behavior: Environment and Action. It includes two fine-grained behavior protocols including Fine-grained Behavior with Similar Environments and Fine-grained Behavior with Similar Actions as well as multiple sub-protocols as different scenarios. Furthermore, with this new dataset, we conduct multiple experiments with different behavior recognition models. Our research primarily explores the impact of input modality, a critical element in studying the environmental and action-based aspects of behavior recognition. Our experimental results yield intriguing insights that have substantial implications for further research endeavors. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: Accept by ICME2025

arXiv:2503.19725 [pdf]

Nitrogen-Vacancy Engineering for Controlled Phase Transitions in CrN(111) Epitaxial Films

Authors: XiaoXu Zhang, Yang Li, Yu Shang, MingYue Zhao, GuoKe Li, Li Ma, DeWei Zhao, CongMian Zhen, DengLu Hou

Abstract: The phase transition in CrN epitaxial films is substantially suppressed by epitaxial constraint. Here, we propose that nitrogen (N) vacancies can be taken as a knob to regulate the phase transition of CrN(111) epitaxial films. To validate this concept, a series of CrN(111) films with controlled N concentrations (approximately from 0.0 to 5.0 at.%) were epitaxially grown on Al2O3(0001) substrates.… ▽ More The phase transition in CrN epitaxial films is substantially suppressed by epitaxial constraint. Here, we propose that nitrogen (N) vacancies can be taken as a knob to regulate the phase transition of CrN(111) epitaxial films. To validate this concept, a series of CrN(111) films with controlled N concentrations (approximately from 0.0 to 5.0 at.%) were epitaxially grown on Al2O3(0001) substrates. Experimental characterization reveals that higher N vacancy concentrations significantly facilitate the out-of-plane contraction of the films at 273 K (0.8%), reaching up to 60% of the contraction magnitude of CrN powders (1.2%) without compromising the stability and reproducibility of the phase transition. Reducing N vacancy concentrations diminishes the lattice contraction, lowers the phase transition temperature to 193 K, and triggers a metallic to insulator transition in electrical behavior. First-principles calculations corroborate these findings, showing that N vacancies decrease the internal tensile stress within triangular Cr atomic layers, which enhances the out-of-plane contraction, elevates phase transition temperatures, and promotes bandgap closure. These results establish N vacancies as a critical factor governing phase transition dynamics in CrN systems and provide a practical strategy for successively engineering thermally responsive phase transitions in CrN films, advancing their potential for functional device applications. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.19542 [pdf, other]

doi 10.1007/JHEP06(2025)220

Measurement of the branching fractions of doubly Cabibbo-suppressed $D$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: By analyzing $e^+e^-$ collision data collected at the center-of-mass energy of 3.773~GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3~fb$^{-1}$, we measure the branching fractions of the doubly Cabibbo-suppressed (DCS) decays $D^0\to K^+π^-$, $D^0\to K^+π^-π^-π^+$, $D^0\to K^+π^-π^0$, $D^0\to K^+π^-π^0π^0$, $D^+\to K^+π^+π^-$, and $D^+\to K^+K^+K^-$. We also perform… ▽ More By analyzing $e^+e^-$ collision data collected at the center-of-mass energy of 3.773~GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3~fb$^{-1}$, we measure the branching fractions of the doubly Cabibbo-suppressed (DCS) decays $D^0\to K^+π^-$, $D^0\to K^+π^-π^-π^+$, $D^0\to K^+π^-π^0$, $D^0\to K^+π^-π^0π^0$, $D^+\to K^+π^+π^-$, and $D^+\to K^+K^+K^-$. We also perform the first searches for $D^0\to K^+π^-η$, $D^0\to K^+π^-π^0η$, $D^+\to K^+π^+π^-η$, $D^{+} \to K^{+} \left(π^{+} π^{-} η\right)_{{\rm non}-η^{\prime}}$, and $D^+\to K^+ηη$ and report the first observations and evidence for some of these final states. Combining the measurements with the world averages of the corresponding Cabibbo-favored (CF) decays, the ratios of the DCS/CF branching fractions are obtained. For the $D^{+} \to K^{+} \left(π^{+} π^{-} η\right)_{{\rm non}-η^{\prime}}$ decay, the ratio is significantly larger than the corresponding ratios of the other DCS decays. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 16 pages, 5 figures

arXiv:2503.19516 [pdf, other]

DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data

Authors: Liming Zheng, Feng Yan, Fanfan Liu, Chengjian Feng, Yufeng Zhong, Yiyang Huang, Lin Ma

Abstract: The growing adoption of Vision-Language-Action (VLA) models in embodied AI intensifies the demand for diverse manipulation demonstrations. However, high costs associated with data collection often result in insufficient data coverage across all scenarios, which limits the performance of the models. It is observed that the spatial reasoning phase (SRP) in large workspace dominates the failure cases… ▽ More The growing adoption of Vision-Language-Action (VLA) models in embodied AI intensifies the demand for diverse manipulation demonstrations. However, high costs associated with data collection often result in insufficient data coverage across all scenarios, which limits the performance of the models. It is observed that the spatial reasoning phase (SRP) in large workspace dominates the failure cases. Fortunately, this data can be collected with low cost, underscoring the potential of leveraging inexpensive data to improve model performance. In this paper, we introduce the DataPlatter method, a framework that decouples training trajectories into distinct task stages and leverages abundant easily collectible SRP data to enhance VLA model's generalization. Through analysis we demonstrate that sub-task-specific training with additional SRP data with proper proportion can act as a performance catalyst for robot manipulation, maximizing the utilization of costly physical interaction phase (PIP) data. Experiments show that through introducing large proportion of cost-effective SRP trajectories into a limited set of PIP data, we can achieve a maximum improvement of 41\% on success rate in zero-shot scenes, while with the ability to transfer manipulation skill to novel targets. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.18869 [pdf, other]

Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design

Authors: Rui Xie, Asad Ul Haq, Linsen Ma, Yunhua Fang, Zirak Burzin Engineer, Liu Liu, Tong Zhang

Abstract: The efficiency of Large Language Model~(LLM) inference is often constrained by substantial memory bandwidth and capacity demands. Existing techniques, such as pruning, quantization, and mixture of experts/depth, reduce memory capacity and/or bandwidth consumption at the cost of slight degradation in inference quality. This paper introduces a design solution that further alleviates memory bottlenec… ▽ More The efficiency of Large Language Model~(LLM) inference is often constrained by substantial memory bandwidth and capacity demands. Existing techniques, such as pruning, quantization, and mixture of experts/depth, reduce memory capacity and/or bandwidth consumption at the cost of slight degradation in inference quality. This paper introduces a design solution that further alleviates memory bottlenecks by enhancing the on-chip memory controller in AI accelerators to achieve two main objectives: (1) significantly reducing memory capacity and bandwidth usage through lossless block compression~(e.g., LZ4 and ZSTD) of model weights and key-value (KV) cache without compromising inference quality, and (2) enabling memory bandwidth and energy consumption to scale proportionally with context-dependent dynamic quantization. These goals are accomplished by equipping the on-chip memory controller with mechanisms to improve fine-grained bit-level accessibility and compressibility of weights and KV cache through LLM-aware configuration of in-memory placement and representation. Experimental results on publicly available LLMs demonstrate the effectiveness of this approach, showing memory footprint reductions of 25.2\% for model weights and 46.9\% for KV cache. In addition, our hardware prototype at 4\,GHz and 32 lanes (7\,nm) achieves 8\,TB/s throughput with a modest area overhead (under 3.8\,mm$^2$), which underscores the viability of LLM-aware memory control as a key to efficient large-scale inference. △ Less

Submitted 21 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

Comments: 9 pages, 11 figures

arXiv:2503.18620 [pdf, ps, other]

doi 10.1103/PhysRevD.111.092007

Observation of the decay $ψ(3686)\rightarrow Σ^{0}\barΣ^{0}ω$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (695 additional authors not shown)

Abstract: Using a dataset of $(27.12\pm 0.14)\times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of the decay $ψ(3686)\toΣ^{0}\barΣ^{0}ω$ with a statistical significance of 8.9$σ$. The measured branching fraction is $(1.24 \pm 0.16_{\textrm{stat}} \pm 0.11_{\textrm{sys}}) \times 10^{-5}$, where the first uncertainty i… ▽ More Using a dataset of $(27.12\pm 0.14)\times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of the decay $ψ(3686)\toΣ^{0}\barΣ^{0}ω$ with a statistical significance of 8.9$σ$. The measured branching fraction is $(1.24 \pm 0.16_{\textrm{stat}} \pm 0.11_{\textrm{sys}}) \times 10^{-5}$, where the first uncertainty is statistical and the second is systematic. Additionally, we investigate potential intermediate states in the invariant mass distributions of $Σ^{0}ω$, $\barΣ^{0}ω$ and $Σ^{0}\barΣ^{0}$. A hint of a resonance is observed in the invariant mass distribution of $M_{Σ^{0}(\barΣ^{0})ω}$, located around 2.06 GeV/$c^2$, with a significance of 2.5$σ$. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2503.18525 [pdf, other]

P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Authors: Yufeng Zhong, Chengjian Feng, Feng Yan, Fanfan Liu, Liming Zheng, Lin Ma

Abstract: In language-guided visual navigation, agents locate target objects in unseen environments using natural language instructions. For reliable navigation in unfamiliar scenes, agents must possess strong perception, planning, and prediction capabilities. Additionally, when agents revisit previously explored areas during long-term navigation, they may retain irrelevant and redundant historical percepti… ▽ More In language-guided visual navigation, agents locate target objects in unseen environments using natural language instructions. For reliable navigation in unfamiliar scenes, agents must possess strong perception, planning, and prediction capabilities. Additionally, when agents revisit previously explored areas during long-term navigation, they may retain irrelevant and redundant historical perceptions, leading to suboptimal results. In this work, we introduce \textbf{P3Nav}, a unified framework that integrates \textbf{P}erception, \textbf{P}lanning, and \textbf{P}rediction capabilities through \textbf{Multitask Collaboration} on navigation and embodied question answering (EQA) tasks, thereby enhancing navigation performance. Furthermore, P3Nav employs an \textbf{Adaptive 3D-aware History Sampling} strategy to effectively and efficiently utilize historical observations. By leveraging the large language models (LLM), P3Nav comprehends diverse commands and complex visual scenes, resulting in appropriate navigation actions. P3Nav achieves a 75\% success rate in object goal navigation on the $\mathrm{CHORES}$-$\mathbb{S}$ benchmark, setting a new state-of-the-art performance. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: 14 pages, 7 figures

arXiv:2503.17673 [pdf, other]

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Authors: Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, Xin Fan

Abstract: Infrared and visible image fusion integrates information from distinct spectral bands to enhance image quality by leveraging the strengths and mitigating the limitations of each modality. Existing approaches typically treat image fusion and subsequent high-level tasks as separate processes, resulting in fused images that offer only marginal gains in task performance and fail to provide constructiv… ▽ More Infrared and visible image fusion integrates information from distinct spectral bands to enhance image quality by leveraging the strengths and mitigating the limitations of each modality. Existing approaches typically treat image fusion and subsequent high-level tasks as separate processes, resulting in fused images that offer only marginal gains in task performance and fail to provide constructive feedback for optimizing the fusion process. To overcome these limitations, we propose a Discriminative Cross-Dimension Evolutionary Learning Framework, termed DCEvo, which simultaneously enhances visual quality and perception accuracy. Leveraging the robust search capabilities of Evolutionary Learning, our approach formulates the optimization of dual tasks as a multi-objective problem by employing an Evolutionary Algorithm (EA) to dynamically balance loss function parameters. Inspired by visual neuroscience, we integrate a Discriminative Enhancer (DE) within both the encoder and decoder, enabling the effective learning of complementary features from different modalities. Additionally, our Cross-Dimensional Embedding (CDE) block facilitates mutual enhancement between high-dimensional task features and low-dimensional fusion features, ensuring a cohesive and efficient feature integration process. Experimental results on three benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches, achieving an average improvement of 9.32% in visual quality while also enhancing subsequent high-level tasks. The code is available at https://github.com/Beate-Suy-Zhang/DCEvo. △ Less

Submitted 22 March, 2025; originally announced March 2025.

Comments: Accepted by CVPR 2025

MSC Class: 68T45 ACM Class: I.4.3

arXiv:2503.17641 [pdf, other]

InstructVEdit: A Holistic Approach for Instructional Video Editing

Authors: Chi Zhang, Chengjian Feng, Feng Yan, Qiming Zhang, Mingjin Zhang, Yujie Zhong, Jing Zhang, Lin Ma

Abstract: Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the systematic exploration of model architectures and training strategies. While prior work has improved specific aspects of video editing (e.g., synthesizing a vid… ▽ More Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the systematic exploration of model architectures and training strategies. While prior work has improved specific aspects of video editing (e.g., synthesizing a video dataset using image editing techniques or decomposed video editing training), a holistic framework addressing the above challenges remains underexplored. In this study, we introduce InstructVEdit, a full-cycle instructional video editing approach that: (1) establishes a reliable dataset curation workflow to initialize training, (2) incorporates two model architectural improvements to enhance edit quality while preserving temporal consistency, and (3) proposes an iterative refinement strategy leveraging real-world data to enhance generalization and minimize train-test discrepancies. Extensive experiments show that InstructVEdit achieves state-of-the-art performance in instruction-based video editing, demonstrating robust adaptability to diverse real-world scenarios. Project page: https://o937-blip.github.io/InstructVEdit. △ Less

Submitted 22 March, 2025; originally announced March 2025.

Comments: https://o937-blip.github.io/InstructVEdit

arXiv:2503.17165 [pdf, other]

Stringent test of $CP$ symmetry in $Σ^+$ hyperon decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

Abstract: The non-leptonic two-body weak decays $Σ^{+} \to p π^{0}$ and $\barΣ^{-} \to \bar{p} π^{0}$ are investigated, utilizing $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events and $(2.7124\pm0.0143)\times10^{9}$ $ψ(3686)$ events collected by BESIII experiment. The precision of the weak-decay parameters for the decays $Σ^{+} \to p π^{0}$ ($α_{0}$) and $\barΣ^{-} \to \bar{p} π^{0}$ ($\barα_{0}$) is improved b… ▽ More The non-leptonic two-body weak decays $Σ^{+} \to p π^{0}$ and $\barΣ^{-} \to \bar{p} π^{0}$ are investigated, utilizing $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events and $(2.7124\pm0.0143)\times10^{9}$ $ψ(3686)$ events collected by BESIII experiment. The precision of the weak-decay parameters for the decays $Σ^{+} \to p π^{0}$ ($α_{0}$) and $\barΣ^{-} \to \bar{p} π^{0}$ ($\barα_{0}$) is improved by a factor of three compared to the previous world average. Furthermore, the quantum-entangled $Σ^{+}\barΣ^{-}$ system enables the most precise test of $CP$ symmetry for the decay $Σ^+\to pπ^0$, through the asymmetry observable $A_{CP}=(α_{0}+\barα_{0})/(α_{0}-\barα_{0})$ that is measured to be $-0.0118\pm0.0083_{\rm stat}\pm0.0028_{\rm syst}$. Assuming $CP$ conservation, the average decay parameter is determined to be ${\left< α_{\rm 0}\right>} = (α_0-\barα_0)/2=-0.9869\pm0.0011_{\rm stat}\pm0.0016_{\rm syst}$, which is the most precise measurement of the asymmetry decay parameters in baryon sectors. The angular dependence of the ratio of the polarization of the $Σ^+$ in both $J/ψ$ and $ψ(3686)$ decays is studied for the first time. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.17014 [pdf]

Behavioral Conflict Avoidance Between Humans and Quadruped Robots in Shared Environments

Authors: Shuang Wei, Muhua Zhang, Yun Gan, Deqing Huang, Lei Ma, Chenguang Yang

Abstract: Nowadays, robots are increasingly operated in environments shared with humans, where conflicts between human and robot behaviors may compromise safety. This paper presents a proactive behavioral conflict avoidance framework based on the principle of adaptation to trends for quadruped robots that not only ensures the robot's safety but also minimizes interference with human activities. It can proac… ▽ More Nowadays, robots are increasingly operated in environments shared with humans, where conflicts between human and robot behaviors may compromise safety. This paper presents a proactive behavioral conflict avoidance framework based on the principle of adaptation to trends for quadruped robots that not only ensures the robot's safety but also minimizes interference with human activities. It can proactively avoid potential conflicts with approaching humans or other dynamic objects, whether the robot is stationary or in motion, then swiftly resume its tasks once the conflict subsides. An enhanced approach is proposed to achieve precise human detection and tracking on vibratory robot platform equipped with low-cost hybrid solid-state LiDAR. When potential conflict detected, the robot selects an avoidance point and executes an evasion maneuver before resuming its task. This approach contrasts with conventional methods that remain goal-driven, often resulting in aggressive behaviors, such as forcibly bypassing obstacles and causing conflicts or becoming stuck in deadlock scenarios. The selection of avoidance points is achieved by integrating static and dynamic obstacle to generate a potential field map. The robot then searches for feasible regions within this map and determines the optimal avoidance point using an evaluation function. Experimental results demonstrate that the framework significantly reduces interference with human activities, enhances the safety of both robots and persons. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: 7 pages, 9 figures. This work has been submitted to the IEEE for possible publication

arXiv:2503.17005 [pdf]

Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions

Authors: Muhua Zhang, Lei Ma, Ying Wu, Kai Shen, Yongkui Sun, Henry Leung

Abstract: This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion… ▽ More This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: 8 pages, 11 figures. This work has been submitted to the IEEE for possible publication

arXiv:2503.16070 [pdf, other]

Search for the radiative leptonic decay $D^+\toγe^+ν_e$ with Deep Learning

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

Abstract: Using 20.3$~\rm fb^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773$~\rm GeV$ with the BESIII detector, we report an improved search for the radiative leptonic decay $D^+\toγe^+ν_e$. An upper limit on its partial branching fraction for photon energies $E_γ>10~\rm MeV$ is determined to be $1.2\times10^{-5}$ at 90\% confidence level, which excludes most current theor… ▽ More Using 20.3$~\rm fb^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773$~\rm GeV$ with the BESIII detector, we report an improved search for the radiative leptonic decay $D^+\toγe^+ν_e$. An upper limit on its partial branching fraction for photon energies $E_γ>10~\rm MeV$ is determined to be $1.2\times10^{-5}$ at 90\% confidence level, which excludes most current theoretical predictions. A sophisticated deep learning approach with thorough validation, based on the Transformer architecture, is implemented to efficiently distinguish the signal from massive backgrounds. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 15 pages, 6 figures

arXiv:2503.15898 [pdf, other]

Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions

Authors: Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen, Lizhuang Ma, Yong-Lu Li

Abstract: Reconstructing human-object interactions (HOI) from single images is fundamental in computer vision. Existing methods are primarily trained and tested on indoor scenes due to the lack of 3D data, particularly constrained by the object variety, making it challenging to generalize to real-world scenes with a wide range of objects. The limitations of previous 3D HOI datasets were primarily due to the… ▽ More Reconstructing human-object interactions (HOI) from single images is fundamental in computer vision. Existing methods are primarily trained and tested on indoor scenes due to the lack of 3D data, particularly constrained by the object variety, making it challenging to generalize to real-world scenes with a wide range of objects. The limitations of previous 3D HOI datasets were primarily due to the difficulty in acquiring 3D object assets. However, with the development of 3D reconstruction from single images, recently it has become possible to reconstruct various objects from 2D HOI images. We therefore propose a pipeline for annotating fine-grained 3D humans, objects, and their interactions from single images. We annotated 2.5k+ 3D HOI assets from existing 2D HOI datasets and built the first open-vocabulary in-the-wild 3D HOI dataset Open3DHOI, to serve as a future test set. Moreover, we design a novel Gaussian-HOI optimizer, which efficiently reconstructs the spatial interactions between humans and objects while learning the contact regions. Besides the 3D HOI reconstruction, we also propose several new tasks for 3D HOI understanding to pave the way for future work. Data and code will be publicly available at https://wenboran2002.github.io/3dhoi. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: Accepted to CVPR 2025

arXiv:2503.15082 [pdf, other]

StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion

Authors: Le Ma, Ziyu Meng, Tengyu Liu, Yuhan Li, Ran Song, Wei Zhang, Siyuan Huang

Abstract: Humanoid robots are anticipated to acquire a wide range of locomotion capabilities while ensuring natural movement across varying speeds and terrains. Existing methods encounter a fundamental dilemma in learning humanoid locomotion: reinforcement learning with handcrafted rewards can achieve agile locomotion but produces unnatural gaits, while Generative Adversarial Imitation Learning (GAIL) with… ▽ More Humanoid robots are anticipated to acquire a wide range of locomotion capabilities while ensuring natural movement across varying speeds and terrains. Existing methods encounter a fundamental dilemma in learning humanoid locomotion: reinforcement learning with handcrafted rewards can achieve agile locomotion but produces unnatural gaits, while Generative Adversarial Imitation Learning (GAIL) with motion capture data yields natural movements but suffers from unstable training processes and restricted agility. Integrating these approaches proves challenging due to the inherent heterogeneity between expert policies and human motion datasets. To address this, we introduce StyleLoco, a novel two-stage framework that bridges this gap through a Generative Adversarial Distillation (GAD) process. Our framework begins by training a teacher policy using reinforcement learning to achieve agile and dynamic locomotion. It then employs a multi-discriminator architecture, where distinct discriminators concurrently extract skills from both the teacher policy and motion capture data. This approach effectively combines the agility of reinforcement learning with the natural fluidity of human-like movements while mitigating the instability issues commonly associated with adversarial training. Through extensive simulation and real-world experiments, we demonstrate that StyleLoco enables humanoid robots to perform diverse locomotion tasks with the precision of expertly trained policies and the natural aesthetics of human motion, successfully transferring styles across different movement types while maintaining stable locomotion across a broad spectrum of command inputs. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 9 pages, 4 figures

arXiv:2503.14843 [pdf]

High-rate continuous-variable quantum key distribution over 100 km fiber with composable security

Authors: Heng Wang, Yang Li, Ting Ye, Li Ma, Yan Pan, Mingze Wu, Junhui Li, Yiming Bian, Yaodi Pi, Yun Shao, Jie Yang, Jinlu Liu, Ao Sun, Wei Huang, Stefano Pirandola, Yichen Zhang, Bingjie Xu

Abstract: Quantum key distribution (QKD), providing a way to generate secret keys with information-theoretic security,is arguably one of the most significant achievements in quantum information. The continuous-variable QKD (CV-QKD) offers the potential advantage of achieving a higher secret key rate (SKR) within a metro area, as well as being compatible with the mature telecom industry. However, the SKR and… ▽ More Quantum key distribution (QKD), providing a way to generate secret keys with information-theoretic security,is arguably one of the most significant achievements in quantum information. The continuous-variable QKD (CV-QKD) offers the potential advantage of achieving a higher secret key rate (SKR) within a metro area, as well as being compatible with the mature telecom industry. However, the SKR and transmission distance of state-of-the-art CV-QKD systems are currently limited. Here, based on the novelly proposed orthogonal-frequency-division-multiplexing (OFDM) CV-QKD protocol, we demonstrate for the first time a high-rate multi-carrier (MC) CV-QKD with a 10 GHz symbol rate that chieves Gbps SKR within 10km and Mbps SKR over 100 km in the finite-size regime under composable security against collective attacks. The record-breaking results are achieved by suitable optimization of subcarrier number and modulation variance, well-controlled excess noise induced by both OFDM mechanism and efficient DSP scheme, and high-performance post-processing capacity realized by heterogeneous computing scheme. The composable finite-size SKR reaches 1779.45 Mbps@5km, 1025.49 Mbps@10km, 370.50 Mbps@25km, 99.93 Mbps@50km, 25.70 Mbps@75km,and 2.25 Mbps@100km, which improves the SKR by two orders of magnitude and quintuples the maximal transmission distance compared to most recently reported CV-QKD results [Nature Communications, 13, 4740 (2022)]. Interestingly, it is experimentally verified that the SKR of the proposed MC CV-QKD can approach five times larger than that of the single-carrier CV-QKD with the same symbol rate without additional hardware costs. Our work constitutes a critical step towards future high-speed quantum metropolitan and access networks. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.14485 [pdf, other]

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Authors: Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Levent Taşel, Ning Yu, Vishal M. Patel, Paul Debevec

Abstract: Video portrait relighting remains challenging because the results need to be both photorealistic and temporally stable. This typically requires a strong model design that can capture complex facial reflections as well as intensive training on a high-quality paired video dataset, such as dynamic one-light-at-a-time (OLAT). In this work, we introduce Lux Post Facto, a novel portrait video relighting… ▽ More Video portrait relighting remains challenging because the results need to be both photorealistic and temporally stable. This typically requires a strong model design that can capture complex facial reflections as well as intensive training on a high-quality paired video dataset, such as dynamic one-light-at-a-time (OLAT). In this work, we introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. From the model side, we design a new conditional video diffusion model built upon state-of-the-art pre-trained video diffusion model, alongside a new lighting injection mechanism to enable precise control. This way we leverage strong spatial and temporal generative capability to generate plausible solutions to the ill-posed relighting problem. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling. This avoids the need to acquire paired video data in different lighting conditions. Our extensive experiments show that our model produces state-of-the-art results both in terms of photorealism and temporal consistency. △ Less

Submitted 1 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

Comments: CVPR 2025

arXiv:2503.14478 [pdf, other]

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Authors: Xinyu Fang, Zhijian Chen, Kai Lan, Lixin Ma, Shengyuan Ding, Yingji Liang, Xiangyu Zhao, Farong Wen, Zicheng Zhang, Guofeng Zhang, Haodong Duan, Kai Chen, Dahua Lin

Abstract: Creativity is a fundamental aspect of intelligence, involving the ability to generate novel and appropriate solutions across diverse contexts. While Large Language Models (LLMs) have been extensively evaluated for their creative capabilities, the assessment of Multimodal Large Language Models (MLLMs) in this domain remains largely unexplored. To address this gap, we introduce Creation-MMBench, a m… ▽ More Creativity is a fundamental aspect of intelligence, involving the ability to generate novel and appropriate solutions across diverse contexts. While Large Language Models (LLMs) have been extensively evaluated for their creative capabilities, the assessment of Multimodal Large Language Models (MLLMs) in this domain remains largely unexplored. To address this gap, we introduce Creation-MMBench, a multimodal benchmark specifically designed to evaluate the creative capabilities of MLLMs in real-world, image-based tasks. The benchmark comprises 765 test cases spanning 51 fine-grained tasks. To ensure rigorous evaluation, we define instance-specific evaluation criteria for each test case, guiding the assessment of both general response quality and factual consistency with visual inputs. Experimental results reveal that current open-source MLLMs significantly underperform compared to proprietary models in creative tasks. Furthermore, our analysis demonstrates that visual fine-tuning can negatively impact the base LLM's creative abilities. Creation-MMBench provides valuable insights for advancing MLLM creativity and establishes a foundation for future improvements in multimodal generative intelligence. Full data and evaluation code is released on https://github.com/open-compass/Creation-MMBench. △ Less

Submitted 19 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

Comments: Evaluation Code and dataset see https://github.com/open-compass/Creation-MMBench

arXiv:2503.14168 [pdf]

What elements should we focus when designing immersive virtual nature? A preliminary user study

Authors: Lin Ma, Qiyuan An, Jing Chen, Xinggang Hou, Yuan Feng, Dengkai Chen

Abstract: Extensive research has confirmed the positive relationship between exposure to natural environments and human cognitive, behavioral, physical, and mental health. However, only some have easy access to nature. With electronic information and simulation technology advancements, digital nature experiences are widely used across various devices and scenarios. It is essential to explore how to effectiv… ▽ More Extensive research has confirmed the positive relationship between exposure to natural environments and human cognitive, behavioral, physical, and mental health. However, only some have easy access to nature. With electronic information and simulation technology advancements, digital nature experiences are widely used across various devices and scenarios. It is essential to explore how to effectively select and utilize natural elements to guide the design of digital nature scenes. This paper examines critical elements in immersive virtual nature (IVN) and their impact on user perception. Through online surveys and design experiments, we identified specific natural elements that promote relaxation and proposed design strategies for virtual environments. We developed several immersive virtual nature scenes for further validation. Finally, we outline our future experimental plans and research directions in digital nature. Our research aims to provide HCI designers insights into creating restorative, immersive virtual scenes. △ Less

Submitted 28 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.14058 [pdf, other]

Strongly regular generalized partial geometries and associated LDPC codes

Authors: Lijun Ma, Changli Ma, Zihong Tian

Abstract: In this paper, we introduce strongly regular generalized partial geometries of grade $r$, which generalise partial geometries and strongly regular $(α,β)$-geometries. By the properties of quadrics in PG$(2,q)$ and PG$(3,q)$, we construct two classes of strongly regular generalized partial geometries of grade $3$. Besides, we define low-density parity-check (LDPC) codes by considering the combinato… ▽ More In this paper, we introduce strongly regular generalized partial geometries of grade $r$, which generalise partial geometries and strongly regular $(α,β)$-geometries. By the properties of quadrics in PG$(2,q)$ and PG$(3,q)$, we construct two classes of strongly regular generalized partial geometries of grade $3$. Besides, we define low-density parity-check (LDPC) codes by considering the combinatorial structures of strongly regular generalized partial geometries and derive bounds on minimum distance, dimension and girth for the LDPC codes. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.14017 [pdf]

Flexible manipulation of chiral spin state by chemical bond in Mn triangular lattice magnet

Authors: Jiyuan Xu, Xin Liu, Li Ma, Guoke Li, Dewei Zhao, Congmian Zhen, Denglu Hou

Abstract: This study investigates the influence of chemical bonds on the magnetic structure of materials, a less explored area compared to their effect on crystal stability. By analyzing the strength and directionality of chemical bonds using the electron localization function (ELF) and charge density difference (CDD) methods, we examine their impact on magnetic exchange interactions and magnetocrystalline… ▽ More This study investigates the influence of chemical bonds on the magnetic structure of materials, a less explored area compared to their effect on crystal stability. By analyzing the strength and directionality of chemical bonds using the electron localization function (ELF) and charge density difference (CDD) methods, we examine their impact on magnetic exchange interactions and magnetocrystalline anisotropy under specific interstitial conditions in Mn4X compounds. Our findings indicate that these properties can effectively modulate the magnetic ground state. This work not only elucidates the varied magnetism observed in Mn triangular lattice magnets but also proposes an approach for engineering chiral spin states through chemical bonding manipulation. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.12311 [pdf]

Subwavelength plasmonic antennas based on asymmetric split-ring-resonators for high near-field enhancements

Authors: Yue You, Xiao-Jing Du, Lin Ma, Hua Qiu, Jun He, Zhong-Jian Yang

Abstract: As for plasmonic antenna structures that generate localized near-field enhancement, the most effective current implementations are based on electric dipole resonance modes, but this approach also imposes limitations on their further optimization. Here we introduce an ASRR structure whose ASR mode enables differential charge distribution across both sides of the split. Through asymmetric regulation… ▽ More As for plasmonic antenna structures that generate localized near-field enhancement, the most effective current implementations are based on electric dipole resonance modes, but this approach also imposes limitations on their further optimization. Here we introduce an ASRR structure whose ASR mode enables differential charge distribution across both sides of the split. Through asymmetric regulation, charges at one end can become highly localized, thereby achieving efficient near-field enhancement. The formation of this structure was initially driven by a hybrid computational framework integrating evolutionary optimization with residual neural networks, and subsequently simplified into an ASRR prototype using the Occam's Razor principle. The ASRR dimer structure can achieve an electric field intensity enhancement over 6.5 times larger than a traditional nanorod dimer, while maintaining a compact size (<1/3 the working wavelength). The ASRR configuration also demonstrates superior Purcell factor and fluorescence enhancement. These results can find applications in surface-enhanced spectroscopy, nonlinear optics, and quantum light-matter interactions. △ Less

Submitted 15 March, 2025; originally announced March 2025.

arXiv:2503.12035 [pdf, other]

MOS: Modeling Object-Scene Associations in Generalized Category Discovery

Authors: Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, Lizhuang Ma

Abstract: Generalized Category Discovery (GCD) is a classification task that aims to classify both base and novel classes in unlabeled images, using knowledge from a labeled dataset. In GCD, previous research overlooks scene information or treats it as noise, reducing its impact during model training. However, in this paper, we argue that scene information should be viewed as a strong prior for inferring no… ▽ More Generalized Category Discovery (GCD) is a classification task that aims to classify both base and novel classes in unlabeled images, using knowledge from a labeled dataset. In GCD, previous research overlooks scene information or treats it as noise, reducing its impact during model training. However, in this paper, we argue that scene information should be viewed as a strong prior for inferring novel classes. We attribute the misinterpretation of scene information to a key factor: the Ambiguity Challenge inherent in GCD. Specifically, novel objects in base scenes might be wrongly classified into base categories, while base objects in novel scenes might be mistakenly recognized as novel categories. Once the ambiguity challenge is addressed, scene information can reach its full potential, significantly enhancing the performance of GCD models. To more effectively leverage scene information, we propose the Modeling Object-Scene Associations (MOS) framework, which utilizes a simple MLP-based scene-awareness module to enhance GCD performance. It achieves an exceptional average accuracy improvement of 4% on the challenging fine-grained datasets compared to state-of-the-art methods, emphasizing its superior performance in fine-grained GCD. The code is publicly available at https://github.com/JethroPeng/MOS △ Less

Submitted 17 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

Comments: Accepted to CVPR 2025.The code is available at https://github.com/JethroPeng/MOS

arXiv:2503.11399 [pdf]

Nonreciprocal quantum photon-pair source with chiral ferroelectric nematics

Authors: Jin-Tao Pan, Yun-Kun Wu, Ling-Ling Ma, Ning Wang, Xin-Yu Tao, Bo-Han Zhu, Shu Wang, Fang-Wen Sun, Guang-Can Guo, Hui Jing, Xi-Feng Ren, Yan-Qing Lu

Abstract: Quantum nonreciprocity-a fundamental phenomenon enabling directional control of quantum states and photon correlations-has long been recognized as pivotal for quantum technologies. However, the experimental realization of nonreciprocal quantum photon-pair generation, as a critical prerequisite for advancing quantum systems, continues to be an outstanding challenge that remains unaddressed in pract… ▽ More Quantum nonreciprocity-a fundamental phenomenon enabling directional control of quantum states and photon correlations-has long been recognized as pivotal for quantum technologies. However, the experimental realization of nonreciprocal quantum photon-pair generation, as a critical prerequisite for advancing quantum systems, continues to be an outstanding challenge that remains unaddressed in practice. Here, we experimentally implement a highly-efficient nonreciprocal quantum photon source in a micro/nano-scale helical structured nonlinear optical fluid. Intriguing helical quasi-phase matching is achieved by deliberately engineering the pitch of the chiral ferroelectric structure, thus enabling spontaneous parametric down-conversion with record-high brightness (5,801.6 Hz*mW-1, 10,071% enhancement over phase-mismatched systems) and high coincidence-to-accidental ratio, rivaling state-of-the-art centimeter-scale nonlinear crystals. In particular, by tailoring the ferroelectric helix structure with orthogonally aligned head and tail polarization vectors, we demonstrate up to 22.6 dB isolation in biphoton generation coupled with nonreciprocal quantum polarization states, while maintaining classical optical reciprocity. This quantum liquid-crystal-based platform, combining flexible tunability and superior performance of purely quantum nonreciprocity at micro/nano scales, builds a bridge between a wide range of soft-matter systems, nonreciprocal physics, and emerging quantum photonic technologies. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 13 pages,4 figures

arXiv:2503.11383 [pdf, other]

Study of $φ\to K\bar{K}$ and $K_{S}^{0}-K_{L}^{0}$ asymmetry in the amplitude analysis of $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (701 additional authors not shown)

Abstract: Using $e^+e^-$ annihilation data corresponding to a total integrated luminosity of 7.33 $\rm fb^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we provide the first amplitude analysis and absolute branching fraction measurement of the hadronic decay $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$. The branching fraction of… ▽ More Using $e^+e^-$ annihilation data corresponding to a total integrated luminosity of 7.33 $\rm fb^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we provide the first amplitude analysis and absolute branching fraction measurement of the hadronic decay $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$. The branching fraction of $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$ is determined to be $(1.86\pm0.06_{\rm stat}\pm0.03_{\rm syst})\%$. Combining the $\mathcal{B}(D_{s}^{+} \to φ(\to K_{S}^0K_{L}^0) π^+)$ obtained in this work and the world average of $\mathcal{B}(D_{s}^{+} \to φ(\to K^+K^-) π^+)$, we measure the relative branching fraction $\mathcal{B}(φ\to K_S^0K_L^0)/\mathcal{B}(φ\to K^+K^-)$=($0.597 \pm 0.023_{\rm stat} \pm 0.018_{\rm syst} \pm 0.016_{\rm PDG}$), which deviates from the PDG value by more than 3$σ$. Furthermore, the asymmetry of the branching fractions of $D^+_s\to K_{S}^0K^{*}(892)^{+}$ and $D^+_s\to K_{L}^0K^{*}(892)^{+}$, $\frac{\mathcal{B}(D_{s}^{+} \to K_{S}^0K^{*}(892)^{+})-\mathcal{B}(D_{s}^{+} \to K_{L}^0K^{*}(892)^{+})}{\mathcal{B}(D_{s}^{+} \to K_{S}^0K^{*}(892)^{+})+\mathcal{B}(D_{s}^{+} \to K_{L}^0K^{*}(892)^{+})}$, is determined to be $(-13.4\pm5.0_{\rm stat}\pm3.4_{\rm syst})\%$. △ Less

Submitted 23 March, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

Comments: 11 pages, 4 figures

arXiv:2503.11067 [pdf, other]

Variational Bayesian Personalized Ranking

Authors: Bin Liu, Xiaohong Liu, Qin Luo, Ziqiao Shang, Jielei Chu, Lin Ma, Zhaoyu Li, Fei Teng, Guangtao Zhai, Tianrui Li

Abstract: Recommendation systems have found extensive applications across diverse domains. However, the training data available typically comprises implicit feedback, manifested as user clicks and purchase behaviors, rather than explicit declarations of user preferences. This type of training data presents three main challenges for accurate ranking prediction: First, the unobservable nature of user preferen… ▽ More Recommendation systems have found extensive applications across diverse domains. However, the training data available typically comprises implicit feedback, manifested as user clicks and purchase behaviors, rather than explicit declarations of user preferences. This type of training data presents three main challenges for accurate ranking prediction: First, the unobservable nature of user preferences makes likelihood function modeling inherently difficult. Second, the resulting false positives (FP) and false negatives (FN) introduce noise into the learning process, disrupting parameter learning. Third, data bias arises as observed interactions tend to concentrate on a few popular items, exacerbating the feedback loop of popularity bias. To address these issues, we propose Variational BPR, a novel and easily implementable learning objective that integrates key components for enhancing collaborative filtering: likelihood optimization, noise reduction, and popularity debiasing. Our approach involves decomposing the pairwise loss under the ELBO-KL framework and deriving its variational lower bound to establish a manageable learning objective for approximate inference. Within this bound, we introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples. The process of deriving interest prototypes implicitly incorporates a flexible hard sample mining strategy, capable of simultaneously identifying hard positive and hard negative samples. Furthermore, we demonstrate that this hard sample mining strategy promotes feature distribution uniformity, thereby alleviating popularity bias. Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models. The code and data are available at: https://github.com/liubin06/VariationalBPR △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 15 pages

arXiv:2503.11015 [pdf, other]

Search for a $1^{-+}$ molecular state via $e^{+}e^{-} \to γD^{+}_{s} D_{s1}^{-}(2536) +c.c.$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (649 additional authors not shown)

Abstract: We search, for the first time, for an exotic molecular state with quantum numbers $J^{PC}=1^{-+}$, called $X$, via the process $e^{+}e^{-} \to γD^{+}_{s} D_{s1}^{-}(2536) +c.c.$ using data samples corresponding to a luminosity of $5.8~\mathrm{fb^{-1}}$ across center-of-mass energies from 4.612 to 4.951~GeV, collected with the BESIII detector operating at the BEPCII collider. No statistically signi… ▽ More We search, for the first time, for an exotic molecular state with quantum numbers $J^{PC}=1^{-+}$, called $X$, via the process $e^{+}e^{-} \to γD^{+}_{s} D_{s1}^{-}(2536) +c.c.$ using data samples corresponding to a luminosity of $5.8~\mathrm{fb^{-1}}$ across center-of-mass energies from 4.612 to 4.951~GeV, collected with the BESIII detector operating at the BEPCII collider. No statistically significant signal is observed. The upper limits on the product of cross-section and branching fraction $σ({e^{+}e^{-} \to γX}) \times \mathcal{B}(X \to D^{+}_{s} D_{s1}^{-}(2536) +c.c.)$ at 90\% confidence level are reported for each energy point, assuming the $X$ mass to be 4.503~GeV/$c^{2}$ and the width 25, 50, 75, and 100~MeV, respectively. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: 13 pages,5 figures

arXiv:2503.10763 [pdf, other]

Symmetry classification correspondence between quadratic Lindbladians and their steady states

Authors: Liang Mao, Fan Yang

Abstract: Symmetry classification is crucial in understanding universal properties of quantum matter. Recently, the scope of symmetry classification has been extended to open quantum systems governed by the Lindblad master equation. However, the classification of Lindbladians and steady states remains largely separate. Because the former requires the non-Hermitian classification framework, while the latter… ▽ More Symmetry classification is crucial in understanding universal properties of quantum matter. Recently, the scope of symmetry classification has been extended to open quantum systems governed by the Lindblad master equation. However, the classification of Lindbladians and steady states remains largely separate. Because the former requires the non-Hermitian classification framework, while the latter relies on the classification scheme for Hermitian matrices. In this paper we build connections between symmetry classes of quadratic Lindbladian and its steady state, despite their different classification frameworks. We classify the full matrix representation of generic quadratic Lindbladians with particle conservation, showing they fall into 27 non-Hermitian symmetry classes. Among these, 22 classes lead to an infinite-temperature steady state. The remaining five classes have one-to-one correspondence with five steady-state Hermitian symmetry classes. Numerical simulations of random Lindbladian dynamics confirm the convergence to the correct steady-state symmetry classes at long time. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: 8 pages, 2 figures

arXiv:2503.10497 [pdf, other]

MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

Authors: Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu , et al. (7 additional authors not shown)

Abstract: Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abilities. This dual limitation makes it challenging to comprehensively assess LLMs' performance in the multilingual setting. To fill this gap, we introduce MMLU-ProX, a comprehensive benchmark covering 29… ▽ More Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abilities. This dual limitation makes it challenging to comprehensively assess LLMs' performance in the multilingual setting. To fill this gap, we introduce MMLU-ProX, a comprehensive benchmark covering 29 languages, built on an English benchmark. Each language version consists of 11,829 identical questions, enabling direct cross-linguistic comparisons. Additionally, to meet efficient evaluation needs, we provide a lite version containing 658 questions per language. To ensure the high quality of MMLU-ProX, we employ a rigorous development process that involves multiple powerful LLMs for translation, followed by expert review to ensure accurate expression, consistent terminology, and cultural relevance. Building on this, we systematically evaluate 36 state-of-the-art LLMs, including reasoning-enhanced and multilingual-optimized LLMs. The results reveal significant disparities in the multilingual capabilities of LLMs: While they perform well in high-resource languages, their performance declines markedly in low-resource languages, with gaps of up to 24.3%. Through MMLU-ProX, we aim to advance the development of more inclusive AI systems and promote equitable access to technology across global contexts. △ Less

Submitted 26 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

arXiv:2503.09278 [pdf, other]

Graph-Dynamics correspondence in metallic glass-forming liquids

Authors: Xin-Jia Zhou, Feng Yang, Xiao-Dong Yang, Lin Ma, Zhen-Wei Wu

Abstract: Theoretical challenges in understanding the nature of glass and the glass transition remain significant open questions in statistical and condensed matter physics. As a prototypical example of complex physical systems, glasses and the vitrification process have been central research topics, consistently attracting broad scientific interest. This focus has driven extensive studies on phenomena such… ▽ More Theoretical challenges in understanding the nature of glass and the glass transition remain significant open questions in statistical and condensed matter physics. As a prototypical example of complex physical systems, glasses and the vitrification process have been central research topics, consistently attracting broad scientific interest. This focus has driven extensive studies on phenomena such as aging, non-exponential relaxation, dynamic anomalies, glass-forming ability, and the mechanical response of glasses under stress. Recent advances in computational and experimental techniques have enabled rigorous testing of theoretical models, shedding new light on glassy behavior. However, the intrinsic complexity of glass and the glass transition that lies in their physics, which spans multiple length and time scales, makes the system challenging to characterize. In this review, we emphasize the need to move beyond conventional approaches and propose a topological perspective as a promising alternative to address these challenges. Specifically, our findings reveal that the diversity in particle relaxation behavior is statistically linked to a global topological feature of the transient network structures formed by the particles in a given liquid. This direction offers opportunities to uncover novel phenomena that could fundamentally reshape our understanding of glassy materials. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Comments: 16 pages, 15 figures, Topical Review, Commun. Theor. Phys. accepted

arXiv:2503.09263 [pdf, other]

COLA: A Scalable Multi-Agent Framework For Windows UI Task Automation

Authors: Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv

Abstract: With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating Windows GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent archi… ▽ More With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating Windows GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures fail to dynamically adapt to the heterogeneous requirements of OS-level tasks, leading to inadequate scenario generalization;(2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for UI agent decision error. To address these limitations, we introduce \textit{COLA}, a collaborative multi-agent framework for automating Windows UI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. The decision agent pool supports plug-and-play expansion for enhanced flexibility. In addition, we design a memory unit equipped to all agents for their self-evolution. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Our experimental results on the GAIA benchmark demonstrates that the \textit{COLA} framework achieves state-of-the-art performance with an average score of 31.89\%, significantly outperforming baseline approaches without web API integration. Ablation studies further validate the individual contributions of our dynamic scheduling. The code is available at https://github.com/Alokia/COLA-demo. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Showing 151–200 of 3,096 results for author: Ma, L