Search | arXiv e-print repository

arXiv:2507.05094 [pdf, ps, other]

Observation of the decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$

Authors: Belle, Belle II Collaborations, :, M. Abumusabh, I. Adachi, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati , et al. (364 additional authors not shown)

Abstract: We report the first observation of the two-body baryonic decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$ with significances of $7.3\,σ$ and $6.2\,σ$, respectively, including statistical and systematic uncertainties. The branching fractions are measured to be… ▽ More We report the first observation of the two-body baryonic decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$ with significances of $7.3\,σ$ and $6.2\,σ$, respectively, including statistical and systematic uncertainties. The branching fractions are measured to be $\mathcal{B}(B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}) = (5.74 \pm 1.11 \pm 0.42_{-1.53}^{+2.47}) \times 10^{-4}$ and $\mathcal{B}(B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}) = (4.83 \pm 1.12 \pm 0.37_{-0.60}^{+0.72}) \times 10^{-4}$. The first and second uncertainties are statistical and systematic, respectively, while the third ones arise from the absolute branching fractions of $\overlineΞ_{c}^{-}$ or $\overlineΞ_{c}^{0}$ decays. The data samples used for this analysis have integrated luminosities of 711~$\mathrm{fb}^{-1}$ and 365~$\mathrm{fb}^{-1}$, and were collected at the $Υ(4S)$ resonance by the Belle and Belle~II detectors operating at the KEKB and SuperKEKB asymmetric-energy $e^{+}e^{-}$ colliders, respectively. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Report number: Belle II Preprint 2025-019, KEK Preprint 2025-18

arXiv:2507.05050 [pdf, ps, other]

Measurement of the $ D^{0}\rightarrow K^{-}π^{+}e^{+}e^{-} $ branching fraction and search for $ D^{0}\rightarrow π^{+}π^{-}e^{+}e^{-} $ and $D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $ decays at Belle

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae , et al. (458 additional authors not shown)

Abstract: We present a study of the rare charm meson decays $ D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $, $ π^{+}π^{-}e^{+}e^{-} $, and $ K^{-}π^{+}e^{+}e^{-} $ using a 942 fb$^{-1}$ data set collected by the Belle detector at the KEKB asymmetric-energy $ e^{+}e^{-} $ collider. We use $ D^{0} $ candidates identified by the charge of the pion in $ D^{*} \rightarrow D^{0} π$ decays and normalize the branching fr… ▽ More We present a study of the rare charm meson decays $ D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $, $ π^{+}π^{-}e^{+}e^{-} $, and $ K^{-}π^{+}e^{+}e^{-} $ using a 942 fb$^{-1}$ data set collected by the Belle detector at the KEKB asymmetric-energy $ e^{+}e^{-} $ collider. We use $ D^{0} $ candidates identified by the charge of the pion in $ D^{*} \rightarrow D^{0} π$ decays and normalize the branching fractions to $ D^{0} \rightarrow K^{-}π^{+}π^{-}π^{+} $ decays. The branching fraction for decay $ D^{0} \rightarrow K^{-}π^{+}e^{+}e^{-} $ is measured to be (39.6 $\pm$ 4.5 (stat) $\pm$ 2.9 (syst)) $\times$ $10^{-7}$, with the dielectron mass in the $ ρ/ω$ mass region $ 675 < m_{ee} < 875 $ MeV$/c^{2}$. We also search for $ D^{0}\rightarrow h^{-} h^{(\prime)+}e^{+}e^{-} $ ($ h^{(\prime)}=K,\,π$) decays with the dielectron mass near the $η$ and $φ$ resonances, and away from these resonances for the $ K^{+}K^{-}e^{+}e^{-} $ and $ π^{+}π^{-}e^{+}e^{-} $ modes. For these modes, we find no significant signals and set 90$\%$ confidence level upper limits on their branching fractions at the $\mathcal{O}$(10$^{-7}$) level. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Report number: Belle II Preprint 2025-020; KEK Preprint 2025-19

arXiv:2507.04896 [pdf, ps, other]

Cross sections of $η$ mesons in $p$$+$$p$ collisions at forward rapidity at $\sqrt{s}=500$ GeV and central rapidity at $\sqrt{s}=510$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, M. Alfred, D. Anderson, K. R. Andrews, A. Angerami, S. Antsupov, K. Aoki, N. Apadula, E. Appelt, Y. Aramaki, R. Armendariz, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun , et al. (476 additional authors not shown)

Abstract: We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross sectio… ▽ More We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross section is measured from 3.5 to 44 GeV/$c$ for pseudorapidity $|η|<0.35$. Both cross sections serve as critical inputs to an updated global analysis of the $η$-meson fragmentation functions. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 500 authors from 81 institutions, 14 pages, 7 figures, 3 tables. v1 is version submitted to Physical Review D. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2507.04630 [pdf, ps, other]

Learn 3D VQA Better with Active Selection and Reannotation

Authors: Shengli Zhou, Yang Liu, Feng Zheng

Abstract: 3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scen… ▽ More 3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scene data enlarges the negative effect of misleading annotations. Although active learning strategies can select valuable instances for training, they fail to identify and resolve misleading labels, which the oracle inevitably provides in practice. To address this issue, we propose a multi-turn interactive active learning strategy. This strategy selects data based on models' semantic uncertainty to form a solid knowledge foundation more effectively and actively requests reannotation from an oracle to resolve potentially misleading labels. For uncertainty assessment, we utilize a variance-based metric that takes semantic relationships between terms into consideration, thus avoiding the uniform inter-class similarity assumption of previous assessment metrics. Extensive experiments exhibit better model performance and a substantial reduction in training costs, with a halving of training costs for achieving relatively high accuracy. The code is available at https://github.com/fz-zsl/AQuA. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: Accepted by ACM MM 2025

arXiv:2507.04463 [pdf, ps, other]

Low-mass vector-meson production at forward rapidity in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, D. Anderson, V. Andrieux, S. Antsupov, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, E. Bannikov, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont , et al. (331 additional authors not shown)

Abstract: The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nuc… ▽ More The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nucleons, $\langle N_{\rm part}\rangle$, and the transverse momentum $p_T$. These results were compared with those obtained via the kaon decay channel in a similar $p_T$ range at midrapidity. The nuclear-modification factors in both rapidity regions are consistent within the uncertainties. A comparison of the $ω+ρ$ and $J/ψ$ mesons reveals that the light and heavy flavors are consistently suppressed across both $p_T$ and ${\langle}N_{\rm part}\rangle$. In contrast, the $φ$ meson displays a nuclear-modification factor consistent with unity, suggesting strangeness enhancement in the medium formed. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: 356 authors from 71 institutions, 14 pages, 14 figures, 1 table. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2507.04289 [pdf, ps, other]

M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding

Authors: Shenxi Liu, Kan Li, Mingyang Zhao, Yuhang Tian, Bin Li, Shoujun Zhou, Hongliang Li, Fuxia Yang

Abstract: With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2)… ▽ More With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2) Shallow Reasoning: their questions are often designed for surface-level information retrieval, failing to properly assess deep multi-modal integration. To address these limitations, we present M3-Med, the first benchmark for Multi-lingual, Multi-modal, and Multi-hop reasoning in Medical instructional video understanding. M3-Med consists of medical questions paired with corresponding video segments, annotated by a team of medical experts. A key innovation of M3-Med is its multi-hop reasoning task, which requires a model to first locate a key entity in the text, then find corresponding visual evidence in the video, and finally synthesize information across both modalities to derive the answer. This design moves beyond simple text matching and poses a substantial challenge to a model's deep cross-modal understanding capabilities. We define two tasks: Temporal Answer Grounding in Single Video (TAGSV) and Temporal Answer Grounding in Video Corpus (TAGVC). We evaluated several state-of-the-art models and Large Language Models (LLMs) on M3-Med. The results reveal a significant performance gap between all models and human experts, especially on the complex multi-hop questions where model performance drops sharply. M3-Med effectively highlights the current limitations of AI models in deep cross-modal reasoning within specialized domains and provides a new direction for future research. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: 19 pages, 8 figures, 7 tables

arXiv:2507.03604 [pdf, ps, other]

Entanglement Purification by Integrated Silicon Photonics

Authors: Yonghe Yu, Siyan Zhou, Mujtaba Zahidy, Caterina Vigliar, Karsten Rottwitt, Leif K. Oxenlowe, Yunhong Ding

Abstract: We demonstrate the first on-chip deterministic entanglement purification based on silicon photonics. To evaluate the purification performance, we simulate the bit-flip and phase-flip errors by reconfigurable circuits on chip. The state fidelity improves from 0.71 to 0.82 under a 20% bit-flip error. rate We demonstrate the first on-chip deterministic entanglement purification based on silicon photonics. To evaluate the purification performance, we simulate the bit-flip and phase-flip errors by reconfigurable circuits on chip. The state fidelity improves from 0.71 to 0.82 under a 20% bit-flip error. rate △ Less

Submitted 4 July, 2025; originally announced July 2025.

arXiv:2507.03304 [pdf, ps, other]

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Authors: Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao

Abstract: Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal dataset… ▽ More Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal datasets and increasing demand for multi-modal tasks, a key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set. Due to the inherent differences between modalities, directly transferring methods from single-modal DG to MMDG typically yields sub-optimal results. These methods often exhibit randomness during generalization due to the invisibility of target domains and fail to consider inter-modal consistency. Applying these methods independently to each modality in the MMDG setting before combining them can lead to divergent generalization directions across different modalities, resulting in degraded generalization capabilities. To address these challenges, we propose a novel approach that leverages Unified Representations to map different paired modalities together, effectively adapting DG methods to MMDG by enabling synchronized multi-modal improvements within the unified space. Additionally, we introduce a supervised disentanglement framework that separates modal-general and modal-specific information, further enhancing the alignment of unified representations. Extensive experiments on benchmark datasets, including EPIC-Kitchens and Human-Animal-Cartoon, demonstrate the effectiveness and superiority of our method in enhancing multi-modal domain generalization. △ Less

Submitted 4 July, 2025; originally announced July 2025.

Comments: Accepted by ICCV 2025

arXiv:2507.02665 [pdf, ps, other]

Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?

Authors: Timo Kehrer, Robert Haines, Guido Juckeland, Shurui Zhou, David E. Bernholdt

Abstract: Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge ga… ▽ More Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge gaps, and areas for potential adaptation. Our preliminary findings reveal opportunities for mutual learning and collaboration, and our systematic methodology for terminology mapping provides a foundation for a crowd-sourced extension and validation in the future. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: Early access journal version: T. Kehrer, R. Haines, G. Juckeland, S. Zhou and D. E. Bernholdt, "Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?," in Computing in Science & Engineering, doi: 10.1109/MCSE.2025.3557236

arXiv:2507.01800 [pdf, ps, other]

HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision

Authors: Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng

Abstract: 3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop… ▽ More 3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop reasoning pathways freely. The absence of supervision on the reasoning pathway enables the potential for developing superficial shortcuts through common patterns in question-answer pairs. Moreover, although slow-thinking methods advance large language models, they suffer from underthinking. To address these issues, we propose \textbf{HCNQA}, a 3D VQA model leveraging a hierarchical concentration narrowing supervision method. By mimicking the human process of gradually focusing from a broad area to specific objects while searching for answers, our method guides the model to perform three phases of concentration narrowing through hierarchical supervision. By supervising key checkpoints on a general reasoning pathway, our method can ensure the development of a rational and effective reasoning pathway. Extensive experimental results demonstrate that our method can effectively ensure that the model develops a rational reasoning pathway and performs better. The code is available at https://github.com/JianuoZhu/HCNQA. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: ICANN 2025

arXiv:2507.01392 [pdf]

Two-Dimensional Superconductivity at the CaZrO3/KTaO3 (001) Heterointerfaces

Authors: Lu Chen, Siyi Zhou, Daming Tian, Yinan Xiao, Qixuan Gao, Yongchao Wang, Yuansha Chen, Fengxia Hu, Baogen Shen, Jirong Sun, Weisheng Zhao, Jinsong Zhang, Hui Zhang

Abstract: We investigated the superconducting transport properties of two-dimensional electron gases (2DEGs) at (001)-oriented CaZrO3/KTaO3 (CZO/KTO) heterointerfaces. Our results unambiguously demonstrate the emergence of two-dimensional superconductivity, with a superconducting transition TC up to ~0.25 K. The two-dimensional nature of the superconducting state is corroborated by the Berezinskii-Kosterlit… ▽ More We investigated the superconducting transport properties of two-dimensional electron gases (2DEGs) at (001)-oriented CaZrO3/KTaO3 (CZO/KTO) heterointerfaces. Our results unambiguously demonstrate the emergence of two-dimensional superconductivity, with a superconducting transition TC up to ~0.25 K. The two-dimensional nature of the superconducting state is corroborated by the Berezinskii-Kosterlitz-Thouless (BKT) transition and pronounced anisotropy of the upper critical field. The estimated superconducting layer thickness and coherence length are 10.1 nm and 146.4 nm, respectively, for the sample with nS=7.7*10^13 cm^-2. Furthermore, we demonstrate that the two-dimensional superconductivity at the CZO/KTO(001) interface can be effectively tuned by applying a back gate voltage. These findings conclusively establish two-dimensional superconductivity at the CZO/KTO(001) interfaces, providing a new platform for exploring emergent superconductivity in complex oxide heterostructures. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: 7 Pages,4 figures

arXiv:2507.01249 [pdf, ps, other]

Search for an Axion-Like Particle in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ Decays at Belle

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae , et al. (400 additional authors not shown)

Abstract: We report a search for an axion-like particle $a$ in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ decays using data collected with the Belle detector at the KEKB asymmetric energy electron-positron collider. The search is based on a $711 \mathrm{fb^{-1}}$ data sample collected at the $Υ4S$ resonance energy, corresponding to a sample of $772\times10^6$ $Υ4S$ events. In this study, we search for the dec… ▽ More We report a search for an axion-like particle $a$ in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ decays using data collected with the Belle detector at the KEKB asymmetric energy electron-positron collider. The search is based on a $711 \mathrm{fb^{-1}}$ data sample collected at the $Υ4S$ resonance energy, corresponding to a sample of $772\times10^6$ $Υ4S$ events. In this study, we search for the decay of the axion-like particle into a pair of photons, $a \rightarrow γγ$. We scan the two-photon invariant mass in the range $0.16\ \mathrm{GeV/}c^2-4.50\ \mathrm{GeV}/c^2$ for the $K$ modes and $0.16\ \mathrm{GeV/}c^2-4.20\ \mathrm{GeV}/c^2$ for the $K^{*}$ modes. No significant signal is observed in any of the modes, and 90\% confidence level upper limits are established on the coupling to the $W$ boson, $g_aW$, as a function of $a$ mass. The limits range from $3 \times 10^{-6} \mathrm{GeV}^{-1}$ to $3 \times 10^{-5} \mathrm{GeV}^{-1}$, improving the current constraints on $g_aW$ by a factor of two over the most stringent previous experimental results. △ Less

Submitted 3 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

Comments: 26 pages, 15 Figures

Report number: Belle II Preprint: 2025-017 KEK Preprint: 2025-16

arXiv:2507.00837 [pdf, ps, other]

Spontaneous emergence of altermagnetism in the single-orbital extended Hubbard model

Authors: Jin-Wei Dong, Yu-Han Lin, Ruiqing Fu, Xianxin Wu, Gang Su, Ziqiang Wang, Sen Zhou

Abstract: Altermagnetism (AM), the recently discovered third class of collinear magnetic order, is characterized by non-relativistic momentum-dependent spin-split electronic structure with compensated zero net magnetization. It can arise from the conventional antiferromagnetism by introducing local anisotropy on the two opposite-spin sublattices, either through structural changes in local crystallographic s… ▽ More Altermagnetism (AM), the recently discovered third class of collinear magnetic order, is characterized by non-relativistic momentum-dependent spin-split electronic structure with compensated zero net magnetization. It can arise from the conventional antiferromagnetism by introducing local anisotropy on the two opposite-spin sublattices, either through structural changes in local crystallographic symmetry or spontaneous emergence of local staggered orbital order from electron correlations in multi-orbital systems. Here, we demonstrate on the two-dimensional square lattice that a $d$-wave AM can emerge spontaneously in the single-orbital extended Hubbard model, without invoking the spin-orbital coupling and multi-orbital physics. We carry out mean-field studies on the concrete single-orbital $t$-$U$-$V$ model with $U$ and $V$ the onsite and nearest-neighbor Coulomb interactions, obtaining the ground states, analyzing their properties, and determining the phase diagram in the $U$-$V$ plane. The $d$-wave AM with novel spin-transport behavior is found to be stabilized in a wide region of the phase diagram when the system is doped away from half-filling, actualized by the coexistence of onsite antiferromagnetic order and complex $d$-wave nearest-neighbor spin bond orders. Our findings provide an alternative route to achieve AM and substantially expand the range of candidate AM materials. △ Less

Submitted 1 July, 2025; originally announced July 2025.

Comments: 7 pages, 3 figures

arXiv:2506.23301 [pdf, ps, other]

Parallax QAMA: Novel Downlink Multiple Access for MISO Systems with Simple Receivers

Authors: Jie Huang, Ming Zhao, Shengli Zhou, Ling Qiu, Jinkang Zhu

Abstract: In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two… ▽ More In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two private symbols carried by different beams. Based on joint symbol mapping of H-QAM constellations and phase-aligned precoding at the transmitter, each receiver observes a different H-QAM constellation with Gray mapping, a unique parallax feature not shared by existing schemes. In addition to avoiding successive interference cancellation (SIC), each user independently demodulates its own bits on separate I and Q branches with calculations based on closed-form expressions. Hence the receiver complexity is on par with that of orthogonal multiple access (OMA), which is much lower than that in other competing alternatives such as non-orthogonal multiple access (NOMA) and rate-splitting multiple access (RSMA). We carry out system optimization and determine the achievable rate region. Numerical results show that the proposed system has a larger rate region relative to other benchmark schemes with receivers not using SIC, and even achieves a comparable rate region to those benchmark schemes with SIC receivers. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.23132 [pdf, ps, other]

Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval

Authors: Sophie Zhou, Shu Kong

Abstract: Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting pho… ▽ More Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting photos and synthesizing plagiarized versions using generative AI, tailored to specific artists' styles. We first establish a baseline approach using off-the-shelf features from the visual foundation model DINOv2 to retrieve the most similar images in the database and classify plagiarism based on a similarity threshold. Surprisingly, this non-learned method achieves a high recognition accuracy of 97.2\% but suffers from low retrieval precision 29.0\% average precision (AP). To improve retrieval quality, we finetune DINOv2 with a metric learning loss using positive and negative sample pairs sampled in the database. The finetuned model greatly improves retrieval performance by 12\% AP over the baseline, though it unexpectedly results in a lower recognition accuracy (92.7\%). We conclude with insightful discussions and outline directions for future research. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: to appear at AVSS'25

arXiv:2506.22608 [pdf, ps, other]

On Fine-Grained Distinct Element Estimation

Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Thanasis Pittas, David P. Woodruff, Samson Zhou

Abstract: We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in… ▽ More We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \fracβ{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $\mathcal{O}\left(α\log n+\frac{\sqrtβ}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: ICML 2025

arXiv:2506.22546 [pdf, ps, other]

Primal S-matrix bootstrap with dispersion relations

Authors: Claudia de Rham, Andrew J. Tolley, Zhuo-Hui Wang, Shuang-Yong Zhou

Abstract: We propose a new method for constructing the consistent space of scattering amplitudes by parameterizing the imaginary parts of partial waves and utilizing dispersion relations, crossing symmetry, and full unitarity. Using this framework, we explicitly compute bounds on the leading couplings and examine the Regge behaviors of the constructed amplitudes. The method also readily accommodates spinnin… ▽ More We propose a new method for constructing the consistent space of scattering amplitudes by parameterizing the imaginary parts of partial waves and utilizing dispersion relations, crossing symmetry, and full unitarity. Using this framework, we explicitly compute bounds on the leading couplings and examine the Regge behaviors of the constructed amplitudes. The method also readily accommodates spinning bound states, which we use to constrain glueball couplings. By incorporating dispersion relations, our approach inherently satisfies the Froissart-Martin/Jin-Martin bounds or softer high-energy behaviors by construction. This, in turn, allows us to formulate a new class of fractionally subtracted dispersion relations, through which we investigate the sensitivity of coupling bounds to the asymptotic growth rate. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 41 pages, 14 figures

Report number: Imperial/TP/2025/cdr/3, USTC-ICTS/PCFT-25-24

arXiv:2506.21967 [pdf, ps, other]

More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents

Authors: Weimin Xiong, Ke Wang, Yifan Song, Hanchao Liu, Sai Zhou, Wei Peng, Sujian Li

Abstract: Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process,… ▽ More Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process, including reading tool documentation, selecting tools and generating parameters, and processing the tool's response. Through extensive experiments, we observe that agents are highly susceptible to errors at each stage and agents based on open-source models are more vulnerable than those based on proprietary models. We also find that increasing the model size does not significantly improve tool invocation reasoning and may make agents more vulnerable to attacks resembling normal user instructions. This highlights the importance of evaluating agent stability and offers valuable insights for future LLM development and evaluation. △ Less

Submitted 27 June, 2025; originally announced June 2025.

arXiv:2506.21915 [pdf]

An Effective Two-Phase Genetic Algorithm for Solving the Resource Constrained Project Scheduling Problem (RCPSP)

Authors: D. Sun, S. Zhou

Abstract: This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases i… ▽ More This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases iteratively. In exploring a solution space, the Phase-1 emphasizes intensification in current neighborhood, while the Phase-2 emphasizes diversification to escape local traps. The 2PGA was tested on the standard benchmark problems in PSPLIB, the results have shown that the algorithm is effective and has improved some of the best heuristic solutions. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 12 pages

MSC Class: 90-08

arXiv:2506.21619 [pdf, other]

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

Authors: Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu

Abstract: Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict… ▽ More Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict audio-visual synchronization. This paper introduces IndexTTS2, which proposes a novel and autoregressive-model-friendly method for speech duration control. The method supports two generation modes: one allows explicit specification of the number of generated tokens for precise duration control; the other does not require manual input and lets the model freely generate speech while preserving prosodic characteristics from the input prompt. Furthermore, IndexTTS2 achieves disentanglement between emotional expression and speaker identity, enabling independent control of timbre and emotion. In the zero-shot setting, the model can perfectly reproduce the emotional characteristics of the input prompt. Users may also provide a separate emotion prompt, even from a different speaker, allowing the model to reconstruct the target timbre while conveying the desired emotion. To enhance clarity during strong emotional expressions, we incorporate GPT latent representations to improve speech stability. Meanwhile, to lower the barrier for emotion control, we design a soft instruction mechanism based on textual descriptions by fine-tuning Qwen3. This enables effective guidance of speech generation with desired emotional tendencies using natural language input. Experimental results demonstrate that IndexTTS2 outperforms existing state-of-the-art zero-shot TTS models in word error rate, speaker similarity, and emotional fidelity. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.21409 [pdf]

Observation of Cavity-Mediated Nonlinear Landau Fan and Modified Landau Level Degeneracy in Graphene Quantum Transport

Authors: Hongxia Xue, Hsun-Chi Chan, Zuzhang Lin, Dalin Boriçi, Shaobo Zhou, Yanan Wang, Kenji Watanabe, Takashi Taniguchi, Cristiano Ciuti, Wang Yao, Dong-Keun Ki, Shuang Zhang

Abstract: Recent studies on cavity-coupled two-dimensional electron gas demonstrate that vacuum-field engineering can tailor electronic transport properties of materials. By achieving ultra-strong coupling between a terahertz resonator and mesoscopic graphene, we demonstrate that cavity vacuum fields can alter the effective degeneracies of Landau levels, resulting in a nonlinear Landau fan diagram for massl… ▽ More Recent studies on cavity-coupled two-dimensional electron gas demonstrate that vacuum-field engineering can tailor electronic transport properties of materials. By achieving ultra-strong coupling between a terahertz resonator and mesoscopic graphene, we demonstrate that cavity vacuum fields can alter the effective degeneracies of Landau levels, resulting in a nonlinear Landau fan diagram for massless Dirac fermions while preserving quantum-Hall quantization. Specifically, by leveraging graphene's gate-tunability, we observe that quantum-Hall features, minimum longitudinal and quantized Hall conductance for a given filling factor, occur at carrier densities reduced by more than 20 percent compared to systems without cavity. Theoretical analysis attributes this effect to the virtual cavity photon mediated transitions between the non-equidistant Landau levels in graphene, significantly reducing their effective degeneracy. This study paves the way for investigating cavity quantum electrodynamics in highly tunable, atomically thin two-dimensional crystals. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: 11 pages, 4 figures

arXiv:2506.21001 [pdf, ps, other]

Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology

Authors: Qiuyi Qi, Xin Li, Ming Kong, Zikang Xu, Bingdi Chen, Qiang Zhu, S Kevin Zhou

Abstract: Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and ro… ▽ More Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and robustness of detection models. Without additional training, SAIC first selects an appropriate candidate from the abnormal cell bank based on attribute guidance. Then, it employs a high-frequency feature reconstruction to achieve a style-aligned and high-fidelity composition of abnormal cells and pathological backgrounds. Finally, it introduces a large vision-language model to filter high-quality synthesis images. Experimental results demonstrate that incorporating SAIC-synthesized images effectively enhances the performance and robustness of abnormal cell detection for tail categories and styles, thereby improving overall detection performance. The comprehensive quality evaluation further confirms the generalizability and practicality of SAIC in clinical application scenarios. Our code will be released at https://github.com/Joey-Qi/SAIC. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: MIDL 2025 Oral

arXiv:2506.20273 [pdf, ps, other]

Adjacency spectral radius and H-factors in 1-binding graphs

Authors: Sizhong Zhou, Tao Zhang, Zhiren Sun

Abstract: Let $G$ be a graph, and let $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ be a set-valued function. Hence, $H(v)$ equals $\{1\}$ or $\{0,2\}$ for any $v\in V(G)$. We let $$ H^{-1}(1)=\{v: v\in V(G) \ \mbox{and} \ H(v)=1\}. $$ An $H$-factor of $G$ is a spanning subgraph $F$ of $G$ such that $d_F(v)\in H(v)$ for each $v\in V(G)$. Lu and Kano showed a characterization for the existence of an $H$-factor in… ▽ More Let $G$ be a graph, and let $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ be a set-valued function. Hence, $H(v)$ equals $\{1\}$ or $\{0,2\}$ for any $v\in V(G)$. We let $$ H^{-1}(1)=\{v: v\in V(G) \ \mbox{and} \ H(v)=1\}. $$ An $H$-factor of $G$ is a spanning subgraph $F$ of $G$ such that $d_F(v)\in H(v)$ for each $v\in V(G)$. Lu and Kano showed a characterization for the existence of an $H$-factor in a graph [Characterization of 1-tough graphs using factors, Discrete Math. 343 (2020) 111901]. Let $A(G)$ and $ρ(G)$ denote the adjacency matrix and the adjacency spectral radius of $G$, respectively. By using Lu and Kano's result, we pose a sufficient condition with respect to the adjacency spectral radius to guarantee the existence of an $H$-factor in a 1-binding graph. In this paper, we prove that if a connected 1-binding graph $G$ of order $n\geq11$ satisfies $ρ(G)\geqρ(K_1\vee(K_{n-4}\cup K_2\cup K_1))$, then $G$ has an $H$-factor for each $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ with $H^{-1}(1)$ even, unless $G=K_1\vee(K_{n-4}\cup K_2\cup K_1)$. △ Less

Submitted 25 June, 2025; originally announced June 2025.

Comments: 9 pages

MSC Class: 05C50; 05C70

arXiv:2506.19742 [pdf, ps, other]

NeRF-based CBCT Reconstruction needs Normalization and Initialization

Authors: Zhuowei Xu, Han Li, Dai Sun, Zhicheng Li, Yujia Li, Qingpeng Kong, Zhiwei Cheng, Nassir Navab, S. Kevin Zhou

Abstract: Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specif… ▽ More Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specifically, in each training step, only a subset of the hash encoder's parameters is used (local sparse), whereas all parameters in the neural network participate (global dense). Consequently, hash features generated in each step are highly misaligned, as they come from different subsets of the hash encoder. These misalignments from different training steps are then fed into the neural network, causing repeated inconsistent global updates in training, which leads to unstable training, slower convergence, and degraded reconstruction quality. Aiming to alleviate the impact of this local-global optimization mismatch, we introduce a Normalized Hash Encoder, which enhances feature consistency and mitigates the mismatch. Additionally, we propose a Mapping Consistency Initialization(MCI) strategy that initializes the neural network before training by leveraging the global mapping property from a well-trained model. The initialized neural network exhibits improved stability during early training, enabling faster convergence and enhanced reconstruction performance. Our method is simple yet effective, requiring only a few lines of code while substantially improving training efficiency on 128 CT cases collected from 4 different datasets, covering 7 distinct anatomical regions. △ Less

Submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.19651 [pdf, ps, other]

PEVLM: Parallel Encoding for Vision-Language Models

Authors: Letian Kang, Shixian Luo, Yiqiang Li, Xiaoyang Yu, Shenxuan Zhou, Yong Wu

Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long… ▽ More Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long video scenarios. PEVLM partitions the input video into context blocks with a shared sink block, while preserving sequential position embeddings to align the attention weight distribution with that of Full-Attention. This design reduces attention complexity from $O((T \times N)^2)$ to $O(T \times N)$ where $T$ is the number of frames and $N$ the number of tokens per frame, without sacrificing accuracy. Extensive experiments across multiple state-of-the-art models and benchmarks demonstrate that PEVLM consistently outperforms existing parallel encoding approaches, achieving up to \textbf{7.47x} speedup in attention computation and reducing end-to-end latency by \textbf{40\%}. Remarkably, PEVLM not only maintains high accuracy, but in some settings even surpasses Full-Attention performance. Under strict latency constraints, it achieves substantial gains, improving accuracy from \textbf{23.26\%} to \textbf{61.03\%}. These results underscore the effectiveness of PEVLM for low-latency, long-context video understanding, making it a promising solution for real-world applications. △ Less

Submitted 7 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.19476 [pdf, ps, other]

Neural Collapse based Deep Supervised Federated Learning for Signal Detection in OFDM Systems

Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li

Abstract: Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this… ▽ More Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this challenge, we propose a neural collapse (NC) inspired deep supervised federated learning (NCDSFL) algorithm. △ Less

Submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.19180 [pdf, ps, other]

Precise Measurement of the $Λ$ Electric Dipole Moment through the Entangled Strange Baryon-Antibaryon System

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (696 additional authors not shown)

Abstract: The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipol… ▽ More The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipole moment (EDM). However, direct measurements of hyperon EDMs through spin precession are highly challenging due to their short lifetimes. In this paper, we present a novel method to extract the EDM of the lightest hyperon, $Λ$, using the entangled $Λ$$\overlineΛ$ system. Our result is consistent with zero, achieving a three-order-of-magnitude improvement over the previous upper limit established in the 1980s with comparable statistics, providing stringent constraints on potential new physics. △ Less

Submitted 28 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18997 [pdf, ps, other]

From simulations to observations. Methodology and data release of mock TNG50 galaxies at 0.3 < z < 0.7 for WEAVE-StePS

Authors: A. Ikhsanova, L. Costantin, A. Pizzella, E. M. Corsini, L. Morelli, F. R. Ditrani, A. Ferré-Mateu, L. Gabarra, M. Gullieuszik, C. P. Haines, A. Iovino, M. Longhetti, A. Mercurio, R. Ragusa, P. Sánchez-Blázquez, C. Tortora, B. Vulcani, S. Zhou, E. Gafton, F. Pistis

Abstract: The new generation of optical spectrographs (i.e., WEAVE, 4MOST, DESI, and WST) offer unprecedented opportunities for statistically studying the star formation histories of galaxies. However, these observations are not easily comparable to predictions from cosmological simulations. Our goal is to build a reference framework for comparing spectroscopic observations with simulations and test tools f… ▽ More The new generation of optical spectrographs (i.e., WEAVE, 4MOST, DESI, and WST) offer unprecedented opportunities for statistically studying the star formation histories of galaxies. However, these observations are not easily comparable to predictions from cosmological simulations. Our goal is to build a reference framework for comparing spectroscopic observations with simulations and test tools for deriving stellar population properties of galaxies. We focus on the observational strategy of the Stellar Population at Intermediate Redshift Survey (StePS) with the WEAVE instrument. We generate mock datasets of ~750 galaxies at redshifts z = 0.3, 0.5, and 0.7 using the TNG50 simulation, perform radiative transfer with SKIRT, and analyze the spectra with pPXF as if they were real observations. We present the methodology to generate these datasets and provide an initial exploration of stellar population parameters (i.e., mass-weighted ages and metallicities) and star formation histories for three galaxies at z = 0.7 and their descendants at z = 0.5 and 0.3. We find good agreement between the mock spectra and intrinsic ages in TNG50 (average difference $0.2\pm0.3$ Gyr) and successfully recover their star formation histories, especially for galaxies form the bulk of their stars on short timescales and at early epochs. We release these datasets, including multi-wavelength imaging and spectra, to support forthcoming WEAVE observations. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18930 [pdf, ps, other]

Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking

Authors: Chong Di, Shuwang Zhou, Da Chen, Jean-Marie Mirebeau, Minglei Shu, Laurent D. Cohen

Abstract: The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenari… ▽ More The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenarios, they often suffer from computational inefficiency and heavily rely on a prescribed prior to fit the target elongated shapes. We propose a novel framework that casts segment-wise tracking as a Markov Decision Process (MDP), enabling a reinforcement learning approach. Our method leverages Q-Learning to dynamically explore a graph of segments, computing edge weights on-demand and adaptively expanding the search space. This strategy avoids the high cost of a pre-computed graph and proves robust to incomplete initial information. Experimental reuslts on typical tubular structure datasets demonstrate that our method significantly outperforms state-of-the-art point-wise and segment-wise approaches. The proposed method effectively handles complex topologies and maintains global path coherence without depending on extensive prior structural knowledge. △ Less

Submitted 21 June, 2025; originally announced June 2025.

arXiv:2506.18897 [pdf, ps, other]

MinD: Unified Visual Imagination and Control via Hierarchical World Models

Authors: Xiaowei Chi, Kuangzhi Ge, Jiaming Liu, Siyuan Zhou, Peidong Jia, Zichen He, Yuzhen Liu, Tingguang Li, Lei Han, Sirui Han, Shanghang Zhang, Yike Guo

Abstract: Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate i… ▽ More Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate in Dream (MinD), a hierarchical diffusion-based world model framework that employs a dual-system design for vision-language manipulation. MinD executes VGM at low frequencies to extract video prediction features, while leveraging a high-frequency diffusion policy for real-time interaction. This architecture enables low-latency, closed-loop control in manipulation with coherent visual guidance. To better coordinate the two systems, we introduce a video-action diffusion matching module (DiffMatcher), with a novel co-training strategy that uses separate schedulers for each diffusion model. Specifically, we introduce a diffusion-forcing mechanism to DiffMatcher that aligns their intermediate representations during training, helping the fast action model better understand video-based predictions. Beyond manipulation, MinD also functions as a world simulator, reliably predicting task success or failure in latent space before execution. Trustworthy analysis further shows that VGMs can preemptively evaluate task feasibility and mitigate risks. Extensive experiments across multiple benchmarks demonstrate that MinD achieves state-of-the-art manipulation (63%+) in RL-Bench, advancing the frontier of unified world modeling in robotics. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18851 [pdf, ps, other]

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Authors: Zhuowei Chen, Bingchuan Li, Tianxiang Ma, Lijie Liu, Mingcong Liu, Yi Zhang, Gen Li, Xinghui Li, Siyu Zhou, Qian He, Xinglong Wu

Abstract: Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by samp… ▽ More Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by sampling reference images from the same scene as the target video. To address this issue, we introduce \textbf{Phantom-Data, the first general-purpose cross-pair subject-to-video consistency dataset}, containing approximately one million identity-consistent pairs across diverse categories. Our dataset is constructed via a three-stage pipeline: (1) a general and input-aligned subject detection module, (2) large-scale cross-context subject retrieval from more than 53 million videos and 3 billion images, and (3) prior-guided identity verification to ensure visual consistency under contextual variation. Comprehensive experiments show that training with Phantom-Data significantly improves prompt alignment and visual quality while preserving identity consistency on par with in-pair baselines. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: Project page:https://phantom-video.github.io/Phantom-Data/

arXiv:2506.18034 [pdf, ps, other]

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Authors: Fenghe Tang, Wenxin Ma, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou

Abstract: With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly,… ▽ More With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: Accepted by MICCAI 2025. Code: https://github.com/FengheTan9/LLM4Seg

arXiv:2506.18019 [pdf, ps, other]

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

Authors: Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu

Abstract: AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are require… ▽ More AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are required to plan and execute effectively, maintain reliable memory, and coordinate smoothly with other agents. Achieving these capabilities involves contending with ever-present intricate information, operations, and interactions. In light of this challenge, data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms that agents can more effectively understand and process. In this context, graphs, with their natural advantage in organizing, managing, and harnessing intricate data relationships, present a powerful data paradigm for structurization to support the capabilities demanded by advanced AI agents. To this end, this survey presents a first systematic review of how graphs can empower AI agents. Specifically, we explore the integration of graph techniques with core agent functionalities, highlight notable applications, and identify prospective avenues for future research. By comprehensively surveying this burgeoning intersection, we hope to inspire the development of next-generation AI agents equipped to tackle increasingly sophisticated challenges with graphs. Related resources are collected and continuously updated for the community in the Github link. △ Less

Submitted 4 July, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

Comments: 20 pages, 7 figures

arXiv:2506.17784 [pdf, ps, other]

AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction

Authors: Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li

Abstract: Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination… ▽ More Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination through a sequential structure rather than a graph structure, offering a significantly larger topology space for multi-agent communication. Our method focuses on two key directions: (1) Next-Agent Prediction, which selects the most suitable agent role at each step, and (2) Next-Context Selection (NCS), which enables each agent to selectively access relevant information from any previous step. Together, these components construct task-adaptive communication pipelines that support both role flexibility and global information flow. Extensive evaluations across multiple benchmarks demonstrate that our approach achieves superior performance while substantially reducing communication overhead. △ Less

Submitted 21 June, 2025; originally announced June 2025.

arXiv:2506.16716 [pdf, ps, other]

V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos

Authors: Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin

Abstract: Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi… ▽ More Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the video's original intent. Expressive speech effectively conveys these cues and enhances the user's comprehension of videos. Building on these insights, this paper explores the usage of vision-context-aware expressive speech in enhancing users' understanding of videos in video commentary systems. Firstly, our formatting study indicates that semantic-only speech can lead to ambiguity, and misaligned emotions between speech and visuals may distort content interpretation. To address this, we propose a method called vision-context-aware speech synthesis (V-CASS). It analyzes para-linguistic cues from visuals using a vision-language model and leverages a knowledge-infused language model to guide the expressive speech model in generating context-aligned speech. User studies show that V-CASS enhances emotional and attitudinal resonance, as well as user audio-visual understanding and engagement, with 74.68% of participants preferring the system. Finally, we explore the potential of our method in helping blind and low-vision users navigate web videos, improving universal accessibility. △ Less