-
Observation of the decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$
Authors:
Belle,
Belle II Collaborations,
:,
M. Abumusabh,
I. Adachi,
L. Aggarwal,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
N. Althubiti,
K. Amos,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
R. Ayad,
V. Babu,
H. Bae,
N. K. Baghel,
S. Bahinipati
, et al. (364 additional authors not shown)
Abstract:
We report the first observation of the two-body baryonic decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$ with significances of $7.3\,σ$ and $6.2\,σ$, respectively, including statistical and systematic uncertainties. The branching fractions are measured to be…
▽ More
We report the first observation of the two-body baryonic decays $B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}$ and $B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}$ with significances of $7.3\,σ$ and $6.2\,σ$, respectively, including statistical and systematic uncertainties. The branching fractions are measured to be $\mathcal{B}(B^{+} \to Σ_{c}(2455)^{++} \overlineΞ_{c}^{-}) = (5.74 \pm 1.11 \pm 0.42_{-1.53}^{+2.47}) \times 10^{-4}$ and $\mathcal{B}(B^{0} \to Σ_{c}(2455)^{0} \overlineΞ_{c}^{0}) = (4.83 \pm 1.12 \pm 0.37_{-0.60}^{+0.72}) \times 10^{-4}$. The first and second uncertainties are statistical and systematic, respectively, while the third ones arise from the absolute branching fractions of $\overlineΞ_{c}^{-}$ or $\overlineΞ_{c}^{0}$ decays. The data samples used for this analysis have integrated luminosities of 711~$\mathrm{fb}^{-1}$ and 365~$\mathrm{fb}^{-1}$, and were collected at the $Υ(4S)$ resonance by the Belle and Belle~II detectors operating at the KEKB and SuperKEKB asymmetric-energy $e^{+}e^{-}$ colliders, respectively.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Measurement of the $ D^{0}\rightarrow K^{-}π^{+}e^{+}e^{-} $ branching fraction and search for $ D^{0}\rightarrow π^{+}π^{-}e^{+}e^{-} $ and $D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $ decays at Belle
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
N. Althubiti,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae
, et al. (458 additional authors not shown)
Abstract:
We present a study of the rare charm meson decays $ D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $, $ π^{+}π^{-}e^{+}e^{-} $, and $ K^{-}π^{+}e^{+}e^{-} $ using a 942 fb$^{-1}$ data set collected by the Belle detector at the KEKB asymmetric-energy $ e^{+}e^{-} $ collider. We use $ D^{0} $ candidates identified by the charge of the pion in $ D^{*} \rightarrow D^{0} π$ decays and normalize the branching fr…
▽ More
We present a study of the rare charm meson decays $ D^{0}\rightarrow K^{+}K^{-}e^{+}e^{-} $, $ π^{+}π^{-}e^{+}e^{-} $, and $ K^{-}π^{+}e^{+}e^{-} $ using a 942 fb$^{-1}$ data set collected by the Belle detector at the KEKB asymmetric-energy $ e^{+}e^{-} $ collider. We use $ D^{0} $ candidates identified by the charge of the pion in $ D^{*} \rightarrow D^{0} π$ decays and normalize the branching fractions to $ D^{0} \rightarrow K^{-}π^{+}π^{-}π^{+} $ decays. The branching fraction for decay $ D^{0} \rightarrow K^{-}π^{+}e^{+}e^{-} $ is measured to be (39.6 $\pm$ 4.5 (stat) $\pm$ 2.9 (syst)) $\times$ $10^{-7}$, with the dielectron mass in the $ ρ/ω$ mass region $ 675 < m_{ee} < 875 $ MeV$/c^{2}$. We also search for $ D^{0}\rightarrow h^{-} h^{(\prime)+}e^{+}e^{-} $ ($ h^{(\prime)}=K,\,π$) decays with the dielectron mass near the $η$ and $φ$ resonances, and away from these resonances for the $ K^{+}K^{-}e^{+}e^{-} $ and $ π^{+}π^{-}e^{+}e^{-} $ modes. For these modes, we find no significant signals and set 90$\%$ confidence level upper limits on their branching fractions at the $\mathcal{O}$(10$^{-7}$) level.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Cross sections of $η$ mesons in $p$$+$$p$ collisions at forward rapidity at $\sqrt{s}=500$ GeV and central rapidity at $\sqrt{s}=510$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
M. Alfred,
D. Anderson,
K. R. Andrews,
A. Angerami,
S. Antsupov,
K. Aoki,
N. Apadula,
E. Appelt,
Y. Aramaki,
R. Armendariz,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun
, et al. (476 additional authors not shown)
Abstract:
We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross sectio…
▽ More
We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross section is measured from 3.5 to 44 GeV/$c$ for pseudorapidity $|η|<0.35$. Both cross sections serve as critical inputs to an updated global analysis of the $η$-meson fragmentation functions.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Learn 3D VQA Better with Active Selection and Reannotation
Authors:
Shengli Zhou,
Yang Liu,
Feng Zheng
Abstract:
3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scen…
▽ More
3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scene data enlarges the negative effect of misleading annotations. Although active learning strategies can select valuable instances for training, they fail to identify and resolve misleading labels, which the oracle inevitably provides in practice. To address this issue, we propose a multi-turn interactive active learning strategy. This strategy selects data based on models' semantic uncertainty to form a solid knowledge foundation more effectively and actively requests reannotation from an oracle to resolve potentially misleading labels. For uncertainty assessment, we utilize a variance-based metric that takes semantic relationships between terms into consideration, thus avoiding the uniform inter-class similarity assumption of previous assessment metrics. Extensive experiments exhibit better model performance and a substantial reduction in training costs, with a halving of training costs for achieving relatively high accuracy. The code is available at https://github.com/fz-zsl/AQuA.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Low-mass vector-meson production at forward rapidity in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
D. Anderson,
V. Andrieux,
S. Antsupov,
N. Apadula,
H. Asano,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
E. Bannikov,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont
, et al. (331 additional authors not shown)
Abstract:
The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nuc…
▽ More
The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nucleons, $\langle N_{\rm part}\rangle$, and the transverse momentum $p_T$. These results were compared with those obtained via the kaon decay channel in a similar $p_T$ range at midrapidity. The nuclear-modification factors in both rapidity regions are consistent within the uncertainties. A comparison of the $ω+ρ$ and $J/ψ$ mesons reveals that the light and heavy flavors are consistently suppressed across both $p_T$ and ${\langle}N_{\rm part}\rangle$. In contrast, the $φ$ meson displays a nuclear-modification factor consistent with unity, suggesting strangeness enhancement in the medium formed.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding
Authors:
Shenxi Liu,
Kan Li,
Mingyang Zhao,
Yuhang Tian,
Bin Li,
Shoujun Zhou,
Hongliang Li,
Fuxia Yang
Abstract:
With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2)…
▽ More
With the rapid progress of artificial intelligence (AI) in multi-modal understanding, there is increasing potential for video comprehension technologies to support professional domains such as medical education. However, existing benchmarks suffer from two primary limitations: (1) Linguistic Singularity: they are largely confined to English, neglecting the need for multilingual resources; and (2) Shallow Reasoning: their questions are often designed for surface-level information retrieval, failing to properly assess deep multi-modal integration. To address these limitations, we present M3-Med, the first benchmark for Multi-lingual, Multi-modal, and Multi-hop reasoning in Medical instructional video understanding. M3-Med consists of medical questions paired with corresponding video segments, annotated by a team of medical experts. A key innovation of M3-Med is its multi-hop reasoning task, which requires a model to first locate a key entity in the text, then find corresponding visual evidence in the video, and finally synthesize information across both modalities to derive the answer. This design moves beyond simple text matching and poses a substantial challenge to a model's deep cross-modal understanding capabilities. We define two tasks: Temporal Answer Grounding in Single Video (TAGSV) and Temporal Answer Grounding in Video Corpus (TAGVC). We evaluated several state-of-the-art models and Large Language Models (LLMs) on M3-Med. The results reveal a significant performance gap between all models and human experts, especially on the complex multi-hop questions where model performance drops sharply. M3-Med effectively highlights the current limitations of AI models in deep cross-modal reasoning within specialized domains and provides a new direction for future research.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Entanglement Purification by Integrated Silicon Photonics
Authors:
Yonghe Yu,
Siyan Zhou,
Mujtaba Zahidy,
Caterina Vigliar,
Karsten Rottwitt,
Leif K. Oxenlowe,
Yunhong Ding
Abstract:
We demonstrate the first on-chip deterministic entanglement purification based on silicon photonics. To evaluate the purification performance, we simulate the bit-flip and phase-flip errors by reconfigurable circuits on chip. The state fidelity improves from 0.71 to 0.82 under a 20% bit-flip error. rate
We demonstrate the first on-chip deterministic entanglement purification based on silicon photonics. To evaluate the purification performance, we simulate the bit-flip and phase-flip errors by reconfigurable circuits on chip. The state fidelity improves from 0.71 to 0.82 under a 20% bit-flip error. rate
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Authors:
Hai Huang,
Yan Xia,
Sashuai Zhou,
Hanting Wang,
Shulei Wang,
Zhou Zhao
Abstract:
Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal dataset…
▽ More
Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal datasets and increasing demand for multi-modal tasks, a key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set. Due to the inherent differences between modalities, directly transferring methods from single-modal DG to MMDG typically yields sub-optimal results. These methods often exhibit randomness during generalization due to the invisibility of target domains and fail to consider inter-modal consistency. Applying these methods independently to each modality in the MMDG setting before combining them can lead to divergent generalization directions across different modalities, resulting in degraded generalization capabilities. To address these challenges, we propose a novel approach that leverages Unified Representations to map different paired modalities together, effectively adapting DG methods to MMDG by enabling synchronized multi-modal improvements within the unified space. Additionally, we introduce a supervised disentanglement framework that separates modal-general and modal-specific information, further enhancing the alignment of unified representations. Extensive experiments on benchmark datasets, including EPIC-Kitchens and Human-Animal-Cartoon, demonstrate the effectiveness and superiority of our method in enhancing multi-modal domain generalization.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?
Authors:
Timo Kehrer,
Robert Haines,
Guido Juckeland,
Shurui Zhou,
David E. Bernholdt
Abstract:
Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge ga…
▽ More
Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. To better understand these divergences, we have started investigating how SE fundamentals from the SER community are interpreted within the RSE community, identifying aligned concepts, knowledge gaps, and areas for potential adaptation. Our preliminary findings reveal opportunities for mutual learning and collaboration, and our systematic methodology for terminology mapping provides a foundation for a crowd-sourced extension and validation in the future.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
Authors:
Shengli Zhou,
Jianuo Zhu,
Qilin Huang,
Fangjing Wang,
Yanfu Zhang,
Feng Zheng
Abstract:
3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop…
▽ More
3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop reasoning pathways freely. The absence of supervision on the reasoning pathway enables the potential for developing superficial shortcuts through common patterns in question-answer pairs. Moreover, although slow-thinking methods advance large language models, they suffer from underthinking. To address these issues, we propose \textbf{HCNQA}, a 3D VQA model leveraging a hierarchical concentration narrowing supervision method. By mimicking the human process of gradually focusing from a broad area to specific objects while searching for answers, our method guides the model to perform three phases of concentration narrowing through hierarchical supervision. By supervising key checkpoints on a general reasoning pathway, our method can ensure the development of a rational and effective reasoning pathway. Extensive experimental results demonstrate that our method can effectively ensure that the model develops a rational reasoning pathway and performs better. The code is available at https://github.com/JianuoZhu/HCNQA.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Two-Dimensional Superconductivity at the CaZrO3/KTaO3 (001) Heterointerfaces
Authors:
Lu Chen,
Siyi Zhou,
Daming Tian,
Yinan Xiao,
Qixuan Gao,
Yongchao Wang,
Yuansha Chen,
Fengxia Hu,
Baogen Shen,
Jirong Sun,
Weisheng Zhao,
Jinsong Zhang,
Hui Zhang
Abstract:
We investigated the superconducting transport properties of two-dimensional electron gases (2DEGs) at (001)-oriented CaZrO3/KTaO3 (CZO/KTO) heterointerfaces. Our results unambiguously demonstrate the emergence of two-dimensional superconductivity, with a superconducting transition TC up to ~0.25 K. The two-dimensional nature of the superconducting state is corroborated by the Berezinskii-Kosterlit…
▽ More
We investigated the superconducting transport properties of two-dimensional electron gases (2DEGs) at (001)-oriented CaZrO3/KTaO3 (CZO/KTO) heterointerfaces. Our results unambiguously demonstrate the emergence of two-dimensional superconductivity, with a superconducting transition TC up to ~0.25 K. The two-dimensional nature of the superconducting state is corroborated by the Berezinskii-Kosterlitz-Thouless (BKT) transition and pronounced anisotropy of the upper critical field. The estimated superconducting layer thickness and coherence length are 10.1 nm and 146.4 nm, respectively, for the sample with nS=7.7*10^13 cm^-2. Furthermore, we demonstrate that the two-dimensional superconductivity at the CZO/KTO(001) interface can be effectively tuned by applying a back gate voltage. These findings conclusively establish two-dimensional superconductivity at the CZO/KTO(001) interfaces, providing a new platform for exploring emergent superconductivity in complex oxide heterostructures.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Search for an Axion-Like Particle in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ Decays at Belle
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
N. Althubiti,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae
, et al. (400 additional authors not shown)
Abstract:
We report a search for an axion-like particle $a$ in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ decays using data collected with the Belle detector at the KEKB asymmetric energy electron-positron collider. The search is based on a $711 \mathrm{fb^{-1}}$ data sample collected at the $Υ4S$ resonance energy, corresponding to a sample of $772\times10^6$ $Υ4S$ events. In this study, we search for the dec…
▽ More
We report a search for an axion-like particle $a$ in $B\rightarrow K^{(*)} a (\rightarrowγγ)$ decays using data collected with the Belle detector at the KEKB asymmetric energy electron-positron collider. The search is based on a $711 \mathrm{fb^{-1}}$ data sample collected at the $Υ4S$ resonance energy, corresponding to a sample of $772\times10^6$ $Υ4S$ events. In this study, we search for the decay of the axion-like particle into a pair of photons, $a \rightarrow γγ$. We scan the two-photon invariant mass in the range $0.16\ \mathrm{GeV/}c^2-4.50\ \mathrm{GeV}/c^2$ for the $K$ modes and $0.16\ \mathrm{GeV/}c^2-4.20\ \mathrm{GeV}/c^2$ for the $K^{*}$ modes. No significant signal is observed in any of the modes, and 90\% confidence level upper limits are established on the coupling to the $W$ boson, $g_aW$, as a function of $a$ mass. The limits range from $3 \times 10^{-6} \mathrm{GeV}^{-1}$ to $3 \times 10^{-5} \mathrm{GeV}^{-1}$, improving the current constraints on $g_aW$ by a factor of two over the most stringent previous experimental results.
△ Less
Submitted 3 July, 2025; v1 submitted 1 July, 2025;
originally announced July 2025.
-
Spontaneous emergence of altermagnetism in the single-orbital extended Hubbard model
Authors:
Jin-Wei Dong,
Yu-Han Lin,
Ruiqing Fu,
Xianxin Wu,
Gang Su,
Ziqiang Wang,
Sen Zhou
Abstract:
Altermagnetism (AM), the recently discovered third class of collinear magnetic order, is characterized by non-relativistic momentum-dependent spin-split electronic structure with compensated zero net magnetization. It can arise from the conventional antiferromagnetism by introducing local anisotropy on the two opposite-spin sublattices, either through structural changes in local crystallographic s…
▽ More
Altermagnetism (AM), the recently discovered third class of collinear magnetic order, is characterized by non-relativistic momentum-dependent spin-split electronic structure with compensated zero net magnetization. It can arise from the conventional antiferromagnetism by introducing local anisotropy on the two opposite-spin sublattices, either through structural changes in local crystallographic symmetry or spontaneous emergence of local staggered orbital order from electron correlations in multi-orbital systems. Here, we demonstrate on the two-dimensional square lattice that a $d$-wave AM can emerge spontaneously in the single-orbital extended Hubbard model, without invoking the spin-orbital coupling and multi-orbital physics. We carry out mean-field studies on the concrete single-orbital $t$-$U$-$V$ model with $U$ and $V$ the onsite and nearest-neighbor Coulomb interactions, obtaining the ground states, analyzing their properties, and determining the phase diagram in the $U$-$V$ plane. The $d$-wave AM with novel spin-transport behavior is found to be stabilized in a wide region of the phase diagram when the system is doped away from half-filling, actualized by the coexistence of onsite antiferromagnetic order and complex $d$-wave nearest-neighbor spin bond orders. Our findings provide an alternative route to achieve AM and substantially expand the range of candidate AM materials.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Parallax QAMA: Novel Downlink Multiple Access for MISO Systems with Simple Receivers
Authors:
Jie Huang,
Ming Zhao,
Shengli Zhou,
Ling Qiu,
Jinkang Zhu
Abstract:
In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two…
▽ More
In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two private symbols carried by different beams. Based on joint symbol mapping of H-QAM constellations and phase-aligned precoding at the transmitter, each receiver observes a different H-QAM constellation with Gray mapping, a unique parallax feature not shared by existing schemes. In addition to avoiding successive interference cancellation (SIC), each user independently demodulates its own bits on separate I and Q branches with calculations based on closed-form expressions. Hence the receiver complexity is on par with that of orthogonal multiple access (OMA), which is much lower than that in other competing alternatives such as non-orthogonal multiple access (NOMA) and rate-splitting multiple access (RSMA). We carry out system optimization and determine the achievable rate region. Numerical results show that the proposed system has a larger rate region relative to other benchmark schemes with receivers not using SIC, and even achieves a comparable rate region to those benchmark schemes with SIC receivers.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Dare to Plagiarize? Plagiarized Painting Recognition and Retrieval
Authors:
Sophie Zhou,
Shu Kong
Abstract:
Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting pho…
▽ More
Art plagiarism detection plays a crucial role in protecting artists' copyrights and intellectual property, yet it remains a challenging problem in forensic analysis. In this paper, we address the task of recognizing plagiarized paintings and explaining the detected plagarisms by retrieving visually similar authentic artworks. To support this study, we construct a dataset by collecting painting photos and synthesizing plagiarized versions using generative AI, tailored to specific artists' styles. We first establish a baseline approach using off-the-shelf features from the visual foundation model DINOv2 to retrieve the most similar images in the database and classify plagiarism based on a similarity threshold. Surprisingly, this non-learned method achieves a high recognition accuracy of 97.2\% but suffers from low retrieval precision 29.0\% average precision (AP). To improve retrieval quality, we finetune DINOv2 with a metric learning loss using positive and negative sample pairs sampled in the database. The finetuned model greatly improves retrieval performance by 12\% AP over the baseline, though it unexpectedly results in a lower recognition accuracy (92.7\%). We conclude with insightful discussions and outline directions for future research.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
On Fine-Grained Distinct Element Estimation
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Thanasis Pittas,
David P. Woodruff,
Samson Zhou
Abstract:
We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in…
▽ More
We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Θ\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \fracβ{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $\mathcal{O}\left(α\log n+\frac{\sqrtβ}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Primal S-matrix bootstrap with dispersion relations
Authors:
Claudia de Rham,
Andrew J. Tolley,
Zhuo-Hui Wang,
Shuang-Yong Zhou
Abstract:
We propose a new method for constructing the consistent space of scattering amplitudes by parameterizing the imaginary parts of partial waves and utilizing dispersion relations, crossing symmetry, and full unitarity. Using this framework, we explicitly compute bounds on the leading couplings and examine the Regge behaviors of the constructed amplitudes. The method also readily accommodates spinnin…
▽ More
We propose a new method for constructing the consistent space of scattering amplitudes by parameterizing the imaginary parts of partial waves and utilizing dispersion relations, crossing symmetry, and full unitarity. Using this framework, we explicitly compute bounds on the leading couplings and examine the Regge behaviors of the constructed amplitudes. The method also readily accommodates spinning bound states, which we use to constrain glueball couplings. By incorporating dispersion relations, our approach inherently satisfies the Froissart-Martin/Jin-Martin bounds or softer high-energy behaviors by construction. This, in turn, allows us to formulate a new class of fractionally subtracted dispersion relations, through which we investigate the sensitivity of coupling bounds to the asymptotic growth rate.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents
Authors:
Weimin Xiong,
Ke Wang,
Yifan Song,
Hanchao Liu,
Sai Zhou,
Wei Peng,
Sujian Li
Abstract:
Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process,…
▽ More
Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process, including reading tool documentation, selecting tools and generating parameters, and processing the tool's response. Through extensive experiments, we observe that agents are highly susceptible to errors at each stage and agents based on open-source models are more vulnerable than those based on proprietary models. We also find that increasing the model size does not significantly improve tool invocation reasoning and may make agents more vulnerable to attacks resembling normal user instructions. This highlights the importance of evaluating agent stability and offers valuable insights for future LLM development and evaluation.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
An Effective Two-Phase Genetic Algorithm for Solving the Resource Constrained Project Scheduling Problem (RCPSP)
Authors:
D. Sun,
S. Zhou
Abstract:
This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases i…
▽ More
This note presents a simple and effective variation of genetic algorithm (GA) for solving RCPSP, denoted as 2-Phase Genetic Algorithm (2PGA). The 2PGA implements GA parent selection in two phases: Phase-1 includes the best current solutions in the parent pool, and Phase-2 excludes the best current solutions from the parent pool. The 2PGA carries out the GA evolution by alternating the two phases iteratively. In exploring a solution space, the Phase-1 emphasizes intensification in current neighborhood, while the Phase-2 emphasizes diversification to escape local traps. The 2PGA was tested on the standard benchmark problems in PSPLIB, the results have shown that the algorithm is effective and has improved some of the best heuristic solutions.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Authors:
Siyi Zhou,
Yiquan Zhou,
Yi He,
Xun Zhou,
Jinchao Wang,
Wei Deng,
Jingchen Shu
Abstract:
Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict…
▽ More
Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict audio-visual synchronization. This paper introduces IndexTTS2, which proposes a novel and autoregressive-model-friendly method for speech duration control. The method supports two generation modes: one allows explicit specification of the number of generated tokens for precise duration control; the other does not require manual input and lets the model freely generate speech while preserving prosodic characteristics from the input prompt. Furthermore, IndexTTS2 achieves disentanglement between emotional expression and speaker identity, enabling independent control of timbre and emotion. In the zero-shot setting, the model can perfectly reproduce the emotional characteristics of the input prompt. Users may also provide a separate emotion prompt, even from a different speaker, allowing the model to reconstruct the target timbre while conveying the desired emotion. To enhance clarity during strong emotional expressions, we incorporate GPT latent representations to improve speech stability. Meanwhile, to lower the barrier for emotion control, we design a soft instruction mechanism based on textual descriptions by fine-tuning Qwen3. This enables effective guidance of speech generation with desired emotional tendencies using natural language input. Experimental results demonstrate that IndexTTS2 outperforms existing state-of-the-art zero-shot TTS models in word error rate, speaker similarity, and emotional fidelity.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Observation of Cavity-Mediated Nonlinear Landau Fan and Modified Landau Level Degeneracy in Graphene Quantum Transport
Authors:
Hongxia Xue,
Hsun-Chi Chan,
Zuzhang Lin,
Dalin Boriçi,
Shaobo Zhou,
Yanan Wang,
Kenji Watanabe,
Takashi Taniguchi,
Cristiano Ciuti,
Wang Yao,
Dong-Keun Ki,
Shuang Zhang
Abstract:
Recent studies on cavity-coupled two-dimensional electron gas demonstrate that vacuum-field engineering can tailor electronic transport properties of materials. By achieving ultra-strong coupling between a terahertz resonator and mesoscopic graphene, we demonstrate that cavity vacuum fields can alter the effective degeneracies of Landau levels, resulting in a nonlinear Landau fan diagram for massl…
▽ More
Recent studies on cavity-coupled two-dimensional electron gas demonstrate that vacuum-field engineering can tailor electronic transport properties of materials. By achieving ultra-strong coupling between a terahertz resonator and mesoscopic graphene, we demonstrate that cavity vacuum fields can alter the effective degeneracies of Landau levels, resulting in a nonlinear Landau fan diagram for massless Dirac fermions while preserving quantum-Hall quantization. Specifically, by leveraging graphene's gate-tunability, we observe that quantum-Hall features, minimum longitudinal and quantized Hall conductance for a given filling factor, occur at carrier densities reduced by more than 20 percent compared to systems without cavity. Theoretical analysis attributes this effect to the virtual cavity photon mediated transitions between the non-equidistant Landau levels in graphene, significantly reducing their effective degeneracy. This study paves the way for investigating cavity quantum electrodynamics in highly tunable, atomically thin two-dimensional crystals.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology
Authors:
Qiuyi Qi,
Xin Li,
Ming Kong,
Zikang Xu,
Bingdi Chen,
Qiang Zhu,
S Kevin Zhou
Abstract:
Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and ro…
▽ More
Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and robustness of detection models. Without additional training, SAIC first selects an appropriate candidate from the abnormal cell bank based on attribute guidance. Then, it employs a high-frequency feature reconstruction to achieve a style-aligned and high-fidelity composition of abnormal cells and pathological backgrounds. Finally, it introduces a large vision-language model to filter high-quality synthesis images. Experimental results demonstrate that incorporating SAIC-synthesized images effectively enhances the performance and robustness of abnormal cell detection for tail categories and styles, thereby improving overall detection performance. The comprehensive quality evaluation further confirms the generalizability and practicality of SAIC in clinical application scenarios. Our code will be released at https://github.com/Joey-Qi/SAIC.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Adjacency spectral radius and H-factors in 1-binding graphs
Authors:
Sizhong Zhou,
Tao Zhang,
Zhiren Sun
Abstract:
Let $G$ be a graph, and let $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ be a set-valued function. Hence, $H(v)$ equals $\{1\}$ or $\{0,2\}$ for any $v\in V(G)$. We let $$ H^{-1}(1)=\{v: v\in V(G) \ \mbox{and} \ H(v)=1\}. $$ An $H$-factor of $G$ is a spanning subgraph $F$ of $G$ such that $d_F(v)\in H(v)$ for each $v\in V(G)$. Lu and Kano showed a characterization for the existence of an $H$-factor in…
▽ More
Let $G$ be a graph, and let $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ be a set-valued function. Hence, $H(v)$ equals $\{1\}$ or $\{0,2\}$ for any $v\in V(G)$. We let $$ H^{-1}(1)=\{v: v\in V(G) \ \mbox{and} \ H(v)=1\}. $$ An $H$-factor of $G$ is a spanning subgraph $F$ of $G$ such that $d_F(v)\in H(v)$ for each $v\in V(G)$. Lu and Kano showed a characterization for the existence of an $H$-factor in a graph [Characterization of 1-tough graphs using factors, Discrete Math. 343 (2020) 111901]. Let $A(G)$ and $ρ(G)$ denote the adjacency matrix and the adjacency spectral radius of $G$, respectively. By using Lu and Kano's result, we pose a sufficient condition with respect to the adjacency spectral radius to guarantee the existence of an $H$-factor in a 1-binding graph. In this paper, we prove that if a connected 1-binding graph $G$ of order $n\geq11$ satisfies $ρ(G)\geqρ(K_1\vee(K_{n-4}\cup K_2\cup K_1))$, then $G$ has an $H$-factor for each $H:V(G)\longrightarrow\{\{1\},\{0,2\}\}$ with $H^{-1}(1)$ even, unless $G=K_1\vee(K_{n-4}\cup K_2\cup K_1)$.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
NeRF-based CBCT Reconstruction needs Normalization and Initialization
Authors:
Zhuowei Xu,
Han Li,
Dai Sun,
Zhicheng Li,
Yujia Li,
Qingpeng Kong,
Zhiwei Cheng,
Nassir Navab,
S. Kevin Zhou
Abstract:
Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specif…
▽ More
Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specifically, in each training step, only a subset of the hash encoder's parameters is used (local sparse), whereas all parameters in the neural network participate (global dense). Consequently, hash features generated in each step are highly misaligned, as they come from different subsets of the hash encoder. These misalignments from different training steps are then fed into the neural network, causing repeated inconsistent global updates in training, which leads to unstable training, slower convergence, and degraded reconstruction quality. Aiming to alleviate the impact of this local-global optimization mismatch, we introduce a Normalized Hash Encoder, which enhances feature consistency and mitigates the mismatch. Additionally, we propose a Mapping Consistency Initialization(MCI) strategy that initializes the neural network before training by leveraging the global mapping property from a well-trained model. The initialized neural network exhibits improved stability during early training, enabling faster convergence and enhanced reconstruction performance. Our method is simple yet effective, requiring only a few lines of code while substantially improving training efficiency on 128 CT cases collected from 4 different datasets, covering 7 distinct anatomical regions.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
PEVLM: Parallel Encoding for Vision-Language Models
Authors:
Letian Kang,
Shixian Luo,
Yiqiang Li,
Xiaoyang Yu,
Shenxuan Zhou,
Yong Wu
Abstract:
Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long…
▽ More
Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long video scenarios. PEVLM partitions the input video into context blocks with a shared sink block, while preserving sequential position embeddings to align the attention weight distribution with that of Full-Attention. This design reduces attention complexity from $O((T \times N)^2)$ to $O(T \times N)$ where $T$ is the number of frames and $N$ the number of tokens per frame, without sacrificing accuracy. Extensive experiments across multiple state-of-the-art models and benchmarks demonstrate that PEVLM consistently outperforms existing parallel encoding approaches, achieving up to \textbf{7.47x} speedup in attention computation and reducing end-to-end latency by \textbf{40\%}. Remarkably, PEVLM not only maintains high accuracy, but in some settings even surpasses Full-Attention performance. Under strict latency constraints, it achieves substantial gains, improving accuracy from \textbf{23.26\%} to \textbf{61.03\%}. These results underscore the effectiveness of PEVLM for low-latency, long-context video understanding, making it a promising solution for real-world applications.
△ Less
Submitted 7 July, 2025; v1 submitted 24 June, 2025;
originally announced June 2025.
-
Neural Collapse based Deep Supervised Federated Learning for Signal Detection in OFDM Systems
Authors:
Kaidi Xu,
Shenglong Zhou,
Geoffrey Ye Li
Abstract:
Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this…
▽ More
Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this challenge, we propose a neural collapse (NC) inspired deep supervised federated learning (NCDSFL) algorithm.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Precise Measurement of the $Λ$ Electric Dipole Moment through the Entangled Strange Baryon-Antibaryon System
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipol…
▽ More
The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipole moment (EDM). However, direct measurements of hyperon EDMs through spin precession are highly challenging due to their short lifetimes. In this paper, we present a novel method to extract the EDM of the lightest hyperon, $Λ$, using the entangled $Λ$$\overlineΛ$ system. Our result is consistent with zero, achieving a three-order-of-magnitude improvement over the previous upper limit established in the 1980s with comparable statistics, providing stringent constraints on potential new physics.
△ Less
Submitted 28 June, 2025; v1 submitted 23 June, 2025;
originally announced June 2025.
-
From simulations to observations. Methodology and data release of mock TNG50 galaxies at 0.3 < z < 0.7 for WEAVE-StePS
Authors:
A. Ikhsanova,
L. Costantin,
A. Pizzella,
E. M. Corsini,
L. Morelli,
F. R. Ditrani,
A. Ferré-Mateu,
L. Gabarra,
M. Gullieuszik,
C. P. Haines,
A. Iovino,
M. Longhetti,
A. Mercurio,
R. Ragusa,
P. Sánchez-Blázquez,
C. Tortora,
B. Vulcani,
S. Zhou,
E. Gafton,
F. Pistis
Abstract:
The new generation of optical spectrographs (i.e., WEAVE, 4MOST, DESI, and WST) offer unprecedented opportunities for statistically studying the star formation histories of galaxies. However, these observations are not easily comparable to predictions from cosmological simulations. Our goal is to build a reference framework for comparing spectroscopic observations with simulations and test tools f…
▽ More
The new generation of optical spectrographs (i.e., WEAVE, 4MOST, DESI, and WST) offer unprecedented opportunities for statistically studying the star formation histories of galaxies. However, these observations are not easily comparable to predictions from cosmological simulations. Our goal is to build a reference framework for comparing spectroscopic observations with simulations and test tools for deriving stellar population properties of galaxies. We focus on the observational strategy of the Stellar Population at Intermediate Redshift Survey (StePS) with the WEAVE instrument. We generate mock datasets of ~750 galaxies at redshifts z = 0.3, 0.5, and 0.7 using the TNG50 simulation, perform radiative transfer with SKIRT, and analyze the spectra with pPXF as if they were real observations. We present the methodology to generate these datasets and provide an initial exploration of stellar population parameters (i.e., mass-weighted ages and metallicities) and star formation histories for three galaxies at z = 0.7 and their descendants at z = 0.5 and 0.3. We find good agreement between the mock spectra and intrinsic ages in TNG50 (average difference $0.2\pm0.3$ Gyr) and successfully recover their star formation histories, especially for galaxies form the bulk of their stars on short timescales and at early epochs. We release these datasets, including multi-wavelength imaging and spectra, to support forthcoming WEAVE observations.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Reinforcement Learning-Based Dynamic Grouping for Tubular Structure Tracking
Authors:
Chong Di,
Shuwang Zhou,
Da Chen,
Jean-Marie Mirebeau,
Minglei Shu,
Laurent D. Cohen
Abstract:
The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenari…
▽ More
The computation of minimal paths for the applications in tracking tubular structures such as blood vessels and roads is challenged by complex morphologies and environmental variations. Existing approaches can be roughly categorized into two research lines: the point-wise based models and the segment-wise based models. Although segment-wise approaches have obtained promising results in many scenarios, they often suffer from computational inefficiency and heavily rely on a prescribed prior to fit the target elongated shapes. We propose a novel framework that casts segment-wise tracking as a Markov Decision Process (MDP), enabling a reinforcement learning approach. Our method leverages Q-Learning to dynamically explore a graph of segments, computing edge weights on-demand and adaptively expanding the search space. This strategy avoids the high cost of a pre-computed graph and proves robust to incomplete initial information. Experimental reuslts on typical tubular structure datasets demonstrate that our method significantly outperforms state-of-the-art point-wise and segment-wise approaches. The proposed method effectively handles complex topologies and maintains global path coherence without depending on extensive prior structural knowledge.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
MinD: Unified Visual Imagination and Control via Hierarchical World Models
Authors:
Xiaowei Chi,
Kuangzhi Ge,
Jiaming Liu,
Siyuan Zhou,
Peidong Jia,
Zichen He,
Yuzhen Liu,
Tingguang Li,
Lei Han,
Sirui Han,
Shanghang Zhang,
Yike Guo
Abstract:
Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate i…
▽ More
Video generation models (VGMs) offer a promising pathway for unified world modeling in robotics by integrating simulation, prediction, and manipulation. However, their practical application remains limited due to (1) slowgeneration speed, which limits real-time interaction, and (2) poor consistency between imagined videos and executable actions. To address these challenges, we propose Manipulate in Dream (MinD), a hierarchical diffusion-based world model framework that employs a dual-system design for vision-language manipulation. MinD executes VGM at low frequencies to extract video prediction features, while leveraging a high-frequency diffusion policy for real-time interaction. This architecture enables low-latency, closed-loop control in manipulation with coherent visual guidance. To better coordinate the two systems, we introduce a video-action diffusion matching module (DiffMatcher), with a novel co-training strategy that uses separate schedulers for each diffusion model. Specifically, we introduce a diffusion-forcing mechanism to DiffMatcher that aligns their intermediate representations during training, helping the fast action model better understand video-based predictions. Beyond manipulation, MinD also functions as a world simulator, reliably predicting task success or failure in latent space before execution. Trustworthy analysis further shows that VGMs can preemptively evaluate task feasibility and mitigate risks. Extensive experiments across multiple benchmarks demonstrate that MinD achieves state-of-the-art manipulation (63%+) in RL-Bench, advancing the frontier of unified world modeling in robotics.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset
Authors:
Zhuowei Chen,
Bingchuan Li,
Tianxiang Ma,
Lijie Liu,
Mingcong Liu,
Yi Zhang,
Gen Li,
Xinghui Li,
Siyu Zhou,
Qian He,
Xinglong Wu
Abstract:
Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by samp…
▽ More
Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by sampling reference images from the same scene as the target video. To address this issue, we introduce \textbf{Phantom-Data, the first general-purpose cross-pair subject-to-video consistency dataset}, containing approximately one million identity-consistent pairs across diverse categories. Our dataset is constructed via a three-stage pipeline: (1) a general and input-aligned subject detection module, (2) large-scale cross-context subject retrieval from more than 53 million videos and 3 billion images, and (3) prior-guided identity verification to ensure visual consistency under contextual variation. Comprehensive experiments show that training with Phantom-Data significantly improves prompt alignment and visual quality while preserving identity consistency on par with in-pair baselines.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster
Authors:
Fenghe Tang,
Wenxin Ma,
Zhiyang He,
Xiaodong Tao,
Zihang Jiang,
S. Kevin Zhou
Abstract:
With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly,…
▽ More
With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
Authors:
Yuanchen Bei,
Weizhi Zhang,
Siwen Wang,
Weizhi Chen,
Sheng Zhou,
Hao Chen,
Yong Li,
Jiajun Bu,
Shirui Pan,
Yizhou Yu,
Irwin King,
Fakhri Karray,
Philip S. Yu
Abstract:
AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are require…
▽ More
AI agents have experienced a paradigm shift, from early dominance by reinforcement learning (RL) to the rise of agents powered by large language models (LLMs), and now further advancing towards a synergistic fusion of RL and LLM capabilities. This progression has endowed AI agents with increasingly strong abilities. Despite these advances, to accomplish complex real-world tasks, agents are required to plan and execute effectively, maintain reliable memory, and coordinate smoothly with other agents. Achieving these capabilities involves contending with ever-present intricate information, operations, and interactions. In light of this challenge, data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms that agents can more effectively understand and process. In this context, graphs, with their natural advantage in organizing, managing, and harnessing intricate data relationships, present a powerful data paradigm for structurization to support the capabilities demanded by advanced AI agents. To this end, this survey presents a first systematic review of how graphs can empower AI agents. Specifically, we explore the integration of graph techniques with core agent functionalities, highlight notable applications, and identify prospective avenues for future research. By comprehensively surveying this burgeoning intersection, we hope to inspire the development of next-generation AI agents equipped to tackle increasingly sophisticated challenges with graphs. Related resources are collected and continuously updated for the community in the Github link.
△ Less
Submitted 4 July, 2025; v1 submitted 22 June, 2025;
originally announced June 2025.
-
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction
Authors:
Song Wang,
Zhen Tan,
Zihan Chen,
Shuang Zhou,
Tianlong Chen,
Jundong Li
Abstract:
Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination…
▽ More
Recent progress in large language model (LLM)-based multi-agent collaboration highlights the power of structured communication in enabling collective intelligence. However, existing methods largely rely on static or graph-based inter-agent topologies, lacking the potential adaptability and flexibility in communication. In this work, we propose a new framework that rethinks multi-agent coordination through a sequential structure rather than a graph structure, offering a significantly larger topology space for multi-agent communication. Our method focuses on two key directions: (1) Next-Agent Prediction, which selects the most suitable agent role at each step, and (2) Next-Context Selection (NCS), which enables each agent to selectively access relevant information from any previous step. Together, these components construct task-adaptive communication pipelines that support both role flexibility and global information flow. Extensive evaluations across multiple benchmarks demonstrate that our approach achieves superior performance while substantially reducing communication overhead.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos
Authors:
Qixin Wang,
Songtao Zhou,
Zeyu Jin,
Chenglin Guo,
Shikun Sun,
Xiaoyu Qin
Abstract:
Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi…
▽ More
Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the video's original intent. Expressive speech effectively conveys these cues and enhances the user's comprehension of videos. Building on these insights, this paper explores the usage of vision-context-aware expressive speech in enhancing users' understanding of videos in video commentary systems. Firstly, our formatting study indicates that semantic-only speech can lead to ambiguity, and misaligned emotions between speech and visuals may distort content interpretation. To address this, we propose a method called vision-context-aware speech synthesis (V-CASS). It analyzes para-linguistic cues from visuals using a vision-language model and leverages a knowledge-infused language model to guide the expressive speech model in generating context-aligned speech. User studies show that V-CASS enhances emotional and attitudinal resonance, as well as user audio-visual understanding and engagement, with 74.68% of participants preferring the system. Finally, we explore the potential of our method in helping blind and low-vision users navigate web videos, improving universal accessibility.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Private Training & Data Generation by Clustering Embeddings
Authors:
Felix Zhou,
Samson Zhou,
Vahab Mirrokni,
Alessandro Epasto,
Vincent Cohen-Addad
Abstract:
Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been shown to unintentionally memorize and reveal sensitive information, including reconstructing entire training samples. Differential privacy (DP) provides a robust…
▽ More
Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been shown to unintentionally memorize and reveal sensitive information, including reconstructing entire training samples. Differential privacy (DP) provides a robust framework for protecting individual data and in particular, a new approach to privately training deep neural networks is to approximate the input dataset with a privately generated synthetic dataset, before any subsequent training algorithm. We introduce a novel principled method for DP synthetic image embedding generation, based on fitting a Gaussian Mixture Model (GMM) in an appropriate embedding space using DP clustering. Our method provably learns a GMM under separation conditions. Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy on standard benchmark datasets. Additionally, we demonstrate that our method can generate realistic synthetic images that achieve downstream classification accuracy comparable to SOTA methods. Our method is quite general, as the encoder and decoder modules can be freely substituted to suit different tasks. It is also highly scalable, consisting only of subroutines that scale linearly with the number of samples and/or can be implemented efficiently in distributed systems.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Authors:
Sen Wang,
Le Wang,
Sanping Zhou,
Jingyi Tian,
Jiayi Li,
Haowen Sun,
Wei Tang
Abstract:
Robotic manipulation in high-precision tasks is essential for numerous industrial and real-world applications where accuracy and speed are required. Yet current diffusion-based policy learning methods generally suffer from low computational efficiency due to the iterative denoising process during inference. Moreover, these methods do not fully explore the potential of generative models for enhanci…
▽ More
Robotic manipulation in high-precision tasks is essential for numerous industrial and real-world applications where accuracy and speed are required. Yet current diffusion-based policy learning methods generally suffer from low computational efficiency due to the iterative denoising process during inference. Moreover, these methods do not fully explore the potential of generative models for enhancing information exploration in 3D environments. In response, we propose FlowRAM, a novel framework that leverages generative models to achieve region-aware perception, enabling efficient multimodal information processing. Specifically, we devise a Dynamic Radius Schedule, which allows adaptive perception, facilitating transitions from global scene comprehension to fine-grained geometric details. Furthermore, we integrate state space models to integrate multimodal information, while preserving linear computational complexity. In addition, we employ conditional flow matching to learn action poses by regressing deterministic vector fields, simplifying the learning process while maintaining performance. We verify the effectiveness of the FlowRAM in the RLBench, an established manipulation benchmark, and achieve state-of-the-art performance. The results demonstrate that FlowRAM achieves a remarkable improvement, particularly in high-precision tasks, where it outperforms previous methods by 12.0% in average success rate. Additionally, FlowRAM is able to generate physically plausible actions for a variety of real-world tasks in less than 4 time steps, significantly increasing inference speed.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks
Authors:
Yejing Wang,
Shengyu Zhou,
Jinyu Lu,
Qidong Liu,
Xinhang Li,
Wenlin Zhang,
Feng Li,
Pengjie Wang,
Jian Xu,
Bo Zheng,
Xiangyu Zhao
Abstract:
Generative recommendations (GR), which usually include item tokenizers and generative Large Language Models (LLMs), have demonstrated remarkable success across a wide range of scenarios. The majority of existing research efforts primarily concentrate on developing powerful item tokenizers or advancing LLM decoding strategies to attain superior performance. However, the critical fine-tuning step in…
▽ More
Generative recommendations (GR), which usually include item tokenizers and generative Large Language Models (LLMs), have demonstrated remarkable success across a wide range of scenarios. The majority of existing research efforts primarily concentrate on developing powerful item tokenizers or advancing LLM decoding strategies to attain superior performance. However, the critical fine-tuning step in GR frameworks, which is essential for adapting LLMs to recommendation data, remains largely unexplored. Current approaches predominantly rely on either the next-token prediction loss of supervised fine-tuning (SFT) or recommendationspecific direct preference optimization (DPO) strategies. Both methods ignore the exploration of possible positive unobserved samples, which is commonly referred to as the exposure bias problem. To mitigate this problem, this paper treats the GR as a multi-step generation task and constructs a GFlowNets-based fine-tuning framework (GFlowGR). The proposed framework integrates collaborative knowledge from traditional recommender systems to create an adaptive trajectory sampler and a comprehensive reward model. Leveraging the diverse generation property of GFlowNets, along with sampling and heuristic weighting techniques, GFlowGR emerges as a promising approach to mitigate the exposure bias problem. Extensive empirical results on two real-world datasets and with two different GR backbones highlight the effectiveness and robustness of GFlowGR.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Authors:
Yizhe Li,
Sanping Zhou,
Zheng Qin,
Le Wang
Abstract:
Dense video captioning is a challenging task that aims to localize and caption multiple events in an untrimmed video. Recent studies mainly follow the transformer-based architecture to jointly perform the two sub-tasks, i.e., event localization and caption generation, in an end-to-end manner. Based on the general philosophy of detection transformer, these methods implicitly learn the event locatio…
▽ More
Dense video captioning is a challenging task that aims to localize and caption multiple events in an untrimmed video. Recent studies mainly follow the transformer-based architecture to jointly perform the two sub-tasks, i.e., event localization and caption generation, in an end-to-end manner. Based on the general philosophy of detection transformer, these methods implicitly learn the event locations and event semantics, which requires a large amount of training data and limits the model's performance in practice. In this paper, we propose a novel dense video captioning framework, named PR-DETR, which injects the explicit position and relation prior into the detection transformer to improve the localization accuracy and caption quality, simultaneously. On the one hand, we first generate a set of position-anchored queries to provide the scene-specific position and semantic information about potential events as position prior, which serves as the initial event search regions to eliminate the implausible event proposals. On the other hand, we further design an event relation encoder to explicitly calculate the relationship between event boundaries as relation prior to guide the event interaction to improve the semantic coherence of the captions. Extensive ablation studies are conducted to verify the effectiveness of the position and relation prior. Experimental results also show the competitive performance of our method on ActivityNet Captions and YouCook2 datasets.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
BatteryBERT for Realistic Battery Fault Detection Using Point-Masked Signal Modeling
Authors:
Songqi Zhou,
Ruixue Liu,
Yixing Wang,
Jia Lu,
Benben Jiang
Abstract:
Accurate fault detection in lithium-ion batteries is essential for the safe and reliable operation of electric vehicles and energy storage systems. However, existing methods often struggle to capture complex temporal dependencies and cannot fully leverage abundant unlabeled data. Although large language models (LLMs) exhibit strong representation capabilities, their architectures are not directly…
▽ More
Accurate fault detection in lithium-ion batteries is essential for the safe and reliable operation of electric vehicles and energy storage systems. However, existing methods often struggle to capture complex temporal dependencies and cannot fully leverage abundant unlabeled data. Although large language models (LLMs) exhibit strong representation capabilities, their architectures are not directly suited to the numerical time-series data common in industrial settings. To address these challenges, we propose a novel framework that adapts BERT-style pretraining for battery fault detection by extending the standard BERT architecture with a customized time-series-to-token representation module and a point-level Masked Signal Modeling (point-MSM) pretraining task tailored to battery applications. This approach enables self-supervised learning on sequential current, voltage, and other charge-discharge cycle data, yielding distributionally robust, context-aware temporal embeddings. We then concatenate these embeddings with battery metadata and feed them into a downstream classifier for accurate fault classification. Experimental results on a large-scale real-world dataset show that models initialized with our pretrained parameters significantly improve both representation quality and classification accuracy, achieving an AUROC of 0.945 and substantially outperforming existing approaches. These findings validate the effectiveness of BERT-style pretraining for time-series fault detection.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence
Authors:
Yao Zhang,
Chenyang Lin,
Shijie Tang,
Haokun Chen,
Shijie Zhou,
Yunpu Ma,
Volker Tresp
Abstract:
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic sys…
▽ More
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation. Our code is publicly released at https://yaoz720.github.io/SwarmAgentic/.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $D^+\to K^+η^{\prime}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (697 additional authors not shown)
Abstract:
Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The bra…
▽ More
Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The branching fractions are determined to be ${\mathcal B}(D^+\to K^+ π^0) = (1.45 \pm 0.06 \pm 0.06)\times 10^{-4}$, ${\mathcal B}(D^+\to K^+ η) = (1.17 \pm 0.10 \pm 0.03)\times 10^{-4}$ and ${\mathcal B}(D^+\to K^+ η^{\prime}) = (1.88 \pm 0.15 \pm 0.06)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic. These results are consistent with the world average values but with significantly improved precision.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Determination of $|V_{cb}|$ using $B\to D\ellν_\ell$ Decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
K. Adamczyk,
L. Aggarwal,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
N. K. Baghel,
S. Bahinipati
, et al. (385 additional authors not shown)
Abstract:
We present a determination of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cb}|$ from the decay $B\to D\ellν_\ell$ using a $365~\mathrm{fb}^{-1}$ $e^+e^-\toΥ(4S)\to B\bar B$ data sample recorded by the Belle II experiment at the SuperKEKB collider. The semileptonic decay of one $B$ meson is reconstructed in the modes $B^0\to D^-(\to K^+π^-π^-)\ell^+ν_\ell$ and…
▽ More
We present a determination of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cb}|$ from the decay $B\to D\ellν_\ell$ using a $365~\mathrm{fb}^{-1}$ $e^+e^-\toΥ(4S)\to B\bar B$ data sample recorded by the Belle II experiment at the SuperKEKB collider. The semileptonic decay of one $B$ meson is reconstructed in the modes $B^0\to D^-(\to K^+π^-π^-)\ell^+ν_\ell$ and $B^+\to \bar D^0(\to K^+π^-)\ell^+ν_\ell$, where $\ell$ denotes either an electron or a muon. Charge conjugation is implied. The second $B$ meson in the $Υ(4S)$ event is not reconstructed explicitly. Using an inclusive reconstruction of the unobserved neutrino momentum, we determine the recoil variable $w=v_B\cdot v_D$, where $v_B$ and $v_D$ are the 4-velocities of the $B$ and $D$ mesons. We measure the total decay branching fractions to be $\mathcal{B}(B^0\to D^-\ell^+ν_\ell)=(2.06 \pm 0.05\,(\mathrm{stat.}) \pm 0.10\,(\mathrm{sys.}))\%$ and $\mathcal{B}(B^+\to\bar D^0\ell^+ν_\ell)=(2.31 \pm 0.04\,(\mathrm{stat.}) \pm 0.09\,(\mathrm{sys.}))\%$. We probe lepton flavor universality by measuring $\mathcal{B}(B\to Deν_e)/\mathcal{B}(B\to Dμν_μ)=1.020 \pm 0.020\,(\mathrm{stat.})\pm 0.022\,(\mathrm{sys.})$. Fitting the partial decay branching fraction as a function of $w$ and using the average of lattice QCD calculations of the $B\to D$ form factor, we obtain $ |V_{cb}|=(39.2\pm 0.4\,(\mathrm{stat.}) \pm 0.6\,(\mathrm{sys.}) \pm 0.5\,(\mathrm{th.})$.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Advancing Loss Functions in Recommender Systems: A Comparative Study with a Rényi Divergence-Based Solution
Authors:
Shengjia Zhang,
Jiawei Chen,
Changdong Li,
Sheng Zhou,
Qihao Shi,
Yan Feng,
Chun Chen,
Can Wang
Abstract:
Loss functions play a pivotal role in optimizing recommendation models. Among various loss functions, Softmax Loss (SL) and Cosine Contrastive Loss (CCL) are particularly effective. Their theoretical connections and differences warrant in-depth exploration. This work conducts comprehensive analyses of these losses, yielding significant insights: 1) Common strengths -- both can be viewed as augment…
▽ More
Loss functions play a pivotal role in optimizing recommendation models. Among various loss functions, Softmax Loss (SL) and Cosine Contrastive Loss (CCL) are particularly effective. Their theoretical connections and differences warrant in-depth exploration. This work conducts comprehensive analyses of these losses, yielding significant insights: 1) Common strengths -- both can be viewed as augmentations of traditional losses with Distributional Robust Optimization (DRO), enhancing robustness to distributional shifts; 2) Respective limitations -- stemming from their use of different distribution distance metrics in DRO optimization, SL exhibits high sensitivity to false negative instances, whereas CCL suffers from low data utilization. To address these limitations, this work proposes a new loss function, DrRL, which generalizes SL and CCL by leveraging Rényi-divergence in DRO optimization. DrRL incorporates the advantageous structures of both SL and CCL, and can be demonstrated to effectively mitigate their limitations. Extensive experiments have been conducted to validate the superiority of DrRL on both recommendation accuracy and robustness.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Moment-enhanced shallow water equations for non-slip boundary conditions
Authors:
Shiping Zhou,
Juntao Huang,
Andrew J. Christlieb
Abstract:
The shallow water equations often assume a constant velocity profile along the vertical axis. However, this assumption does not hold in many practical applications. To better approximate the vertical velocity distribution, models such as the shallow water moment expansion models have been proposed. Nevertheless, under non-slip bottom boundary conditions, both the standard shallow water equation an…
▽ More
The shallow water equations often assume a constant velocity profile along the vertical axis. However, this assumption does not hold in many practical applications. To better approximate the vertical velocity distribution, models such as the shallow water moment expansion models have been proposed. Nevertheless, under non-slip bottom boundary conditions, both the standard shallow water equation and its moment-enhanced models struggle to accurately capture the vertical velocity profile due to the stiff source terms. In this work, we propose modified shallow water equations and corresponding moment-enhanced models that perform well under both non-slip and slip boundary conditions. The primary difference between the modified and original models lies in the treatment of the source term, which allows our modified moment expansion models to be readily generalized, while maintaining compatibility with our previous analysis on the hyperbolicity of the model. To assess the performance of both the standard and modified moment expansion models, we conduct a comprehensive numerical comparison with the incompressible Navier--Stokes equations -- a comparison that is absent from existing literature.
△ Less
Submitted 26 May, 2025;
originally announced June 2025.
-
Analysis of three-body charmed $B$ meson decays $B \to {D}(V^* \to){V P}$
Authors:
Jing Ou-Yang,
Run-Hui Li,
Si-Hong Zhou
Abstract:
We systematically analyze the decays $B_{(s)} \to D_{(s)} (V^* \to)\, V\, P$, where $V^*$ represents a vector resonance ($ρ, \, ω$ or $K^*$), and $V P$ denotes the final-state meson pairs $ ω\, π$, $ ρ\, π$ and $ ρ\, K$. The intermediate subprocesses $B_{(s)} \to D_{(s)} V^*$ are calculated in the factorization-assisted topological-amplitude approach, while the intermediate resonant states $V^*$ a…
▽ More
We systematically analyze the decays $B_{(s)} \to D_{(s)} (V^* \to)\, V\, P$, where $V^*$ represents a vector resonance ($ρ, \, ω$ or $K^*$), and $V P$ denotes the final-state meson pairs $ ω\, π$, $ ρ\, π$ and $ ρ\, K$. The intermediate subprocesses $B_{(s)} \to D_{(s)} V^*$ are calculated in the factorization-assisted topological-amplitude approach, while the intermediate resonant states $V^*$ are modeled using a relativistic Breit-Wigner distribution, subsequently decaying into $VP$ through strong interactions. We predict the off-shell effects of the ground-state resonances ($ρ,\, ω, \, K^*$) in $B_{(s)} \to D_{(s)} (V^* \to )V P$. Our results show that the virtual contributions from $ρ\to ω\, π$, $ω\to ρ\, π$, and $K^* \to ρ\, K$ are crucial for these three-body decays, $B_{(s)} \to D_{(s)} V\, P$. In particular, the branching fractions arising from the $ρ$ and $ω$ virtual effects can be comparable to the total decay rates of $B_{(s)} \to D_{(s)} ω\, π$ and $B_{(s)} \to D_{(s)} ρ\, π$, respectively. Decays with branching fractions of order $10^{-6}-10^{-4}$ are expected to be measurable at Belle II and LHCb. Compared with previous perturbative QCD predictions for $B_{(s)} \to D_{(s)} (ρ\to)\, ω\, π$, our results are consistent but exhibit higher precision.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
Authors:
Jingqi Yang,
Zhilong Song,
Jiawei Chen,
Mingli Song,
Sheng Zhou,
linjun sun,
Xiaogang Ouyang,
Chun Chen,
Can Wang
Abstract:
The development of high-quality datasets is crucial for benchmarking and advancing research in Graphical User Interface (GUI) agents. Despite their importance, existing datasets are often constructed under idealized conditions, overlooking the diverse anomalies frequently encountered in real-world deployments. To address this limitation, we introduce GUI-Robust, a novel dataset designed for compre…
▽ More
The development of high-quality datasets is crucial for benchmarking and advancing research in Graphical User Interface (GUI) agents. Despite their importance, existing datasets are often constructed under idealized conditions, overlooking the diverse anomalies frequently encountered in real-world deployments. To address this limitation, we introduce GUI-Robust, a novel dataset designed for comprehensive GUI agent evaluation, explicitly incorporating seven common types of anomalies observed in everyday GUI interactions. Furthermore, we propose a semi-automated dataset construction paradigm that collects user action sequences from natural interactions via RPA tools and then generate corresponding step and task descriptions for these actions with the assistance of MLLMs. This paradigm significantly reduces annotation time cost by a factor of over 19 times. Finally, we assess state-of-the-art GUI agents using the GUI-Robust dataset, revealing their substantial performance degradation in abnormal scenarios. We anticipate that our work will highlight the importance of robustness in GUI agents and inspires more future research in this direction. The dataset and code are available at https://github.com/chessbean1/GUI-Robust..
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing
Authors:
Biao Yang,
Muqi Huang,
Yuhui Zhang,
Yun Xiong,
Kun Zhou,
Xi Chen,
Shiyang Zhou,
Huishuai Bao,
Chuan Li,
Feng Shi,
Hualei Liu
Abstract:
Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step po…
▽ More
Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step point-based image editing method, named AttentionDrag, which leverages the inherent latent knowledge and feature correlations within pre-trained diffusion models for image editing tasks. This framework enables semantic consistency and high-quality manipulation without the need for extensive re-optimization or retraining. Specifically, we reutilize the latent correlations knowledge learned by the self-attention mechanism in the U-Net module during the DDIM inversion process to automatically identify and adjust relevant image regions, ensuring semantic validity and consistency. Additionally, AttentionDrag adaptively generates masks to guide the editing process, enabling precise and context-aware modifications with friendly interaction. Our results demonstrate a performance that surpasses most state-of-the-art methods with significantly faster speeds, showing a more efficient and semantically coherent solution for point-based image editing tasks.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
On secure UAV-aided ISCC systems
Authors:
Hongjiang Lei,
Congke Jiang,
Ki-Hong Park,
Mohamed A. Aboulhassan,
Sen Zhou,
Gaofeng Pan
Abstract:
Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV se…
▽ More
Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV sends radar signals to locate and disrupt potential eavesdroppers while providing offload services to ground users (GUs). Considering the constraints of UAV maximum speed, transmit power, and propulsion energy, as well as secure offloading, data transmission, and computation time, the total energy consumption of GUs is minimized by jointly optimizing user offloading ratio, user scheduling strategy, transmit beamforming, and UAV trajectory. An efficient iterative optimization algorithm is proposed to solve the non-convex optimization problem caused by tightly coupled dependent variables. In particular, the original optimization problem is decomposed into four sub-optimization problems, and the non-convex sub-problems are transformed into approximately convex forms via successive convex approximation. Then, all sub-problems are solved successively by using the block coordinate descent technique. Numerical results demonstrate the convergence and validate the effectiveness of the proposed algorithm.
△ Less
Submitted 27 June, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises
Authors:
Afifa Khaled,
Mohammed Sabir,
Rizwan Qureshi,
Camillo Maria Caruso,
Valerio Guarrasi,
Suncheng Xiang,
S Kevin Zhou
Abstract:
The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research by providing freely accessible, deidentified records from tens of thousands of critical care admissions, enabling a broad spectrum of applications in clinical decision support, outcome prediction, and healthcare analytics. Although numerous studies and surveys have explored the predic…
▽ More
The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research by providing freely accessible, deidentified records from tens of thousands of critical care admissions, enabling a broad spectrum of applications in clinical decision support, outcome prediction, and healthcare analytics. Although numerous studies and surveys have explored the predictive power and clinical utility of MIMIC based models, critical challenges in data integration, representation, and interoperability remain underexplored. This paper presents a comprehensive survey that focuses uniquely on open problems. We identify persistent issues such as data granularity, cardinality limitations, heterogeneous coding schemes, and ethical constraints that hinder the generalizability and real-time implementation of machine learning models. We highlight key progress in dimensionality reduction, temporal modelling, causal inference, and privacy preserving analytics, while also outlining promising directions including hybrid modelling, federated learning, and standardized preprocessing pipelines. By critically examining these structural limitations and their implications, this survey offers actionable insights to guide the next generation of MIMIC powered digital health innovations.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.