Search | arXiv e-print repository

LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction

Authors: Yixin Yan, Yang Li, Yuanfan Wang, Xiaozhou Zhou, Beihao Xia, Manjiang Hu, Hongmao Qin

Abstract: It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial inte… ▽ More It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc. To further refine the trajectory prediction, we propose a Lightweight Proposal Refinement Module that leverages Multi-Layer Perceptrons for trajectory embedding and generates the refined trajectories with fewer model parameters. Experiment results on the Argoverse 1 dataset demonstrate that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%. We also achieve higher accuracy than HiVT-128 with a 68% reduction in model size. △ Less

Submitted 6 July, 2025; originally announced July 2025.

arXiv:2507.03770 [pdf, ps, other]

Efficient streaming dynamic mode decomposition

Authors: Aditya Kale, Marcos Netto, Xinyang Zhou

Abstract: We propose a reformulation of the streaming dynamic mode decomposition method that requires maintaining a single orthonormal basis, thereby reducing computational redundancy. The proposed efficient streaming dynamic mode decomposition method results in a constant-factor reduction in computational complexity and memory storage requirements. Numerical experiments on representative canonical dynamica… ▽ More We propose a reformulation of the streaming dynamic mode decomposition method that requires maintaining a single orthonormal basis, thereby reducing computational redundancy. The proposed efficient streaming dynamic mode decomposition method results in a constant-factor reduction in computational complexity and memory storage requirements. Numerical experiments on representative canonical dynamical systems show that the enhanced computational efficiency does not compromise the accuracy of the proposed method. △ Less

Submitted 4 July, 2025; originally announced July 2025.

arXiv:2507.03730 [pdf, ps, other]

Less is More: Empowering GUI Agent with Context-Aware Simplification

Authors: Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie

Abstract: The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the ex… ▽ More The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent. △ Less

Submitted 4 July, 2025; originally announced July 2025.

Comments: Accepted to ICCV 2025

arXiv:2507.03368 [pdf, ps, other]

doi 10.1103/jyq4-d4gs

Ferroelectric Antiferromagnetic Lifting of Spin-Valley Degeneracy

Authors: Jiaqi Feng, Xiaodong Zhou, Jingyan Chen, Meiling Xu, Xiuxian Yang, Yinwei Li

Abstract: The generation and control of spin- and valley-polarization in antiferromagnets (AFMs) have garnered increasing attention due to their potential for enabling faster and more stable multifunctional spintronic and valleytronic memory and logic devices. However, the two primary categories of AFMs, altermagnets and TP-symmetric AFMs, either lack intrinsic valley-polarization or net spin-polarization.… ▽ More The generation and control of spin- and valley-polarization in antiferromagnets (AFMs) have garnered increasing attention due to their potential for enabling faster and more stable multifunctional spintronic and valleytronic memory and logic devices. However, the two primary categories of AFMs, altermagnets and TP-symmetric AFMs, either lack intrinsic valley-polarization or net spin-polarization. Here, we propose an effective approach for achieving spontaneous spin-valley polarization in TP-broken layered ferroelectric antiferromagnets (FE-AFMs). The FE-AFMs exhibit lifted spin degeneracy across the entire Brillouin zone, along with uncompensated spin density of states. They combine the benefits of spin-polarization in altermagnets with valley-polarization in TP-symmetric AFMs. Furthermore, the FE-AFMs feature layer-dependent spin-polarization, rooted in their intrinsic ferroelectric property, allowing for the flexible control over spin-valley polarization by interlayer sliding. This tunability facilitates sign-reversible and size-tunable valley Hall and Nernst effects, along with other spin-valley-dependent transport properties. Our findings are demonstrated in a broad class of TP-broken bilayer antiferromagnets such as Nb3X8 (X = Cl, Br, I), VX2 (X = S, Se), and VSi2X4 (X = N, P), underscoring the potential of FE-AFMs for advancing next-generation spin- and valley-based information technologies. △ Less

Submitted 4 July, 2025; originally announced July 2025.

Comments: 11 pages

Journal ref: Physical Review B 111, 214446 (2025)

arXiv:2507.03043 [pdf, ps, other]

K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

Authors: Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

Abstract: Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specifi… ▽ More Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specific errors while remaining fully interpretable. Kids-WFST attains 1.39% phoneme error on MyST and 8.61% on Multitudes--absolute gains of 10.47 and 7.06 points over a greedy-search decoder. These high-fidelity transcripts power an LLM that grades verbal skills, milestones, reading, and comprehension, aligning with human proctors and supplying tongue-and-lip visualizations plus targeted advice. The results show that precise phoneme recognition cements a complete diagnostic-feedback loop, paving the way for scalable, clinician-ready language assessment. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2507.02860 [pdf, ps, other]

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

Authors: Xin Zhou, Dingkang Liang, Kaijin Chen, Tianrui Feng, Xiwu Chen, Hongkai Lin, Yikang Ding, Feiyang Tan, Hengshuang Zhao, Xiang Bai

Abstract: Video generation models have demonstrated remarkable performance, yet their broader adoption remains constrained by slow inference speeds and substantial computational costs, primarily due to the iterative nature of the denoising process. Addressing this bottleneck is essential for democratizing advanced video synthesis technologies and enabling their integration into real-world applications. This… ▽ More Video generation models have demonstrated remarkable performance, yet their broader adoption remains constrained by slow inference speeds and substantial computational costs, primarily due to the iterative nature of the denoising process. Addressing this bottleneck is essential for democratizing advanced video synthesis technologies and enabling their integration into real-world applications. This work proposes EasyCache, a training-free acceleration framework for video diffusion models. EasyCache introduces a lightweight, runtime-adaptive caching mechanism that dynamically reuses previously computed transformation vectors, avoiding redundant computations during inference. Unlike prior approaches, EasyCache requires no offline profiling, pre-computation, or extensive parameter tuning. We conduct comprehensive studies on various large-scale video generation models, including OpenSora, Wan2.1, and HunyuanVideo. Our method achieves leading acceleration performance, reducing inference time by up to 2.1-3.3$\times$ compared to the original baselines while maintaining high visual fidelity with a significant up to 36% PSNR improvement compared to the previous SOTA method. This improvement makes our EasyCache a efficient and highly accessible solution for high-quality video generation in both research and practical applications. The code is available at https://github.com/H-EmbodVis/EasyCache. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: The code is made available at https://github.com/H-EmbodVis/EasyCache. Project page: https://h-embodvis.github.io/EasyCache/

arXiv:2507.02460 [pdf, ps, other]

Enhancing Photon Indistinguishability of Spectrally Mismatched Single Photons by Cavity Floquet Engineering

Authors: J. W. Yu, X. Q. Zhou, Z. B. Ni, X. T. Cheng, Y. Zhao, H. H. Zhu, C. H. Li, F. Liu, C. Y. Jin

Abstract: We theoretically propose a scheme to enhance the photon indistinguishability of spectrally mismatched single photons via Floquet-engineered optical frequency combs (OFCs) in cavity quantum electrodynamic systems. By periodically modulating two distinct single-photon states under a modulation frequency which is exactly equal to the spectral mismatch of two cavity modes, a pair of single-photon freq… ▽ More We theoretically propose a scheme to enhance the photon indistinguishability of spectrally mismatched single photons via Floquet-engineered optical frequency combs (OFCs) in cavity quantum electrodynamic systems. By periodically modulating two distinct single-photon states under a modulation frequency which is exactly equal to the spectral mismatch of two cavity modes, a pair of single-photon frequency-comb (SPFC) states is prepared energy-conservatively based on full unitary operations. The two states show high indistinguishability with an ideal $g^{(2)}_\mathrm{HOM}(0)$ down to zero due to the superposition of intensity-matched single-photon states coherently distributed across the teeth of the combs. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2507.02447 [pdf, ps, other]

HAC-LOCO: Learning Hierarchical Active Compliance Control for Quadruped Locomotion under Continuous External Disturbances

Authors: Xiang Zhou, Xinyu Zhang, Qingrui Zhang

Abstract: Despite recent remarkable achievements in quadruped control, it remains challenging to ensure robust and compliant locomotion in the presence of unforeseen external disturbances. Existing methods prioritize locomotion robustness over compliance, often leading to stiff, high-frequency motions, and energy inefficiency. This paper, therefore, presents a two-stage hierarchical learning framework that… ▽ More Despite recent remarkable achievements in quadruped control, it remains challenging to ensure robust and compliant locomotion in the presence of unforeseen external disturbances. Existing methods prioritize locomotion robustness over compliance, often leading to stiff, high-frequency motions, and energy inefficiency. This paper, therefore, presents a two-stage hierarchical learning framework that can learn to take active reactions to external force disturbances based on force estimation. In the first stage, a velocity-tracking policy is trained alongside an auto-encoder to distill historical proprioceptive features. A neural network-based estimator is learned through supervised learning, which estimates body velocity and external forces based on proprioceptive measurements. In the second stage, a compliance action module, inspired by impedance control, is learned based on the pre-trained encoder and policy. This module is employed to actively adjust velocity commands in response to external forces based on real-time force estimates. With the compliance action module, a quadruped robot can robustly handle minor disturbances while appropriately yielding to significant forces, thus striking a balance between robustness and compliance. Simulations and real-world experiments have demonstrated that our method has superior performance in terms of robustness, energy efficiency, and safety. Experiment comparison shows that our method outperforms the state-of-the-art RL-based locomotion controllers. Ablation studies are given to show the critical roles of the compliance action module. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: 8 pages, 7 Figures

arXiv:2507.02332 [pdf, ps, other]

PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage

Authors: Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou

Abstract: This paper investigates privacy jailbreaking in LLMs via steering, focusing on whether manipulating activations can bypass LLM alignment and alter response behaviors to privacy related queries (e.g., a certain public figure's sexual orientation). We begin by identifying attention heads predictive of refusal behavior for private attributes (e.g., sexual orientation) using lightweight linear probes… ▽ More This paper investigates privacy jailbreaking in LLMs via steering, focusing on whether manipulating activations can bypass LLM alignment and alter response behaviors to privacy related queries (e.g., a certain public figure's sexual orientation). We begin by identifying attention heads predictive of refusal behavior for private attributes (e.g., sexual orientation) using lightweight linear probes trained with privacy evaluator labels. Next, we steer the activations of a small subset of these attention heads guided by the trained probes to induce the model to generate non-refusal responses. Our experiments show that these steered responses often disclose sensitive attribute details, along with other private information about data subjects such as life events, relationships, and personal histories that the models would typically refuse to produce. Evaluations across four LLMs reveal jailbreaking disclosure rates of at least 95%, with more than 50% on average of these responses revealing true personal information. Our controlled study demonstrates that private information memorized in LLMs can be extracted through targeted manipulation of internal activations. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: Preprint

arXiv:2507.02256 [pdf, ps, other]

Uncertainty-aware Reward Design Process

Authors: Yang Yang, Xiaolu Zhou, Bosong Ding, Miao Xin

Abstract: Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate reward function design. However, their suboptimal performance in numerical optimization of… ▽ More Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate reward function design. However, their suboptimal performance in numerical optimization often yields unsatisfactory reward quality, while the evolutionary search paradigm demonstrates inefficient utilization of simulation resources, resulting in prohibitively lengthy design cycles with disproportionate computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation in RL environments. URDP quantifies candidate reward function uncertainty based on self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel reward components. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to significantly enhance hyperparameter configuration efficiency. Finally, we construct a bi-level optimization architecture by decoupling the reward component optimization and the hyperparameter tuning. URDP orchestrates synergistic collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian Optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: 34 pages, 9 figures

arXiv:2507.02149 [pdf, ps, other]

Observation of orbitally excited $B_{c}^{+}$ states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1154 additional authors not shown)

Abstract: The observation of a wide peaking structure in the $B_{c}^{+} γ$ mass spectrum is reported using proton-proton collision data collected by the LHCb detector at center-of-mass energies of $7$, $8$ and $13~\text{TeV}$, corresponding to a total integrated luminosity of $9~\text{fb}^{-1}$. The statistical significance over the background-only hypothesis exceeds seven standard deviations. The width of… ▽ More The observation of a wide peaking structure in the $B_{c}^{+} γ$ mass spectrum is reported using proton-proton collision data collected by the LHCb detector at center-of-mass energies of $7$, $8$ and $13~\text{TeV}$, corresponding to a total integrated luminosity of $9~\text{fb}^{-1}$. The statistical significance over the background-only hypothesis exceeds seven standard deviations. The width of the observed structure is larger than the expectation from a single-peak hypothesis, and is well described by an effective minimal model consisting of two narrow peaks located at $6704.8 \pm 5.5 \pm 2.8 \pm 0.3~\mathrm{Me\kern -0.1em V\!/}c^2$ and $6752.4 \pm 9.5 \pm 3.1 \pm 0.3~\mathrm{Me\kern -0.1em V\!/}c^2$. The uncertainty terms are statistical, systematic, and associated to the knowledge of the $B_{c}^{+}$ mass, respectively. The measured peak locations are in line with theoretical predictions for lowest excited $P$-wave $B_{c}^{+}$ states, marking the first observation of orbitally excited beauty-charm mesons and providing important insights into the internal dynamics of hadrons containing two heavy quarks. △ Less

Submitted 4 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5250 (LHCb public pages)

Report number: LHCb-PAPER-2025-014, CERN-EP-2025-123

arXiv:2507.02142 [pdf, ps, other]

Study of $B_{c}(1P)^{+}$ states in the $B_{c}^{+} γ$ mass spectrum

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1154 additional authors not shown)

Abstract: The study of a wide peaking structure in the $B_{c}^{+} γ$ mass spectrum is reported using a data sample of proton-proton collisions collected by the LHCb detector at center-of-mass energies of $7$, $8$ and $13~\text{TeV}$, corresponding to an integrated luminosity of $9~\text{fb}^{-1}$. The observed structure is consistent with the lowest excited $P$-wave $B_{c}^{+}$ states and exhibits a statist… ▽ More The study of a wide peaking structure in the $B_{c}^{+} γ$ mass spectrum is reported using a data sample of proton-proton collisions collected by the LHCb detector at center-of-mass energies of $7$, $8$ and $13~\text{TeV}$, corresponding to an integrated luminosity of $9~\text{fb}^{-1}$. The observed structure is consistent with the lowest excited $P$-wave $B_{c}^{+}$ states and exhibits a statistical significance exceeding seven standard deviations relative to the background-only hypothesis. A two-peak model serves as an effective description of the data, with various theory-constrained models further explored to provide physical interpretation. Based on the predictions for the $B_{c}(1P)^{+}$ spectrum, the relative production cross-section of the overall $B_{c}(1P)^{+}$ states with respect to the $B_{c}^{+}$ ground state with the transverse momentum $p_{\text{T}}$ and rapidity $y$ of $B_{c}^{+}$ mesons in the regions $p_{\text{T}}<20~\mathrm{Ge\kern -0.1em V\!/}c$ and $2.0<y<4.5$ at $\sqrt{s}=13~\text{TeV}$ is measured to be $0.20 \pm 0.03 \pm 0.02 \pm 0.03$, where the uncertainty terms represent statistical, systematic, and uncertainties related to the choice of theoretical models, respectively. The results provide a test of theoretical models and deepen our understanding of quantum chromodynamics. △ Less

Submitted 4 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/4137 (LHCb public pages)

Report number: LHCb-PAPER-2025-015, CERN-EP-2025-122

arXiv:2507.01616 [pdf, ps, other]

Enhanced Influence-aware Group Recommendation for Online Media Propagation

Authors: Chengkun He, Xiangmin Zhou, Chen Wang, Longbing Cao, Jie Shao, Xiaodong Li, Guang Xu, Carrie Jinqiu Hu, Zahir Tari

Abstract: Group recommendation over social media streams has attracted significant attention due to its wide applications in domains such as e-commerce, entertainment, and online news broadcasting. By leveraging social connections and group behaviours, group recommendation (GR) aims to provide more accurate and engaging content to a set of users rather than individuals. Recently, influence-aware GR has emer… ▽ More Group recommendation over social media streams has attracted significant attention due to its wide applications in domains such as e-commerce, entertainment, and online news broadcasting. By leveraging social connections and group behaviours, group recommendation (GR) aims to provide more accurate and engaging content to a set of users rather than individuals. Recently, influence-aware GR has emerged as a promising direction, as it considers the impact of social influence on group decision-making. In earlier work, we proposed Influence-aware Group Recommendation (IGR) to solve this task. However, this task remains challenging due to three key factors: the large and ever-growing scale of social graphs, the inherently dynamic nature of influence propagation within user groups, and the high computational overhead of real-time group-item matching. To tackle these issues, we propose an Enhanced Influence-aware Group Recommendation (EIGR) framework. First, we introduce a Graph Extraction-based Sampling (GES) strategy to minimise redundancy across multiple temporal social graphs and effectively capture the evolving dynamics of both groups and items. Second, we design a novel DYnamic Independent Cascade (DYIC) model to predict how influence propagates over time across social items and user groups. Finally, we develop a two-level hash-based User Group Index (UG-Index) to efficiently organise user groups and enable real-time recommendation generation. Extensive experiments on real-world datasets demonstrate that our proposed framework, EIGR, consistently outperforms state-of-the-art baselines in both effectiveness and efficiency. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2507.01291 [pdf, ps, other]

PanTS: The Pancreatic Tumor Segmentation Dataset

Authors: Wenxuan Li, Xinze Zhou, Qi Chen, Tianyu Lin, Pedro R. A. S. Bassi, Szymon Plotka, Jaroslaw B. Cwikla, Xiaoxi Chen, Chen Ye, Zheren Zhu, Kai Ding, Heng Li, Kang Wang, Yang Yang, Yucheng Tang, Daguang Xu, Alan L. Yuille, Zongwei Zhou

Abstract: PanTS is a large-scale, multi-institutional dataset curated to advance research in pancreatic CT analysis. It contains 36,390 CT scans from 145 medical centers, with expert-validated, voxel-wise annotations of over 993,000 anatomical structures, covering pancreatic tumors, pancreas head, body, and tail, and 24 surrounding anatomical structures such as vascular/skeletal structures and abdominal/tho… ▽ More PanTS is a large-scale, multi-institutional dataset curated to advance research in pancreatic CT analysis. It contains 36,390 CT scans from 145 medical centers, with expert-validated, voxel-wise annotations of over 993,000 anatomical structures, covering pancreatic tumors, pancreas head, body, and tail, and 24 surrounding anatomical structures such as vascular/skeletal structures and abdominal/thoracic organs. Each scan includes metadata such as patient age, sex, diagnosis, contrast phase, in-plane spacing, slice thickness, etc. AI models trained on PanTS achieve significantly better performance in pancreatic tumor detection, localization, and segmentation compared to those trained on existing public datasets. Our analysis indicates that these gains are directly attributable to the 16x larger-scale tumor annotations and indirectly supported by the 24 additional surrounding anatomical structures. As the largest and most comprehensive resource of its kind, PanTS offers a new benchmark for developing and evaluating AI models in pancreatic CT analysis. △ Less

Submitted 1 July, 2025; originally announced July 2025.

arXiv:2507.00358 [pdf, ps, other]

Data-Driven Exploration for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems

Authors: Yilie Huang, Xun Yu Zhou

Abstract: We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the cri… ▽ More We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts. △ Less

Submitted 30 June, 2025; originally announced July 2025.

Comments: 36 pages, 10 figures

arXiv:2506.23623 [pdf, ps, other]

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

Authors: Shaofei Huang, Rui Ling, Tianrui Hui, Hongyu Li, Xu Zhou, Shifeng Zhang, Si Liu, Richang Hong, Meng Wang

Abstract: Audio-Visual Segmentation (AVS) aims to segment sound-producing objects in video frames based on the associated audio signal. Prevailing AVS methods typically adopt an audio-centric Transformer architecture, where object queries are derived from audio features. However, audio-centric Transformers suffer from two limitations: perception ambiguity caused by the mixed nature of audio, and weakened de… ▽ More Audio-Visual Segmentation (AVS) aims to segment sound-producing objects in video frames based on the associated audio signal. Prevailing AVS methods typically adopt an audio-centric Transformer architecture, where object queries are derived from audio features. However, audio-centric Transformers suffer from two limitations: perception ambiguity caused by the mixed nature of audio, and weakened dense prediction ability due to visual detail loss. To address these limitations, we propose a new Vision-Centric Transformer (VCT) framework that leverages vision-derived queries to iteratively fetch corresponding audio and visual information, enabling queries to better distinguish between different sounding objects from mixed audio and accurately delineate their contours. Additionally, we also introduce a Prototype Prompted Query Generation (PPQG) module within our VCT framework to generate vision-derived queries that are both semantically aware and visually rich through audio prototype prompting and pixel context grouping, facilitating audio-visual information aggregation. Extensive experiments demonstrate that our VCT framework achieves new state-of-the-art performances on three subsets of the AVSBench dataset. The code is available at https://github.com/spyflying/VCT_AVS. △ Less

Submitted 30 June, 2025; originally announced June 2025.

Comments: Accepted by CVPR 2025; Code: https://github.com/spyflying/VCT_AVS; Models: https://huggingface.co/nowherespyfly/VCT_AVS

arXiv:2506.23601 [pdf, ps, other]

Semantic-guided Diverse Decoding for Large Language Model

Authors: Weijie Shi, Yue Cui, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Mengze Li, Sirui Han, Jia Zhu, Jiajie Xu, Xiaofang Zhou

Abstract: Diverse decoding of large language models is crucial for applications requiring multiple semantically distinct responses, yet existing methods primarily achieve lexical rather than semantic diversity. This limitation significantly constrains Best-of-N strategies, group-based reinforcement learning, and data synthesis. While temperature sampling and diverse beam search modify token distributions or… ▽ More Diverse decoding of large language models is crucial for applications requiring multiple semantically distinct responses, yet existing methods primarily achieve lexical rather than semantic diversity. This limitation significantly constrains Best-of-N strategies, group-based reinforcement learning, and data synthesis. While temperature sampling and diverse beam search modify token distributions or apply n-gram penalties, they fail to ensure meaningful semantic differentiation. We introduce Semantic-guided Diverse Decoding (SemDiD), operating directly in embedding space that balances quality with diversity through three complementary mechanisms: orthogonal directional guidance, dynamic inter-group repulsion, and position-debiased probability assessment. SemDiD harmonizes these competing objectives using adaptive gain functions and constraint optimization, ensuring both quality thresholds and maximal semantic differentiation. Experiments show SemDiD consistently outperforms existing methods, improving Best-of-N coverage by 1.4-5.2% across diverse tasks and accelerating RLHF training convergence by 15% while increasing accuracy by up to 2.1%. △ Less

Submitted 30 June, 2025; originally announced June 2025.

arXiv:2506.23445 [pdf]

Topotactic phase transformation in correlated vanadium dioxide through oxygen vacancy ordering

Authors: Xuanchi Zhou, Xiaohui Yao, Xiaomei Qiao, Jiahui Ji, Guowei Zhou, Huihui Ji, Xiaohong Xu

Abstract: Controlling the insulator-metal transition (IMT) in correlated oxide system through oxygen vacancy ordering opens up a new paradigm for exploring exotic structural transformation and physical functionality. Oxygen vacancy serves as a powerful tuning knob for adjusting the IMT property in VO2, though driving topochemical reduction to V2O3 remains challenging due to structural incompatibility and co… ▽ More Controlling the insulator-metal transition (IMT) in correlated oxide system through oxygen vacancy ordering opens up a new paradigm for exploring exotic structural transformation and physical functionality. Oxygen vacancy serves as a powerful tuning knob for adjusting the IMT property in VO2, though driving topochemical reduction to V2O3 remains challenging due to structural incompatibility and competing phase instability. Here we unveil consecutive oxygen-vacancy-driven VO2-VO2-x-V2O3 topotactic phase transformation route with enticing facet-dependent anisotropy, engendering tunable IMT properties over an extended temperature range. Remarkably, topochemically reduced V2O3 inherits the crystallographic characteristics from parent VO2, enabling emergent lattice framework and IMT behavior inaccessible via direct epitaxial growth. Analogous electron doping arising from hydrogenation and oxygen vacancy contributes cooperatively to drive the Mott phase transition in VO2 through band-filling control. Our work not only unveils sequential topotactic phase transformations in VO2 through oxygen vacancy ordering but also provides fundamentally new insights for defect-mediated Mott transitions. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.23322 [pdf, ps, other]

GaussMaster: An LLM-based Database Copilot System

Authors: Wei Zhou, Ji Sun, Xuanhe Zhou, Guoliang Li, Luyang Liu, Hao Wu, Tianyuan Wang

Abstract: In the financial industry, data is the lifeblood of operations, and DBAs shoulder significant responsibilities for SQL tuning, database deployment, diagnosis, and service repair. In recent years, both database vendors and customers have increasingly turned to autonomous database platforms in an effort to alleviate the heavy workload of DBAs. However, existing autonomous database platforms are limi… ▽ More In the financial industry, data is the lifeblood of operations, and DBAs shoulder significant responsibilities for SQL tuning, database deployment, diagnosis, and service repair. In recent years, both database vendors and customers have increasingly turned to autonomous database platforms in an effort to alleviate the heavy workload of DBAs. However, existing autonomous database platforms are limited in their capabilities, primarily addressing single-point issues such as NL2SQL, anomaly detection, and SQL tuning. Manual intervention remains a necessity for comprehensive database maintenance. GaussMaster aims to revolutionize this landscape by introducing an LLM-based database copilot system. This innovative solution is designed not only to assist developers in writing efficient SQL queries but also to provide comprehensive care for database services. When database instances exhibit abnormal behavior, GaussMaster is capable of orchestrating the entire maintenance process automatically. It achieves this by analyzing hundreds of metrics and logs, employing a Tree-of-thought approach to identify root causes, and invoking appropriate tools to resolve issues. We have successfully implemented GaussMaster in real-world scenarios, such as the banking industry, where it has achieved zero human intervention for over 34 database maintenance scenarios. In this paper, we present significant improvements in these tasks with code at https://gitcode.com/opengauss/openGauss-GaussMaster. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: We welcome contributions from the community. For reference, please see the code at: https://gitcode.com/opengauss/openGauss-GaussMaster

arXiv:2506.23281 [pdf, ps, other]

On the Feasibility of Deduplicating Compiler Bugs with Bisection

Authors: Xintong Zhou, Zhenyang Xu, Chengnian Sun

Abstract: Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler… ▽ More Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler bug deduplication primarily rely on program analysis to extract bug-related features for duplicate identification, which can result in substantial computational overhead and limited generalizability. This paper investigates the feasibility of employing bisection, a standard debugging procedure largely overlooked in prior research on compiler bug deduplication, for this purpose. Our study demonstrates that the utilization of bisection to locate failure-inducing commits provides a valuable criterion for deduplication, albeit one that requires supplementary techniques for more accurate identification. Building on these results, we introduce BugLens, a novel deduplication method that primarily uses bisection, enhanced by the identification of bug-triggering optimizations to minimize false negatives. Empirical evaluations conducted on four real-world datasets demonstrate that BugLens significantly outperforms the state-of-the-art analysis-based methodologies Tamer and D3 by saving an average of 26.98% and 9.64% human effort to identify the same number of distinct bugs. Given the inherent simplicity and generalizability of bisection, it presents a highly practical solution for compiler bug deduplication in real-world applications. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.23046 [pdf, ps, other]

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Authors: Xianzhe Fan, Xuhui Zhou, Chuanyang Jin, Kolby Nottingham, Hao Zhu, Maarten Sap

Abstract: Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent co… ▽ More Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: 23 pages, 6 figures

arXiv:2506.22805 [pdf, ps, other]

The Flexible Accumulation Model for High Density Temporal Exposures

Authors: Xinkai Zhou, Lee Goeddel, Nauder Faraday, Ciprian M. Crainiceanu

Abstract: Emerging technologies enable continuous monitoring of temporal exposures to disease risk factors, leading to complex structures in the exposure process that consists of a subject-specific number and duration of exposure episodes. A key scientific question is how does the number and duration of episodic exposure affect disease risk. We address this question by introducing the FLexible Accumulation… ▽ More Emerging technologies enable continuous monitoring of temporal exposures to disease risk factors, leading to complex structures in the exposure process that consists of a subject-specific number and duration of exposure episodes. A key scientific question is how does the number and duration of episodic exposure affect disease risk. We address this question by introducing the FLexible Accumulation ModEl (FLAME) and the associated inferential tools, whose finite sample performance is evaluated through comprehensive simulations. FLAME is motivated by and applied to quantifying the association between hypotensive exposure during cardiac surgery and acute kidney injury (AKI). Our results characterize the AKI risk accumulation pattern as a function of hypotensive duration and shows that while 60 one-minute episodes is associated with an AKI probability of 0.23, a single sustained sixty-minute hypotensive episode raises that probability to 0.32, a 37\% increase despite the same total duration. These results offer direct guidance for improving hemodynamics risk management strategies during intraoperative care. Our method is accompanied by the R package flame. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.22765 [pdf, ps, other]

Timing results of 22 years for PSR J0922+0638

Authors: Peng Liu, Mingyang Wang, Jianping Yuan, Zhonghao Tu, Ang Li, Xia Zhou, Na Wang

Abstract: We conducted a timing analysis of PSR J0922+0638 (B0919+06) using data from the Nanshan 26 m radio telescope and the MeerKAT telescope, spanning from January 2001 to March 2023. During this 22-year period, we discovered a previously unreported small glitch (glitch 1) before the well-known large glitch (glitch 2), occurring at ${\rm MJD} \sim 53325(3)$, with a frequency jump amplitude of… ▽ More We conducted a timing analysis of PSR J0922+0638 (B0919+06) using data from the Nanshan 26 m radio telescope and the MeerKAT telescope, spanning from January 2001 to March 2023. During this 22-year period, we discovered a previously unreported small glitch (glitch 1) before the well-known large glitch (glitch 2), occurring at ${\rm MJD} \sim 53325(3)$, with a frequency jump amplitude of $Δν/ν\sim 0.79(6) \times 10^{-9}$. We also identified ten slow glitch events, half of which were newly detected. These slow glitches occurred quasi-periodically, with an average interval of approximately 553(21) days, fractional frequency changes ranging from $Δν/ν\sim 1.13(1) \times 10^{-9}$ to $4.08(5) \times 10^{-9}$, and a maximum fractional change in the first derivative of the frequency of $Δ\dotν/\dotν \sim -4.6 \times 10^{-3}$. Additionally, our timing noise analysis reveals a change in the spectral index for noise power before and after glitch 2, with values of $-6.0$ and $-5.3$, respectively, likely due to this large glitch. Throughout the entire observation period, the first derivative of the spin frequency ($\dotν$) showed a periodic structure. The possible modulation period was estimated to be 537(24) days before the 700-day data gap at MJD 56716 and 600(58) days afterward. We discuss the periodic oscillations in pulsar rotation as a possible manifestation of spin-down noise and quasi-periodic slow glitches. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: 14 pages, 8 figures, 4 tables, ApJ (2025) accepted

arXiv:2506.22756 [pdf, ps, other]

RoboPearls: Editable Video Simulation for Robot Manipulation

Authors: Tao Tang, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, xia zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang

Abstract: The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging th… ▽ More The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging the sim-to-real gap remains. To address these challenges, we propose RoboPearls, an editable video simulation framework for robotic manipulation. Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations from demonstration videos, and supports a wide range of simulation operators, including various object manipulations, powered by advanced modules like Incremental Semantic Distillation (ISD) and 3D regularized NNFM Loss (3D-NNFM). Moreover, by incorporating large language models (LLMs), RoboPearls automates the simulation production process in a user-friendly manner through flexible command interpretation and execution. Furthermore, RoboPearls employs a vision-language model (VLM) to analyze robotic learning issues to close the simulation loop for performance enhancement. To demonstrate the effectiveness of RoboPearls, we conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot, which demonstrate our satisfactory simulation performance. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: ICCV 2025

arXiv:2506.22448 [pdf, ps, other]

Unsupervised Learning-Based Joint Resource Allocation and Beamforming Design for RIS-Assisted MISO-OFDMA Systems

Authors: Yu Ma, Xingyu Zhou, Xiao Li, Le Liang, Shi Jin

Abstract: Reconfigurable intelligent surfaces (RIS) are key enablers for 6G wireless systems. This paper studies downlink transmission in an RIS-assisted MISO-OFDMA system, addressing resource allocation challenges. A two-stage unsupervised learning-based framework is proposed to jointly design RIS phase shifts, BS beamforming, and resource block (RB) allocation. The framework includes BeamNet, which predic… ▽ More Reconfigurable intelligent surfaces (RIS) are key enablers for 6G wireless systems. This paper studies downlink transmission in an RIS-assisted MISO-OFDMA system, addressing resource allocation challenges. A two-stage unsupervised learning-based framework is proposed to jointly design RIS phase shifts, BS beamforming, and resource block (RB) allocation. The framework includes BeamNet, which predicts RIS phase shifts from CSI, and AllocationNet, which allocates RBs using equivalent CSI derived from BeamNet outputs. Active beamforming is implemented via maximum ratio transmission and water-filling. To handle discrete constraints while ensuring differentiability, quantization and the Gumbel-softmax trick are adopted. A customized loss and phased training enhance performance under QoS constraints. Simulations show the method achieves 99.93% of the sum rate of the SCA baseline with only 0.036% of its runtime, and it remains robust across varying channel and user conditions. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

arXiv:2506.22202 [pdf, ps, other]

Ultra-small Mode Volume Polariton Condensation via Precision $He^+$ Ion Implantation

Authors: Y. C. Balas, X. Zhou, E. Cherotchenko, I. Kuznetsov, S. K. Rajendran, G. G. Paschos, A. V. Trifonov, A. Nalitov, H. Ohadi, P. G. Savvidis

Abstract: We present a novel method for generating potential landscapes in GaAs microcavities through focused $He^{+}$ implantation. The ion beam imprints micron-scale patterns of non-radiative centers that deplete the exciton reservoir and form a loss-defined potential minimum. Under non-resonant pumping, the resulting traps have a lateral size $\le 1.2 ~\mathrm{μm}$ and a three-dimensional mode volume of… ▽ More We present a novel method for generating potential landscapes in GaAs microcavities through focused $He^{+}$ implantation. The ion beam imprints micron-scale patterns of non-radiative centers that deplete the exciton reservoir and form a loss-defined potential minimum. Under non-resonant pumping, the resulting traps have a lateral size $\le 1.2 ~\mathrm{μm}$ and a three-dimensional mode volume of only $\approx 0.6 ~ \mathrm{μm^3}$, small enough to to support a single polariton condensate mode. The implantation process maintains strong coupling and provides lithographic ($ < 300 ~ \mathrm{nm}$) resolution. These loss-engineered traps effectively overcome the micrometer-scale limitations of conventional microcavity patterning techniques, opening new avenues for device development and polariton research within the quantum regime. △ Less

Submitted 1 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

arXiv:2506.22175 [pdf, ps, other]

doi 10.1109/IPDPS54959.2023.00026

MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism

Authors: Zheng Zhang, Donglin Yang, Yaqi Xia, Liang Ding, Dacheng Tao, Xiaobo Zhou, Dazhao Cheng

Abstract: Recently, Mixture-of-Experts (MoE) has become one of the most popular techniques to scale pre-trained models to extraordinarily large sizes. Dynamic activation of experts allows for conditional computation, increasing the number of parameters of neural networks, which is critical for absorbing the vast amounts of knowledge available in many deep learning areas. However, despite the existing system… ▽ More Recently, Mixture-of-Experts (MoE) has become one of the most popular techniques to scale pre-trained models to extraordinarily large sizes. Dynamic activation of experts allows for conditional computation, increasing the number of parameters of neural networks, which is critical for absorbing the vast amounts of knowledge available in many deep learning areas. However, despite the existing system and algorithm optimizations, there are significant challenges to be tackled when it comes to the inefficiencies of communication and memory consumption. In this paper, we present the design and implementation of MPipeMoE, a high-performance library that accelerates MoE training with adaptive and memory-efficient pipeline parallelism. Inspired by that the MoE training procedure can be divided into multiple independent sub-stages, we design adaptive pipeline parallelism with an online algorithm to configure the granularity of the pipelining. Further, we analyze the memory footprint breakdown of MoE training and identify that activations and temporary buffers are the primary contributors to the overall memory footprint. Toward memory efficiency, we propose memory reusing strategies to reduce memory requirements by eliminating memory redundancies, and develop an adaptive selection component to determine the optimal strategy that considers both hardware capacities and model characteristics at runtime. We implement MPipeMoE upon PyTorch and evaluate it with common MoE models in a physical cluster consisting of 8 NVIDIA DGX A100 servers. Compared with the state-of-art approach, MPipeMoE achieves up to 2.8x speedup and reduces memory footprint by up to 47% in training large models. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 11 pages, accepted at IPDPS 2023

Journal ref: 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 167-177. IEEE, 2023

arXiv:2506.22169 [pdf, ps, other]

doi 10.1109/SC41406.2024.00040

MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Authors: Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng

Abstract: Operator fusion, a key technique to improve data locality and alleviate GPU memory bandwidth pressure, often fails to extend to the fusion of multiple compute-intensive operators due to saturated computation throughput. However, the dynamicity of tensor dimension sizes could potentially lead to these operators becoming memory-bound, necessitating the generation of fused kernels, a task hindered by… ▽ More Operator fusion, a key technique to improve data locality and alleviate GPU memory bandwidth pressure, often fails to extend to the fusion of multiple compute-intensive operators due to saturated computation throughput. However, the dynamicity of tensor dimension sizes could potentially lead to these operators becoming memory-bound, necessitating the generation of fused kernels, a task hindered by limited search spaces for fusion strategies, redundant memory access, and prolonged tuning time, leading to sub-optimal performance and inefficient deployment. We introduce MCFuser, a pioneering framework designed to overcome these obstacles by generating high-performance fused kernels for what we define as memory-bound compute-intensive (MBCI) operator chains. Leveraging high-level tiling expressions to delineate a comprehensive search space, coupled with Directed Acyclic Graph (DAG) analysis to eliminate redundant memory accesses, MCFuser streamlines kernel optimization. By implementing guidelines to prune the search space and incorporating an analytical performance model with a heuristic search, MCFuser not only significantly accelerates the tuning process but also demonstrates superior performance. Benchmarked against leading compilers like Ansor on NVIDIA A100 and RTX3080 GPUs, MCFuser achieves up to a 5.9x speedup in kernel performance and outpaces other baselines while reducing tuning time by over 70-fold, showcasing its agility. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 12 pages, accepted at SC 2024

Journal ref: SC24: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2024

arXiv:2506.22090 [pdf, ps, other]

Updated measurement of $CP$ violation and polarisation in $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1168 additional authors not shown)

Abstract: A time-integrated angular analysis of the decay $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$, with $J/ψ\rightarrow μ^{+} μ^{-}$ and $\overline{K}{}^{*}\kern-1pt(892)^{0} \rightarrow K^{-} π^{+}$, is presented. The analysis employs a sample of proton-proton collision data collected by the LHCb experiment during 2015-2018 at a centre-of-mass energy of $13 \text{TeV}$, corresponding to… ▽ More A time-integrated angular analysis of the decay $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$, with $J/ψ\rightarrow μ^{+} μ^{-}$ and $\overline{K}{}^{*}\kern-1pt(892)^{0} \rightarrow K^{-} π^{+}$, is presented. The analysis employs a sample of proton-proton collision data collected by the LHCb experiment during 2015-2018 at a centre-of-mass energy of $13 \text{TeV}$, corresponding to an integrated luminosity of $6 \text{fb}^{-1}$. A simultaneous maximum-likelihood fit is performed to the angular distributions in bins of the $K^{-} π^{+}$ mass. This fit yields measurements of the $CP$-averaged polarisation fractions and $CP$ asymmetries for the P-wave component of the $K^{-} π^{+}$ system. The longitudinal and parallel polarisation fractions are determined to be $f_{0} = 0.534 \pm 0.012 \pm 0.009$ and $f_{\parallel} = 0.211 \pm 0.014 \pm 0.005$, respectively, where the first uncertainty is statistical and the second is systematic. The $CP$ asymmetries are measured with $3$-$7\%$ precision and are found to be consistent with zero. These measurements, along with an updated determination of the branching fraction relative to the $B^0 \rightarrow J/ψK^{*0}$ decay, are combined with previous LHCb results, providing the most precise values for these observables to date. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/4457/ (LHCb public pages)

Report number: LHCb-PAPER-2025-020, CERN-EP-2025-131

arXiv:2506.21875 [pdf, ps, other]

WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation

Authors: Jian Zhang, Linhao Zhang, Bokai Lei, Chuhan Wu, Wei Jia, Xiao Zhou

Abstract: Recent multi-modal Large Language Models (LLMs) such as GPT-4o have demonstrated strong capabilities of direct speech interaction. However, the lack of specialized and comprehensive benchmarks for end-to-end speech LLM evaluation hinders optimizing the user experience of Audio LLMs in real-world applications. Existing evaluation methods often adapt text-based benchmarks, overlooking speech's uniqu… ▽ More Recent multi-modal Large Language Models (LLMs) such as GPT-4o have demonstrated strong capabilities of direct speech interaction. However, the lack of specialized and comprehensive benchmarks for end-to-end speech LLM evaluation hinders optimizing the user experience of Audio LLMs in real-world applications. Existing evaluation methods often adapt text-based benchmarks, overlooking speech's unique characteristics and challenges, including prosody, homophones, stuttering, and differing user expectations. Here, we present a novel approach to thoroughly evaluate LLMs in practical speech conversations. We systematically curate real-world chat data relevant to spoken scenarios, introduce diversity in speaker attributes and acoustic conditions, and augment the dataset with speech-specific phenomena. We further design a query-aware evaluation method to use customized evaluation checklists and prompts to enhance the accuracy of automatic evaluation. We conduct comprehensive testing and detailed analysis of various mainstream speech models, revealing significant differences in model performance across different speech scenarios. The use of query-aware evaluation further enables a finer-grained assessment under various speech-specific scenarios. Our benchmark can provide valuable insights for speech model development and evaluation. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2506.21619 [pdf, other]

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

Authors: Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu

Abstract: Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict… ▽ More Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict audio-visual synchronization. This paper introduces IndexTTS2, which proposes a novel and autoregressive-model-friendly method for speech duration control. The method supports two generation modes: one allows explicit specification of the number of generated tokens for precise duration control; the other does not require manual input and lets the model freely generate speech while preserving prosodic characteristics from the input prompt. Furthermore, IndexTTS2 achieves disentanglement between emotional expression and speaker identity, enabling independent control of timbre and emotion. In the zero-shot setting, the model can perfectly reproduce the emotional characteristics of the input prompt. Users may also provide a separate emotion prompt, even from a different speaker, allowing the model to reconstruct the target timbre while conveying the desired emotion. To enhance clarity during strong emotional expressions, we incorporate GPT latent representations to improve speech stability. Meanwhile, to lower the barrier for emotion control, we design a soft instruction mechanism based on textual descriptions by fine-tuning Qwen3. This enables effective guidance of speech generation with desired emotional tendencies using natural language input. Experimental results demonstrate that IndexTTS2 outperforms existing state-of-the-art zero-shot TTS models in word error rate, speaker similarity, and emotional fidelity. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.21550 [pdf, ps, other]

mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

Authors: Xiaona Zhou, Constantin Brif, Ismini Lourentzou

Abstract: Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets an… ▽ More Multivariate time series anomaly detection (MTS-AD) is critical in domains like healthcare, cybersecurity, and industrial monitoring, yet remains challenging due to complex inter-variable dependencies, temporal dynamics, and sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for MTS-AD and unsupervised model selection, spanning 344 labeled time series across 19 datasets and 12 diverse application domains. mTSBench evaluates 24 anomaly detection methods, including large language model (LLM)-based detectors for multivariate time series, and systematically benchmarks unsupervised model selection techniques under standardized conditions. Consistent with prior findings, our results confirm that no single detector excels across datasets, underscoring the importance of model selection. However, even state-of-the-art selection methods remain far from optimal, revealing critical gaps. mTSBench provides a unified evaluation suite to enable rigorous, reproducible comparisons and catalyze future advances in adaptive anomaly detection and robust model selection. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2506.21110 [pdf, ps, other]

Orthogonality conditions for convex regression

Authors: Sheng Dai, Timo Kuosmanen, Xun Zhou

Abstract: Econometric identification generally relies on orthogonality conditions, which usually state that the random error term is uncorrelated with the explanatory variables. In convex regression, the orthogonality conditions for identification are unknown. Applying Lagrangian duality theory, we establish the sample orthogonality conditions for convex regression, including additive and multiplicative for… ▽ More Econometric identification generally relies on orthogonality conditions, which usually state that the random error term is uncorrelated with the explanatory variables. In convex regression, the orthogonality conditions for identification are unknown. Applying Lagrangian duality theory, we establish the sample orthogonality conditions for convex regression, including additive and multiplicative formulations of the regression model, with and without monotonicity and homogeneity constraints. We then propose a hybrid instrumental variable control function approach to mitigate the impact of potential endogeneity in convex regression. The superiority of the proposed approach is shown in a Monte Carlo study and examined in an empirical application to Chilean manufacturing data. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2506.20966 [pdf, ps, other]

Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends

Authors: Tian-Yu Xiang, Ao-Qun Jin, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Sheng-Bin Duan, Fu-Chao Xie, Wen-Kai Wang, Si-Cheng Wang, Ling-Yun Li, Tian Tu, Zeng-Guang Hou

Abstract: Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation. Leveraging strengths of VLM in vision perception and instruction understanding, VLA models exhibit promising generalization across diverse manipulation tasks. However, applications demanding high precision and accuracy reveal performance gaps without further ad… ▽ More Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation. Leveraging strengths of VLM in vision perception and instruction understanding, VLA models exhibit promising generalization across diverse manipulation tasks. However, applications demanding high precision and accuracy reveal performance gaps without further adaptation. Evidence from multiple domains highlights the critical role of post-training to align foundational models with downstream applications, spurring extensive research on post-training VLA models. VLA model post-training aims to address the challenge of improving an embodiment's ability to interact with the environment for the given tasks, analogous to the process of humans motor skills acquisition. Accordingly, this paper reviews post-training strategies for VLA models through the lens of human motor learning, focusing on three dimensions: environments, embodiments, and tasks. A structured taxonomy is introduced aligned with human learning mechanisms: (1) enhancing environmental perception, (2) improving embodiment awareness, (3) deepening task comprehension, and (4) multi-component integration. Finally, key challenges and trends in post-training VLA models are identified, establishing a conceptual framework to guide future research. This work delivers both a comprehensive overview of current VLA model post-training methods from a human motor learning perspective and practical insights for VLA model development. (Project website: https://github.com/AoqunJin/Awesome-VLA-Post-Training) △ Less

Submitted 25 June, 2025; originally announced June 2025.

arXiv:2506.20963 [pdf, ps, other]

EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Authors: Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou

Abstract: Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a no… ▽ More Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official. △ Less

Submitted 3 July, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

Comments: Under review

arXiv:2506.20954 [pdf, ps, other]

Cooperative Circumnavigation for Multi-Quadrotor Systems via Onboard Sensing

Authors: Xueming Liu, Lin Li, Xiang Zhou, Qingrui Zhang, Tianjiang Hu

Abstract: A cooperative circumnavigation framework is proposed for multi-quadrotor systems to enclose and track a moving target without reliance on external localization systems. The distinct relationships between quadrotor-quadrotor and quadrotor-target interactions are evaluated using a heterogeneous perception strategy and corresponding state estimation algorithms. A modified Kalman filter is developed t… ▽ More A cooperative circumnavigation framework is proposed for multi-quadrotor systems to enclose and track a moving target without reliance on external localization systems. The distinct relationships between quadrotor-quadrotor and quadrotor-target interactions are evaluated using a heterogeneous perception strategy and corresponding state estimation algorithms. A modified Kalman filter is developed to fuse visual-inertial odometry with range measurements to enhance the accuracy of inter-quadrotor relative localization. An event-triggered distributed Kalman filter is designed to achieve robust target state estimation under visual occlusion by incorporating neighbor measurements and estimated inter-quadrotor relative positions. Using the estimation results, a cooperative circumnavigation controller is constructed, leveraging an oscillator-based autonomous formation flight strategy. We conduct extensive indoor and outdoor experiments to validate the efficiency of the proposed circumnavigation framework in occluded environments. Furthermore, a quadrotor failure experiment highlights the inherent fault tolerance property of the proposed framework, underscoring its potential for deployment in search-and-rescue operations. △ Less

Submitted 25 June, 2025; originally announced June 2025.

Comments: 8 Pages, 7 figures. Accepted by RA-L

arXiv:2506.20430 [pdf, ps, other]

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Authors: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

Abstract: Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model… ▽ More Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor. △ Less

Submitted 25 June, 2025; originally announced June 2025.

arXiv:2506.20219 [pdf, ps, other]

Integrated optomechanical ultrasonic sensors with nano-Pascal-level sensitivity

Authors: Xuening Cao, Hao Yang, Min Wang, Zhi-Gang Hu, Zu-Lei Wu, Yuanlei Wang, Jian-Fei Liu, Xin Zhou, Jincheng Li, Chenghao Lao, Qi-Fan Yang, Bei-Bei Li

Abstract: Ultrasonic sensors are widely used for object detection and localization in underwater and biological settings. The operational range and spatial resolution are inherently limited by sensor sensitivity, in which conventional piezoelectric transducers have been overwhelmed by advanced photonic sensors. Here, we demonstrate an optomechanical ultrasonic sensor integrated into a photonic platform, whi… ▽ More Ultrasonic sensors are widely used for object detection and localization in underwater and biological settings. The operational range and spatial resolution are inherently limited by sensor sensitivity, in which conventional piezoelectric transducers have been overwhelmed by advanced photonic sensors. Here, we demonstrate an optomechanical ultrasonic sensor integrated into a photonic platform, which comprises a suspended SiO2 membrane embedded with a high-Q Si3N4 microring resonator. By exploiting simultaneous optical and mechanical resonances, the sensor achieves a record low noise-equivalent pressure (NEP) of 218 nPa/Hz^1/2 at 289 kHz in air and 9.6 nPa/Hz^1/2 at 52 kHz in water. We demonstrate its versatility through photoacoustic gas spectroscopy in air and underwater ultrasound imaging, achieving a minimum detectable C2H2 concentration of 2.9 ppm (integration time 1 s) and an imaging resolution of 1.89 mm, respectively. Our work represents a significant advancement in compact CMOS-compatible ultrasound sensing, unlocking new possibilities in biomedical imaging, environmental monitoring, industrial testing, and underwater communications. △ Less

Submitted 25 June, 2025; originally announced June 2025.

arXiv:2506.19699 [pdf, ps, other]

UniTac-NV: A Unified Tactile Representation For Non-Vision-Based Tactile Sensors

Authors: Jian Hou, Xin Zhou, Qihan Yang, Adam J. Spiers

Abstract: Generalizable algorithms for tactile sensing remain underexplored, primarily due to the diversity of sensor modalities. Recently, many methods for cross-sensor transfer between optical (vision-based) tactile sensors have been investigated, yet little work focus on non-optical tactile sensors. To address this gap, we propose an encoder-decoder architecture to unify tactile data across non-vision-ba… ▽ More Generalizable algorithms for tactile sensing remain underexplored, primarily due to the diversity of sensor modalities. Recently, many methods for cross-sensor transfer between optical (vision-based) tactile sensors have been investigated, yet little work focus on non-optical tactile sensors. To address this gap, we propose an encoder-decoder architecture to unify tactile data across non-vision-based sensors. By leveraging sensor-specific encoders, the framework creates a latent space that is sensor-agnostic, enabling cross-sensor data transfer with low errors and direct use in downstream applications. We leverage this network to unify tactile data from two commercial tactile sensors: the Xela uSkin uSPa 46 and the Contactile PapillArray. Both were mounted on a UR5e robotic arm, performing force-controlled pressing sequences against distinct object shapes (circular, square, and hexagonal prisms) and two materials (rigid PLA and flexible TPU). Another more complex unseen object was also included to investigate the model's generalization capabilities. We show that alignment in latent space can be implicitly learned from joint autoencoder training with matching contacts collected via different sensors. We further demonstrate the practical utility of our approach through contact geometry estimation, where downstream models trained on one sensor's latent representation can be directly applied to another without retraining. △ Less

Submitted 24 June, 2025; originally announced June 2025.

Comments: 7 pages, 8 figures. Accepted version to appear in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2506.19287 [pdf, ps, other]

Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs

Authors: Yaoxuan Wu, Xiaojie Zhou, Ahmad Humayun, Muhammad Ali Gulzar, Miryung Kim

Abstract: Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code including library functions in terms of symbolic constraint and the capability of underlying constraint solvers. As a result, many paths involving complex features remain u… ▽ More Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code including library functions in terms of symbolic constraint and the capability of underlying constraint solvers. As a result, many paths involving complex features remain unanalyzed or insufficiently modeled. Recent advances in large language models (LLMs) have shown promise in generating diverse and valid test inputs. Yet, LLMs lack mechanisms for systematically enumerating program paths and often fail to cover subtle corner cases. We observe that directly prompting an LLM with the full program leads to missed coverage of interesting paths. In this paper, we present PALM, a test generation system that combines symbolic path enumeration with LLM-assisted test generation. PALM statically enumerates possible paths through AST-level analysis and transforms each into an executable variant with embedded assertions that specify the target path. This avoids the need to translate path constraints into SMT formulae, by instead constructing program variants that LLM can interpret. Importantly, PALM is the first to provide an interactive frontend that visualizes path coverage alongside generated tests, assembling tests based on the specific paths they exercise. A user study with 12 participants demonstrates that PALM's frontend helps users better understand path coverage and identify which paths are actually exercised by PALM-generated tests, through verification and visualization of their path profiles. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.19220 [pdf, ps, other]

Private Model Personalization Revisited

Authors: Conor Snedeker, Xinyu Zhou, Raef Bassily

Abstract: We study model personalization under user-level differential privacy (DP) in the shared representation framework. In this problem, there are $n$ users whose data is statistically heterogeneous, and their optimal parameters share an unknown embedding $U^* \in\mathbb{R}^{d\times k}$ that maps the user parameters in $\mathbb{R}^d$ to low-dimensional representations in $\mathbb{R}^k$, where $k\ll d$.… ▽ More We study model personalization under user-level differential privacy (DP) in the shared representation framework. In this problem, there are $n$ users whose data is statistically heterogeneous, and their optimal parameters share an unknown embedding $U^* \in\mathbb{R}^{d\times k}$ that maps the user parameters in $\mathbb{R}^d$ to low-dimensional representations in $\mathbb{R}^k$, where $k\ll d$. Our goal is to privately recover the shared embedding and the local low-dimensional representations with small excess risk in the federated setting. We propose a private, efficient federated learning algorithm to learn the shared embedding based on the FedRep algorithm in [CHM+21]. Unlike [CHM+21], our algorithm satisfies differential privacy, and our results hold for the case of noisy labels. In contrast to prior work on private model personalization [JRS+21], our utility guarantees hold under a larger class of users' distributions (sub-Gaussian instead of Gaussian distributions). Additionally, in natural parameter regimes, we improve the privacy error term in [JRS+21] by a factor of $\widetilde{O}(dk)$. Next, we consider the binary classification setting. We present an information-theoretic construction to privately learn the shared embedding and derive a margin-based accuracy guarantee that is independent of $d$. Our method utilizes the Johnson-Lindenstrauss transform to reduce the effective dimensions of the shared embedding and the users' data. This result shows that dimension-independent risk bounds are possible in this setting under a margin loss. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: ICML 2025

arXiv:2506.19180 [pdf, ps, other]

Precise Measurement of the $Λ$ Electric Dipole Moment through the Entangled Strange Baryon-Antibaryon System

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (696 additional authors not shown)

Abstract: The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipol… ▽ More The dominance of matter over antimatter in the universe has consistently driven the pursuit of new physics beyond the Standard Model that violates charge-parity symmetry. Unlike the well-constrained electrons and neutrons, strange baryons (hyperons) remain a largely unexplored territory, in which interactions between hyperons and particles from new physics could induce a non-trivial electric dipole moment (EDM). However, direct measurements of hyperon EDMs through spin precession are highly challenging due to their short lifetimes. In this paper, we present a novel method to extract the EDM of the lightest hyperon, $Λ$, using the entangled $Λ$$\overlineΛ$ system. Our result is consistent with zero, achieving a three-order-of-magnitude improvement over the previous upper limit established in the 1980s with comparable statistics, providing stringent constraints on potential new physics. △ Less

Submitted 28 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18544 [pdf, ps, other]

Normality Prior Guided Multi-Semantic Fusion Network for Unsupervised Image Anomaly Detection

Authors: Muhao Xu, Xueying Zhou, Xizhan Gao, Weiye Song, Guang Feng, Sijie Niu

Abstract: Recently, detecting logical anomalies is becoming a more challenging task compared to detecting structural ones. Existing encoder decoder based methods typically compress inputs into low-dimensional bottlenecks on the assumption that the compression process can effectively suppress the transmission of logical anomalies to the decoder. However, logical anomalies present a particular difficulty beca… ▽ More Recently, detecting logical anomalies is becoming a more challenging task compared to detecting structural ones. Existing encoder decoder based methods typically compress inputs into low-dimensional bottlenecks on the assumption that the compression process can effectively suppress the transmission of logical anomalies to the decoder. However, logical anomalies present a particular difficulty because, while their local features often resemble normal semantics, their global semantics deviate significantly from normal patterns. Thanks to the generalisation capabilities inherent in neural networks, these abnormal semantic features can propagate through low-dimensional bottlenecks. This ultimately allows the decoder to reconstruct anomalous images with misleading fidelity. To tackle the above challenge, we propose a novel normality prior guided multi-semantic fusion network for unsupervised anomaly detection. Instead of feeding the compressed bottlenecks to the decoder directly, we introduce the multi-semantic features of normal samples into the reconstruction process. To this end, we first extract abstract global semantics of normal cases by a pre-trained vision-language network, then the learnable semantic codebooks are constructed to store representative feature vectors of normal samples by vector quantisation. Finally, the above multi-semantic features are fused and employed as input to the decoder to guide the reconstruction of anomalies to approximate normality. Extensive experiments are conducted to validate the effectiveness of our proposed method, and it achieves the SOTA performance on the MVTec LOCO AD dataset with improvements of 5.7% in pixel-sPRO and 2.6% in image-AUROC. The source code is available at https://github.com/Xmh-L/NPGMF. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18502 [pdf, ps, other]

Electromagnetic Proximity Effect: Superconducting Magnonics and Beyond

Authors: Tao Yu, Xi-Han Zhou, Gerrit E. W. Bauer, Irina Bobkova

Abstract: The exchange interaction at interfaces between superconductors (SCs) and ferromagnets (FMs) has been a central topic in condensed matter physics for many decades, starting with the prediction of exotic phases such as the Fulde-Ferrell-Larkin-Ovchinnikov states and leading to the discovery of triplet superconductivity. This review focuses on new phenomena in SC$|$FM heterostructures caused by the \… ▽ More The exchange interaction at interfaces between superconductors (SCs) and ferromagnets (FMs) has been a central topic in condensed matter physics for many decades, starting with the prediction of exotic phases such as the Fulde-Ferrell-Larkin-Ovchinnikov states and leading to the discovery of triplet superconductivity. This review focuses on new phenomena in SC$|$FM heterostructures caused by the \textit{non-contact dipolar interaction} between magnons, i.e., the quanta of spin wave excitations in the ferromagnet, and the superconducting order. A universal non-relativistic spin-orbit coupling locks the polarization and momentum of their evanescent stray magnetic fields and leads to chiral screening by proximate superconductors. The interaction-induced hybrid quasiparticles are magnon-Meissner collective modes, magnon-cooparon, Josephson plasmonic modes, and nodal magnon-photon polaritons. Superconducting and normal metallic gates modulate and control the magnetodipolar interaction and thereby magnetization and energy transport at interfaces and in thin films. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: 108 pages, 45 figures

arXiv:2506.18464 [pdf, ps, other]

Probing universal phase diagram of dimensional crossover with an atomic quantum simulator

Authors: Jinyuan Tian, Zhongcheng Yu, Jing Liu, Chi-Kin Lai, Lorenzo Pizzino, Chengyang Wu, Hongmian Shui, Thierry Giamarchi, Hepeng Yao, Xiaoji Zhou

Abstract: Dimensionality is a fundamental concept in physics, which plays a hidden but crucial role in various domains, including condensed matter physics, relativity and string theory, statistical physics, etc. In quantum physics, reducing dimensionality usually enhances fluctuations and leads to novel properties. Owing to these effects, quantum simulators in which dimensionality can be controlled have eme… ▽ More Dimensionality is a fundamental concept in physics, which plays a hidden but crucial role in various domains, including condensed matter physics, relativity and string theory, statistical physics, etc. In quantum physics, reducing dimensionality usually enhances fluctuations and leads to novel properties. Owing to these effects, quantum simulators in which dimensionality can be controlled have emerged as a new area of interest. However, such a platform has only been studied in specific regimes and a universal phase diagram is lacking. Here, we produce an interacting atomic quantum simulator with continuous tunability of anisotropy and temperature, and probe the universal phase diagram of dimensional crossover. At low temperatures, we identify the regimes from quantum three to zero dimensions. By increasing temperature, we observe the non-trivial emergence of a thermal regime situated between the quantum zero and integer dimensions. We show that the quantum-to thermal transition falls into four different universality classes depending on the dimensionality. Surprisingly, we also detect a fifth type where the high-dimensional quantum system can reach the thermal phase by crossing a low-dimensional quantum regime. Our results provide a crucial foundation for understanding the projective condensed matter structures in unconventional dimensions. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18363 [pdf, ps, other]

Probing d-wave superconducting gap of high-$T_\mathrm{c}$ cuprate $\mathrm{Bi}_2\mathrm{Sr}_2\mathrm{Ca}_2\mathrm{Cu}_3\mathrm{O}_{10+δ}$ by resonant inelastic X-ray scattering

Authors: Kunhao Li, Qizhi Li, Changwei Zou, Jaewon Choi, Chaohui Yin, Mirian Garcia-Fernandez, Stefano Agrestini, Shilong Zhang, Chengtian Lin, Xingjiang Zhou, Ke-Jin Zhou, Yi Lu, Yingying Peng

Abstract: The superconducting gap is a characteristic feature of high-T$_c$ superconductors and provides crucial information on the pairing mechanism underlying high-temperature superconductivity. Here, we employ high-resolution resonant inelastic X-ray scattering (RIXS) at the Cu $L_3$-edge to investigate the superconducting gap in the overdoped cuprate… ▽ More The superconducting gap is a characteristic feature of high-T$_c$ superconductors and provides crucial information on the pairing mechanism underlying high-temperature superconductivity. Here, we employ high-resolution resonant inelastic X-ray scattering (RIXS) at the Cu $L_3$-edge to investigate the superconducting gap in the overdoped cuprate $\mathrm{Bi}_2\mathrm{Sr}_2\mathrm{Ca}_2\mathrm{Cu}_3\mathrm{O}_{10+δ}$ ($T_\mathrm{c}$ = 107 K). By analyzing antisymmetrized, temperature-dependent RIXS spectra over a range of in-plane momentum transfers, we observe a clear suppression of low-energy spectral weight below T$_c$, indicative of superconducting gap formation. This suppression is most pronounced at small momentum transfers ($|\boldsymbol{q}_\parallel| \leq 0.18$ r.l.u.) and corresponds to a gap size of approximately 2$Δ_0 \sim$ 130 meV. Comparison with theoretical calculations of the momentum-dependent charge susceptibility supports a d-wave symmetry of the superconducting gap, while an isotropic s-wave gap fails to reproduce key experimental features. These findings establish RIXS as a powerful, bulk-sensitive probe of superconducting gap symmetry and highlight its utility for studying materials beyond the reach of surface-sensitive techniques such as ARPES and STM. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: 11 pages, 10 figures

arXiv:2506.18298 [pdf, ps, other]

Thermalization of Quantum Many-Body Scars in Kinetically Constrained Systems

Authors: Jia-wei Wang, Xiang-Fa Zhou, Guang-Can Guo, Zheng-Wei Zhou

Abstract: The phenomenon of quantum many-body scars (QMBS) has been studied both theoretically and experimentally, due to its unusual violation of the eigenstate thermalization hypothesis (ETH). In this paper, we extend the ETH to a new description based on the grand canonical ensemble to depict the thermal properties of QMBS models. For this purpose, we embed the dynamics of kinetically constrained systems… ▽ More The phenomenon of quantum many-body scars (QMBS) has been studied both theoretically and experimentally, due to its unusual violation of the eigenstate thermalization hypothesis (ETH). In this paper, we extend the ETH to a new description based on the grand canonical ensemble to depict the thermal properties of QMBS models. For this purpose, we embed the dynamics of kinetically constrained systems within the Lindblad-like master equation, and demonstrate that the violation of the ETH by scar eigenstates is related to their slow decay in the corresponding dissipative process. Within this open system description, we reformulate the ETH to demonstrate that both scar eigenstates and thermal ones exhibit thermalization governed by grand canonical statistics. Consequently, our revised ETH unifies scars and thermal states under a cohesive thermodynamic rule. Our work resolves the fundamental tension between constraint-induced non-ergodicity and thermalization paradigms, establishing a unified route to generalized thermalization for quantum many-body systems. △ Less

Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

Comments: main manuscript with 7 pages containing 4 figures, complementary material with 8 pages containing 3 figures

arXiv:2506.17962 [pdf, ps, other]

Spin Polarization Control via Magnetic Field in Dissipative Bosonic Systems

Authors: Yaoyuan Fan, Shuoyu Shi, Lang Cao, Qiuxin Zhang, Dong Hu, Yu Wang, Xiaoji Zhou

Abstract: Engineering spin polarization in dissipative bosonic systems is crucial for advancing quantum technologies, especially for applications in quantum metrology and space-based quantum simulations. This work demonstrates precise magnetic moment control in multicomponent Bose gases during evaporative cooling via tailored magnetic fields. By adjusting the magnetic field gradients, null point position, a… ▽ More Engineering spin polarization in dissipative bosonic systems is crucial for advancing quantum technologies, especially for applications in quantum metrology and space-based quantum simulations. This work demonstrates precise magnetic moment control in multicomponent Bose gases during evaporative cooling via tailored magnetic fields. By adjusting the magnetic field gradients, null point position, and duration, we selectively tune evaporation rates of magnetic sublevels, achieving targeted spin polarization. Theoretical models, validated by numerical simulations and Stern-Gerlach experiments, reveal how magnetic fields reshape trapping potentials and spin-dependent dissipation. The results establish a dissipative spin-selection mechanism governing polarization evolution in evaporatively cooled Bose gases and provide a framework for engineering spin-polarized quantum states. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: 9 pages, 5 figures

arXiv:2506.17103 [pdf, ps, other]

TransDreamerV3: Implanting Transformer In DreamerV3

Authors: Shruti Sadanand Dongare, Amun Kharel, Jonathan Samuel, Xiaona Zhou

Abstract: This paper introduces TransDreamerV3, a reinforcement learning model that enhances the DreamerV3 architecture by integrating a transformer encoder. The model is designed to improve memory and decision-making capabilities in complex environments. We conducted experiments on Atari-Boxing, Atari-Freeway, Atari-Pong, and Crafter tasks, where TransDreamerV3 demonstrated improved performance over Dreame… ▽ More This paper introduces TransDreamerV3, a reinforcement learning model that enhances the DreamerV3 architecture by integrating a transformer encoder. The model is designed to improve memory and decision-making capabilities in complex environments. We conducted experiments on Atari-Boxing, Atari-Freeway, Atari-Pong, and Crafter tasks, where TransDreamerV3 demonstrated improved performance over DreamerV3, particularly in the Atari-Freeway and Crafter tasks. While issues in the Minecraft task and limited training across all tasks were noted, TransDreamerV3 displays advancement in world model-based reinforcement learning, leveraging transformer architectures. △ Less

Submitted 20 June, 2025; originally announced June 2025.

arXiv:2506.16049 [pdf, ps, other]

Ultrafast dynamics of three-dimensional Kane plasmons in the narrow-bandgap Hg$_{0.8}$Cd$_{0.2}$Te

Authors: Xiaoyue Zhou, Yi Chan, Siyuan Zhu, Fu Deng, Wei Bai, Jingdi Zhang

Abstract: We report on an ultrafast terahertz spectroscopic study on the dynamics of free carriers and the pertinent bulk plasmons in Hg$_{0.8}$Cd$_{0.2}$Te (MCT) film, a narrowband semiconductor accommodating three dimensional massless Kane fermions. The ultrabroadband terahertz source enables the investigation of the lightly doped equilibrium state in the presence of plasmon-phonon hybridization through t… ▽ More We report on an ultrafast terahertz spectroscopic study on the dynamics of free carriers and the pertinent bulk plasmons in Hg$_{0.8}$Cd$_{0.2}$Te (MCT) film, a narrowband semiconductor accommodating three dimensional massless Kane fermions. The ultrabroadband terahertz source enables the investigation of the lightly doped equilibrium state in the presence of plasmon-phonon hybridization through the heavily doped excited state, primarily dominated by plasmons. Without the recourse to the resource consuming cryogenic high magnetic field spectroscopy that hinges on observable related to the interband transition, we show that the massless band dispersion can instead be conveniently perceived by the room temperature study of the intraband transition through the determination of the plasmon carrier density relationship. We found the plasma frequency in MCT scales with the cube root of carrier density, in contrast with the square root scaling in the conventional massive fermion system of parabolic band dispersion. This work also answers the curious question of whether the MCT can maintain its massless Kane fermion character in case the strict gapless condition is deviated from. The method presented herein provides a convenient approach to identifying the landscape of both massless and massive band dispersion. △ Less

Submitted 19 June, 2025; originally announced June 2025.

Showing 1–50 of 5,336 results for author: zhou, X