Search | arXiv e-print repository

Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines

Authors: Xiaohou Shi, Ke Li, Aobo Liang, Yan Sun

Abstract: In the past few years, time series foundation models have achieved superior predicting accuracy. However, real-world time series often exhibit significant diversity in their temporal patterns across different time spans and domains, making it challenging for a single model architecture to fit all complex scenarios. In addition, time series data may have multiple variables exhibiting complex correl… ▽ More In the past few years, time series foundation models have achieved superior predicting accuracy. However, real-world time series often exhibit significant diversity in their temporal patterns across different time spans and domains, making it challenging for a single model architecture to fit all complex scenarios. In addition, time series data may have multiple variables exhibiting complex correlations between each other. Recent mainstream works have focused on modeling times series in a channel-independent manner in both pretraining and finetuning stages, overlooking the valuable inter-series dependencies. To this end, we propose \textbf{Time Tracker} for better predictions on multivariate time series data. Firstly, we leverage sparse mixture of experts (MoE) within Transformers to handle the modeling of diverse time series patterns, thereby alleviating the learning difficulties of a single model while improving its generalization. Besides, we propose Any-variate Attention, enabling a unified model structure to seamlessly handle both univariate and multivariate time series, thereby supporting channel-independent modeling during pretraining and channel-mixed modeling for finetuning. Furthermore, we design a graph learning module that constructs relations among sequences from frequency-domain features, providing more precise guidance to capture inter-series dependencies in channel-mixed modeling. Based on these advancements, Time Tracker achieves state-of-the-art performance in predicting accuracy, model generalization and adaptability. △ Less

Submitted 21 May, 2025; originally announced May 2025.

arXiv:2505.14988 [pdf, ps, other]

doi 10.1038/s41467-025-59498-4

Test of local realism via entangled $Λ\barΛ$ system

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (597 additional authors not shown)

Abstract: The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However… ▽ More The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However, examples of Bell inequalities violation in high energy physics are scarce. In this study, we utilize $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected with the BES-III detector at the BEPCII collider, performing non-local correlation tests using the entangled hyperon pairs. The massive-entangled $Λ\barΛ$ systems are formed and decay through strong and weak interactions, respectively. Through measurements of the angular distribution of $p\bar{p}$ in $J/ψ\to γη_c$ and subsequent $η_c\toΛ(pπ^-)\barΛ(\bar{p}π^{+})$ cascade decays, a significant violation of LHVT predictions is observed. The exclusion of LHVT is found to be statistically significant at a level exceeding $5.2σ$ in the testing of three Bell-like inequalities. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Journal ref: Nat Commun 16, 4948 (2025)

arXiv:2505.14974 [pdf, ps, other]

Full spectral response of grating-induced loss in photonic crystal microrings

Authors: Daniel Pimbi, Yi Sun, Roy Zektzer, Xiyuan Lu, Kartik Srinivasan

Abstract: Photonic crystal microrings (PhCRs) have emerged as powerful and versatile platforms for integrated nonlinear photonics, offering precise control over frequency and phase matching while maintaining high optical quality factors. Through grating-mediated mode coupling, PhCRs enable advanced dispersion engineering, which is critical for wideband nonlinear processes such as optical parametric oscillat… ▽ More Photonic crystal microrings (PhCRs) have emerged as powerful and versatile platforms for integrated nonlinear photonics, offering precise control over frequency and phase matching while maintaining high optical quality factors. Through grating-mediated mode coupling, PhCRs enable advanced dispersion engineering, which is critical for wideband nonlinear processes such as optical parametric oscillation, Kerr frequency comb generation, and dual-pump spontaneous and Bragg scattering four-wave mixing. Beyond dispersion control, PhCRs also facilitate the manipulation of orbital angular momentum (OAM) emission, a key functionality for encoding high-dimensional quantum states in emerging quantum photonic platforms. Despite these advances, the broadband spectral behavior of grating-induced losses in PhCRs remains largely unexplored, with most studies focusing on grating periods near the modal wavelength or its half. Such losses can significantly impact broadband nonlinear processes, where excess loss at unintended wavelengths can degrade device performance. In this work, we experimentally characterize grating-induced losses in PhCRs and reveal their full spectral response as a function of the ratio between modal wavelength and grating period. We identify distinct loss channels arising from either radiation or mode conversion, including a broad excess-loss region attributed to vertical out-coupling into OAM-carrying states. These observations are supported by three-dimensional finite-difference time-domain simulations and further analyzed through OAM radiation angle and phase-mismatch analysis. The resulting broadband loss spectrum highlights critical design trade-offs and provides practical guidelines for optimizing PhCR-based devices for nonlinear photonic applications involving widely separated frequencies. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14916 [pdf]

Super-Resolution Optical Coherence Tomography Using Diffusion Model-Based Plug-and-Play Priors

Authors: Yaning Wang, Jinglun Yu, Wenhan Guo, Yu Sun, Jin U. Kang

Abstract: We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal ima… ▽ More We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal images and apply a deep learning-based up-sampling pipeline to build realistic training pairs. Evaluations on in vivo and ex vivo fish-eye corneal models show that PnP-DM outperforms conventional 2D-UNet baselines, producing sharper structures and better noise suppression. This approach advances high-fidelity OCT imaging in high-speed acquisition for clinical applications. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14560 [pdf, ps, other]

Neural Inverse Scattering with Score-based Regularization

Authors: Yuan Gao, Wenhan Guo, Yu Sun

Abstract: Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the d… ▽ More Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the denoising score function used in score-based generative models. The neural field formulation offers convenient flexibility to performing joint estimation, while the denoising score function imposes the rich structural prior of images. Our results on three high-contrast simulated objects show that the proposed approach yields a better imaging quality compared to the state-of-the-art NF approach, where regularization is based on total variation. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14161 [pdf, other]

Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation

Authors: Ting Wei, Biao Mei, Junliang Lyu, Renquan Zhang, Feng Zhou, Yifan Sun

Abstract: Personalized Bayesian federated learning (PBFL) handles non-i.i.d. client data and quantifies uncertainty by combining personalization with Bayesian inference. However, existing PBFL methods face two limitations: restrictive parametric assumptions in client posterior inference and naive parameter averaging for server aggregation. To overcome these issues, we propose FedWBA, a novel PBFL method tha… ▽ More Personalized Bayesian federated learning (PBFL) handles non-i.i.d. client data and quantifies uncertainty by combining personalization with Bayesian inference. However, existing PBFL methods face two limitations: restrictive parametric assumptions in client posterior inference and naive parameter averaging for server aggregation. To overcome these issues, we propose FedWBA, a novel PBFL method that enhances both local inference and global aggregation. At the client level, we use particle-based variational inference for nonparametric posterior representation. At the server level, we introduce particle-based Wasserstein barycenter aggregation, offering a more geometrically meaningful approach. Theoretically, we provide local and global convergence guarantees for FedWBA. Locally, we prove a KL divergence decrease lower bound per iteration for variational inference convergence. Globally, we show that the Wasserstein barycenter converges to the true parameter as the client data size increases. Empirically, experiments show that FedWBA outperforms baselines in prediction accuracy, uncertainty calibration, and convergence rate, with ablation studies confirming its robustness. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14135 [pdf, other]

Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

Authors: Ruihuang Li, Caijin Zhou, Shoujian Zheng, Jianxiang Lu, Jiabin Huang, Comi Chen, Junshu Tang, Guangzheng Xu, Jiale Tao, Hongmei Wang, Donghao Li, Wenqing Yu, Senbo Wang, Zhimin Li, Yetshuan Shi, Haoyu Yang, Yukun Wang, Wenxun Dai, Jiaqi Li, Linqing Wang, Qixun Wang, Zhiyong Xu, Yingfang Zhang, Jiangfeng Xiong, Weijie Kong , et al. (33 additional authors not shown)

Abstract: Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simult… ▽ More Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simultaneously aligns with player preferences and significantly boosts designer efficiency, we present Hunyuan-Game, an innovative project designed to revolutionize intelligent game production. Hunyuan-Game encompasses two primary branches: image generation and video generation. The image generation component is built upon a vast dataset comprising billions of game images, leading to the development of a group of customized image generation models tailored for game scenarios: (1) General Text-to-Image Generation. (2) Game Visual Effects Generation, involving text-to-effect and reference image-based game visual effect generation. (3) Transparent Image Generation for characters, scenes, and game visual effects. (4) Game Character Generation based on sketches, black-and-white images, and white models. The video generation component is built upon a comprehensive dataset of millions of game and anime videos, leading to the development of five core algorithmic models, each targeting critical pain points in game development and having robust adaptation to diverse game video scenarios: (1) Image-to-Video Generation. (2) 360 A/T Pose Avatar Video Synthesis. (3) Dynamic Illustration Generation. (4) Generative Video Super-Resolution. (5) Interactive Game Video Generation. These image and video generation models not only exhibit high-level aesthetic expression but also deeply integrate domain-specific knowledge, establishing a systematic understanding of diverse game and anime art styles. △ Less

Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14057 [pdf, ps, other]

Field Matters: A lightweight LLM-enhanced Method for CTR Prediction

Authors: Yu Cui, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Xiaohu Yang, Can Wang

Abstract: Click-through rate (CTR) prediction is a fundamental task in modern recommender systems. In recent years, the integration of large language models (LLMs) has been shown to effectively enhance the performance of traditional CTR methods. However, existing LLM-enhanced methods often require extensive processing of detailed textual descriptions for large-scale instances or user/item entities, leading… ▽ More Click-through rate (CTR) prediction is a fundamental task in modern recommender systems. In recent years, the integration of large language models (LLMs) has been shown to effectively enhance the performance of traditional CTR methods. However, existing LLM-enhanced methods often require extensive processing of detailed textual descriptions for large-scale instances or user/item entities, leading to substantial computational overhead. To address this challenge, this work introduces LLaCTR, a novel and lightweight LLM-enhanced CTR method that employs a field-level enhancement paradigm. Specifically, LLaCTR first utilizes LLMs to distill crucial and lightweight semantic knowledge from small-scale feature fields through self-supervised field-feature fine-tuning. Subsequently, it leverages this field-level semantic knowledge to enhance both feature representation and feature interactions. In our experiments, we integrate LLaCTR with six representative CTR models across four datasets, demonstrating its superior performance in terms of both effectiveness and efficiency compared to existing LLM-enhanced methods. Our code is available at https://anonymous.4open.science/r/LLaCTR-EC46. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.13756 [pdf, other]

doi 10.1093/mnras/staf665

OGHReS: Star formation in the Outer Galaxy II ($\ell = 180^\circ$-$280^\circ$)

Authors: J. S. Urquhart, C. Koenig, D. Colombo, A. Karska, A. Giannetti, T. J. T. Moore, A. Y. Yang, F. Wyrowski, Y. Sun, Z. Jiang, K. R. Neralwar, D. Eden, I. Grozdanova, S. Neupane, M. Figueira, E. Dann, V., S. Veena, W. -J. Kim, S. Leurini, J. Brand, M. -Y. Lee

Abstract: The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122… ▽ More The Outer Galaxy High-Resolution Survey (OGHReS) covers 100 square degrees ($180^\circ < \ell < 280^\circ$) in the (2--1) transitions of three CO-isotopologues. We use the spectra to refine the velocities and physical properties to 6706 \higal\ clumps located in the OGHReS region. In a previous paper, we analysed 3584 clumps between $\ell = 250^\circ$ and $280^\circ$. Here, we cover a further 3122 clumps ($180^\circ < \ell < 250^\circ$) and determine reliable velocities for \withVLSR\ of these, finding good agreement with the previously assigned velocities ($\sim$80 percent within 5 \kms). We update velocities for 288 clumps and provide new values for an additional 411. Combining these with the previous results, we have velocities and physical properties for 6193 clumps (92.3 percent). The \allnonDetections\ non-detections are low surface density clumps or likely contamination by evolved stars and galaxies. Key findings: i) improved correlation between clumps and spiral arm loci, and the discovery of clumps beyond the outer arm supports the existence of a new spiral structure; ii) decreasing trend in the $L/M$-ratio consistent with less high-mass star formation in the outer Galaxy; iii) increase in the star formation fraction (SFF) in the outer Galaxy, suggesting that more clumps are forming stars despite their lower mass; iv) discrepancies in velocity assignments across different surveys that could affect $\sim$10000 clumps, especially in the fourth quadrant. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 18 pages, 14 figues. Full versions of Tables 1 and 2 are only available in electronic form via CDS. arXiv admin note: text overlap with arXiv:2401.00808

arXiv:2505.13633 [pdf, ps, other]

IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion

Authors: Wentao Song, He Huang, Youqiang Sun, Fang Qu, Jiaqi Zhang, Longhui Fang, Yuwei Hao, Chenyang Peng

Abstract: Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised… ▽ More Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised multi-target point cloud extraction method. The method utilizes radiance field information to lift 2D masks, which are segmented by SAM2 (Segment Anything Model 2), into 3D space for target point cloud extraction. A multi-target collaborative optimization strategy is designed to effectively resolve the single-interaction multi-target segmentation challenge. Experimental validation demonstrates that IPENS achieves a grain-level segmentation accuracy (mIoU) of 63.72% on a rice dataset, with strong phenotypic estimation capabilities: grain volume prediction yields R2 = 0.7697 (RMSE = 0.0025), leaf surface area R2 = 0.84 (RMSE = 18.93), and leaf length and width predictions achieve R2 = 0.97 and 0.87 (RMSE = 1.49 and 0.21). On a wheat dataset,IPENS further improves segmentation accuracy to 89.68% (mIoU), with equally outstanding phenotypic estimation performance: spike volume prediction achieves R2 = 0.9956 (RMSE = 0.0055), leaf surface area R2 = 1.00 (RMSE = 0.67), and leaf length and width predictions reach R2 = 0.99 and 0.92 (RMSE = 0.23 and 0.15). This method provides a non-invasive, high-quality phenotyping extraction solution for rice and wheat. Without requiring annotated data, it rapidly extracts grain-level point clouds within 3 minutes through simple single-round interactions on images for multiple targets, demonstrating significant potential to accelerate intelligent breeding efficiency. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.13579 [pdf, other]

Learning Wavelet-Sparse FDK for 3D Cone-Beam CT Reconstruction

Authors: Yipeng Sun, Linda-Sophie Schneider, Chengze Ye, Mingxuan Gu, Siyuan Mei, Siming Bayer, Andreas Maier

Abstract: Cone-Beam Computed Tomography (CBCT) is essential in medical imaging, and the Feldkamp-Davis-Kress (FDK) algorithm is a popular choice for reconstruction due to its efficiency. However, FDK is susceptible to noise and artifacts. While recent deep learning methods offer improved image quality, they often increase computational complexity and lack the interpretability of traditional methods. In this… ▽ More Cone-Beam Computed Tomography (CBCT) is essential in medical imaging, and the Feldkamp-Davis-Kress (FDK) algorithm is a popular choice for reconstruction due to its efficiency. However, FDK is susceptible to noise and artifacts. While recent deep learning methods offer improved image quality, they often increase computational complexity and lack the interpretability of traditional methods. In this paper, we introduce an enhanced FDK-based neural network that maintains the classical algorithm's interpretability by selectively integrating trainable elements into the cosine weighting and filtering stages. Recognizing the challenge of a large parameter space inherent in 3D CBCT data, we leverage wavelet transformations to create sparse representations of the cosine weights and filters. This strategic sparsification reduces the parameter count by $93.75\%$ without compromising performance, accelerates convergence, and importantly, maintains the inference computational cost equivalent to the classical FDK algorithm. Our method not only ensures volumetric consistency and boosts robustness to noise, but is also designed for straightforward integration into existing CT reconstruction pipelines. This presents a pragmatic enhancement that can benefit clinical applications, particularly in environments with computational limitations. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: Accepted by Fully3D 2025

arXiv:2505.13222 [pdf, ps, other]

Partial Wave Analysis of $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$ and Cross Section Measurement of $e^{+}e^{-} \rightarrow π^{\pm}Z_{c}(3900)^{\mp}$ from 4.1271 to 4.3583 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$,… ▽ More Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$, $f_{0}(980)(\rightarrowπ^{+}π^{-})J/ψ$, and $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ are measured for the first time. The mass and width of the $Z_{c}(3900)^{\pm}$ are determined to be $3884.6\pm0.7\pm3.3$ MeV/$c^{2}$ and $37.2\pm1.3\pm6.6$ MeV, respectively. The first errors are statistical and the second systematic. The final state $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ dominates the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. By analyzing the cross sections of $π^{\pm}Z_{c}(3900)^{\mp}$ and $f_{0}(980)J/ψ$, $Y(4220)$ has been observed. Its mass and width are determined to be $4225.8\pm4.2\pm3.1$ MeV/$c^{2}$ and $55.3\pm9.5\pm11.1$ MeV, respectively. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.13211 [pdf, ps, other]

MAGI-1: Autoregressive Video Generation at Scale

Authors: Sand. ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, Yuchen Sun, Yue Cao, Yunpeng Huang, Yutong Lin, Yuxin Fang, Zewei Tao, Zheng Zhang, Zhongshu Wang, Zixun Liu, Dai Shi, Guoli Su, Hanwen Sun, Hong Pan , et al. (14 additional authors not shown)

Abstract: We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks condition… ▽ More We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12884 [pdf, ps, other]

TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks

Authors: Yuanze Hu, Zhaoxin Fan, Xinyu Wang, Gen Li, Ye Qiu, Zhichao Yang, Wenjun Wu, Kejian Wu, Yifan Sun, Xiaotie Deng, Jin Dong

Abstract: Lightweight Vision-Language Models (VLMs) are indispensable for resource-constrained applications. The prevailing approach to aligning vision and language models involves freezing both the vision encoder and the language model while training small connector modules. However, this strategy heavily depends on the intrinsic capabilities of the language model, which can be suboptimal for lightweight m… ▽ More Lightweight Vision-Language Models (VLMs) are indispensable for resource-constrained applications. The prevailing approach to aligning vision and language models involves freezing both the vision encoder and the language model while training small connector modules. However, this strategy heavily depends on the intrinsic capabilities of the language model, which can be suboptimal for lightweight models with limited representational capacity. In this work, we investigate this alignment bottleneck through the lens of mutual information, demonstrating that the constrained capacity of the language model inherently limits the Effective Mutual Information (EMI) between multimodal inputs and outputs, thereby compromising alignment quality. To address this challenge, we propose TinyAlign, a novel framework inspired by Retrieval-Augmented Generation, which strategically retrieves relevant context from a memory bank to enrich multimodal inputs and enhance their alignment. Extensive empirical evaluations reveal that TinyAlign significantly reduces training loss, accelerates convergence, and enhances task performance. Remarkably, it allows models to achieve baseline-level performance with only 40\% of the fine-tuning data, highlighting exceptional data efficiency. Our work thus offers a practical pathway for developing more capable lightweight VLMs while introducing a fresh theoretical lens to better understand and address alignment bottlenecks in constrained multimodal systems. △ Less

Submitted 30 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12629 [pdf, ps, other]

Enhancing Latent Computation in Transformers with Latent Tokens

Authors: Yuchang Sun, Yanxi Chen, Yaliang Li, Bolin Ding

Abstract: Augmenting large language models (LLMs) with auxiliary tokens has emerged as a promising strategy for enhancing model performance. In this work, we introduce a lightweight method termed latent tokens; these are dummy tokens that may be non-interpretable in natural language but steer the autoregressive decoding process of a Transformer-based LLM via the attention mechanism. The proposed latent toke… ▽ More Augmenting large language models (LLMs) with auxiliary tokens has emerged as a promising strategy for enhancing model performance. In this work, we introduce a lightweight method termed latent tokens; these are dummy tokens that may be non-interpretable in natural language but steer the autoregressive decoding process of a Transformer-based LLM via the attention mechanism. The proposed latent tokens can be seamlessly integrated with a pre-trained Transformer, trained in a parameter-efficient manner, and applied flexibly at inference time, while adding minimal complexity overhead to the existing infrastructure of standard Transformers. We propose several hypotheses about the underlying mechanisms of latent tokens and design synthetic tasks accordingly to verify them. Numerical results confirm that the proposed method noticeably outperforms the baselines, particularly in the out-of-distribution generalization scenarios, highlighting its potential in improving the adaptability of LLMs. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.12539 [pdf, ps, other]

Penetration-free Solid-Fluid Interaction on Shells and Rods

Authors: Jinyuan Liu, Yuchen Sun, Yin Yang, Chenfanfu Jiang, Minchen Li, Bo Zhu

Abstract: We introduce a novel approach to simulate the interaction between fluids and thin elastic solids without any penetration. Our approach is centered around an optimization system augmented with barriers, which aims to find a configuration that ensures the absence of penetration while enforcing incompressibility for the fluids and minimizing elastic potentials for the solids. Unlike previous methods… ▽ More We introduce a novel approach to simulate the interaction between fluids and thin elastic solids without any penetration. Our approach is centered around an optimization system augmented with barriers, which aims to find a configuration that ensures the absence of penetration while enforcing incompressibility for the fluids and minimizing elastic potentials for the solids. Unlike previous methods that primarily focus on velocity coherence at the fluid-solid interfaces, we demonstrate the effectiveness and flexibility of explicitly resolving positional constraints, including both explicit representation of solid positions and the implicit representation of fluid level-set interface. To preserve the volume of the fluid, we propose a simple yet efficient approach that adjusts the associated level-set values. Additionally, we develop a distance metric capable of measuring the separation between an implicitly represented surface and a Lagrangian object of arbitrary codimension. By integrating the inertia, solid elastic potential, damping, barrier potential, and fluid incompressibility within a unified system, we are able to robustly simulate a wide range of processes involving fluid interactions with lower-dimensional objects such as shells and rods. These processes include topology changes, bouncing, splashing, sliding, rolling, floating, and more. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.12380 [pdf, ps, other]

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

Authors: Han Weng, Puzhen Wu, Cui Longjie, Yi Zhan, Boyi Liu, Yuanfeng Song, Dun Zeng, Yingxiang Yang, Qianru Zhang, Dong Huang, Xiaoming Yin, Yang Sun, Xing Chen

Abstract: Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly… ▽ More Reinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel Text-to-SQL RL fine-tuning framework named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing inference time and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and structural clarity of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models. △ Less

Submitted 27 June, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.12234 [pdf, other]

Observation of $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (678 additional authors not shown)

Abstract: Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are deter… ▽ More Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are determined to be $({5.75 \pm 0.59 \pm 0.42}) \times 10^{-5}$, $({1.40 \pm 0.33 \pm 0.17}) \times 10^{-5}$, and $({2.64 \pm 0.40 \pm 0.27}) \times 10^{-5}$, respectively, where the first uncertainties are statistical and the second systematic. No evident resonant structures are found in the $p\bar{p}$ and $pη/\bar{p}η$ systems. △ Less

Submitted 18 May, 2025; originally announced May 2025.

Comments: 17 pages, 16 figures

arXiv:2505.12188 [pdf, ps, other]

LLM-DSE: Searching Accelerator Parameters with LLM Agents

Authors: Hanyu Wang, Xinrui Wu, Zijian Ding, Su Zheng, Chengyue Wang, Tony Nowatzki, Yizhou Sun, Jason Cong

Abstract: Even though high-level synthesis (HLS) tools mitigate the challenges of programming domain-specific accelerators (DSAs) by raising the abstraction level, optimizing hardware directive parameters remains a significant hurdle. Existing heuristic and learning-based methods struggle with adaptability and sample efficiency. We present LLM-DSE, a multi-agent framework designed specifically for optimizin… ▽ More Even though high-level synthesis (HLS) tools mitigate the challenges of programming domain-specific accelerators (DSAs) by raising the abstraction level, optimizing hardware directive parameters remains a significant hurdle. Existing heuristic and learning-based methods struggle with adaptability and sample efficiency. We present LLM-DSE, a multi-agent framework designed specifically for optimizing HLS directives. Combining LLM with design space exploration (DSE), our explorer coordinates four agents: Router, Specialists, Arbitrator, and Critic. These multi-agent components interact with various tools to accelerate the optimization process. LLM-DSE leverages essential domain knowledge to identify efficient parameter combinations while maintaining adaptability through verbal learning from online interactions. Evaluations on the HLSyn dataset demonstrate that LLM-DSE achieves substantial $2.55\times$ performance gains over state-of-the-art methods, uncovering novel designs while reducing runtime. Ablation studies validate the effectiveness and necessity of the proposed agent interactions. Our code is open-sourced here: https://github.com/Nozidoali/LLM-DSE. △ Less

Submitted 20 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.12108 [pdf, ps, other]

EarthSynth: Generating Informative Earth Observation with Diffusion Models

Authors: Jiancheng Pan, Shiye Lei, Yuqian Fu, Jiahao Li, Yanxing Liu, Yuze Sun, Xiao He, Long Peng, Xiaomeng Huang, Bo Zhao

Abstract: Remote sensing image (RSI) interpretation typically faces challenges due to the scarcity of labeled data, which limits the performance of RSI interpretation tasks. To tackle this challenge, we propose EarthSynth, a diffusion-based generative foundation model that enables synthesizing multi-category, cross-satellite labeled Earth observation for downstream RSI interpretation tasks. To the best of o… ▽ More Remote sensing image (RSI) interpretation typically faces challenges due to the scarcity of labeled data, which limits the performance of RSI interpretation tasks. To tackle this challenge, we propose EarthSynth, a diffusion-based generative foundation model that enables synthesizing multi-category, cross-satellite labeled Earth observation for downstream RSI interpretation tasks. To the best of our knowledge, EarthSynth is the first to explore multi-task generation for remote sensing. EarthSynth, trained on the EarthSynth-180K dataset, employs the Counterfactual Composition training strategy to improve training data diversity and enhance category control. Furthermore, a rule-based method of R-Filter is proposed to filter more informative synthetic data for downstream tasks. We evaluate our EarthSynth on scene classification, object detection, and semantic segmentation in open-world scenarios, offering a practical solution for advancing RSI interpretation. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: 23 pages

arXiv:2505.12086 [pdf, ps, other]

Observation of an Altered $a_{0}(980)$ Line-shape in $D^{+} \rightarrow π^{+}ηη$ due to the Triangle Loop Rescattering Effect

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (705 additional authors not shown)

Abstract: Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be… ▽ More Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be $(3.67\pm0.12_{\mathrm{stat.}}\pm 0.06_{\mathrm{syst.}})\times 10^{-3}$. Unlike the $a_{0}(980)$ line-shape observed in the decays of charmed mesons to $a_{0}(980)π$ and in the decay $D^{0} \to a_{0}(980)^{-}e^{+}ν_{e}$, where the low-mass side of the $a_0(980)$ is wider than the high-mass side, the $a_{0}(980)$ line-shape in $D^{+} \to a_{0}(980)^{+}η$ is found to be significantly altered, with the high-mass side being wider than the low-mass side. We establish that the $a_0(980)$ line-shape arises from the triangle loop rescattering of $D^+ \to \bar{K}_0^*(1430)^0K^+ \to a_0(980)^+ η$ and $D^+ \to K_0^*(1430)^+\bar{K}^0 \to a_0(980)^+ η$ with a significance of 5.8$σ$. This is the first experimental confirmation of the triangle loop rescattering effect. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.12044 [pdf, ps, other]

FlashBias: Fast Computation of Attention with Bias

Authors: Haixu Wu, Minghao Guo, Yuezhou Ma, Yuanxu Sun, Jianmin Wang, Wojciech Matusik, Mingsheng Long

Abstract: Attention mechanism has emerged as a foundation module of modern deep learning models and has also empowered many milestones in various domains. Moreover, FlashAttention with IO-aware speedup resolves the efficiency issue of standard attention, further promoting its practicality. Beyond canonical attention, attention with bias also widely exists, such as relative position bias in vision and langua… ▽ More Attention mechanism has emerged as a foundation module of modern deep learning models and has also empowered many milestones in various domains. Moreover, FlashAttention with IO-aware speedup resolves the efficiency issue of standard attention, further promoting its practicality. Beyond canonical attention, attention with bias also widely exists, such as relative position bias in vision and language models and pair representation bias in AlphaFold. In these works, prior knowledge is introduced as an additive bias term of attention weights to guide the learning process, which has been proven essential for model performance. Surprisingly, despite the common usage of attention with bias, its targeted efficiency optimization is still absent, which seriously hinders its wide applications in complex tasks. Diving into the computation of FlashAttention, we prove that its optimal efficiency is determined by the rank of the attention weight matrix. Inspired by this theoretical result, this paper presents FlashBias based on the low-rank compressed sensing theory, which can provide fast-exact computation for many widely used attention biases and a fast-accurate approximation for biases in general formalization. FlashBias can fully take advantage of the extremely optimized matrix multiplication operation in modern GPUs, achieving 1.5$\times$ speedup for AlphaFold, and over 2$\times$ speedup for attention with bias in vision and language models without loss of accuracy. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.12039 [pdf, ps, other]

AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research

Authors: Renqi Chen, Haoyang Su, Shixiang Tang, Zhenfei Yin, Qi Wu, Hui Li, Ye Sun, Nanqing Dong, Wanli Ouyang, Philip Torr

Abstract: The Science of Science (SoS) explores the mechanisms underlying scientific discovery, and offers valuable insights for enhancing scientific efficiency and fostering innovation. Traditional approaches often rely on simplistic assumptions and basic statistical tools, such as linear regression and rule-based simulations, which struggle to capture the complexity and scale of modern research ecosystems… ▽ More The Science of Science (SoS) explores the mechanisms underlying scientific discovery, and offers valuable insights for enhancing scientific efficiency and fostering innovation. Traditional approaches often rely on simplistic assumptions and basic statistical tools, such as linear regression and rule-based simulations, which struggle to capture the complexity and scale of modern research ecosystems. The advent of artificial intelligence (AI) presents a transformative opportunity for the next generation of SoS, enabling the automation of large-scale pattern discovery and uncovering insights previously unattainable. This paper offers a forward-looking perspective on the integration of Science of Science with AI for automated research pattern discovery and highlights key open challenges that could greatly benefit from AI. We outline the advantages of AI over traditional methods, discuss potential limitations, and propose pathways to overcome them. Additionally, we present a preliminary multi-agent system as an illustrative example to simulate research societies, showcasing AI's ability to replicate real-world research patterns and accelerate progress in Science of Science research. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11823 [pdf, ps, other]

Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action

Authors: Yuhao Sun, Zhenyi Zhang, Zihan Wang, Tiejun Li, Peijie Zhou

Abstract: Recovering the dynamics from a few snapshots of a high-dimensional system is a challenging task in statistical physics and machine learning, with important applications in computational biology. Many algorithms have been developed to tackle this problem, based on frameworks such as optimal transport and the Schrödinger bridge. A notable recent framework is Regularized Unbalanced Optimal Transport… ▽ More Recovering the dynamics from a few snapshots of a high-dimensional system is a challenging task in statistical physics and machine learning, with important applications in computational biology. Many algorithms have been developed to tackle this problem, based on frameworks such as optimal transport and the Schrödinger bridge. A notable recent framework is Regularized Unbalanced Optimal Transport (RUOT), which integrates both stochastic dynamics and unnormalized distributions. However, since many existing methods do not explicitly enforce optimality conditions, their solutions often struggle to satisfy the principle of least action and meet challenges to converge in a stable and reliable way. To address these issues, we propose Variational RUOT (Var-RUOT), a new framework to solve the RUOT problem. By incorporating the optimal necessary conditions for the RUOT problem into both the parameterization of the search space and the loss function design, Var-RUOT only needs to learn a scalar field to solve the RUOT problem and can search for solutions with lower action. We also examined the challenge of selecting a growth penalty function in the widely used Wasserstein-Fisher-Rao metric and proposed a solution that better aligns with biological priors in Var-RUOT. We validated the effectiveness of Var-RUOT on both simulated data and real single-cell datasets. Compared with existing algorithms, Var-RUOT can find solutions with lower action while exhibiting faster convergence and improved training stability. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11197 [pdf, ps, other]

Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge

Authors: Zhenyi Zhang, Zihan Wang, Yuhao Sun, Tiejun Li, Peijie Zhou

Abstract: Modeling the dynamics from sparsely time-resolved snapshot data is crucial for understanding complex cellular processes and behavior. Existing methods leverage optimal transport, Schrödinger bridge theory, or their variants to simultaneously infer stochastic, unbalanced dynamics from snapshot data. However, these approaches remain limited in their ability to account for cell-cell interactions. Thi… ▽ More Modeling the dynamics from sparsely time-resolved snapshot data is crucial for understanding complex cellular processes and behavior. Existing methods leverage optimal transport, Schrödinger bridge theory, or their variants to simultaneously infer stochastic, unbalanced dynamics from snapshot data. However, these approaches remain limited in their ability to account for cell-cell interactions. This integration is essential in real-world scenarios since intercellular communications are fundamental life processes and can influence cell state-transition dynamics. To address this challenge, we formulate the Unbalanced Mean-Field Schrödinger Bridge (UMFSB) framework to model unbalanced stochastic interaction dynamics from snapshot data. Inspired by this framework, we further propose CytoBridge, a deep learning algorithm designed to approximate the UMFSB problem. By explicitly modeling cellular transitions, proliferation, and interactions through neural networks, CytoBridge offers the flexibility to learn these processes directly from data. The effectiveness of our method has been extensively validated using both synthetic gene regulatory data and real scRNA-seq datasets. Compared to existing methods, CytoBridge identifies growth, transition, and interaction patterns, eliminates false transitions, and reconstructs the developmental landscape with greater accuracy. △ Less

Submitted 1 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

arXiv:2505.11059 [pdf, other]

Upper bound of holographic entanglement entropy combinations

Authors: Xin-Xiang Ju, Ya-Wen Sun, Yang Zhao

Abstract: In this work, we develop a systematic formalism to evaluate the upper bound of a large family of holographic entanglement entropy combinations when fixing $n$ subsystems and fine-tuning one other subsystem. The upper bound configurations and values of these entropy combinations can be derived and classified. The upper bound of these entropy combinations reveals holographic $n+1$-partite entangleme… ▽ More In this work, we develop a systematic formalism to evaluate the upper bound of a large family of holographic entanglement entropy combinations when fixing $n$ subsystems and fine-tuning one other subsystem. The upper bound configurations and values of these entropy combinations can be derived and classified. The upper bound of these entropy combinations reveals holographic $n+1$-partite entanglement that $n$ fixed subsystems participate in. In AdS$_3$/CFT$_2$, AdS$_4$/CFT$_3$, and even higher-dimensional holography, one can, in principle, find different formulas of upper bound values, reflecting the fundamental difference in entanglement structure in different dimensions. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: 52 pages, 15 figures

arXiv:2505.10996 [pdf, other]

Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark

Authors: Yunkang Cao, Yuqi Cheng, Xiaohao Xu, Yiheng Zhang, Yihan Sun, Yuxiang Tan, Yuxin Zhang, Xiaonan Huang, Weiming Shen

Abstract: The practical deployment of Visual Anomaly Detection (VAD) systems is hindered by their sensitivity to real-world imaging variations, particularly the complex interplay between viewpoint and illumination which drastically alters defect visibility. Current benchmarks largely overlook this critical challenge. We introduce Multi-View Multi-Illumination Anomaly Detection (M2AD), a new large-scale benc… ▽ More The practical deployment of Visual Anomaly Detection (VAD) systems is hindered by their sensitivity to real-world imaging variations, particularly the complex interplay between viewpoint and illumination which drastically alters defect visibility. Current benchmarks largely overlook this critical challenge. We introduce Multi-View Multi-Illumination Anomaly Detection (M2AD), a new large-scale benchmark comprising 119,880 high-resolution images designed explicitly to probe VAD robustness under such interacting conditions. By systematically capturing 999 specimens across 10 categories using 12 synchronized views and 10 illumination settings (120 configurations total), M2AD enables rigorous evaluation. We establish two evaluation protocols: M2AD-Synergy tests the ability to fuse information across diverse configurations, and M2AD-Invariant measures single-image robustness against realistic view-illumination effects. Our extensive benchmarking shows that state-of-the-art VAD methods struggle significantly on M2AD, demonstrating the profound challenge posed by view-illumination interplay. This benchmark serves as an essential tool for developing and validating VAD methods capable of overcoming real-world complexities. Our full dataset and test suite will be released at https://hustcyq.github.io/M2AD to facilitate the field. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: Homgepage: https://hustcyq.github.io/M2AD/. Yunkang Cao and Yuqi Cheng contribute equally to this work

arXiv:2505.10311 [pdf, other]

Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems

Authors: Jeffrey Alido, Tongyu Li, Yu Sun, Lei Tian

Abstract: Conventional score-based diffusion models (DMs) may struggle with anisotropic Gaussian diffusion processes due to the required inversion of covariance matrices in the denoising score matching training objective \cite{vincent_connection_2011}. We propose Whitened Score (WS) diffusion models, a novel framework based on stochastic differential equations that learns the Whitened Score function instead… ▽ More Conventional score-based diffusion models (DMs) may struggle with anisotropic Gaussian diffusion processes due to the required inversion of covariance matrices in the denoising score matching training objective \cite{vincent_connection_2011}. We propose Whitened Score (WS) diffusion models, a novel framework based on stochastic differential equations that learns the Whitened Score function instead of the standard score. This approach circumvents covariance inversion, extending score-based DMs by enabling stable training of DMs on arbitrary Gaussian forward noising processes. WS DMs establish equivalence with flow matching for arbitrary Gaussian noise, allow for tailored spectral inductive biases, and provide strong Bayesian priors for imaging inverse problems with structured noise. We experiment with a variety of computational imaging tasks using the CIFAR and CelebA ($64\times64$) datasets and demonstrate that WS diffusion priors trained on anisotropic Gaussian noising processes consistently outperform conventional diffusion priors based on isotropic Gaussian noise. Our code is open-sourced at \href{https://github.com/jeffreyalido/wsdiffusion}{\texttt{github.com/jeffreyalido/wsdiffusion}}. △ Less

Submitted 20 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.10297 [pdf, other]

Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning

Authors: Chibueze Peace Obioma, Youcheng Sun, Mustafa A. Mustafa

Abstract: Federated learning (FL) enhances privacy and reduces communication cost for resource-constrained edge clients by supporting distributed model training at the edge. However, the heterogeneous nature of such devices produces diverse, non-independent, and identically distributed (non-IID) data, making the detection of backdoor attacks more challenging. In this paper, we propose a novel federated repr… ▽ More Federated learning (FL) enhances privacy and reduces communication cost for resource-constrained edge clients by supporting distributed model training at the edge. However, the heterogeneous nature of such devices produces diverse, non-independent, and identically distributed (non-IID) data, making the detection of backdoor attacks more challenging. In this paper, we propose a novel federated representative-attention-based defense mechanism, named FeRA, that leverages cross-client attention over internal feature representations to distinguish benign from malicious clients. FeRA computes an anomaly score based on representation reconstruction errors, effectively identifying clients whose internal activations significantly deviate from the group consensus. Our evaluation demonstrates FeRA's robustness across various FL scenarios, including challenging non-IID data distributions typical of edge devices. Experimental results show that it effectively reduces backdoor attack success rates while maintaining high accuracy on the main task. The method is model-agnostic, attack-agnostic, and does not require labeled reference data, making it well suited to heterogeneous and resource-limited edge deployments. △ Less

Submitted 15 May, 2025; originally announced May 2025.

Comments: Submitted to ESORICS 2025

arXiv:2505.10241 [pdf, ps, other]

Predicting Beyond Training Data via Extrapolation versus Translocation: AI Weather Models and Dubai's Unprecedented 2024 Rainfall

Authors: Y. Qiang Sun, Pedram Hassanzadeh, Tiffany Shaw, Hamid A. Pahlavan

Abstract: Artificial intelligence (AI) models have transformed weather forecasting, but their skill for gray swan extreme events is unclear. Here, we analyze GraphCast and FuXi forecasts of the unprecedented 2024 Dubai storm, which had twice the training set's highest rainfall in that region. Remarkably, GraphCast accurately forecasts this event 8 days ahead. FuXi forecasts the event, but underestimates the… ▽ More Artificial intelligence (AI) models have transformed weather forecasting, but their skill for gray swan extreme events is unclear. Here, we analyze GraphCast and FuXi forecasts of the unprecedented 2024 Dubai storm, which had twice the training set's highest rainfall in that region. Remarkably, GraphCast accurately forecasts this event 8 days ahead. FuXi forecasts the event, but underestimates the rainfall, especially at long lead times. GraphCast's success stems from "translocation": learning from comparable/stronger dynamically similar events in other regions during training via global effective receptive fields. Evidence of "extrapolation" (learning from training set's weaker events) is not found. Even events within the global distribution's tail are poorly forecasted, which is not just due to data imbalance (generalization error) but also spectral bias (optimization error). These findings demonstrate the potential of AI models to forecast regional gray swans and opportunity to improve them through understanding the mechanisms behind their successes and limitations. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.09965 [pdf, ps, other]

MambaControl: Anatomy Graph-Enhanced Mamba ControlNet with Fourier Refinement for Diffusion-Based Disease Trajectory Prediction

Authors: Hao Yang, Tao Tan, Shuai Tan, Weiqin Yang, Kunyan Cai, Calvin Chen, Yue Sun

Abstract: Modelling disease progression in precision medicine requires capturing complex spatio-temporal dynamics while preserving anatomical integrity. Existing methods often struggle with longitudinal dependencies and structural consistency in progressive disorders. To address these limitations, we introduce MambaControl, a novel framework that integrates selective state-space modelling with diffusion pro… ▽ More Modelling disease progression in precision medicine requires capturing complex spatio-temporal dynamics while preserving anatomical integrity. Existing methods often struggle with longitudinal dependencies and structural consistency in progressive disorders. To address these limitations, we introduce MambaControl, a novel framework that integrates selective state-space modelling with diffusion processes for high-fidelity prediction of medical image trajectories. To better capture subtle structural changes over time while maintaining anatomical consistency, MambaControl combines Mamba-based long-range modelling with graph-guided anatomical control to more effectively represent anatomical correlations. Furthermore, we introduce Fourier-enhanced spectral graph representations to capture spatial coherence and multiscale detail, enabling MambaControl to achieve state-of-the-art performance in Alzheimer's disease prediction. Quantitative and regional evaluations demonstrate improved progression prediction quality and anatomical fidelity, highlighting its potential for personalised prognosis and clinical decision support. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.09958 [pdf]

Ultrafast excitation of polar skyrons

Authors: Huaiyu Wang, Vladimir Stoica, Cheng Dai, Marek Paściak, Sujit Das, Tiannan Yang, Mauro A. P. Gonçalves, Jiri Kulda, Margaret R. McCarter, Anudeep Mangu, Yue Cao, Hari Padma, Utkarsh Saha, Diling Zhu, Takahiro Sato, Sanghoon Song, Mathias Hoffmann, Patrick Kramer, Silke Nelson, Yanwen Sun, Quynh Nguyen, Zhan Zhang, Ramamoorthy Ramesh, Lane Martin, Aaron M. Lindenberg , et al. (5 additional authors not shown)

Abstract: Unraveling collective modes arising from coupled degrees of freedom is crucial for understanding complex interactions in solids and developing new functionalities. Unique collective behaviors emerge when two degrees of freedom, ordered on distinct length scales, interact. Polar skyrmions, three-dimensional electric polarization textures in ferroelectric superlattices, disrupt the lattice continuit… ▽ More Unraveling collective modes arising from coupled degrees of freedom is crucial for understanding complex interactions in solids and developing new functionalities. Unique collective behaviors emerge when two degrees of freedom, ordered on distinct length scales, interact. Polar skyrmions, three-dimensional electric polarization textures in ferroelectric superlattices, disrupt the lattice continuity at the nanometer scale with nontrivial topology, leading to previously unexplored collective modes. Here, using terahertz-field excitation and femtosecond x-ray diffraction, we discovered subterahertz collective modes, dubbed 'skyrons', which appear as swirling patterns of atomic displacements functioning as atomic-scale gearsets. Momentum-resolved time-domain measurements of diffuse scattering revealed an avoided crossing in the dispersion relation of skyrons. We further demonstrated that the amplitude and dispersion of skyrons can be controlled by sample temperature and electric-field bias. Atomistic simulations and dynamical phase-field modeling provided microscopic insights into the three-dimensional crystallographic and polarization dynamics. The discovery of skyrons and their coupling with terahertz fields opens avenues for ultrafast control of topological polar structures. △ Less

Submitted 19 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.09916 [pdf, other]

Parasitic loss in microring-waveguide coupling and its impact on wideband nonlinear photonics

Authors: Yi Sun, Daniel Pimbi, Xiyuan Lu, Jordan Stone, Junyeob Song, Zhimin Shi, Kartik Srinivasan

Abstract: Microring resonators enable the enhancement of nonlinear frequency mixing processes, generating output fields at frequencies that widely differ from the inputs, in some cases by more than an octave. The efficiency of such devices depends on effective in- and out-coupling between access waveguides and the microrings at these widely separated frequencies. One successful approach is to separate the c… ▽ More Microring resonators enable the enhancement of nonlinear frequency mixing processes, generating output fields at frequencies that widely differ from the inputs, in some cases by more than an octave. The efficiency of such devices depends on effective in- and out-coupling between access waveguides and the microrings at these widely separated frequencies. One successful approach is to separate the coupling task across multiple waveguides, with a cutoff waveguide (a waveguide that does not support guided modes above a certain wavelength) being judiciously used to prevent unwanted excessive overcoupling at low frequencies. Here, we examine how such a cutoff waveguide can still induce parasitic loss in the coupling region of a microring resonator, thereby impacting nonlinear device performance. We verified this parasitic loss channel through both experiment and simulation, showing that a waveguide optimized for 532 nm (visible) and 780 nm (near-infrared), while nominally cut off at 1550 nm, can still introduce significant parasitic loss at telecom wavelengths. This is studied in the context of visible-telecom optical parametric oscillation, where the excess parasitic loss can be strong enough to prevent threshold from being reached. Our finding elucidates a major challenge for wideband integrated nonlinear photonics processes when efficient coupling of widely-separated frequencies is needed. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.09659 [pdf, ps, other]

LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models

Authors: Long Chen, Xiaotian Song, Yanan Sun

Abstract: Spiking Large Language Models (LLMs) have emerged as an energy-efficient alternative to conventional LLMs through their event-driven computation. To effectively obtain spiking LLMs, researchers develop different ANN-to-SNN conversion methods by leveraging pre-trained ANN parameters while inheriting the energy efficiency of SNN. However, existing conversion methods struggle with extreme activation… ▽ More Spiking Large Language Models (LLMs) have emerged as an energy-efficient alternative to conventional LLMs through their event-driven computation. To effectively obtain spiking LLMs, researchers develop different ANN-to-SNN conversion methods by leveraging pre-trained ANN parameters while inheriting the energy efficiency of SNN. However, existing conversion methods struggle with extreme activation outliers and incompatible nonlinear operations of ANN-based LLMs. To address this, we propose a loss-less ANN-SNN conversion for fully spike-driven LLMs, termed LAS. Specifically, LAS introduces two novel neurons to convert the activation outlier and nonlinear operation of ANN-based LLMs. Moreover, LAS tailors the spike-equivalent Transformer components for spiking LLMs, which can ensure full spiking conversion without any loss of performance. Experimental results on six language models and two vision-language models demonstrate that LAS achieves loss-less conversion. Notably, on OPT-66B, LAS even improves the accuracy of 2\% on the WSC task. In addition, the parameter and ablation studies further verify the effectiveness of LAS. The source code is available at https://github.com/lc783/LAS △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.09284 [pdf, ps, other]

Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations

Authors: Panqi Chen, Yifan Sun, Lei Cheng, Yang Yang, Weichang Li, Yang Liu, Weiqing Liu, Jiang Bian, Shikai Fang

Abstract: Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and… ▽ More Modeling and reconstructing multidimensional physical dynamics from sparse and off-grid observations presents a fundamental challenge in scientific research. Recently, diffusion-based generative modeling shows promising potential for physical simulation. However, current approaches typically operate on on-grid data with preset spatiotemporal resolution, but struggle with the sparsely observed and continuous nature of real-world physical dynamics. To fill the gaps, we present SDIFT, Sequential DIffusion in Functional Tucker space, a novel framework that generates full-field evolution of physical dynamics from irregular sparse observations. SDIFT leverages the functional Tucker model as the latent space representer with proven universal approximation property, and represents observations as latent functions and Tucker core sequences. We then construct a sequential diffusion model with temporally augmented UNet in the functional Tucker space, denoising noise drawn from a Gaussian process to generate the sequence of core tensors. At the posterior sampling stage, we propose a Message-Passing Posterior Sampling mechanism, enabling conditional generation of the entire sequence guided by observations at limited time steps. We validate SDIFT on three physical systems spanning astronomical (supernova explosions, light-year scale), environmental (ocean sound speed fields, kilometer scale), and molecular (organic liquid, millimeter scale) domains, demonstrating significant improvements in both reconstruction accuracy and computational efficiency compared to state-of-the-art approaches. △ Less

Submitted 24 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.08915 [pdf, ps, other]

An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models

Authors: Jialin Mao, Itay Griniasty, Yan Sun, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari

Abstract: Recent experiments have shown that training trajectories of multiple deep neural networks with different architectures, optimization algorithms, hyper-parameter settings, and regularization methods evolve on a remarkably low-dimensional "hyper-ribbon-like" manifold in the space of probability distributions. Inspired by the similarities in the training trajectories of deep networks and linear netwo… ▽ More Recent experiments have shown that training trajectories of multiple deep neural networks with different architectures, optimization algorithms, hyper-parameter settings, and regularization methods evolve on a remarkably low-dimensional "hyper-ribbon-like" manifold in the space of probability distributions. Inspired by the similarities in the training trajectories of deep networks and linear networks, we analytically characterize this phenomenon for the latter. We show, using tools in dynamical systems theory, that the geometry of this low-dimensional manifold is controlled by (i) the decay rate of the eigenvalues of the input correlation matrix of the training data, (ii) the relative scale of the ground-truth output to the weights at the beginning of training, and (iii) the number of steps of gradient descent. By analytically computing and bounding the contributions of these quantities, we characterize phase boundaries of the region where hyper-ribbons are to be expected. We also extend our analysis to kernel machines and linear models that are trained with stochastic gradient descent. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08838 [pdf, ps, other]

Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Authors: Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

Abstract: Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveragi… ▽ More Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveraging the standardized nature of US reports. By aligning modular text fragments with diverse imaging data and curating a bilingual English-Chinese dataset, the method achieves consistent and clinically accurate text generation across organ sites and languages. Fine-tuning with selective unfreezing of the vision transformer (ViT) further improves text-image alignment. Compared to the previous state-of-the-art KMVE method, our approach achieves relative gains of about 2\% in BLEU scores, approximately 3\% in ROUGE-L, and about 15\% in CIDEr, while significantly reducing errors such as missing or incorrect content. By unifying multi-organ and multi-language report generation into a single, scalable framework, this work demonstrates strong potential for real-world clinical workflows. △ Less

Submitted 19 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08583 [pdf]

Strain Induced Robust Skyrmion lattice at Room Temperature in van der Waals Ferromagnet

Authors: Xinyi Zhou, Iftikhar Ahmed Malik, Ruihuan Duan, Hanqing Shi, Chen Liu, Yan Luo, Yue Sun, Ruixi Chen, Yilin Liu, Shian Xia, Vanessa Li Zhang, Sheng Liu, Chao Zhu, Xixiang Zhang, Yi Du, Zheng Liu, Ting Yu

Abstract: Manipulating topological magnetic orders of two-dimensional (2D) magnets by strain, once achieved, offers enormous potential for future low-power flexible spintronic applications. In this work, by placing Fe3GaTe2 (FGaT), a room-temperature 2D ferromagnet, on flexible substrate, we demonstrate a field-free and robust formation of skyrmion lattice induced by strain. By applying a minimal strain of… ▽ More Manipulating topological magnetic orders of two-dimensional (2D) magnets by strain, once achieved, offers enormous potential for future low-power flexible spintronic applications. In this work, by placing Fe3GaTe2 (FGaT), a room-temperature 2D ferromagnet, on flexible substrate, we demonstrate a field-free and robust formation of skyrmion lattice induced by strain. By applying a minimal strain of ~0.80% to pre-annealed FGaT flakes, the Magnetic Force Microscopy (MFM) tip directly triggers the transition from maze-like domains to an ordered skyrmion lattice while scanning the sample surface. The skyrmion lattice is rather stable against extensive cyclic mechanical testing (stretching, bending, and twisting over 2000 cycles each). It also exhibited stability across a wide range of magnetic fields (~2.9 kOe) and temperatures (~ 323 K), as well as long-term retention stability, highlighting its robustness and field free stabilization. The strain effect reduces the lattice symmetry and enhances the Dzyaloshinskii-Moriya interaction (DMI) of FGaT, thus stabilizing the skyrmion lattice. Our findings highlight the potential of FGaT for integrating magnetic skyrmions into future low-power-consumption flexible spintronics devices. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08534 [pdf, ps, other]

Holographic geometry/real-space entanglement correspondence and metric reconstruction

Authors: Xuanting Ji, Xin-Xiang Ju, Ya-Wen Sun, Yuan-Tai Wang, He-Lin Zhou

Abstract: In holography, the boundary entanglement structure is believed to be encoded in the bulk geometry. In this work, we investigate the precise correspondence between the boundary real-space entanglement and the bulk geometry. By the boundary real-space entanglement, we refer to the conditional mutual information (CMI) for two infinitesimal subsystems separated by a distance $l$, and the corresponding… ▽ More In holography, the boundary entanglement structure is believed to be encoded in the bulk geometry. In this work, we investigate the precise correspondence between the boundary real-space entanglement and the bulk geometry. By the boundary real-space entanglement, we refer to the conditional mutual information (CMI) for two infinitesimal subsystems separated by a distance $l$, and the corresponding bulk geometry is at a radial position $z_*$, namely the turning point of the entanglement wedge for a boundary region with a length scale $l$. In a generic geometry described by a given coordinate system, $z_*$ can be determined locally by $l$, while the exact expression for $z_*(l)$ depends on the gauge choice, reflecting the inherent nonlocality of this seemingly local correspondence. We propose to specify the function $z_*(l)$ as the criterion for a gauge choice, and with the specified gauge function, we verify the exact correspondence between the boundary real-space entanglement and the bulk geometry. Inspired by this correspondence, we propose a new method of bulk metric reconstruction from boundary entanglement data, namely the CMI reconstruction. In this CMI proposal, with the gauge fixed a priori by specifying $z_*(l)$, the bulk metric can be reconstructed from the relation between the bulk geometry and the boundary CMI. The CMI reconstruction method establishes a connection between the differential entropy prescription and Bilson's general algorithm for metric reconstruction. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: 30 pages, 12 figures

arXiv:2505.08295 [pdf, ps, other]

A Practical Introduction to Deep Reinforcement Learning

Authors: Yinghan Sun, Hongxi Wang, Hua Chen, Wei Zhang

Abstract: Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking t… ▽ More Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking to enter the field. This tutorial aims to provide a concise, intuitive, and practical introduction to DRL, with a particular focus on the Proximal Policy Optimization (PPO) algorithm, which is one of the most widely used and effective DRL methods. To facilitate learning, we organize all algorithms under the Generalized Policy Iteration (GPI) framework, offering readers a unified and systematic perspective. Instead of lengthy theoretical proofs, we emphasize intuitive explanations, illustrative examples, and practical engineering techniques. This work serves as an efficient and accessible guide, helping readers rapidly progress from basic concepts to the implementation of advanced DRL algorithms. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08289 [pdf, ps, other]

doi 10.1021/acsnano.4c17371

Nonlinear optical response in kagome lattice with inversion symmetry breaking

Authors: Xiangyang Liu, Junwen Lai, Jie Zhan, Tianye Yu, Peitao Liu, Seiji Yunoki, Xing-Qiu Chen, Yan Sun

Abstract: The kagome lattice is a fundamental model structure in condensed matter physics and materials science featuring symmetry-protected flat bands, saddle points, and Dirac points. This structure has emerged as an ideal platform for exploring various quantum physics. By combining effective model analysis and first-principles calculations, we propose that the synergy among inversion symmetry breaking, f… ▽ More The kagome lattice is a fundamental model structure in condensed matter physics and materials science featuring symmetry-protected flat bands, saddle points, and Dirac points. This structure has emerged as an ideal platform for exploring various quantum physics. By combining effective model analysis and first-principles calculations, we propose that the synergy among inversion symmetry breaking, flat bands, and saddle point-related van Hove singularities within the kagome lattice holds significant potential for generating strong second-order nonlinear optical response. This property provides an inspiring insight into the practical application of the kagome-like materials, which is helpful for a comprehensive understanding of kagome lattice-related physics. Moreover, this work offers an alternative approach for designing materials with strong a second-order nonlinear optical response. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08119 [pdf, ps, other]

doi 10.1145/3698205.3729544

Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting

Authors: Wenhan Lyu, Yimeng Wang, Yifan Sun, Yixuan Zhang

Abstract: Generative AI (GenAI), especially Large Language Models (LLMs), is rapidly reshaping both programming workflows and computer science education. Many programmers now incorporate GenAI tools into their workflows, including for collaborative coding tasks such as pair programming. While prior research has demonstrated the benefits of traditional pair programming and begun to explore GenAI-assisted cod… ▽ More Generative AI (GenAI), especially Large Language Models (LLMs), is rapidly reshaping both programming workflows and computer science education. Many programmers now incorporate GenAI tools into their workflows, including for collaborative coding tasks such as pair programming. While prior research has demonstrated the benefits of traditional pair programming and begun to explore GenAI-assisted coding, the role of LLM-based tools as collaborators in pair programming remains underexamined. In this work, we conducted a mixed-methods study with 39 undergraduate students to examine how GenAI influences collaboration, learning, and performance in pair programming. Specifically, students completed six in-class assignments under three conditions: Traditional Pair Programming (PP), Pair Programming with GenAI (PAI), and Solo Programming with GenAI (SAI). They used both LLM-based inline completion tools (e.g., GitHub Copilot) and LLM-based conversational tools (e.g., ChatGPT). Our results show that students in PAI achieved the highest assignment scores, whereas those in SAI attained the lowest. Additionally, students' attitudes toward LLMs' programming capabilities improved significantly after collaborating with LLM-based tools, and preferences were largely shaped by the perceived usefulness for completing assignments and learning programming skills, as well as the quality of collaboration. Our qualitative findings further reveal that while students appreciated LLM-based tools as valuable pair programming partners, they also identified limitations and had different expectations compared to human teammates. Our study provides one of the first empirical evaluations of GenAI as a pair programming collaborator through a comparison of three conditions (PP, PAI, and SAI). We also discuss the design implications and pedagogical considerations for future GenAI-assisted pair programming approaches. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: Accepted by Learning @ Scale 2025

arXiv:2505.07763 [pdf, other]

Gravitationally Bound Gas Determines Star Formation in the Galaxy

Authors: Sihan Jiao, Jingwen Wu, Zhi-Yu Zhang, Neal J. Evans II, Chao-Wei Tsai, Di Li, Hauyu Baobab Liu, Yong Shi, Junzhi Wang, Qizhou Zhang, Yuxin Lin, Linjing Feng, Xing Lu, Yan Sun, Hao Ruan, Fangyuan Deng

Abstract: Stars form from molecular gas under complex conditions influenced by multiple competing physical mechanisms, such as gravity, turbulence, and magnetic fields. However, accurately identifying the fraction of gas actively involved in star formation remains challenging. Using dust continuum observations from the Herschel Space Observatory, we derived column density maps and their associated probabili… ▽ More Stars form from molecular gas under complex conditions influenced by multiple competing physical mechanisms, such as gravity, turbulence, and magnetic fields. However, accurately identifying the fraction of gas actively involved in star formation remains challenging. Using dust continuum observations from the Herschel Space Observatory, we derived column density maps and their associated probability distribution functions (N-PDFs). Assuming the power-law component in the N-PDFs corresponds to gravitationally bound (and thus star-forming) gas, we analyzed a diverse sample of molecular clouds spanning a wide range of mass and turbulence conditions. This sample included 21 molecular clouds from the solar neighborhood ($d<$500 pc) and 16 high-mass star-forming molecular clouds. For these two groups, we employed the counts of young stellar objects (YSOs) and mid-/far-infrared luminosities as proxies for star formation rates (SFR), respectively. Both groups revealed a tight linear correlation between the mass of gravitationally bound gas and the SFR, suggesting a universally constant star formation efficiency in the gravitationally bound gas phase. The star-forming gas mass derived from threshold column densities ($N_{\mbox {threshold}}$) varies from cloud to cloud and is widely distributed over the range of $\sim$1--17$\times$10$^{21}$ cm$^{-2}$ based on N-PDF analysis. But in solar neighborhood clouds, it is in rough consistency with the traditional approach using $A_{\rm V}$ $\ge$ 8 mag. In contrast, in high turbulent regions (e.g., the Central Molecular Zone) where the classical approach fails, the gravitationally bound gas mass and SFR still follow the same correlation as other high-mass star-forming regions in the Milky Way. Our findings also strongly support the interpretation that gas in the power-law component of the N-PDF is undergoing self-gravitational collapse to form stars. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: 23 pages, 17 figures. Submitted to A&A

arXiv:2505.07674 [pdf]

Joint Graph Convolution and Sequential Modeling for Scalable Network Traffic Estimation

Authors: Nan Jiang, Wenxuan Zhu, Xu Han, Weiqiang Huang, Yumeng Sun

Abstract: This study focuses on the challenge of predicting network traffic within complex topological environments. It introduces a spatiotemporal modeling approach that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU). The GCN component captures spatial dependencies among network nodes, while the GRU component models the temporal evolution of traffic data. This combination al… ▽ More This study focuses on the challenge of predicting network traffic within complex topological environments. It introduces a spatiotemporal modeling approach that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU). The GCN component captures spatial dependencies among network nodes, while the GRU component models the temporal evolution of traffic data. This combination allows for precise forecasting of future traffic patterns. The effectiveness of the proposed model is validated through comprehensive experiments on the real-world Abilene network traffic dataset. The model is benchmarked against several popular deep learning methods. Furthermore, a set of ablation experiments is conducted to examine the influence of various components on performance, including changes in the number of graph convolution layers, different temporal modeling strategies, and methods for constructing the adjacency matrix. Results indicate that the proposed approach achieves superior performance across multiple metrics, demonstrating robust stability and strong generalization capabilities in complex network traffic forecasting scenarios. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07546 [pdf, ps, other]

GRADA: Graph-based Reranker against Adversarial Documents Attack

Authors: Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu

Abstract: Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically si… ▽ More Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically similar to the query. Notably, while these adversarial documents resemble the query, they exhibit weak similarity to benign documents in the retrieval set. Thus, we propose a simple yet effective Graph-based Reranking against Adversarial Document Attacks (GRADA) framework aiming at preserving retrieval quality while significantly reducing the success of adversaries. Our study evaluates the effectiveness of our approach through experiments conducted on five LLMs: GPT-3.5-Turbo, GPT-4o, Llama3.1-8b, Llama3.1-70b, and Qwen2.5-7b. We use three datasets to assess performance, with results from the Natural Questions dataset demonstrating up to an 80% reduction in attack success rates while maintaining minimal loss in accuracy. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07396 [pdf, ps, other]

TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

Authors: Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, Mathias Pechinger, Zhaiyu Chen, Yao Sun, Alejandro Rueda Segura, Ziyang Xu, Omar AbdelGafar, Mansour Mehranfar, Chandan Yeshwanth, Yueh-Cheng Liu, Hadi Yazdi, Jiapan Wang, Stefan Auer, Katharina Anders, Klaus Bogenberger, Andre Borrmann , et al. (9 additional authors not shown)

Abstract: Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually… ▽ More Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win △ Less

Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

Comments: Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:2505.07276 [pdf, ps, other]

FCPCA: Fuzzy clustering of high-dimensional time series based on common principal component analysis

Authors: Ziling Ma, Ángel López-Oriona, Hernando Ombao, Ying Sun

Abstract: Clustering multivariate time series data is a crucial task in many domains, as it enables the identification of meaningful patterns and groups in time-evolving data. Traditional approaches, such as crisp clustering, rely on the assumption that clusters are sufficiently separated with little overlap. However, real-world data often defy this assumption, exhibiting overlapping distributions or overla… ▽ More Clustering multivariate time series data is a crucial task in many domains, as it enables the identification of meaningful patterns and groups in time-evolving data. Traditional approaches, such as crisp clustering, rely on the assumption that clusters are sufficiently separated with little overlap. However, real-world data often defy this assumption, exhibiting overlapping distributions or overlapping clouds of points and blurred boundaries between clusters. Fuzzy clustering offers a compelling alternative by allowing partial membership in multiple clusters, making it well-suited for these ambiguous scenarios. Despite its advantages, current fuzzy clustering methods primarily focus on univariate time series, and for multivariate cases, even datasets of moderate dimensionality become computationally prohibitive. This challenge is further exacerbated when dealing with time series of varying lengths, leaving a clear gap in addressing the complexities of modern datasets. This work introduces a novel fuzzy clustering approach based on common principal component analysis to address the aforementioned shortcomings. Our method has the advantage of efficiently handling high-dimensional multivariate time series by reducing dimensionality while preserving critical temporal features. Extensive numerical results show that our proposed clustering method outperforms several existing approaches in the literature. An interesting application involving brain signals from different drivers recorded from a simulated driving experiment illustrates the potential of the approach. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.06924 [pdf]

Spontaneous Enhancement of Dzyaloshinskii-Moriya Interaction via Field-Cooling-Induced Interface Engineering in 2D van der Waals Ferromagnetic ternary Tellurides

Authors: Shian Xia, Yan Luo, Iftikhar Ahmed Malik, Xinyi Zhou, Keying Han, Yue Sun, Haoyun Lin, Hanqing Shi, Yingchun Cheng, Vanessa Li Zhang, Yi Du, Sheng Liu, Chao Zhu, Ting Yu

Abstract: The emergence of two-dimensional (2D) van der Waals (vdW) ferromagnets has opened new avenues for exploring topological spin textures and their applications in next-generation spintronics. Among these materials, Fe3GaTe2 (FGaT) emerges as a model system due to its room-temperature skyrmion phases, which are stabilized by strong Dzyaloshinskii-Moriya interaction (DMI). However, the atomistic origin… ▽ More The emergence of two-dimensional (2D) van der Waals (vdW) ferromagnets has opened new avenues for exploring topological spin textures and their applications in next-generation spintronics. Among these materials, Fe3GaTe2 (FGaT) emerges as a model system due to its room-temperature skyrmion phases, which are stabilized by strong Dzyaloshinskii-Moriya interaction (DMI). However, the atomistic origins of DMI in centrosymmetric vdW lattices remain elusive. Here, we report a spontaneous DMI enhancement mechanism driven by FC in FGaT and its analog Fe3GeTe2 (FGeT). Combining Raman spectroscopy and scanning transmission electron microscopy (STEM), we have observed the irreversible precipitation of FeTe2 in annealed FGaT. The resulting FeTe2/FGaT heterostructure is considered to break the symmetry and significantly enhance the DMI. Furthermore, similar phenomenon has been observed in the family ferromagnetic material FGeT as well. Additionally, the precipitation of FeTe2 varies significantly with different thicknesses of FGaT, aligning closely with the reported behavior of skyrmions. This discovery provides new insights into the mechanisms behind the origin of the DMI in ternary tellurides, paving the way for advanced spintronic applications. △ Less

Submitted 17 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.06896 [pdf, ps, other]

RCOMPSs: A Scalable Runtime System for R Code Execution on Manycore Systems

Authors: Xiran Zhang, Javier Conejero, Sameh Abdulah, Jorge Ejarque, Ying Sun, Rosa M. Badia, David E. Keyes, Marc G. Genton

Abstract: R has become a cornerstone of scientific and statistical computing due to its extensive package ecosystem, expressive syntax, and strong support for reproducible analysis. However, as data sizes and computational demands grow, native R parallelism support remains limited. This paper presents RCOMPSs, a scalable runtime system that enables efficient parallel execution of R applications on multicore… ▽ More R has become a cornerstone of scientific and statistical computing due to its extensive package ecosystem, expressive syntax, and strong support for reproducible analysis. However, as data sizes and computational demands grow, native R parallelism support remains limited. This paper presents RCOMPSs, a scalable runtime system that enables efficient parallel execution of R applications on multicore and manycore systems. RCOMPSs adopts a dynamic, task-based programming model, allowing users to write code in a sequential style, while the runtime automatically handles asynchronous task execution, dependency tracking, and scheduling across available resources. We present RCOMPSs using three representative data analysis algorithms, i.e., K-nearest neighbors (KNN) classification, K-means clustering, and linear regression and evaluate their performance on two modern HPC systems: KAUST Shaheen-III and Barcelona Supercomputing Center (BSC) MareNostrum 5. Experimental results reveal that RCOMPSs demonstrates both strong and weak scalability on up to 128 cores per node and across 32 nodes. For KNN and K-means, parallel efficiency remains above 70% in most settings, while linear regression maintains acceptable performance under shared and distributed memory configurations despite its deeper task dependencies. Overall, RCOMPSs significantly enhances the parallel capabilities of R with minimal, automated, and runtime-aware user intervention, making it a practical solution for large-scale data analytics in high-performance environments. △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.06865 [pdf, ps, other]

An ultrastable hard x-ray attosecond split-delay line

Authors: Yanwen Sun, Haoyuan Li, Yoshio Ichii, Diling Zhu

Abstract: We present a novel split-delay line design for generating hard x-ray attosecond pulse pulse pairs. The design introduces an unconventional delay adjustment mechanism, where an x-ray mirror pair rotation was used for adjusting the path length differential between two beam paths. The exit beam pointing stability is guaranteed by the mirror-pair self-compensating geometry, therefore enabling stable c… ▽ More We present a novel split-delay line design for generating hard x-ray attosecond pulse pulse pairs. The design introduces an unconventional delay adjustment mechanism, where an x-ray mirror pair rotation was used for adjusting the path length differential between two beam paths. The exit beam pointing stability is guaranteed by the mirror-pair self-compensating geometry, therefore enabling stable continuous delay adjustments. We present a parameter study for this concept covering 5-11 keV photon energies with high efficiency over a delay time coverage window of 20-femstosecond with sub-20 attosecond scanning resolution. Wavefront simulations incorporating realistic mirror parameters demonstrate that the system achieves high throughput and is capable of delivering high-peak-intensity pulses. Attosecond x-ray pump x-ray probe capability enabled by such a delay line is poised to unlock a wide range of hard x-ray nonlinear spectroscopy measurements at sub-femtosecond timescales for the first time. △ Less

Submitted 11 May, 2025; originally announced May 2025.

Showing 201–250 of 6,538 results for author: Suen, Y