Search | arXiv e-print repository

Fast Transitions of X-ray Variability in the Neutron Star Low Mass X-ray Binary Cygnus X-2

Authors: Liang Zhang, Mariano Méndez, Hua Feng, Diego Altamirano, Zi-xu Yang, Qing-chang Zhao, Shuang-nan Zhang, Lian Tao, Yue Huang, Xiang Ma, Shu-mei Jia, Ming-yu Ge, Li-ming Song, Jin-lu Qu, Shu Zhang

Abstract: We present a spectral-timing analysis of two NICER observations of the weakly magnetized neutron star low-mass X-ray binary Cygnus X-2. During these observations, we detect a rapid transition from a narrow 50-Hz horizontal-branch oscillation to a broad 5-Hz normal-branch oscillation, accompanied by an increase in source flux and a decrease in spectral hardness. Thanks to the large effective area o… ▽ More We present a spectral-timing analysis of two NICER observations of the weakly magnetized neutron star low-mass X-ray binary Cygnus X-2. During these observations, we detect a rapid transition from a narrow 50-Hz horizontal-branch oscillation to a broad 5-Hz normal-branch oscillation, accompanied by an increase in source flux and a decrease in spectral hardness. Thanks to the large effective area of NICER, we are able to conduct a detailed comparison of the spectra associated with different types of quasi-periodic oscillations (QPOs) on short timescales. By fitting the spectra with a model that includes a disc and Comptonization components plus two emission lines, we find that the parameters of the disc component do not change significantly during the transition. However, assuming a fixed electron temperature, the optical depth of the Comptonization component decreases significantly. This drop in optical depth may be attributed to the expansion of the boundary layer or spreading layer.In addition, we find that the rms spectra for both the HBO and NBO are hard, suggesting that the boundary layer or spreading layer is driving the variability. We discuss the potential physical origin of the different types of QPOs. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 12 pages, 7 figures, accepted for publication in ApJ

arXiv:2506.13415 [pdf, other]

Simple is what you need for efficient and accurate medical image segmentation

Authors: Xiang Yu, Yayan Chen, Guannan He, Qing Zeng, Yue Qin, Meiling Liang, Dandan Luo, Yimei Liao, Zeyu Ren, Cheng Kang, Delong Yang, Bocheng Liang, Bin Pu, Ying Yuan, Shengli Li

Abstract: While modern segmentation models often prioritize performance over practicality, we advocate a design philosophy prioritizing simplicity and efficiency, and attempted high performance segmentation model design. This paper presents SimpleUNet, a scalable ultra-lightweight medical image segmentation model with three key innovations: (1) A partial feature selection mechanism in skip connections for r… ▽ More While modern segmentation models often prioritize performance over practicality, we advocate a design philosophy prioritizing simplicity and efficiency, and attempted high performance segmentation model design. This paper presents SimpleUNet, a scalable ultra-lightweight medical image segmentation model with three key innovations: (1) A partial feature selection mechanism in skip connections for redundancy reduction while enhancing segmentation performance; (2) A fixed-width architecture that prevents exponential parameter growth across network stages; (3) An adaptive feature fusion module achieving enhanced representation with minimal computational overhead. With a record-breaking 16 KB parameter configuration, SimpleUNet outperforms LBUNet and other lightweight benchmarks across multiple public datasets. The 0.67 MB variant achieves superior efficiency (8.60 GFLOPs) and accuracy, attaining a mean DSC/IoU of 85.76%/75.60% on multi-center breast lesion datasets, surpassing both U-Net and TransUNet. Evaluations on skin lesion datasets (ISIC 2017/2018: mDice 84.86%/88.77%) and endoscopic polyp segmentation (KVASIR-SEG: 86.46%/76.48% mDice/mIoU) confirm consistent dominance over state-of-the-art models. This work demonstrates that extreme model compression need not compromise performance, providing new insights for efficient and accurate medical image segmentation. Codes can be found at https://github.com/Frankyu5666666/SimpleUNet. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 15 pages, 11 figures

ACM Class: I.4.6

arXiv:2506.13402 [pdf, ps, other]

A Dynamic Relaxation Framework for Global Solution of ACOPF

Authors: Yu-Yang Tang, Liang Chen, Sheng-Jie Chen, Yu-Hong Dai, Bo Zhou, Xiaomeng Ai

Abstract: Solving the Alternating Current Optimal Power Flow (AC OPF) problem to global optimality remains challenging due to its nonconvex quadratic constraints. In this paper, we present a unified framework that combines static piecewise relaxations with dynamic cut-generation mechanism to systematically tighten the classic Second-Order Cone Programming (SOCP) relaxation to arbitrarily small conic violati… ▽ More Solving the Alternating Current Optimal Power Flow (AC OPF) problem to global optimality remains challenging due to its nonconvex quadratic constraints. In this paper, we present a unified framework that combines static piecewise relaxations with dynamic cut-generation mechanism to systematically tighten the classic Second-Order Cone Programming (SOCP) relaxation to arbitrarily small conic violation, thus enabling the recovery of globally optimal solutions. Two static formulations, Pyramidal Relaxation (PR) and Quasi-Pyramidal Relaxation (QPR), are introduced to tighten each branch-flow second-order cone via a finite union of wedges, providing controllable accuracy. Their dynamic counterparts, Dynamic PR (DPR) and Dynamic QPR (DQPR), embed on-the-fly cut generation within a branch-and-cut solver to improve scalability. Convergence is further accelerated through warm starts and a lightweight local-search post-processing. Extensive experiments on benchmarks demonstrate effective elimination of conic violations and flexible trade-offs between solution accuracy and runtime. Practical guidelines are derived for selecting appropriate variants based on network size and accuracy requirements. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Full version of a submission to IEEE Transactions on Power Systems. Includes all proofs and algorithm pseudocode

arXiv:2506.13334 [pdf, ps, other]

Measurement of the $Ω_c^0$ and $Ξ_c^0$ baryon lifetimes using hadronic $b$-baryon decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1141 additional authors not shown)

Abstract: The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses top… ▽ More The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses topologically and kinematically similar $B^- \rightarrow D^0(\rightarrow K^-K^+π^-π^+)~π^-$ decays for normalisation. The measured lifetimes are $τ_{Ω_c^0} = 276.3 \pm 19.4~\rm{(stat)} \pm 1.8~\rm{(syst)} \pm 0.7~(τ_{D^0})~\rm{fs}$, $τ_{Ξ_c^0} = 149.2 \pm ~\,2.5~\rm{(stat)} \pm 0.9~\rm{(syst)} \pm 0.4~(τ_{D^0})~\rm{fs}$, where the first uncertainty is statistical, the second systematic and the third due to the uncertainty of the $D^0$ lifetime. These results are consistent with previous measurements performed by the LHCb experiment. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3875/ (LHCb public pages)

Report number: LHCb-PAPER-2025-013,CERN-EP-2025-117

arXiv:2506.13274 [pdf, ps, other]

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Authors: Hongyuan Dong, Dingkang Yang, Xiao Liang, Chao Feng, Jiao Ran

Abstract: Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we pro… ▽ More Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose \textbf{AdaLRS}, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide experiment results to show that the optimization of training loss and loss descent velocity in foundation model pretraining are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via theoretical analysis. Experiments on both LLM and VLM pretraining show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness, with model performance improved accordingly. We also show the robust generalizability of AdaLRS across varying training scenarios, such as different model sizes, training paradigms, and base learning rate scheduler choices. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13205 [pdf, ps, other]

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

Authors: Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang

Abstract: With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobi… ▽ More With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon VLMs. Our method manipulates only the visual inputs of a portion of the training samples - without altering their corresponding labels or instructions - thereby injecting malicious behaviors into the model. Once fine-tuned with this tampered data, the agent will exhibit attacker-controlled responses when a specific visual trigger is introduced at inference time. The core of our approach lies in aligning the gradients of poisoned samples with those of a chosen target instance, embedding backdoor-relevant features into the poisoned training data. To maintain stealth and enhance robustness, we develop three realistic visual triggers: static visual patches, dynamic motion cues, and subtle low-opacity overlays. We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use. Results show that our attack achieves high attack success rates (up to 94.67 percent) while maintaining high clean-task performance (FSR up to 95.85 percent). Additionally, ablation studies shed light on how various design choices affect the efficacy and concealment of the attack. Overall, this work is the first to expose critical security flaws in VLM-based mobile agents, highlighting their susceptibility to clean-label backdoor attacks and the urgent need for effective defense mechanisms in their training pipelines. △ Less

Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

Comments: 12 pages

arXiv:2506.13201 [pdf, ps, other]

A Comprehensive Survey on Deep Learning Solutions for 3D Flood Mapping

Authors: Wenfeng Jia, Bin Liang, Yuxi Liu, Muhammad Arif Khan, Lihong Zheng

Abstract: Flooding remains a major global challenge, worsened by climate change and urbanization, demanding advanced solutions for effective disaster management. While traditional 2D flood mapping techniques provide limited insights, 3D flood mapping, powered by deep learning (DL), offers enhanced capabilities by integrating flood extent and depth. This paper presents a comprehensive survey of deep learning… ▽ More Flooding remains a major global challenge, worsened by climate change and urbanization, demanding advanced solutions for effective disaster management. While traditional 2D flood mapping techniques provide limited insights, 3D flood mapping, powered by deep learning (DL), offers enhanced capabilities by integrating flood extent and depth. This paper presents a comprehensive survey of deep learning-based 3D flood mapping, emphasizing its advancements over 2D maps by integrating flood extent and depth for effective disaster management and urban planning. The survey categorizes deep learning techniques into task decomposition and end-to-end approaches, applicable to both static and dynamic flood features. We compare key DL architectures, highlighting their respective roles in enhancing prediction accuracy and computational efficiency. Additionally, this work explores diverse data sources such as digital elevation models, satellite imagery, rainfall, and simulated data, outlining their roles in 3D flood mapping. The applications reviewed range from real-time flood prediction to long-term urban planning and risk assessment. However, significant challenges persist, including data scarcity, model interpretability, and integration with traditional hydrodynamic models. This survey concludes by suggesting future directions to address these limitations, focusing on enhanced datasets, improved models, and policy implications for flood management. This survey aims to guide researchers and practitioners in leveraging DL techniques for more robust and reliable 3D flood mapping, fostering improved flood management strategies. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13127 [pdf, ps, other]

I$^2$S-TFCKD: Intra-Inter Set Knowledge Distillation with Time-Frequency Calibration for Speech Enhancement

Authors: Jiaming Cheng, Ruiyu Liang, Chao Xu, Ye Ni, Wei Zhou, Björn W. Schuller, Xiaoshuai Hao

Abstract: In recent years, complexity compression of neural network (NN)-based speech enhancement (SE) models has gradually attracted the attention of researchers, especially in scenarios with limited hardware resources or strict latency requirements. The main difficulties and challenges lie in achieving a balance between complexity and performance according to the characteristics of the task. In this paper… ▽ More In recent years, complexity compression of neural network (NN)-based speech enhancement (SE) models has gradually attracted the attention of researchers, especially in scenarios with limited hardware resources or strict latency requirements. The main difficulties and challenges lie in achieving a balance between complexity and performance according to the characteristics of the task. In this paper, we propose an intra-inter set knowledge distillation (KD) framework with time-frequency calibration (I$^2$S-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully utilizes the time-frequency differential information of speech while promoting global knowledge flow. Firstly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. Secondly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through residual fusion to form the fused feature set that enables inter-set knowledge interaction. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: submitted to IEEE Transactions on Neural Networks and Learning Systems

arXiv:2506.13094 [pdf, ps, other]

MorphSAM: Learning the Morphological Prompts from Atlases for Spine Image Segmentation

Authors: Dingwei Fan, Junyong Zhao, Chunlin Li, Xinlong Wang, Ronghan Zhang, Mingliang Wang, Qi Zhu, Haipeng Si, Daoqiang Zhang, Liang Sun

Abstract: Spine image segmentation is crucial for clinical diagnosis and treatment of spine diseases. The complex structure of the spine and the high morphological similarity between individual vertebrae and adjacent intervertebral discs make accurate spine segmentation a challenging task. Although the Segment Anything Model (SAM) has been developed, it still struggles to effectively capture and utilize mor… ▽ More Spine image segmentation is crucial for clinical diagnosis and treatment of spine diseases. The complex structure of the spine and the high morphological similarity between individual vertebrae and adjacent intervertebral discs make accurate spine segmentation a challenging task. Although the Segment Anything Model (SAM) has been developed, it still struggles to effectively capture and utilize morphological information, limiting its ability to enhance spine image segmentation performance. To address these challenges, in this paper, we propose a MorphSAM that explicitly learns morphological information from atlases, thereby strengthening the spine image segmentation performance of SAM. Specifically, the MorphSAM includes two fully automatic prompt learning networks, 1) an anatomical prompt learning network that directly learns morphological information from anatomical atlases, and 2) a semantic prompt learning network that derives morphological information from text descriptions converted from the atlases. Then, the two learned morphological prompts are fed into the SAM model to boost the segmentation performance. We validate our MorphSAM on two spine image segmentation tasks, including a spine anatomical structure segmentation task with CT images and a lumbosacral plexus segmentation task with MR images. Experimental results demonstrate that our MorphSAM achieves superior segmentation performance when compared to the state-of-the-art methods. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13093 [pdf, ps, other]

Acceleration and Collimation of the Two-Sided Jets in the Nearby Low-luminosity Active Galactic Nucleus NGC 4261 (3C 270)

Authors: Xi Yan, Lang Cui, Kazuhiro Hada, Sandor Frey, Ru-sen Lu, Liang Chen, Wancheng Xu, Elika P. Fariyanto, Luis C. Ho

Abstract: We study the acceleration and collimation of the two-sided jets in the nearby low-luminosity active galactic nucleus NGC 4261 (3C 270) using archival multifrequency, multi-epoch Very Long Baseline Array data. By applying multiple analysis methods and incorporating results from the literature, we robustly identify a parabolic-to-conical structural transition in both the jet and counterjet, with the… ▽ More We study the acceleration and collimation of the two-sided jets in the nearby low-luminosity active galactic nucleus NGC 4261 (3C 270) using archival multifrequency, multi-epoch Very Long Baseline Array data. By applying multiple analysis methods and incorporating results from the literature, we robustly identify a parabolic-to-conical structural transition in both the jet and counterjet, with the transition occurring at $(1.23\pm0.24)\,$pc or $(8.1\pm1.6)\times10^3\,R_{\rm s}$ (Schwarzschild radii) for the jet and $(0.97\pm0.29)\,$pc or $(6.4\pm1.9)\times10^3\,R_{\rm s}$ for the counterjet. Assuming that the brightness asymmetry between the twin jets is primarily due to relativistic Doppler (de)boosting, we derive the jet velocity field at distances of $\sim (10^3-2\times10^4)\,R_{\rm s}$ based on the jet-to-counterjet brightness ratio and spectral index. Although local kinematic variations are present, the jet shows an overall acceleration to relativistic speeds from $\sim 10^3$ to $\sim8\times10^3\,R_{\rm s}$, with a maximum Lorentz factor of $Γ_{\rm max} \approx 2.6$. Beyond this region, the jet gradually decelerates to sub-relativistic speeds. These results support the existence of a (sub)parsec-scale ($\lesssim 1.5\,$pc) acceleration and collimation zone (ACZ) in NGC 4261, where the jet is accelerated via magnetic-to-kinetic energy conversion while being confined by external pressure. A brief comparison with M 87 suggests that the ACZ in NGC 4261 may represent a scaled-down analogue of that in M 87. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: A revision and a similar version will appear in ApJ. Comments are welcome

arXiv:2506.13050 [pdf, ps, other]

NeuVAS: Neural Implicit Surfaces for Variational Shape Modeling

Authors: Pengfei Wang, Qiujie Dong, Fangtian Liang, Hao Pan, Lei Yang, Congyi Zhang, Guying Lin, Caiming Zhang, Yuanfeng Zhou, Changhe Tu, Shiqing Xin, Alla Sheffer, Xin Li, Wenping Wang

Abstract: Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically incl… ▽ More Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically includes 3D curve networks or, more generally, 3D curve sketches, which are unstructured and cannot be connected to form a curve network, and therefore more difficult to deal with. While 3D curve networks or curve sketches provide intuitive shape control, their sparsity and varied topology pose challenges in generating high-quality surfaces to meet such curve constraints. In this paper, we propose NeuVAS, a variational approach to shape modeling using neural implicit surfaces constrained under sparse input shape control, including unstructured 3D curve sketches as well as connected 3D curve networks. Specifically, we introduce a smoothness term based on a functional of surface curvatures to minimize shape variation of the zero-level set surface of a neural SDF. We also develop a new technique to faithfully model G0 sharp feature curves as specified in the input curve sketches. Comprehensive comparisons with the state-of-the-art methods demonstrate the significant advantages of our method. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.13035 [pdf, ps, other]

Probing Dark Matter's Gravitational Effects Locally with TianQin

Authors: Zheng-Cheng Liang, Fa-Peng Huang, Xuefeng Zhang, Yi-Ming Hu

Abstract: In this study, we explore the potential of using TianQin missions to probe the local gravitational effects of dark matter. The TianQin project plans to launch satellites at both low and high orbits. High-precision orbit determination is expected to assist in the Earth's gravity or gravitational waves detection. By comparing the derived masses in low and high orbits, it is possible to constrain the… ▽ More In this study, we explore the potential of using TianQin missions to probe the local gravitational effects of dark matter. The TianQin project plans to launch satellites at both low and high orbits. High-precision orbit determination is expected to assist in the Earth's gravity or gravitational waves detection. By comparing the derived masses in low and high orbits, it is possible to constrain the amount of dark matter between the two spheres, hence placing a local constraint on dark matter's gravity effect. Our results show the capability of TianQin in detecting the density of dark matter around Earth, with an ultimate sensitivity to a value of $10^{-8}\,\,{\rm kg\,\,m^{-3}}$. This detection limit surpasses the estimated bounds for the solar system and the observation results for our Galaxy by approximately 7 and 14 orders of magnitude, respectively. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: 5 pages, 1 figure

arXiv:2506.13021 [pdf, ps, other]

C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation

Authors: Siqi Liang, Yudi Zhang, Yubo Wang

Abstract: Sequential recommender systems aim to model users' evolving preferences by capturing patterns in their historical interactions. Recent advances in this area have leveraged deep neural networks and attention mechanisms to effectively represent sequential behaviors and time-sensitive interests. In this work, we propose C-TLSAN (Content-Enhanced Time-Aware Long- and Short-Term Attention Network), an… ▽ More Sequential recommender systems aim to model users' evolving preferences by capturing patterns in their historical interactions. Recent advances in this area have leveraged deep neural networks and attention mechanisms to effectively represent sequential behaviors and time-sensitive interests. In this work, we propose C-TLSAN (Content-Enhanced Time-Aware Long- and Short-Term Attention Network), an extension of the TLSAN architecture that jointly models long- and short-term user preferences while incorporating semantic content associated with items, such as product descriptions. C-TLSAN enriches the recommendation pipeline by embedding textual content linked to users' historical interactions directly into both long-term and short-term attention layers. This allows the model to learn from both behavioral patterns and rich item content, enhancing user and item representations across temporal dimensions. By fusing sequential signals with textual semantics, our approach improves the expressiveness and personalization capacity of recommendation systems. We conduct extensive experiments on large-scale Amazon datasets, benchmarking C-TLSAN against state-of-the-art baselines, including recent sequential recommenders based on Large Language Models (LLMs), which represent interaction history and predictions in text form. Empirical results demonstrate that C-TLSAN consistently outperforms strong baselines in next-item prediction tasks. Notably, it improves AUC by 1.66%, Recall@10 by 93.99%, and Precision@10 by 94.80% on average over the best-performing baseline (TLSAN) across 10 Amazon product categories. These results highlight the value of integrating content-aware enhancements into temporal modeling frameworks for sequential recommendation. Our code is available at https://github.com/booml247/cTLSAN. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12909 [pdf, ps, other]

SciDA: Scientific Dynamic Assessor of LLMs

Authors: Junting Zhou, Tingjia Miao, Yiyan Liao, Qichao Wang, Zhoufutu Wen, Yanqin Wang, Yunjie Huang, Ge Yan, Leqi Wang, Yucheng Xia, Hongwan Gao, Yuansong Zeng, Renjie Zheng, Chen Dun, Yitao Liang, Tong Yang, Wenhao Huang, Ge Zhang

Abstract: Advancement in Large Language Models (LLMs) reasoning capabilities enables them to solve scientific problems with enhanced efficacy. Thereby, a high-quality benchmark for comprehensive and appropriate assessment holds significance, while existing ones either confront the risk of data contamination or lack involved disciplines. To be specific, due to the data source overlap of LLMs training and sta… ▽ More Advancement in Large Language Models (LLMs) reasoning capabilities enables them to solve scientific problems with enhanced efficacy. Thereby, a high-quality benchmark for comprehensive and appropriate assessment holds significance, while existing ones either confront the risk of data contamination or lack involved disciplines. To be specific, due to the data source overlap of LLMs training and static benchmark, the keys or number pattern of answers inadvertently memorized (i.e. data contamination), leading to systematic overestimation of their reasoning capabilities, especially numerical reasoning. We propose SciDA, a multidisciplinary benchmark that consists exclusively of over 1k Olympic-level numerical computation problems, allowing randomized numerical initializations for each inference round to avoid reliance on fixed numerical patterns. We conduct a series of experiments with both closed-source and open-source top-performing LLMs, and it is observed that the performance of LLMs drop significantly under random numerical initialization. Thus, we provide truthful and unbiased assessments of the numerical reasoning capabilities of LLMs. The data is available at https://huggingface.co/datasets/m-a-p/SciDA △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12861 [pdf, ps, other]

Exceptional Point-enhanced Rydberg Atomic Electrometers

Authors: Chao Liang, Ce Yang, Wei Huang

Abstract: Rydberg atoms, with their large transition dipole moments and extreme sensitivity to electric fields, have attracted widespread attention as promising candidates for next-generation quantum precision electrometry. Meanwhile, exceptional points (EPs) in non-Hermitian systems have opened new avenues for ultrasensitive metrology. Despite increasing interest in non-Hermitian physics, EP-enhanced sensi… ▽ More Rydberg atoms, with their large transition dipole moments and extreme sensitivity to electric fields, have attracted widespread attention as promising candidates for next-generation quantum precision electrometry. Meanwhile, exceptional points (EPs) in non-Hermitian systems have opened new avenues for ultrasensitive metrology. Despite increasing interest in non-Hermitian physics, EP-enhanced sensitivity has rarely been explored in Rydberg atomic platforms. Here, we provide a new theoretical understanding of Autler-Townes (AT)-based Rydberg electrometry under non-Hermitian conditions, showing that dissipation fundamentally modifies the spectral response and enables sensitivity enhancement via EP-induced nonlinearity. Experimentally, we realize a second-order EP in a passive thermal Rydberg system without requiring gain media or cryogenics, and demonstrate the first EP-enhanced atomic electrometer. The EP can be tuned in real time by adjusting laser and microwave parameters, forming a flexible and scalable platform. Near the EP, the system exhibits a square-root response, yielding a nearly 20-fold enhancement in responsivity. Using amplitude-based detection, we achieve a sensitivity of $22.68~\mathrm{nV cm^{-1} Hz^{-1/2}}$ under realistic conditions. Our work establishes a practical, tunable platform for EP-enhanced sensing and real-time control, with broad implications for quantum metrology in open systems. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: Any comments are welcome

arXiv:2506.12857 [pdf, ps, other]

Experimental Observation of Purity-Like Invariants of Multi-photon States in Linear Optics

Authors: Baichuan Yang, Hao Zhan, Minghao Mi, Aonan Zhang, Liang Xu, Lijian Zhang

Abstract: Linear optical networks (LONs) with multi-photon inputs offer a powerful platform for advanced quantum technologies. However, the number of degrees of freedom of a LON is far fewer than the dimensionality of the multi-photon multi-mode Fock space, therefore it cannot implement arbitrary unitary evolutions on multi-photon states. Understanding these intrinsic constraints is essential for the prepar… ▽ More Linear optical networks (LONs) with multi-photon inputs offer a powerful platform for advanced quantum technologies. However, the number of degrees of freedom of a LON is far fewer than the dimensionality of the multi-photon multi-mode Fock space, therefore it cannot implement arbitrary unitary evolutions on multi-photon states. Understanding these intrinsic constraints is essential for the preparation, manipulation, and measurement of multi-photon states with LONs. Although several properties of the multi-photon state have been shown to be invariant under LON unitary evolution, their physical interpretation remains elusive. Here, we introduce a Hermitian transfer matrix approach to explore the multi-photon evolution, revealing that the overall state purity decomposes into three distinct invariants -- each arising from either single-photon dynamics or the multi-photon interference. We experimentally observe these purity-like invariants by preparing distinct initial states, applying LON unitaries, and measuring the resulting invariants. Our results not only confirm their conservation but also provide valuable insights into multi-photon state evolution in linear optics. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: 3 figures, supplementary material included

arXiv:2506.12815 [pdf, ps, other]

TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

Authors: Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen

Abstract: Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Mo… ▽ More Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3\% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: 23 pages, 6 figures

arXiv:2506.12786 [pdf, ps, other]

Semantic-Aware Visual Information Transmission With Key Information Extraction Over Wireless Networks

Authors: Chen Zhu, Kang Liang, Jianrong Bao, Zhouxiang Zhao, Zhaohui Yang, Zhaoyang Zhang, Mohammad Shikh-Bahaei

Abstract: The advent of 6G networks demands unprecedented levels of intelligence, adaptability, and efficiency to address challenges such as ultra-high-speed data transmission, ultra-low latency, and massive connectivity in dynamic environments. Traditional wireless image transmission frameworks, reliant on static configurations and isolated source-channel coding, struggle to balance computational efficienc… ▽ More The advent of 6G networks demands unprecedented levels of intelligence, adaptability, and efficiency to address challenges such as ultra-high-speed data transmission, ultra-low latency, and massive connectivity in dynamic environments. Traditional wireless image transmission frameworks, reliant on static configurations and isolated source-channel coding, struggle to balance computational efficiency, robustness, and quality under fluctuating channel conditions. To bridge this gap, this paper proposes an AI-native deep joint source-channel coding (JSCC) framework tailored for resource-constrained 6G networks. Our approach integrates key information extraction and adaptive background synthesis to enable intelligent, semantic-aware transmission. Leveraging AI-driven tools, Mediapipe for human pose detection and Rembg for background removal, the model dynamically isolates foreground features and matches backgrounds from a pre-trained library, reducing data payloads while preserving visual fidelity. Experimental results demonstrate significant improvements in peak signal-to-noise ratio (PSNR) compared with traditional JSCC method, especially under low-SNR conditions. This approach offers a practical solution for multimedia services in resource-constrained mobile communications. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12776 [pdf, ps, other]

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Authors: Junbo Niu, Yuanhong Zheng, Ziyang Miao, Hejun Dong, Chunjiang Ge, Hao Liang, Ma Lu, Bohan Zeng, Qiahao Zheng, Conghui He, Wentao Zhang

Abstract: Vision-Language Models (VLMs) face significant challenges when dealing with the diverse resolutions and aspect ratios of real-world images, as most existing models rely on fixed, low-resolution inputs. While recent studies have explored integrating native resolution visual encoding to improve model performance, such efforts remain fragmented and lack a systematic framework within the open-source c… ▽ More Vision-Language Models (VLMs) face significant challenges when dealing with the diverse resolutions and aspect ratios of real-world images, as most existing models rely on fixed, low-resolution inputs. While recent studies have explored integrating native resolution visual encoding to improve model performance, such efforts remain fragmented and lack a systematic framework within the open-source community. Moreover, existing benchmarks fall short in evaluating VLMs under varied visual conditions, often neglecting resolution as a critical factor. To address the "Resolution Dilemma" stemming from both model design and benchmark limitations, we introduce RC-Bench, a novel benchmark specifically designed to systematically evaluate VLM capabilities under extreme visual conditions, with an emphasis on resolution and aspect ratio variations. In conjunction, we propose NativeRes-LLaVA, an open-source training framework that empowers VLMs to effectively process images at their native resolutions and aspect ratios. Based on RC-Bench and NativeRes-LLaVA, we conduct comprehensive experiments on existing visual encoding strategies. The results show that Native Resolution Visual Encoding significantly improves the performance of VLMs on RC-Bench as well as other resolution-centric benchmarks. Code is available at https://github.com/Niujunbo2002/NativeRes-LLaVA. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12760 [pdf, ps, other]

IDOL: Improved Different Optimization Levels Testing for Solidity Compilers

Authors: Lantian Li, Yejian Liang, Zhongxing Yu

Abstract: As blockchain technology continues to evolve and mature, smart contracts have become a key driving force behind the digitization and automation of transactions. Smart contracts greatly simplify and refine the traditional business transaction processes, and thus have had a profound impact on various industries such as finance and supply chain management. However, because smart contracts cannot be m… ▽ More As blockchain technology continues to evolve and mature, smart contracts have become a key driving force behind the digitization and automation of transactions. Smart contracts greatly simplify and refine the traditional business transaction processes, and thus have had a profound impact on various industries such as finance and supply chain management. However, because smart contracts cannot be modified once deployed, any vulnerabilities or design flaws within the contract cannot be easily fixed, potentially leading to significant financial losses or even legal issues. The compiler, as a critical component in the development process, directly affects the quality and security of smart contracts. This paper innovatively proposes a method, known as the Improved Different Optimization Levels (IDOL), for testing the Solidity compiler. The key idea behind IDOL is to perform reverse optimization transformations (i.e., change optimized form into unoptimized form) to generate semantically equivalent variants of the smart contracts under test, aiming to maximize the opportunities to trigger the optimization logic of compilers. We conducted a preliminary evaluation of IDOL and three confirmed compiler optimization bugs have been uncovered at the time of writing. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: Accepted by QRS 2025 (Fast Abstracts track)

arXiv:2506.12710 [pdf, ps, other]

Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems

Authors: Yuqi Ping, Tianhao Liang, Huahao Ding, Guangyu Lei, Junwei Wu, Xuan Zou, Kuan Shi, Rui Shao, Chiya Zhang, Weizheng Zhang, Weijie Yuan, Tingting Zhang

Abstract: Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning and natural-language interaction across text, image and video streams. Meanwhile, Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational understanding and autonomous adaptation. This paper explores pot… ▽ More Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning and natural-language interaction across text, image and video streams. Meanwhile, Unmanned Aerial Vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational understanding and autonomous adaptation. This paper explores potential solutions for integrating MLLMs with UAV swarms to enhance the intelligence and adaptability across diverse tasks. Specifically, we first outline the fundamental architectures and functions of UAVs and MLLMs. Then, we analyze how MLLMs can enhance the UAV system performance in terms of target detection, autonomous navigation, and multi-agent coordination, while exploring solutions for integrating MLLMs into UAV systems. Next, we propose a practical case study focused on the forest fire fighting. To fully reveal the capabilities of the proposed framework, human-machine interaction, swarm task planning, fire assessment, and task execution are investigated. Finally, we discuss the challenges and future research directions for the MLLMs-enabled UAV swarm. An experiment illustration video could be found online at https://youtu.be/zwnB9ZSa5A4. △ Less

Submitted 14 June, 2025; originally announced June 2025.

Comments: 8 pages, 5 figures,submitted to IEEE wcm

arXiv:2506.12708 [pdf, ps, other]

Serving Large Language Models on Huawei CloudMatrix384

Authors: Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, Shirui Lu, Zhao Qiu, Peiyang Li, Xianyu Chang, Zhengzhong Yu, Fangzheng Miao, Jia Zheng, Ying Li, Yuan Feng, Bei Wang, Zaijian Zong, Mosong Zhou, Wenli Zhou, Houjiang Chen, Xingyu Liao, Yipeng Li , et al. (21 additional authors not shown)

Abstract: The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-leve… ▽ More The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-level objectives. Addressing these issues requires fundamentally redesigned hardware-software integration. This paper introduces Huawei CloudMatrix, a next-generation AI datacenter architecture, realized in the production-grade CloudMatrix384 supernode. It integrates 384 Ascend 910 NPUs and 192 Kunpeng CPUs interconnected via an ultra-high-bandwidth Unified Bus (UB) network, enabling direct all-to-all communication and dynamic pooling of resources. These features optimize performance for communication-intensive operations, such as large-scale MoE expert parallelism and distributed key-value cache access. To fully leverage CloudMatrix384, we propose CloudMatrix-Infer, an advanced LLM serving solution incorporating three core innovations: a peer-to-peer serving architecture that independently scales prefill, decode, and caching; a large-scale expert parallelism strategy supporting EP320 via efficient UB-based token dispatch; and hardware-aware optimizations including specialized operators, microbatch-based pipelining, and INT8 quantization. Evaluation with the DeepSeek-R1 model shows CloudMatrix-Infer achieves state-of-the-art efficiency: prefill throughput of 6,688 tokens/s per NPU and decode throughput of 1,943 tokens/s per NPU (<50 ms TPOT). It effectively balances throughput and latency, sustaining 538 tokens/s per NPU even under stringent 15 ms latency constraints, while INT8 quantization maintains model accuracy across benchmarks. △ Less

Submitted 19 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

Comments: 59 pages, 24 figures

arXiv:2506.12700 [pdf, ps, other]

Large Scalable Cross-Domain Graph Neural Networks for Personalized Notification at LinkedIn

Authors: Shihai He, Julie Choi, Tianqi Li, Zhiwei Ding, Peng Du, Priya Bannur, Franco Liang, Fedor Borisyuk, Padmini Jaikumar, Xiaobing Xue, Viral Gupta

Abstract: Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments.… ▽ More Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments. In this paper, we present a cross-domain GNN-based system deployed at LinkedIn that unifies user, content, and activity signals into a single, large-scale graph. By training on this cross-domain structure, our model significantly outperforms single-domain baselines on key tasks, including click-through rate (CTR) prediction and professional engagement. We introduce architectural innovations including temporal modeling and multi-task learning, which further enhance performance. Deployed in LinkedIn's notification system, our approach led to a 0.10% lift in weekly active users and a 0.62% improvement in CTR. We detail our graph construction process, model design, training pipeline, and both offline and online evaluations. Our work demonstrates the scalability and effectiveness of cross-domain GNNs in real-world, high-impact applications. △ Less

Submitted 14 June, 2025; originally announced June 2025.

MSC Class: 68R10

arXiv:2506.12479 [pdf, ps, other]

AI Flow: Perspectives, Scenarios, and Approaches

Authors: Hongjun An, Sida Huang, Siqi Huang, Ruanjun Li, Yuanzhi Liang, Jiawei Shao, Zihan Wang, Cheng Yuan, Chi Zhang, Hongyuan Zhang, Wenhao Zhuang, Xuelong Li

Abstract: Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models th… ▽ More Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models that are reshaping industries and redefining human-machine collaboration. However, the realization of ubiquitous intelligence faces considerable challenges due to substantial resource consumption in large models and high communication bandwidth demands. To address these challenges, AI Flow has been introduced as a multidisciplinary framework that integrates cutting-edge IT and CT advancements, with a particular emphasis on the following three key points. First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters to optimize scalability and efficiency for low-latency model inference. Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features, enabling effective collaboration and the flexibility to adapt to varying resource constraints and dynamic scenarios. Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow. By leveraging communication networks to enhance connectivity, the collaboration among AI models across heterogeneous nodes achieves emergent intelligence that surpasses the capability of any single model. The innovations of AI Flow provide enhanced intelligence, timely responsiveness, and ubiquitous accessibility to AI services, paving the way for the tighter fusion of AI techniques and communication systems. △ Less

Submitted 14 June, 2025; originally announced June 2025.

Comments: Authors are with Institute of Artificial Intelligence (TeleAI), China Telecom, China. Author names are listed alphabetically by surname. This work was conducted at TeleAI, facilitated by Dr. Jiawei Shao (e-mail: [email protected]) under the leadership of Prof. Xuelong Li. The corresponding author is Prof. Xuelong Li (e-mail: xuelong [email protected]), the CTO and Chief Scientist of China Telecom

arXiv:2506.12441 [pdf, ps, other]

MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation

Authors: Caixu Xu, Junming Wei, Huizhen Chen, Pengchen Liang, Bocheng Liang, Ying Tan, Xintong Wei

Abstract: Recently, Mamba-based methods have become popular in medical image segmentation due to their lightweight design and long-range dependency modeling capabilities. However, current segmentation methods frequently encounter challenges in fetal ultrasound images, such as enclosed anatomical structures, blurred boundaries, and small anatomical structures. To address the need for balancing local feature… ▽ More Recently, Mamba-based methods have become popular in medical image segmentation due to their lightweight design and long-range dependency modeling capabilities. However, current segmentation methods frequently encounter challenges in fetal ultrasound images, such as enclosed anatomical structures, blurred boundaries, and small anatomical structures. To address the need for balancing local feature extraction and global context modeling, we propose MS-UMamba, a novel hybrid convolutional-mamba model for fetal ultrasound image segmentation. Specifically, we design a visual state space block integrated with a CNN branch (SS-MCAT-SSM), which leverages Mamba's global modeling strengths and convolutional layers' local representation advantages to enhance feature learning. In addition, we also propose an efficient multi-scale feature fusion module that integrates spatial attention mechanisms, which Integrating feature information from different layers enhances the feature representation ability of the model. Finally, we conduct extensive experiments on a non-public dataset, experimental results demonstrate that MS-UMamba model has excellent performance in segmentation performance. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.12430 [pdf, ps, other]

Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

Authors: Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song , et al. (22 additional authors not shown)

Abstract: Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents finding… ▽ More Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents findings from the competition, which involved 86 teams testing MLLM vulnerabilities via adversarial image-text attacks in two phases: white-box and black-box evaluations. The competition results highlight ongoing challenges in securing MLLMs and provide valuable guidance for developing stronger defense mechanisms. The challenge establishes new benchmarks for MLLM safety evaluation and lays groundwork for advancing safer multimodal AI systems. The code and data for this challenge are openly available at https://github.com/NY1024/ATLAS_Challenge_2025. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.12286 [pdf, ps, other]

The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason

Authors: Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam

Abstract: As large language models (LLMs) become increasingly capable and widely adopted, benchmarks play a central role in assessing their practical utility. For example, SWE-Bench Verified has emerged as a critical benchmark for evaluating LLMs' software engineering abilities, particularly their aptitude for resolving real-world GitHub issues. Recent LLMs show impressive performance on SWE-Bench, leading… ▽ More As large language models (LLMs) become increasingly capable and widely adopted, benchmarks play a central role in assessing their practical utility. For example, SWE-Bench Verified has emerged as a critical benchmark for evaluating LLMs' software engineering abilities, particularly their aptitude for resolving real-world GitHub issues. Recent LLMs show impressive performance on SWE-Bench, leading to optimism about their capacity for complex coding tasks. However, current evaluation protocols may overstate these models' true capabilities. It is crucial to distinguish LLMs' generalizable problem-solving ability and other learned artifacts. In this work, we introduce a diagnostic task: file path identification from issue descriptions alone, to probe models' underlying knowledge. We present empirical evidence that performance gains on SWE-Bench-Verified may be partially driven by memorization rather than genuine problem-solving. We show that state-of-the-art models achieve up to 76% accuracy in identifying buggy file paths using only issue descriptions, without access to repository structure. This performance is merely up to 53% on tasks from repositories not included in SWE-Bench, pointing to possible data contamination or memorization. These findings raise concerns about the validity of existing results and underscore the need for more robust, contamination-resistant benchmarks to reliably evaluate LLMs' coding abilities. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.12264 [pdf]

A Novel Thermal Network Model and Electro-Thermal Coupling Study for NSFETs and CFETs Considering Thermal Crosstalk

Authors: Tianci Miao, Qihang Zheng, Yangyang Hu, Xiaoyu Cheng, Jie Liang, Liang Chen, Aiying Guo, Jingjing Liu, Kailin Ren, Jianhua Zhang

Abstract: As the technology node continues to shrink, nanosheet field effect transistors (NSFETs) and complementary FETs (CFETs) become valid candidates for the 3nm and sub-nanometre nodes. However, due to the shrinking device size, self-heating and inter-device thermal crosstalk of NSFETs and CFETs become more severe. It is important to accurately calculate the self-heating and thermal crosstalk of devices… ▽ More As the technology node continues to shrink, nanosheet field effect transistors (NSFETs) and complementary FETs (CFETs) become valid candidates for the 3nm and sub-nanometre nodes. However, due to the shrinking device size, self-heating and inter-device thermal crosstalk of NSFETs and CFETs become more severe. It is important to accurately calculate the self-heating and thermal crosstalk of devices and to study the electrical and thermal characteristics of logic gates, etc. In this work, a thermal network model considering the thermal crosstalk of neighboring devices is proposed, which can accurately calculate the self-heating and thermal crosstalk. The electrical and thermal characteristics of NSFETs and CFETs are compared, and it is found that CFETs have more severe self-heating and thermal crosstalk. The electro-thermal characteristics of inverters, logic gates and ring oscillators composed of NSFETs and CFETs are further investigated. Compared with NSFETs, logic gates and ring oscillators composed of CFETs are more seriously affected by self-heating and should be given extra attention. The thermal network model proposed in this paper can be further used to study the thermal optimization strategy of devices and circuits to enhance the electrical performance, achieving the design technology co-optimizations (DTCO). △ Less

Submitted 9 March, 2025; originally announced June 2025.

arXiv:2506.12103 [pdf, other]

The Amazon Nova Family of Models: Technical Report and Model Card

Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation. △ Less

Submitted 17 March, 2025; originally announced June 2025.

Comments: 48 pages, 10 figures

Report number: 20250317

arXiv:2506.11991 [pdf, ps, other]

VGR: Visual Grounded Reasoning

Authors: Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao

Abstract: In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science domains. This narrow focus limits their ability to handle complex visual reasoning tasks that demand comprehensive understanding of image details. To address these limitations,… ▽ More In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science domains. This narrow focus limits their ability to handle complex visual reasoning tasks that demand comprehensive understanding of image details. To address these limitations, this paper introduces VGR, a novel reasoning multimodal large language model (MLLM) with enhanced fine-grained visual perception capabilities. Unlike traditional MLLMs that answer the question or reasoning solely on the language space, our VGR first detects relevant regions that may help to solve problems, and then provides precise answers based on replayed image regions. To achieve this, we conduct a large-scale SFT dataset called VGR -SFT that contains reasoning data with mixed vision grounding and language deduction. The inference pipeline of VGR allows the model to choose bounding boxes for visual reference and a replay stage is introduced to integrates the corresponding regions into the reasoning process, enhancing multimodel comprehension. Experiments on the LLaVA-NeXT-7B baseline show that VGR achieves superior performance on multi-modal benchmarks requiring comprehensive image detail understanding. Compared to the baseline, VGR uses only 30\% of the image token count while delivering scores of +4.1 on MMStar, +7.1 on AI2D, and a +12.9 improvement on ChartQA. △ Less

Submitted 16 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

Comments: 9 pages, 4 figures

arXiv:2506.11881 [pdf, ps, other]

Continuously trapped matter-wave interferometry in magic Floquet-Bloch band structures

Authors: Xiao Chai, Jeremy L. Tanlimco, Eber Nolasco-Martinez, Xuanwei Liang, E. Quinn Simmons, Eric Zhu, Roshan Sajjad, Hector Mas, S. Nicole Halawani, David M. Weld

Abstract: Trapped matter-wave interferometry offers the promise of compact high-precision local force sensing. However, the trap itself can introduce new systematic errors which are absent in traditional free-fall interferometers. We describe and demonstrate a novel Floquet-engineered platform for compact, continuously trapped atom interferometry which is intrinsically robust against trap noise and beamspli… ▽ More Trapped matter-wave interferometry offers the promise of compact high-precision local force sensing. However, the trap itself can introduce new systematic errors which are absent in traditional free-fall interferometers. We describe and demonstrate a novel Floquet-engineered platform for compact, continuously trapped atom interferometry which is intrinsically robust against trap noise and beamsplitter pulse duration. A non-interacting degenerate quantum gas undergoes position-space Bloch oscillations through an amplitude-modulated optical lattice, whose resulting Floquet-Bloch band structure includes Landau-Zener beamsplitters and Bragg mirrors, forming the components of a Mach-Zehnder interferometric force sensor. We identify, realize, and experimentally characterize magic band structures, analogous to the magic wavelengths employed in optical lattice clocks, for which the interferometric phase is insensitive to lattice intensity noise. We leverage the intrinsic programmability of the Floquet synthesis approach to demonstrate a variety of interferometer structures, highlighting the potential of this technique for quantum force sensors which are tunable, compact, simple, and robust. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 20 pages, 11 figures

arXiv:2506.11870 [pdf, ps, other]

LLM-based Dynamic Differential Testing for Database Connectors with Reinforcement Learning-Guided Prompt Selection

Authors: Ce Lyu, Minghao Zhao, Yanhao Wang, Liang Jie

Abstract: Database connectors are critical components enabling applications to interact with underlying database management systems (DBMS), yet their security vulnerabilities often remain overlooked. Unlike traditional software defects, connector vulnerabilities exhibit subtle behavioral patterns and are inherently challenging to detect. Besides, nonstandardized implementation of connectors leaves potential… ▽ More Database connectors are critical components enabling applications to interact with underlying database management systems (DBMS), yet their security vulnerabilities often remain overlooked. Unlike traditional software defects, connector vulnerabilities exhibit subtle behavioral patterns and are inherently challenging to detect. Besides, nonstandardized implementation of connectors leaves potential risks (a.k.a. unsafe implementations) but is more elusive. As a result, traditional fuzzing methods are incapable of finding such vulnerabilities. Even for LLM-enable test case generation, due to a lack of domain knowledge, they are also incapable of generating test cases that invoke all interface and internal logic of connectors. In this paper, we propose reinforcement learning (RL)-guided LLM test-case generation for database connector testing. Specifically, to equip the LLM with sufficient and appropriate domain knowledge, a parameterized prompt template is composed which can be utilized to generate numerous prompts. Test cases are generated via LLM with a prompt, and are dynamically evaluated through differential testing across multiple connectors. The testing is iteratively conducted, with each round RL is adopted to select optimal prompt based on prior-round behavioral feedback, so as to maximize control flow coverage. We implement aforementioned methodology in a practical tool and evaluate it on two widely used JDBC connectors: MySQL Connector/J and OceanBase Connector/J. In total, we reported 16 bugs, among them 10 are officially confirmed and the rest are acknowledged as unsafe implementations. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 5 pages

MSC Class: 68N99 ACM Class: H.2.4; D.2.5

arXiv:2506.11784 [pdf, ps, other]

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Authors: Guang Liang, Xinyao Liu, Jianxin Wu

Abstract: Vision Transformers (ViTs) are essential in computer vision but are computationally intensive, too. Model quantization, particularly to low bit-widths like 4-bit, aims to alleviate this difficulty, yet existing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) methods exhibit significant limitations. PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy… ▽ More Vision Transformers (ViTs) are essential in computer vision but are computationally intensive, too. Model quantization, particularly to low bit-widths like 4-bit, aims to alleviate this difficulty, yet existing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) methods exhibit significant limitations. PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy but suffers from prohibitive computational costs, limited generalization to downstream tasks, training instability, and lacking of open-source codebase. To address these challenges, this paper introduces General, Practical, and Lightning Quantization (GPLQ), a novel framework designed for efficient and effective ViT quantization. GPLQ is founded on two key empirical insights: the paramount importance of activation quantization and the necessity of preserving the model's original optimization ``basin'' to maintain generalization. Consequently, GPLQ employs a sequential ``activation-first, weights-later'' strategy. Stage 1 keeps weights in FP32 while quantizing activations with a feature mimicking loss in only 1 epoch to keep it stay in the same ``basin'', thereby preserving generalization. Stage 2 quantizes weights using a PTQ method. As a result, GPLQ is 100x faster than existing QAT methods, lowers memory footprint to levels even below FP32 training, and achieves 4-bit model performance that is highly competitive with FP32 models in terms of both accuracy on ImageNet and generalization to diverse downstream tasks, including fine-grained visual classification and object detection. We will release an easy-to-use open-source toolkit supporting multiple vision tasks. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11783 [pdf, ps, other]

Holistic approach and Advanced Color Singlet Identification for physics measurements at high energy frontier

Authors: Yongfeng Zhu, Hao Liang, Yuexin Wang, Yuzhi Che, Hengyu Wang, Chen Zhou, Huilin Qu, Manqi Ruan

Abstract: To enhance the discovery power of high-energy colliders, we propose a holistic approach and Advanced Color Singlet Identification (ACSI), both of which utilize inclusive reconstructed information as input. The holistic approach is designed to simultaneously classify physics events, while ACSI focuses on associating final-state particles with their parent massive bosons. Implemented using state-of-… ▽ More To enhance the discovery power of high-energy colliders, we propose a holistic approach and Advanced Color Singlet Identification (ACSI), both of which utilize inclusive reconstructed information as input. The holistic approach is designed to simultaneously classify physics events, while ACSI focuses on associating final-state particles with their parent massive bosons. Implemented using state-of-the-art artificial intelligence architectures and applied to benchmark analyses with simulated data from a future Higgs factory, these new concepts significantly improve the accuracy of H->bb/cc/ss/gg measurements by up to a factor of two to six. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11612 [pdf, ps, other]

KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis

Authors: Zhijie Liu, Qiyi Tang, Sen Nie, Shi Wu, Liang Feng Zhang, Yutian Tang

Abstract: Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search)… ▽ More Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search). Towards effective and efficient program-level BCSA, we propose KEENHash, a novel hashing approach that hashes binaries into program-level representations through large language model (LLM)-generated function embeddings. KEENHash condenses a binary into one compact and fixed-length program embedding using K-Means and Feature Hashing, allowing us to do effective and efficient large-scale program-level BCSA, surpassing the previous state-of-the-art methods. The experimental results show that KEENHash is at least 215 times faster than the state-of-the-art function matching tools while maintaining effectiveness. Furthermore, in a large-scale scenario with 5.3 billion similarity evaluations, KEENHash takes only 395.83 seconds while these tools will cost at least 56 days. We also evaluate KEENHash on the program clone search of large-scale BCSA across extensive datasets in 202,305 binaries. Compared with 4 state-of-the-art methods, KEENHash outperforms all of them by at least 23.16%, and displays remarkable superiority over them in the large-scale BCSA security scenario of malware detection. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11512 [pdf, ps, other]

Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs

Authors: Wei Li, Yunyao Cheng, Xinli Hao, Chaohong Ma, Yuxuan Liang, Bin Yang, Christian S. Jensen, Xiaofeng Meng

Abstract: Recent advances in Large Language Models (LLMs) have enabled unprecedented capabilities for time-series reasoning in diverse real-world applications, including medical, financial, and spatio-temporal domains. However, existing approaches typically focus on task-specific model customization, such as forecasting and anomaly detection, while overlooking the data itself, referred to as time-series pri… ▽ More Recent advances in Large Language Models (LLMs) have enabled unprecedented capabilities for time-series reasoning in diverse real-world applications, including medical, financial, and spatio-temporal domains. However, existing approaches typically focus on task-specific model customization, such as forecasting and anomaly detection, while overlooking the data itself, referred to as time-series primitives, which are essential for in-depth reasoning. This position paper advocates a fundamental shift in approaching time-series reasoning with LLMs: prioritizing alignment paradigms grounded in the intrinsic primitives of time series data over task-specific model customization. This realignment addresses the core limitations of current time-series reasoning approaches, which are often costly, inflexible, and inefficient, by systematically accounting for intrinsic structure of data before task engineering. To this end, we propose three alignment paradigms: Injective Alignment, Bridging Alignment, and Internal Alignment, which are emphasized by prioritizing different aspects of time-series primitives: domain, characteristic, and representation, respectively, to activate time-series reasoning capabilities of LLMs to enable economical, flexible, and efficient reasoning. We further recommend that practitioners adopt an alignment-oriented method to avail this instruction to select an appropriate alignment paradigm. Additionally, we categorize relevant literature into these alignment paradigms and outline promising research directions. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11498 [pdf, ps, other]

Lag-Relative Sparse Attention In Long Context Training

Authors: Manlai Liang, Wanyi Huang, Mandi Liu, Huaijun Li, Jinlong Li

Abstract: Large Language Models (LLMs) have made significant strides in natural language processing and generation, yet their ability to handle long-context input remains constrained by the quadratic complexity of attention computation and linear-increasing key-value memory footprint. To reduce computational costs and memory, key-value cache compression techniques are commonly applied at inference time, but… ▽ More Large Language Models (LLMs) have made significant strides in natural language processing and generation, yet their ability to handle long-context input remains constrained by the quadratic complexity of attention computation and linear-increasing key-value memory footprint. To reduce computational costs and memory, key-value cache compression techniques are commonly applied at inference time, but this often leads to severe performance degradation, as models are not trained to handle compressed context. Although there are more sophisticated compression methods, they are typically unsuitable for post-training because of their incompatibility with gradient-based optimization or high computation overhead. To fill this gap with no additional parameter and little computation overhead, we propose Lag-Relative Sparse Attention(LRSA) anchored by the LagKV compression method for long context post-training. Our method performs chunk-by-chunk prefilling, which selects the top K most relevant key-value pairs in a fixed-size lagging window, allowing the model to focus on salient historical context while maintaining efficiency. Experimental results show that our approach significantly enhances the robustness of the LLM with key-value compression and achieves better fine-tuned results in the question-answer tuning task. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11400 [pdf, ps, other]

A Step-by-Step Guide to Creating a Robust Autonomous Drone Testing Pipeline

Authors: Yupeng Jiang, Yao Deng, Sebastian Schroder, Linfeng Liang, Suhaas Gambhir, Alice James, Avishkar Seth, James Pirrie, Yihao Zhang, Xi Zheng

Abstract: Autonomous drones are rapidly reshaping industries ranging from aerial delivery and infrastructure inspection to environmental monitoring and disaster response. Ensuring the safety, reliability, and efficiency of these systems is paramount as they transition from research prototypes to mission-critical platforms. This paper presents a step-by-step guide to establishing a robust autonomous drone te… ▽ More Autonomous drones are rapidly reshaping industries ranging from aerial delivery and infrastructure inspection to environmental monitoring and disaster response. Ensuring the safety, reliability, and efficiency of these systems is paramount as they transition from research prototypes to mission-critical platforms. This paper presents a step-by-step guide to establishing a robust autonomous drone testing pipeline, covering each critical stage: Software-in-the-Loop (SIL) Simulation Testing, Hardware-in-the-Loop (HIL) Testing, Controlled Real-World Testing, and In-Field Testing. Using practical examples, including the marker-based autonomous landing system, we demonstrate how to systematically verify drone system behaviors, identify integration issues, and optimize performance. Furthermore, we highlight emerging trends shaping the future of drone testing, including the integration of Neurosymbolic and LLMs, creating co-simulation environments, and Digital Twin-enabled simulation-based testing techniques. By following this pipeline, developers and researchers can achieve comprehensive validation, minimize deployment risks, and prepare autonomous drones for safe and reliable real-world operations. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.11397 [pdf, ps, other]

On existence of a variational regularization parameter under Morozov's discrepancy principle

Authors: Liang Ding, Long Li, Weimin Han, Wei Wang

Abstract: Morozov's discrepancy principle is commonly adopted in Tikhonov regularization for choosing the regularization parameter. Nevertheless, for a general non-linear inverse problem, the discrepancy $\|F(x_α^δ)-y^δ\|_Y$ does not depend continuously on $α$ and it is questionable whether there exists a regularization parameter $α$ such that $τ_1δ\leq \|F(x_α^δ)-y^δ\|_Y\leq τ_2 δ$ $(1\le τ_1<τ_2)$. In thi… ▽ More Morozov's discrepancy principle is commonly adopted in Tikhonov regularization for choosing the regularization parameter. Nevertheless, for a general non-linear inverse problem, the discrepancy $\|F(x_α^δ)-y^δ\|_Y$ does not depend continuously on $α$ and it is questionable whether there exists a regularization parameter $α$ such that $τ_1δ\leq \|F(x_α^δ)-y^δ\|_Y\leq τ_2 δ$ $(1\le τ_1<τ_2)$. In this paper, we prove the existence of $α$ under Morozov's discrepancy principle if $τ_2\ge (3+2γ)τ_1$, where $γ>0$ is a parameter in a tangential cone condition for the nonlinear operator $F$. Furthermore, we present results on the convergence of the regularized solutions under Morozov's discrepancy principle. Numerical results are reported on the efficiency of the proposed approach. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 24 pages, 10 figures

MSC Class: 47J06 ACM Class: G.1.6

arXiv:2506.11379 [pdf, ps, other]

SVD method for sparse recovery

Authors: Long Li, Liang Ding

Abstract: Sparsity regularization has garnered significant interest across multiple disciplines, including statistics, imaging, and signal processing. Standard techniques for addressing sparsity regularization include iterative soft thresholding algorithms and their accelerated variants. However, these algorithms rely on Landweber iteration, which can be computationally intensive. Therefore, there is a pres… ▽ More Sparsity regularization has garnered significant interest across multiple disciplines, including statistics, imaging, and signal processing. Standard techniques for addressing sparsity regularization include iterative soft thresholding algorithms and their accelerated variants. However, these algorithms rely on Landweber iteration, which can be computationally intensive. Therefore, there is a pressing need to develop a more efficient algorithm for sparsity regularization. The Singular Value Decomposition (SVD) method serves as a regularization strategy that does not require Landweber iterations; however, it is confined to classical quadratic regularization. This paper introduces two inversion schemes tailored for situations where the operator $K$ is diagonal within a specific orthogonal basis, focusing on $\ell_{p}$ regularization when $p=1$ and $p=1/2$. Furthermore, we demonstrate that for a general linear compact operator $K$, the SVD method serves as an effective regularization strategy. To assess the efficacy of the proposed methodologies, We conduct several numerical experiments to evaluate the proposed method's effectiveness. The results indicate that our algorithms not only operate faster but also achieve a higher success rate than traditional iterative methods. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 33 pages, 5 figures

MSC Class: 47A52 ACM Class: G.1.6

arXiv:2506.11372 [pdf, ps, other]

$\ell_{1}^{2}-η\ell_{2}^{2}$ regularization for sparse recovery

Authors: Long Li, Liang Ding

Abstract: This paper presents a regularization technique incorporating a non-convex and non-smooth term, $\ell_{1}^{2}-η\ell_{2}^{2}$, with parameters $0<η\leq 1$ designed to address ill-posed linear problems that yield sparse solutions. We explore the existence, stability, and convergence of the regularized solution, demonstrating that the $\ell_{1}^{2}-η\ell_{2}^{2}$ regularization is well-posed and resul… ▽ More This paper presents a regularization technique incorporating a non-convex and non-smooth term, $\ell_{1}^{2}-η\ell_{2}^{2}$, with parameters $0<η\leq 1$ designed to address ill-posed linear problems that yield sparse solutions. We explore the existence, stability, and convergence of the regularized solution, demonstrating that the $\ell_{1}^{2}-η\ell_{2}^{2}$ regularization is well-posed and results in sparse solutions. Under suitable source conditions, we establish a convergence rate of $\mathcal{O}\left(δ\right)$ in the $\ell_{2}$-norm for both a priori and a posteriori parameter choice rules. Additionally, we propose and analyze a numerical algorithm based on a half-variation iterative strategy combined with the proximal gradient method. We prove convergence despite the regularization term being non-smooth and non-convex. The algorithm features a straightforward structure, facilitating implementation. Furthermore, we propose a projected gradient iterative strategy base on surrogate function approach to achieve faster solving. Experimentally, we demonstrate visible improvements of $\ell_{1}^{2}-η\ell_{2}^{2}$ over $\ell_{1}$, $\ell_{1}-η\ell_{2}$, and other nonconvex regularizations for compressive sensing and image deblurring problems. All the numerical results show the efficiency of our proposed approach. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 40 pages, 9 figures

MSC Class: 47A52 ACM Class: G.1.6

arXiv:2506.11343 [pdf, ps, other]

From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review

Authors: Yaohui Zhang, Haijing Zhang, Wenlong Ji, Tianyu Hua, Nick Haber, Hancheng Cao, Weixin Liang

Abstract: The advent of large language models (LLMs) offers unprecedented opportunities to reimagine peer review beyond the constraints of traditional workflows. Despite these opportunities, prior efforts have largely focused on replicating traditional review workflows with LLMs serving as direct substitutes for human reviewers, while limited attention has been given to exploring new paradigms that fundamen… ▽ More The advent of large language models (LLMs) offers unprecedented opportunities to reimagine peer review beyond the constraints of traditional workflows. Despite these opportunities, prior efforts have largely focused on replicating traditional review workflows with LLMs serving as direct substitutes for human reviewers, while limited attention has been given to exploring new paradigms that fundamentally rethink how LLMs can participate in the academic review process. In this paper, we introduce and explore a novel mechanism that employs LLM agents to perform pairwise comparisons among manuscripts instead of individual scoring. By aggregating outcomes from substantial pairwise evaluations, this approach enables a more accurate and robust measure of relative manuscript quality. Our experiments demonstrate that this comparative approach significantly outperforms traditional rating-based methods in identifying high-impact papers. However, our analysis also reveals emergent biases in the selection process, notably a reduced novelty in research topics and an increased institutional imbalance. These findings highlight both the transformative potential of rethinking peer review with LLMs and critical challenges that future systems must address to ensure equity and diversity. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.11337 [pdf]

Multiscale transform based seismic reflectivity inversion using convolutional neural network

Authors: John Castagna, Oleg Portniaguine, Gabriel Gil, Arnold Oyem, Chen Liang

Abstract: The Multiscale Fourier Transform of a seismic trace performs time-frequency analyses over a range of window lengths. The variation in window length captures local and global relative amplitudes between events, thereby allowing reflectivity inversion that is independent of the amplitude spectrum of the seismic wavelet. As the temporal and spatial variation of the actual seismic wavelet in seismic r… ▽ More The Multiscale Fourier Transform of a seismic trace performs time-frequency analyses over a range of window lengths. The variation in window length captures local and global relative amplitudes between events, thereby allowing reflectivity inversion that is independent of the amplitude spectrum of the seismic wavelet. As the temporal and spatial variation of the actual seismic wavelet in seismic reflection data is poorly known, this approach has many advantages over conventional seismic reflectivity inversion. No wavelet extraction is performed. Thus, the inversion for reflectivity can be conducted without well control, seismic ties, or time-depth functions. The inversion is sparse, so no starting model is needed. Furthermore, as no wavelet is required, the inversion can be applied directly to depth migrated data. The phase of the wavelet is constrained by the assumption of sparse reflectivity and thus works best when earth impedance structure is blocky. Trace integration of the inverted reflectivity provides bandlimited impedance which compares very favorably to well-log bandlimited impedance for both synthetic and real data cases. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.11144 [pdf, ps, other]

AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation

Authors: Chao Liang, Jianwen Jiang, Wang Liao, Jiaqi Yang, Zerong zheng, Weihong Zeng, Han Liang

Abstract: Recent advancements in human video generation and animation tasks, driven by diffusion models, have achieved significant progress. However, expressive and realistic human animation remains challenging due to the trade-off between motion naturalness and visual fidelity. To address this, we propose \textbf{AlignHuman}, a framework that combines Preference Optimization as a post-training technique wi… ▽ More Recent advancements in human video generation and animation tasks, driven by diffusion models, have achieved significant progress. However, expressive and realistic human animation remains challenging due to the trade-off between motion naturalness and visual fidelity. To address this, we propose \textbf{AlignHuman}, a framework that combines Preference Optimization as a post-training technique with a divide-and-conquer training strategy to jointly optimize these competing objectives. Our key insight stems from an analysis of the denoising process across timesteps: (1) early denoising timesteps primarily control motion dynamics, while (2) fidelity and human structure can be effectively managed by later timesteps, even if early steps are skipped. Building on this observation, we propose timestep-segment preference optimization (TPO) and introduce two specialized LoRAs as expert alignment modules, each targeting a specific dimension in its corresponding timestep interval. The LoRAs are trained using their respective preference data and activated in the corresponding intervals during inference to enhance motion naturalness and fidelity. Extensive experiments demonstrate that AlignHuman improves strong baselines and reduces NFEs during inference, achieving a 3.3$\times$ speedup (from 100 NFEs to 30 NFEs) with minimal impact on generation quality. Homepage: \href{https://alignhuman.github.io/}{https://alignhuman.github.io/} △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: Homepage: https://alignhuman.github.io/

arXiv:2506.11017 [pdf, ps, other]

TeleEval-OS: Performance evaluations of large language models for operations scheduling

Authors: Yanyan Wang, Yingying Wang, Junli Liang, Yin Xu, Yunlong Liu, Yiming Xu, Zhengwang Jiang, Zhehe Li, Fei Li, Long Zhao, Kuang Xu, Qi Song, Xiangyang Li

Abstract: The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domains. Telecommunications operation scheduling (OS) is a critical aspect of the telecommunications industry, involving the coordinated management of networks, services, risks, and human resources to opti… ▽ More The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domains. Telecommunications operation scheduling (OS) is a critical aspect of the telecommunications industry, involving the coordinated management of networks, services, risks, and human resources to optimize production scheduling and ensure unified service control. However, the inherent complexity and domain-specific nature of OS tasks, coupled with the absence of comprehensive evaluation benchmarks, have hindered thorough exploration of LLMs' application potential in this critical field. To address this research gap, we propose the first Telecommunications Operation Scheduling Evaluation Benchmark (TeleEval-OS). Specifically, this benchmark comprises 15 datasets across 13 subtasks, comprehensively simulating four key operational stages: intelligent ticket creation, intelligent ticket handling, intelligent ticket closure, and intelligent evaluation. To systematically assess the performance of LLMs on tasks of varying complexity, we categorize their capabilities in telecommunications operation scheduling into four hierarchical levels, arranged in ascending order of difficulty: basic NLP, knowledge Q&A, report generation, and report analysis. On TeleEval-OS, we leverage zero-shot and few-shot evaluation methods to comprehensively assess 10 open-source LLMs (e.g., DeepSeek-V3) and 4 closed-source LLMs (e.g., GPT-4o) across diverse scenarios. Experimental results demonstrate that open-source LLMs can outperform closed-source LLMs in specific scenarios, highlighting their significant potential and value in the field of telecommunications operation scheduling. △ Less

Submitted 5 May, 2025; originally announced June 2025.

arXiv:2506.10960 [pdf, other]

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Authors: Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng

Abstract: Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We prese… ▽ More Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: Work in progress

arXiv:2506.10938 [pdf, ps, other]

From Fractionalization to Chiral Topological Superconductivity in Flat Chern Band

Authors: Daniele Guerci, Ahmed Abouelkomsan, Liang Fu

Abstract: We show that interacting electrons in a flat Chern band can form, in addition to fractional Chern insulators, a chiral $f$-wave topological superconductor that hosts neutral Majorana fermion edge modes. Superconductivity emerges from an interaction-induced metallic state that exhibits anomalous Hall effect, as observed in rhombohedral graphene and near the $ν=\frac{2}{3}$ fractional Chern insulato… ▽ More We show that interacting electrons in a flat Chern band can form, in addition to fractional Chern insulators, a chiral $f$-wave topological superconductor that hosts neutral Majorana fermion edge modes. Superconductivity emerges from an interaction-induced metallic state that exhibits anomalous Hall effect, as observed in rhombohedral graphene and near the $ν=\frac{2}{3}$ fractional Chern insulator in twisted transition metal dichalcogenides. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 7+9 pages, 5+6 figures

arXiv:2506.10931 [pdf, ps, other]

doi 10.1145/3721145.3730428

MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem

Authors: Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri Ghiasi, Rakesh Nadig, Haiyu Mayo, Geraldo F. Oliveira, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, Onur Mutlu

Abstract: Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult for software-based RSGA to match the throughput of raw signal generation. This paper demonstrates that while hardware acceleration techniques can significantly ac… ▽ More Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult for software-based RSGA to match the throughput of raw signal generation. This paper demonstrates that while hardware acceleration techniques can significantly accelerate RSGA, the high volume of genomic data shifts the performance and energy bottleneck from computation to I/O data movement. As sequencing throughput increases, I/O overhead becomes the main contributor to both runtime and energy consumption. Therefore, there is a need to design a high-performance, energy-efficient system for RSGA that can both alleviate the data movement bottleneck and provide large acceleration capabilities. We propose MARS, a storage-centric system that leverages the heterogeneous resources within modern storage systems (e.g., storage-internal DRAM, storage controller, flash chips) alongside their large storage capacity to tackle both data movement and computational overheads of RSGA in an area-efficient and low-cost manner. MARS accelerates RSGA through a novel hardware/software co-design approach. First, MARS modifies the RSGA pipeline via two filtering mechanisms and a quantization scheme, reducing hardware demands and optimizing for in-storage execution. Second, MARS accelerates the RSGA steps directly within the storage by leveraging both Processing-Near-Memory and Processing-Using-Memory paradigms. Third, MARS orchestrates the execution of all steps to fully exploit in-storage parallelism and minimize data movement. Our evaluation shows that MARS outperforms basecalling-based software and hardware-accelerated state-of-the-art read mapping pipelines by 93x and 40x, on average across different datasets, while reducing their energy consumption by 427x and 72x. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.10877 [pdf, ps, other]

Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment

Authors: Hongda Sun, Jiaren Peng, Wenzhong Yang, Liang He, Bo Du, Rui Yan

Abstract: Medical dialogue systems (MDS) have emerged as crucial online platforms for enabling multi-turn, context-aware conversations with patients. However, existing MDS often struggle to (1) identify relevant medical knowledge and (2) generate personalized, medically accurate responses. To address these challenges, we propose MedRef, a novel MDS that incorporates knowledge refining and dynamic prompt adj… ▽ More Medical dialogue systems (MDS) have emerged as crucial online platforms for enabling multi-turn, context-aware conversations with patients. However, existing MDS often struggle to (1) identify relevant medical knowledge and (2) generate personalized, medically accurate responses. To address these challenges, we propose MedRef, a novel MDS that incorporates knowledge refining and dynamic prompt adjustment. First, we employ a knowledge refining mechanism to filter out irrelevant medical data, improving predictions of critical medical entities in responses. Additionally, we design a comprehensive prompt structure that incorporates historical details and evident details. To enable real-time adaptability to diverse patient conditions, we implement two key modules, Triplet Filter and Demo Selector, providing appropriate knowledge and demonstrations equipped in the system prompt. Extensive experiments on MedDG and KaMed benchmarks show that MedRef outperforms state-of-the-art baselines in both generation quality and medical entity accuracy, underscoring its effectiveness and reliability for real-world healthcare applications. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: ACL 2025 Findings

arXiv:2506.10776 [pdf, ps, other]

ME: Trigger Element Combination Backdoor Attack on Copyright Infringement

Authors: Feiyu Yang, Siyuan Liang, Aishan Liu, Dacheng Tao

Abstract: The capability of generative diffusion models (DMs) like Stable Diffusion (SD) in replicating training data could be taken advantage of by attackers to launch the Copyright Infringement Attack, with duplicated poisoned image-text pairs. SilentBadDiffusion (SBD) is a method proposed recently, which shew outstanding performance in attacking SD in text-to-image tasks. However, the feasible data resou… ▽ More The capability of generative diffusion models (DMs) like Stable Diffusion (SD) in replicating training data could be taken advantage of by attackers to launch the Copyright Infringement Attack, with duplicated poisoned image-text pairs. SilentBadDiffusion (SBD) is a method proposed recently, which shew outstanding performance in attacking SD in text-to-image tasks. However, the feasible data resources in this area are still limited, some of them are even constrained or prohibited due to the issues like copyright ownership or inappropriate contents; And not all of the images in current datasets are suitable for the proposed attacking methods; Besides, the state-of-the-art (SoTA) performance of SBD is far from ideal when few generated poisoning samples could be adopted for attacks. In this paper, we raised new datasets accessible for researching in attacks like SBD, and proposed Multi-Element (ME) attack method based on SBD by increasing the number of poisonous visual-text elements per poisoned sample to enhance the ability of attacking, while importing Discrete Cosine Transform (DCT) for the poisoned samples to maintain the stealthiness. The Copyright Infringement Rate (CIR) / First Attack Epoch (FAE) we got on the two new datasets were 16.78% / 39.50 and 51.20% / 23.60, respectively close to or even outperformed benchmark Pokemon and Mijourney datasets. In condition of low subsampling ratio (5%, 6 poisoned samples), MESI and DCT earned CIR / FAE of 0.23% / 84.00 and 12.73% / 65.50, both better than original SBD, which failed to attack at all. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Showing 51–100 of 20,906 results for author: Liang