Search | arXiv e-print repository

Shadow defense against gradient inversion attack in federated learning

Abstract: Federated learning (FL) has emerged as a transformative framework for privacy-preserving distributed training, allowing clients to collaboratively train a global model without sharing their local data. This is especially crucial in sensitive fields like healthcare, where protecting patient data is paramount. However, privacy leakage remains a critical challenge, as the communication of model updat… ▽ More Federated learning (FL) has emerged as a transformative framework for privacy-preserving distributed training, allowing clients to collaboratively train a global model without sharing their local data. This is especially crucial in sensitive fields like healthcare, where protecting patient data is paramount. However, privacy leakage remains a critical challenge, as the communication of model updates can be exploited by potential adversaries. Gradient inversion attacks (GIAs), for instance, allow adversaries to approximate the gradients used for training and reconstruct training images, thus stealing patient privacy. Existing defense mechanisms obscure gradients, yet lack a nuanced understanding of which gradients or types of image information are most vulnerable to such attacks. These indiscriminate calibrated perturbations result in either excessive privacy protection degrading model accuracy, or insufficient one failing to safeguard sensitive information. Therefore, we introduce a framework that addresses these challenges by leveraging a shadow model with interpretability for identifying sensitive areas. This enables a more targeted and sample-specific noise injection. Specially, our defensive strategy achieves discrepancies of 3.73 in PSNR and 0.2 in SSIM compared to the circumstance without defense on the ChestXRay dataset, and 2.78 in PSNR and 0.166 in the EyePACS dataset. Moreover, it minimizes adverse effects on model performance, with less than 1\% F1 reduction compared to SOTA methods. Our extensive experiments, conducted across diverse types of medical images, validate the generalization of the proposed framework. The stable defense improvements for FedAvg are consistently over 1.5\% times in LPIPS and SSIM. It also offers a universal defense against various GIA types, especially for these sensitive areas in images. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2506.15710 [pdf, ps, other]

RAST: Reasoning Activation in LLMs via Small-model Transfer

Authors: Siru Ouyang, Xinyu Zhu, Zilin Xiao, Minhao Jiang, Yu Meng, Jiawei Han

Abstract: Reinforcement learning (RL) has become a powerful approach for improving the reasoning capabilities of large language models (LLMs), as evidenced by recent successes such as OpenAI's o1 and Deepseek-R1. However, applying RL at scale remains intimidatingly resource-intensive, requiring multiple model copies and extensive GPU workloads. On the other hand, while being powerful, recent studies suggest… ▽ More Reinforcement learning (RL) has become a powerful approach for improving the reasoning capabilities of large language models (LLMs), as evidenced by recent successes such as OpenAI's o1 and Deepseek-R1. However, applying RL at scale remains intimidatingly resource-intensive, requiring multiple model copies and extensive GPU workloads. On the other hand, while being powerful, recent studies suggest that RL does not fundamentally endow models with new knowledge; rather, it primarily reshapes the model's output distribution to activate reasoning capabilities latent in the base model. Building on this insight, we hypothesize that the changes in output probabilities induced by RL are largely model-size invariant, opening the door to a more efficient paradigm: training a small model with RL and transferring its induced probability shifts to larger base models. To verify our hypothesis, we conduct a token-level analysis of decoding trajectories and find high alignment in RL-induced output distributions across model scales, validating our hypothesis. Motivated by this, we propose RAST, a simple yet effective method that transfers reasoning behaviors by injecting RL-induced probability adjustments from a small RL-trained model into larger models. Experiments across multiple mathematical reasoning benchmarks show that RAST substantially and consistently enhances the reasoning capabilities of base models while requiring significantly lower GPU memory than direct RL training, sometimes even yielding better performance than the RL-trained counterparts. Our findings offer new insights into the nature of RL-driven reasoning and practical strategies for scaling its benefits without incurring its full computational cost. The project page of RAST is available at https://ozyyshr.github.io/RAST/. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2506.15691 [pdf, other]

What Do Latent Action Models Actually Learn?

Authors: Chuheng Zhang, Tim Pearce, Pushi Zhang, Kaixin Wang, Xiaoyu Chen, Wei Shen, Li Zhao, Jiang Bian

Abstract: Latent action models (LAMs) aim to learn action-relevant changes from unlabeled videos by compressing changes between frames as latents. However, differences between video frames can be caused by controllable changes as well as exogenous noise, leading to an important concern -- do latents capture the changes caused by actions or irrelevant noise? This paper studies this issue analytically, presen… ▽ More Latent action models (LAMs) aim to learn action-relevant changes from unlabeled videos by compressing changes between frames as latents. However, differences between video frames can be caused by controllable changes as well as exogenous noise, leading to an important concern -- do latents capture the changes caused by actions or irrelevant noise? This paper studies this issue analytically, presenting a linear model that encapsulates the essence of LAM learning, while being tractable.This provides several insights, including connections between LAM and principal component analysis (PCA), desiderata of the data-generating policy, and justification of strategies to encourage learning controllable changes using data augmentation, data cleaning, and auxiliary action-prediction. We also provide illustrative results based on numerical simulation, shedding light on the specific structure of observations, actions, and noise in data that influence LAM learning. △ Less

Submitted 26 May, 2025; originally announced June 2025.

arXiv:2506.15560 [pdf, ps, other]

RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation

Authors: Xingrui Qin, Wentao Zhao, Chuan Cao, Yihe Niu, Houcheng Jiang, Jingchuan Wang

Abstract: Dense metric depth estimation using millimeter-wave radar typically requires dense LiDAR supervision, generated via multi-frame projection and interpolation, to guide the learning of accurate depth from sparse radar measurements and RGB images. However, this paradigm is both costly and data-intensive. To address this, we propose RaCalNet, a novel framework that eliminates the need for dense superv… ▽ More Dense metric depth estimation using millimeter-wave radar typically requires dense LiDAR supervision, generated via multi-frame projection and interpolation, to guide the learning of accurate depth from sparse radar measurements and RGB images. However, this paradigm is both costly and data-intensive. To address this, we propose RaCalNet, a novel framework that eliminates the need for dense supervision by using sparse LiDAR to supervise the learning of refined radar measurements, resulting in a supervision density of merely around 1% compared to dense-supervised methods. Unlike previous approaches that associate radar points with broad image regions and rely heavily on dense labels, RaCalNet first recalibrates and refines sparse radar points to construct accurate depth priors. These priors then serve as reliable anchors to guide monocular depth prediction, enabling metric-scale estimation without resorting to dense supervision. This design improves structural consistency and preserves fine details. Despite relying solely on sparse supervision, RaCalNet surpasses state-of-the-art dense-supervised methods, producing depth maps with clear object contours and fine-grained textures. Extensive experiments on the ZJU-4DRadarCam dataset and real-world deployment scenarios demonstrate its effectiveness, reducing RMSE by 35.30% and 34.89%, respectively. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 9 pages, 7 figures

arXiv:2506.15548 [pdf, other]

Versatile Symbolic Music-for-Music Modeling via Function Alignment

Authors: Junyan Jiang, Daniel Chin, Liwei Lin, Xuanjie Liu, Gus Xia

Abstract: Many music AI models learn a map between music content and human-defined labels. However, many annotations, such as chords, can be naturally expressed within the music modality itself, e.g., as sequences of symbolic notes. This observation enables both understanding tasks (e.g., chord recognition) and conditional generation tasks (e.g., chord-conditioned melody generation) to be unified under a mu… ▽ More Many music AI models learn a map between music content and human-defined labels. However, many annotations, such as chords, can be naturally expressed within the music modality itself, e.g., as sequences of symbolic notes. This observation enables both understanding tasks (e.g., chord recognition) and conditional generation tasks (e.g., chord-conditioned melody generation) to be unified under a music-for-music sequence modeling paradigm. In this work, we propose parameter-efficient solutions for a variety of symbolic music-for-music tasks. The high-level idea is that (1) we utilize a pretrained Language Model (LM) for both the reference and the target sequence and (2) we link these two LMs via a lightweight adapter. Experiments show that our method achieves superior performance among different tasks such as chord recognition, melody generation, and drum track generation. All demos, code and model weights are publicly available. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Journal ref: The 26th conference of the International Society for Music Information Retrieval (ISMIR 2025)

arXiv:2506.15533 [pdf, ps, other]

Measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $D^+\to K^+η^{\prime}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

Abstract: Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The bra… ▽ More Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773\,GeV with the BESIII detector, we present improved measurements of the absolute branching fractions of the doubly Cabibbo-suppressed decays $D^+\to K^+π^0$, $D^+\to K^+η$ and $ D^+ \to K^+ η^{\prime}$ with the double-tag method. The statistical significance of each signal decay exceeds $10σ$. The branching fractions are determined to be ${\mathcal B}(D^+\to K^+ π^0) = (1.45 \pm 0.06 \pm 0.06)\times 10^{-4}$, ${\mathcal B}(D^+\to K^+ η) = (1.17 \pm 0.10 \pm 0.03)\times 10^{-4}$ and ${\mathcal B}(D^+\to K^+ η^{\prime}) = (1.88 \pm 0.15 \pm 0.06)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic. These results are consistent with the world average values but with significantly improved precision. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 20 pages, 4 figures

arXiv:2506.15520 [pdf, ps, other]

Time-bin encoded quantum key distribution over 120 km with a telecom quantum dot source

Authors: Jipeng Wang, Joscha Hanel, Zenghui Jiang, Raphael Joos, Michael Jetter, Eddy Patrick Rugeramigabo, Simone Luca Portalupi, Peter Michler, Xiao-Yu Cao, Hua-Lei Yin, Shan Lei, Jingzhong Yang, Michael Zopf, Fei Ding

Abstract: Quantum key distribution (QKD) with deterministic single photon sources has been demonstrated over intercity fiber and free-space channels. The previous implementations relied mainly on polarization encoding schemes, which are susceptible to birefringence, polarization-mode dispersion and polarization-dependent loss in practical fiber networks. In contrast, time-bin encoding offers inherent robust… ▽ More Quantum key distribution (QKD) with deterministic single photon sources has been demonstrated over intercity fiber and free-space channels. The previous implementations relied mainly on polarization encoding schemes, which are susceptible to birefringence, polarization-mode dispersion and polarization-dependent loss in practical fiber networks. In contrast, time-bin encoding offers inherent robustness and has been widely adopted in mature QKD systems using weak coherent laser pulses. However, its feasibility in conjunction with a deterministic single-photon source has not yet been experimentally demonstrated. In this work, we construct a time-bin encoded QKD system employing a high-brightness quantum dot (QD) single-photon source operating at telecom wavelength. Our proof-of-concept experiment successfully demonstrates the possibility of secure key distribution over fiber link of 120 km, while maintaining extraordinary long-term stability over 6 hours of continuous operation. This work provides the first experimental validation of integrating a quantum dot single-photon source with time-bin encoding in a telecom-band QKD system. In addition, it demonstrates the highest secure key rate among the time-bin QKDs based on single-photon sources. This development signifies a substantial advancement in the establishment of a robust and scalable QKD network based on solid-state single-photon technology △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 14 pages, 4 figures

arXiv:2506.15495 [pdf]

Near-field optical mode engineering-enabled freeform nonlocal metasurfaces

Authors: Zhongjun Jiang, Tianxiang Dai, Shuwei Guo, Soyaib Sohag, Yixuan Shao, Chenkai Mao, Andrea Alù, Jonathan A. Fan, You Zhou

Abstract: Nanophotonic technologies inherently rely on tailoring light-matter interactions through the excitation and interference of deeply confined optical resonances. However, existing concepts in optical mode engineering remain heuristic and are challenging to extend towards complex and multi-functional resonant phenomena. Here, we introduce an inverse design framework that optimizes near-field distribu… ▽ More Nanophotonic technologies inherently rely on tailoring light-matter interactions through the excitation and interference of deeply confined optical resonances. However, existing concepts in optical mode engineering remain heuristic and are challenging to extend towards complex and multi-functional resonant phenomena. Here, we introduce an inverse design framework that optimizes near-field distributions, ideally suited to tailor Mie-type modes within dielectric nanophotonic structures, and we demonstrate its powerful opportunities to facilitate the discovery of new classes of nonlocal metasurfaces. We show that freeform nonlocal metasurfaces supporting accidental bound states in the continuum can be readily optimized to tackle tailored illumination conditions, modal properties and quality factors. We further extend our approach to multifunctional and multipolar mode engineering, and experimentally demonstrate freeform planar nonlocal multi-wavelength and chiral metasurfaces. Our versatile and robust framework for freeform mode engineering has applications in a broad range of high quality-factor metasurface platforms relevant to sensing, nonlinear optics, optomechanics and quantum information processing. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 20 pages, 5 figures

arXiv:2506.15442 [pdf, ps, other]

Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

Authors: Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, Qingxiang Lin, Zeqiang Lai, Xianghui Yang, Huiwen Shi, Zibo Zhao, Bowen Zhang, Hongyu Yan, Lifu Wang, Sicong Liu, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Yulin Cai, Jiaao Yu , et al. (28 additional authors not shown)

Abstract: 3D AI-generated content (AIGC) is a passionate field that has significantly accelerated the creation of 3D models in gaming, film, and design. Despite the development of several groundbreaking models that have revolutionized 3D generation, the field remains largely accessible only to researchers, developers, and designers due to the complexities involved in collecting, processing, and training 3D… ▽ More 3D AI-generated content (AIGC) is a passionate field that has significantly accelerated the creation of 3D models in gaming, film, and design. Despite the development of several groundbreaking models that have revolutionized 3D generation, the field remains largely accessible only to researchers, developers, and designers due to the complexities involved in collecting, processing, and training 3D models. To address these challenges, we introduce Hunyuan3D 2.1 as a case study in this tutorial. This tutorial offers a comprehensive, step-by-step guide on processing 3D data, training a 3D generative model, and evaluating its performance using Hunyuan3D 2.1, an advanced system for producing high-resolution, textured 3D assets. The system comprises two core components: the Hunyuan3D-DiT for shape generation and the Hunyuan3D-Paint for texture synthesis. We will explore the entire workflow, including data preparation, model architecture, training strategies, evaluation metrics, and deployment. By the conclusion of this tutorial, you will have the knowledge to finetune or develop a robust 3D generative model suitable for applications in gaming, virtual reality, and industrial design. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: Github link: https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1

arXiv:2506.15390 [pdf, ps, other]

doi 10.23731/CYRM-2025-005

ECFA Higgs, electroweak, and top Factory Study

Authors: H. Abidi, J. A. Aguilar-Saavedra, S. Airen, S. Ajmal, M. Al-Thakeel, G. L. Alberghi, J. Alcaraz Maestre, J. Alimena, S. Alshamaily, J. Altmann, W. Altmannshofer, Y. Amhis, A. Amiri, A. Andreazza, S. Antusch, O. Arnaez, K. A. Assamagan, S. Aumiller, K. Azizi, P. Azzi, P. Azzurri, E. Bagnaschi, Z. Baharyioon, H. Bahl, V. Balagura , et al. (346 additional authors not shown)

Abstract: The ECFA Higgs, electroweak, and top Factory Study ran between 2021 and 2025 as a broad effort across the experimental and theoretical particle physics communities, bringing together participants from many different proposed future collider projects. Activities across three main working groups advanced the joint development of tools and analysis techniques, fostered new considerations of detector… ▽ More The ECFA Higgs, electroweak, and top Factory Study ran between 2021 and 2025 as a broad effort across the experimental and theoretical particle physics communities, bringing together participants from many different proposed future collider projects. Activities across three main working groups advanced the joint development of tools and analysis techniques, fostered new considerations of detector design and optimisation, and led to a new set of studies resulting in improved projected sensitivities across a wide physics programme. This report demonstrates the significant expansion in the state-of-the-art understanding of the physics potential of future e+e- Higgs, electroweak, and top factories, and has been submitted as input to the 2025 European Strategy for Particle Physics Update. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Report number: CERN-2025-005

arXiv:2506.15347 [pdf, ps, other]

Search for the lepton-flavour-violating decays $B^0 \to K^{*0} τ^\pm e^\mp$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1128 additional authors not shown)

Abstract: A first search for the lepton-flavour-violating decays $B^0\to K^{*0}τ^\pm e^\mp$ is presented. The analysis is performed using a sample of proton-proton collision data, collected with the LHCb detector at a centre-of-mass energy of 13 TeV between 2016 and 2018, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No significant signal is observed, and upper limits on the branching fraction… ▽ More A first search for the lepton-flavour-violating decays $B^0\to K^{*0}τ^\pm e^\mp$ is presented. The analysis is performed using a sample of proton-proton collision data, collected with the LHCb detector at a centre-of-mass energy of 13 TeV between 2016 and 2018, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No significant signal is observed, and upper limits on the branching fractions are determined to be $\cal{B}$$(B^0 \to K^{*0}τ^-e^+)< 5.9$ $(7.1)\times 10^{-6}$ and $\cal{B}$$(B^0 \to K^{*0}τ^+e^-)< 4.9$ $(5.9)\times 10^{-6}$ at the 90\% (95\%) confidence level. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3859/ (LHCb public pages)

Report number: LHCb-PAPER-2025-005, CERN-EP-2025-107

arXiv:2506.15290 [pdf, ps, other]

Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models

Authors: Andela Ilic, Jiaxi Jiang, Paul Streli, Xintong Liu, Christian Holz

Abstract: Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present a new task of full-body human pose estimation using sp… ▽ More Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present a new task of full-body human pose estimation using sparse, loosely attached IMU sensors. To solve this task, we simulate IMU recordings from an existing garment-aware human motion dataset. We developed transformer-based diffusion models to synthesize loose IMU data and estimate human poses based on this challenging loose IMU data. In addition, we show that incorporating garment-related parameters while training the model on simulated loose data effectively maintains expressiveness and enhances the ability to capture variations introduced by looser or tighter garments. Experiments show that our proposed diffusion methods trained on simulated and synthetic data outperformed the state-of-the-art methods quantitatively and qualitatively, opening up a promising direction for future research. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: Accepted by IJCAI 2025

MSC Class: 68T07; 68T45; 68U01 ACM Class: I.2; I.3; I.4; I.5

arXiv:2506.15276 [pdf, ps, other]

MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion

Authors: Jun Zhu, Xinfeng Zhang, Lv Tang, JunHao Jiang

Abstract: Implicit Neural representations (INRs) have emerged as a promising approach for video compression, and have achieved comparable performance to the state-of-the-art codecs such as H.266/VVC. However, existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content. This limitation mainly stems from the underutilization of internal network features and th… ▽ More Implicit Neural representations (INRs) have emerged as a promising approach for video compression, and have achieved comparable performance to the state-of-the-art codecs such as H.266/VVC. However, existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content. This limitation mainly stems from the underutilization of internal network features and the absence of video-specific considerations in network design. To address these challenges, we propose a multi-scale feature fusion framework, MSNeRV, for neural video representation. In the encoding stage, we enhance temporal consistency by employing temporal windows, and divide the video into multiple Groups of Pictures (GoPs), where a GoP-level grid is used for background representation. Additionally, we design a multi-scale spatial decoder with a scale-adaptive loss function to integrate multi-resolution and multi-frequency information. To further improve feature extraction, we introduce a multi-scale feature block that fully leverages hidden features. We evaluate MSNeRV on HEVC ClassB and UVG datasets for video representation and compression. Experimental results demonstrate that our model exhibits superior representation capability among INR-based approaches and surpasses VTM-23.7 (Random Access) in dynamic scenarios in terms of compression efficiency. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2506.15270 [pdf, ps, other]

The invariant subspace problem and Rosenblum operators I

Authors: Junsheng Fang, Bingzhe Hou, Chunlan Jiang, Yuanhang Zhang

Abstract: Let $T\in B(\mathcal{H})$ be an invertible operator. From the 1940's, Gelfand, Hille and Wermer investigated the invariant subspaces of $T$ by analyzing the growth of $\|T^n\|$, where $n\in \mathbb{Z}$. In this paper, we study the invariant subspaces of $T$ by estimating the growth of $\|T^n+λT^{-n}\|$, where $n\in \mathbb{N}$ and $λ$ is a nonzero complex constant. The key ingredient of our approa… ▽ More Let $T\in B(\mathcal{H})$ be an invertible operator. From the 1940's, Gelfand, Hille and Wermer investigated the invariant subspaces of $T$ by analyzing the growth of $\|T^n\|$, where $n\in \mathbb{Z}$. In this paper, we study the invariant subspaces of $T$ by estimating the growth of $\|T^n+λT^{-n}\|$, where $n\in \mathbb{N}$ and $λ$ is a nonzero complex constant. The key ingredient of our approach is introducing the notion of shift representation operators, which is based on the Rosenblum operators. In addition, by employing shift representation operators, we provide an equivalent of the Invariant Subspace Problem via the injectivity of certain Hankel operators. △ Less

Submitted 18 June, 2025; originally announced June 2025.

MSC Class: Primary 47A15; 47B02; Secondary 47A16

arXiv:2506.15199 [pdf, ps, other]

Interpretability and Generalization Bounds for Learning Spatial Physics

Authors: Alejandro Francisco Queiruga, Theo Gutman-Solo, Shuai Jiang

Abstract: While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. For scientific applications, actual quantitative accuracy is crucial. This work applies the rigor of numerical analysis for differential equations to machine learning by specifically quantifying the accuracy of applying different ML techniques to the elementary 1D Poisson differential equa… ▽ More While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. For scientific applications, actual quantitative accuracy is crucial. This work applies the rigor of numerical analysis for differential equations to machine learning by specifically quantifying the accuracy of applying different ML techniques to the elementary 1D Poisson differential equation. Beyond the quantity and discretization of data, we identify that the function space of the data is critical to the generalization of the model. We prove generalization bounds and convergence rates under finite data discretizations and restricted training data subspaces by analyzing the training dynamics and deriving optimal parameters for both a white-box differential equation discovery method and a black-box linear model. The analytically derived generalization bounds are replicated empirically. Similar lack of generalization is empirically demonstrated for deep linear models, shallow neural networks, and physics-specific DeepONets and Neural Operators. We theoretically and empirically demonstrate that generalization to the true physical equation is not guaranteed in each explored case. Surprisingly, we find that different classes of models can exhibit opposing generalization behaviors. Based on our theoretical analysis, we also demonstrate a new mechanistic interpretability lens on scientific models whereby Green's function representations can be extracted from the weights of black-box models. Our results inform a new cross-validation technique for measuring generalization in physical systems. We propose applying it to the Poisson equation as an evaluation benchmark of future methods. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2506.15196 [pdf, ps, other]

HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges

Authors: Xianliang Yang, Ling Zhang, Haolong Qian, Lei Song, Jiang Bian

Abstract: Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce \textbf{HeurAgenix}, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heur… ▽ More Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce \textbf{HeurAgenix}, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception, enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix. △ Less

Submitted 24 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

Comments: 27 pages,9 figures

arXiv:2506.15160 [pdf, ps, other]

Enhancing point cloud analysis via neighbor aggregation correction based on cross-stage structure correlation

Authors: Jiaqi Shi, Jin Xiao, Xiaoguang Hu, Boyang Song, Hao Jiang, Tianyou Chen, Baochang Zhang

Abstract: Point cloud analysis is the cornerstone of many downstream tasks, among which aggregating local structures is the basis for understanding point cloud data. While numerous works aggregate neighbor using three-dimensional relative coordinates, there are irrelevant point interference and feature hierarchy gap problems due to the limitation of local coordinates. Although some works address this limita… ▽ More Point cloud analysis is the cornerstone of many downstream tasks, among which aggregating local structures is the basis for understanding point cloud data. While numerous works aggregate neighbor using three-dimensional relative coordinates, there are irrelevant point interference and feature hierarchy gap problems due to the limitation of local coordinates. Although some works address this limitation by refining spatial description though explicit modeling of cross-stage structure, these enhancement methods based on direct geometric structure encoding have problems of high computational overhead and noise sensitivity. To overcome these problems, we propose the Point Distribution Set Abstraction module (PDSA) that utilizes the correlation in the high-dimensional space to correct the feature distribution during aggregation, which improves the computational efficiency and robustness. PDSA distinguishes the point correlation based on a lightweight cross-stage structural descriptor, and enhances structural homogeneity by reducing the variance of the neighbor feature matrix and increasing classes separability though long-distance modeling. Additionally, we introducing a key point mechanism to optimize the computational overhead. The experimental result on semantic segmentation and classification tasks based on different baselines verify the generalization of the method we proposed, and achieve significant performance improvement with less parameter cost. The corresponding ablation and visualization results demonstrate the effectiveness and rationality of our method. The code and training weight is available at: https://github.com/AGENT9717/PointDistribution △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 17 papes, 7 figures

arXiv:2506.15066 [pdf, ps, other]

ChatModel: Automating Reference Model Design and Verification with LLMs

Authors: Jianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang

Abstract: As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification process, are themselves becoming more intricate and time-consuming to develop. Despite the promise shown by large language models (LLMs) in code programming, effectively generating complex reference models… ▽ More As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification process, are themselves becoming more intricate and time-consuming to develop. Despite the promise shown by large language models (LLMs) in code programming, effectively generating complex reference models remains a significant hurdle. To address these challenges, we introduce ChatModel, the first LLM-aided agile reference model generation and verification platform. ChatModel streamlines the transition from design specifications to fully functional reference models by integrating design standardization and hierarchical agile modeling. Employing a building-block generation strategy, it not only enhances the design capabilities of LLMs for reference models but also significantly boosts verification efficiency. We evaluated ChatModel on 300 designs of varying complexity, demonstrating substantial improvements in both efficiency and quality of reference model generation. ChatModel achieved a peak performance improvement of 55.02% compared to alternative methods, with notable enhancements in generation stability, and delivered a 9.18x increase in its capacity to produce reference model designs. Furthermore, it accelerated the iterative process of reference model design and validation by an average of 5.90x compared to traditional approaches. These results highlight the potential of ChatModel to significantly advance the automation of reference model generation and validation. △ Less

Submitted 24 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.15039 [pdf, ps, other]

Unveiling the Cosmic Dance of Repeated Nuclear Transient ASASSN-14ko: Insights from Multiwavelength Observations

Authors: Shifeng Huang, Tinggui Wang, Ning Jiang, Rong-Feng Shen, Zhaohao Chen, Yuanming Wang, Jiazheng Zhu, Yibo Wang, Yunguo Jiang, Xinwen Shu, Hucheng Ding, Xiongjun Fang, Yifan Wang, Jie Lin, Jingran Xu, Xu Chen, Zheyu Lin, Zhengfeng Sheng

Abstract: ASASSN-14ko is a periodically repeating nuclear transient. We conducted high-cadence, multiwavelength observations of this source, revealing several recurrent early bumps and rebrightenings in its UV/optical light curves. The energy released during these bumps and rebrightenings shows a diminishing trend in recent UV/optical outbursts, which we monitored through multiwavelength observations. These… ▽ More ASASSN-14ko is a periodically repeating nuclear transient. We conducted high-cadence, multiwavelength observations of this source, revealing several recurrent early bumps and rebrightenings in its UV/optical light curves. The energy released during these bumps and rebrightenings shows a diminishing trend in recent UV/optical outbursts, which we monitored through multiwavelength observations. These features can be ascribed to the interaction between stream debris and the expanded disk in the repeated partial tidal disruption event. The X-ray light curve exhibits an inverse pattern compared to the UV/optical bands, displaying sporadic outbursts. Furthermore, our observations demonstrate that the blackbody temperature and radius in each outburst increase with the UV/optical luminosity, and such evolution resembles that observed in X-ray quasiperiodic eruptions, whereas distinguishing it from typical tidal disruption events. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Accepted for publication in ApJ, 16 pages, 10 figures

arXiv:2506.14827 [pdf, ps, other]

DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning

Authors: Yifeng Gao, Yifan Ding, Hongyu Su, Juncheng Li, Yunhan Zhao, Lin Luo, Zixing Chen, Li Wang, Xin Wang, Yixu Wang, Xingjun Ma, Yu-Gang Jiang

Abstract: As AI-generated video becomes increasingly pervasive across media platforms, the ability to reliably distinguish synthetic content from authentic footage has become both urgent and essential. Existing approaches have primarily treated this challenge as a binary classification task, offering limited insight into where or why a model identifies a video as AI-generated. However, the core challenge ex… ▽ More As AI-generated video becomes increasingly pervasive across media platforms, the ability to reliably distinguish synthetic content from authentic footage has become both urgent and essential. Existing approaches have primarily treated this challenge as a binary classification task, offering limited insight into where or why a model identifies a video as AI-generated. However, the core challenge extends beyond simply detecting subtle artifacts; it requires providing fine-grained, persuasive evidence that can convince auditors and end-users alike. To address this critical gap, we introduce DAVID-X, the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales. Leveraging these rich annotations, we present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning-including defect categorization, temporal-spatial localization, and natural language explanations. This approach fundamentally transforms AI-generated video detection from an opaque black-box decision into a transparent and verifiable diagnostic process. We demonstrate that a general-purpose backbone, fine-tuned on our compact dataset and enhanced with chain-of-thought distillation, achieves strong generalization across a variety of generators and generation modes. Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.14813 [pdf, ps, other]

Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks

Authors: Yuxuan Jiang, Ziming Zhou, Boyu Xu, Beijie Liu, Runhui Xu, Peng Huang

Abstract: Training deep learning (DL) models is a complex process, making it prone to silent errors that are challenging to detect and diagnose. This paper presents TRAINCHECK, a framework that takes a proactive checking approach to address silent training errors. TRAINCHECK automatically infers invariants tailored for DL training. It uses these invariants to proactively detect silent errors during the trai… ▽ More Training deep learning (DL) models is a complex process, making it prone to silent errors that are challenging to detect and diagnose. This paper presents TRAINCHECK, a framework that takes a proactive checking approach to address silent training errors. TRAINCHECK automatically infers invariants tailored for DL training. It uses these invariants to proactively detect silent errors during the training process while providing debugging help. To evaluate TRAINCHECK, we reproduce 20 real-world silent training errors with diverse root causes. TRAINCHECK successfully detects 18 errors within a single training iteration. It also uncovers 6 unknown bugs in popular training libraries that lead to silent errors. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: 19 pages, to appear in 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI '25)

arXiv:2506.14809 [pdf]

Impact of a Deployed LLM Survey Creation Tool through the IS Success Model

Authors: Peng Jiang, Vinicius Cezar Monteiro de Lira, Antonio Maiorino

Abstract: Surveys are a cornerstone of Information Systems (IS) research, yet creating high-quality surveys remains labor-intensive, requiring both domain expertise and methodological rigor. With the evolution of large language models (LLMs), new opportunities emerge to automate survey generation. This paper presents the real-world deployment of an LLM-powered system designed to accelerate data collection w… ▽ More Surveys are a cornerstone of Information Systems (IS) research, yet creating high-quality surveys remains labor-intensive, requiring both domain expertise and methodological rigor. With the evolution of large language models (LLMs), new opportunities emerge to automate survey generation. This paper presents the real-world deployment of an LLM-powered system designed to accelerate data collection while maintaining survey quality. Deploying such systems in production introduces real-world complexity, including diverse user needs and quality control. We evaluate the system using the DeLone and McLean IS Success Model to understand how generative AI can reshape a core IS method. This study makes three key contributions. To our knowledge, this is the first application of the IS Success Model to a generative AI system for survey creation. In addition, we propose a hybrid evaluation framework combining automated and human assessments. Finally, we implement safeguards that mitigate post-deployment risks and support responsible integration into IS workflows. △ Less

Submitted 3 June, 2025; originally announced June 2025.

ACM Class: I.2; H.4

arXiv:2506.14731 [pdf, ps, other]

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Authors: Ling Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan , et al. (21 additional authors not shown)

Abstract: We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challeng… ▽ More We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks (e.g., AIME, LiveCodeBench, GPQA-Diamond) while activating only one-third of the parameters required by comparable models. To accomplish this, we introduce a joint training pipeline integrating distillation with RL, revealing undocumented challenges in MoE RL training. First, we identify optimization instability during RL training, and we propose Constrained Contextual Computation Policy Optimization(C3PO), a novel approach that enhances training stability and improves computational throughput via algorithm-system co-design methodology. Second, we empirically demonstrate that selecting distillation checkpoints based on entropy loss for RL training, rather than validation metrics, yields superior performance-efficiency trade-offs in subsequent RL training. Finally, we develop a two-stage training paradigm to harmonize multi-domain data integration, addressing domain conflicts that arise in training with mixed dataset. We will release the model, dataset, and code. △ Less

Submitted 17 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

Comments: Technical Report

arXiv:2506.14728 [pdf, ps, other]

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

Authors: Jiahao Qiu, Xinzhe Juan, Yimin Wang, Ling Yang, Xuan Qi, Tongcheng Zhang, Jiacheng Guo, Yifu Lu, Zixin Yao, Hongru Wang, Shilong Liu, Xun Jiang, Liu Leqi, Mengdi Wang

Abstract: While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teache… ▽ More While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teacher tool usage, but they often struggle to train student agents to dynamically plan and act in novel environments. We propose AgentDistill, a novel, training-free agent distillation framework that enables efficient and scalable knowledge transfer via direct reuse of Model-Context-Protocols (MCPs), which are structured and reusable task-solving modules autonomously generated by teacher agents. The reuse of these distilled MCPs enables student agents to generalize their capabilities across domains and solve new problems with minimal supervision or human intervention. Experiments on biomedical and mathematical benchmarks demonstrate that our distilled student agents, built on small language models, can achieve performance comparable to advanced systems using large LLMs such as OctoTools (GPT-4o), highlighting the effectiveness of our framework in building scalable and cost-efficient intelligent agents. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: 10 pages, 5 figures

arXiv:2506.14716 [pdf, ps, other]

All-optical convolution utilizing processing in memory based on a cold atomic ensemble

Authors: Ying-Hao Ye, Jia-Qi Jiang, En-Ze Li, Wei Zhang, Da-Chuang Li, Zhi-Han Zhu, Dong-Sheng Ding, Bao-Sen Shi

Abstract: Processing in memory (PIM) has received significant attention due to its high efficiency, low latency, and parallelism. In optical computation, coherent memory is a crucial infrastructure for PIM frameworks. This study presents an all-optical convolution experiment conducted within computational storage based on a cold atomic ensemble. By exploiting the light-atom phase transfer facilitated by the… ▽ More Processing in memory (PIM) has received significant attention due to its high efficiency, low latency, and parallelism. In optical computation, coherent memory is a crucial infrastructure for PIM frameworks. This study presents an all-optical convolution experiment conducted within computational storage based on a cold atomic ensemble. By exploiting the light-atom phase transfer facilitated by the electromagnetically induced transparency, we demonstrated spiral phase contrast processing of photon images in memory, resulting in the edge enhancement of retrieved images recorded using time-correlated photon imaging. In particular, adopting state-of-the-art atomic techniques provides a coherent memory lifetime exceeding 320 us for PIM operations. Our results highlight the significant potential of cold atomic ensembles as computational storage for developing all-optical PIM systems. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: 6 pages, 3 figures, supplmental material appended

arXiv:2506.14683 [pdf, ps, other]

Unified Software Engineering agent as AI Software Engineer

Authors: Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury

Abstract: The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the sa… ▽ More The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenance and evolution of a project. In this context, the concept of LLM agents has gained traction, which utilize LLMs as reasoning engines to invoke external tools autonomously. But is an LLM agent the same as an AI software engineer? In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent. Unlike existing work which builds specialized agents for specific software tasks such as testing, debugging, and repair, our goal is to build a unified agent which can orchestrate and handle multiple capabilities. This gives the agent the promise of handling complex scenarios in software development such as fixing an incomplete patch, adding new features, or taking over code written by others. We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans. To evaluate the efficacy of USEagent, we build a Unified Software Engineering bench (USEbench) comprising of myriad tasks such as coding, testing, and patching. USEbench is a judicious mixture of tasks from existing benchmarks such as SWE-bench, SWT-bench, and REPOCOD. In an evaluation on USEbench consisting of 1,271 repository-level software engineering tasks, USEagent shows improved efficacy compared to existing general agents such as OpenHands CodeActAgent. There exist gaps in the capabilities of USEagent for certain coding tasks, which provides hints on further developing the AI Software Engineer of the future. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Leonhard Applis and Yuntong Zhang contributed equally to this work

arXiv:2506.14541 [pdf, ps, other]

Exploring Diffusion with Test-Time Training on Efficient Image Restoration

Authors: Rongchang Lu, Tianduo Luo, Yunzhi Jiang, Conghan Yue, Pei Yang, Guibao Liu, Changyang Gu

Abstract: Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test-Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni-Scale 2D State Evolution extends RWKV's location-dependent parameterization to hierarchic… ▽ More Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test-Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni-Scale 2D State Evolution extends RWKV's location-dependent parameterization to hierarchical multi-directional 2D scanning, enabling global contextual awareness with linear complexity O(L); (2) Chunk-Optimized Flash Processing accelerates intra-chunk parallelism by 3.2x via contiguous chunk processing (O(LCd) complexity), reducing sequential dependencies and computational overhead; (3) Prior-Guided Efficient Diffusion extracts a compact Image Prior Representation (IPR) in only 5-20 steps, proving 45% faster training/inference than DiffIR while solving computational inefficiency in denoising. Evaluated across super-resolution and inpainting benchmarks (Set5, Set14, BSD100, Urban100, Places365), DiffRWKVIR outperforms SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, LPIPS, and efficiency metrics. Our method establishes a new paradigm for adaptive, high-efficiency image restoration with optimized hardware utilization. △ Less

Submitted 22 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

ACM Class: I.4.9

arXiv:2506.14493 [pdf, ps, other]

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

Authors: Jiyuan Fu, Kaixun Jiang, Lingyi Hong, Jinglun Li, Haijing Guo, Dingkang Yang, Zhaoyu Chen, Wenqiang Zhang

Abstract: Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect th… ▽ More Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments demonstrate LingoLoop can increase generated tokens by up to 30 times and energy consumption by a comparable factor on models like Qwen2.5-VL-3B, consistently driving MLLMs towards their maximum generation limits. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment. The code will be released publicly following the paper's acceptance. △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.14406 [pdf, ps, other]

Search for neutron decay into an antineutrino and a neutral kaon in 0.401 megaton-years exposure of Super-Kamiokande

Authors: Super-Kamiokande Collaboration, :, K. Yamauchi, K. Abe, S. Abe, Y. Asaoka, M. Harada, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, M. Nakahata, S. Nakayama, Y. Noguchi, G. Pronost, K. Sato, H. Sekiya , et al. (240 additional authors not shown)

Abstract: We searched for bound neutron decay via $n\to\barν+K^0$ predicted by the Grand Unified Theories in 0.401 Mton$\cdot$years exposure of all pure water phases in the Super-Kamiokande detector. About 4.4 times more data than in the previous search have been analyzed by a new method including a spectrum fit to kaon invariant mass distributions. No significant data excess has been observed in the signal… ▽ More We searched for bound neutron decay via $n\to\barν+K^0$ predicted by the Grand Unified Theories in 0.401 Mton$\cdot$years exposure of all pure water phases in the Super-Kamiokande detector. About 4.4 times more data than in the previous search have been analyzed by a new method including a spectrum fit to kaon invariant mass distributions. No significant data excess has been observed in the signal regions. As a result of this analysis, we set a lower limit of $7.8\times10^{32}$ years on the neutron lifetime at a 90% confidence level. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: 12 pages, 5 figures

arXiv:2506.14381 [pdf, ps, other]

Compressed Video Super-Resolution based on Hierarchical Encoding

Authors: Yuxuan Jiang, Siyue Teng, Qiang Zhu, Chen Feng, Chengxi Zeng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull

Abstract: This paper presents a general-purpose video super-resolution (VSR) method, dubbed VSR-HE, specifically designed to enhance the perceptual quality of compressed content. Targeting scenarios characterized by heavy compression, the method upscales low-resolution videos by a ratio of four, from 180p to 720p or from 270p to 1080p. VSR-HE adopts hierarchical encoding transformer blocks and has been soph… ▽ More This paper presents a general-purpose video super-resolution (VSR) method, dubbed VSR-HE, specifically designed to enhance the perceptual quality of compressed content. Targeting scenarios characterized by heavy compression, the method upscales low-resolution videos by a ratio of four, from 180p to 720p or from 270p to 1080p. VSR-HE adopts hierarchical encoding transformer blocks and has been sophisticatedly optimized to eliminate a wide range of compression artifacts commonly introduced by H.265/HEVC encoding across various quantization parameter (QP) levels. To ensure robustness and generalization, the model is trained and evaluated under diverse compression settings, allowing it to effectively restore fine-grained details and preserve visual fidelity. The proposed VSR-HE has been officially submitted to the ICME 2025 Grand Challenge on VSR for Video Conferencing (Team BVI-VSR), under both the Track 1 (General-Purpose Real-World Video Content) and Track 2 (Talking Head Videos). △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.14245 [pdf, ps, other]

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for advancing the reasoning capabilities of Large Language Models (LLMs). However, a critical paradox clouds its efficacy: RLVR-tuned models often underperform their base models on the $Pass@K$ metric for solution-finding, leading to the hypothesis that RLVR merely re-weights existing reasoning paths at the c… ▽ More Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for advancing the reasoning capabilities of Large Language Models (LLMs). However, a critical paradox clouds its efficacy: RLVR-tuned models often underperform their base models on the $Pass@K$ metric for solution-finding, leading to the hypothesis that RLVR merely re-weights existing reasoning paths at the cost of reasoning diversity. In this work, we resolve this contradiction by identifying the source of the problem: the $Pass@K$ metric itself is a flawed measure of reasoning, as it credits correct final answers that probably arise from inaccurate or incomplete chains of thought (CoTs). To address this, we introduce a more precise evaluation metric, $CoT$-$Pass@K$, which mandates that both the reasoning path and the final answer be correct. We provide a new theoretical foundation that formalizes how RLVR, unlike traditional RL, is uniquely structured to incentivize logical integrity. Our empirical results are supportive: using $CoT$-$Pass@K$, we observe that RLVR can incentivize the generalization of correct reasoning for all values of $K$. Furthermore, by analyzing the training dynamics, we find that this enhanced reasoning capability emerges early in the training process and smoothly generalizes. Our work provides a clear perspective on the role of RLVR, offers a more reliable method for its evaluation, and confirms its potential to genuinely advance machine reasoning. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Preprint

arXiv:2506.14243 [pdf, ps, other]

Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition

Authors: Xiaohui Jiang, Haijiang Zhu, Chade Li, Fulin Tang, Ning An

Abstract: LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representati… ▽ More LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representation fragility stems from reliance on single-level geometric abstractions that lack discriminative power in structurally complex scenarios. To address these limitations, we propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning. Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density and achieves the characteristic of uniform distribution. Subsequently, we derive the occupancy grid and normal vector information of the scene from this implicit representation. Finally, with the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view (capturing macro-level spatial layouts) and 3D segment (encoding micro-scale surface geometries) perspectives. We conducted extensive experiments on numerous datasets (KITTI, KITTI-360, MulRan, NCLT) across diverse environments. The experimental results demonstrate that our method achieves state-of-the-art performance. Moreover, our approach strikes an optimal balance between accuracy, runtime, and memory optimization for historical maps, showcasing excellent Resilient and scalability. Our code will be open-sourced in the future. △ Less

Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.14194 [pdf, ps, other]

A Variational Information Theoretic Approach to Out-of-Distribution Detection

Authors: Sudeepta Mondal, Zhuolin Jiang, Ganesh Sundaramoorthi

Abstract: We present a theory for the construction of out-of-distribution (OOD) detection features for neural networks. We introduce random features for OOD through a novel information-theoretic loss functional consisting of two terms, the first based on the KL divergence separates resulting in-distribution (ID) and OOD feature distributions and the second term is the Information Bottleneck, which favors co… ▽ More We present a theory for the construction of out-of-distribution (OOD) detection features for neural networks. We introduce random features for OOD through a novel information-theoretic loss functional consisting of two terms, the first based on the KL divergence separates resulting in-distribution (ID) and OOD feature distributions and the second term is the Information Bottleneck, which favors compressed features that retain the OOD information. We formulate a variational procedure to optimize the loss and obtain OOD features. Based on assumptions on OOD distributions, one can recover properties of existing OOD features, i.e., shaping functions. Furthermore, we show that our theory can predict a new shaping function that out-performs existing ones on OOD benchmarks. Our theory provides a general framework for constructing a variety of new features with clear explainability. △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.14151 [pdf, ps, other]

TraGe: A Generic Packet Representation for Traffic Classification Based on Header-Payload Differences

Authors: Chungang Lin, Yilong Jiang, Weiyao Zhang, Xuying Meng, Tianyu Zuo, Yujun Zhang

Abstract: Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network. Since traditional methods heavily rely on feature extraction and large scale labeled data, some recent pre-trained models manage to reduce the dependency by utilizing different pre-training tasks to train generic representations for network packets. However, existing pre-trained models typic… ▽ More Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network. Since traditional methods heavily rely on feature extraction and large scale labeled data, some recent pre-trained models manage to reduce the dependency by utilizing different pre-training tasks to train generic representations for network packets. However, existing pre-trained models typically adopt pre-training tasks developed for image or text data, which are not tailored to traffic data. As a result, the obtained traffic representations fail to fully reflect the information contained in the traffic, and may even disrupt the protocol information. To address this, we propose TraGe, a novel generic packet representation model for traffic classification. Based on the differences between the header and payload-the two fundamental components of a network packet-we perform differentiated pre-training according to the byte sequence variations (continuous in the header vs. discontinuous in the payload). A dynamic masking strategy is further introduced to prevent overfitting to fixed byte positions. Once the generic packet representation is obtained, TraGe can be finetuned for diverse traffic classification tasks using limited labeled data. Experimental results demonstrate that TraGe significantly outperforms state-of-the-art methods on two traffic classification tasks, with up to a 6.97% performance improvement. Moreover, TraGe exhibits superior robustness under parameter fluctuations and variations in sampling configurations. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: This paper has been accepted by IWQoS 2025 as a short paper

arXiv:2506.14087 [pdf, ps, other]

Multi-Scale Finetuning for Encoder-based Time Series Foundation Models

Authors: Zhongzheng Qiao, Chenghao Liu, Yiming Zhang, Ming Jin, Quang Pham, Qingsong Wen, P. N. Suganthan, Xudong Jiang, Savitha Ramasamy

Abstract: Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal per… ▽ More Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance. Given the diverse temporal patterns across sampling scales and the inherent multi-scale forecasting capabilities of TSFMs, we adopt a causal perspective to analyze finetuning process, through which we highlight the critical importance of explicitly modeling multiple scales and reveal the shortcomings of naive approaches. Focusing on \textit{encoder-based} TSFMs, we propose \textbf{M}ulti\textbf{\textsc{s}}cale \textbf{\textsc{f}}ine\textbf{\textsc{t}}uning (\textbf{MSFT}), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. Experimental results on three different backbones (\moirai, \moment\ and \units) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.14030 [pdf]

The Bones and Shapes of the Phillips Curve

Authors: Hanyuan Jiang

Abstract: The COVID-19 pandemic reignited debate on the U.S. Phillips curve. Using MSA-level panel data (2001-2024), we employ a Two-Stage Least Squares (2SLS) instrumental variable strategy with a shift-share instrument to estimate core non-tradable inflation's response to a v/u-based slack measure. We distinguish structural slope stability from state-dependent non-linearities via a threshold model. Our an… ▽ More The COVID-19 pandemic reignited debate on the U.S. Phillips curve. Using MSA-level panel data (2001-2024), we employ a Two-Stage Least Squares (2SLS) instrumental variable strategy with a shift-share instrument to estimate core non-tradable inflation's response to a v/u-based slack measure. We distinguish structural slope stability from state-dependent non-linearities via a threshold model. Our analysis addresses whether the slope of the Phillips Curve changed during and after the Pandemic in the United States by evaluating if recent inflation dynamics reflect an altered structural trade-off ("bones") or the activation of non-linear "shapes" in response to extreme labor market tightness. This distinction offers critical insights into the unemployment cost of disinflation. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.14028 [pdf, ps, other]

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global financial domain, evaluating LLMs across modalities (text, vision, audio) and linguistic settings (monolingual, bilingual, multilingual) on domain-specific tasks. We introduce two novel tasks, including PolyFiQA-Easy and PolyFiQA-Expert, the first multilingual financial benchmarks requiring models to perform complex reasoning over mixed-language inputs; and EnglishOCR and SpanishOCR, the first OCR-embedded financial QA tasks challenging models to extract and reason over information from visual-text financial documents. Moreover, we propose a dynamic, difficulty-aware selection mechanism and curate a compact, balanced benchmark rather than simple aggregation existing datasets. Extensive evaluation of 22 state-of-the-art models reveals that even the strongest models, despite their general multimodal and multilingual capabilities, struggle dramatically when faced with complex cross-lingual and multimodal tasks in financial domain. MultiFinBen is publicly released to foster transparent, reproducible, and inclusive progress in financial studies and applications. △ Less

Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.14020 [pdf, ps, other]

Bures-Wasserstein Flow Matching for Graph Generation

Authors: Keyue Jiang, Jiahao Cui, Xiaowen Dong, Laura Toni

Abstract: Graph generation has emerged as a critical task in fields ranging from molecule design to drug discovery. Contemporary approaches, notably diffusion and flow-based models, have achieved solid graph generative performance through constructing a probability path that interpolates between a reference distribution and the data distribution. However, these methods typically model the evolution of indiv… ▽ More Graph generation has emerged as a critical task in fields ranging from molecule design to drug discovery. Contemporary approaches, notably diffusion and flow-based models, have achieved solid graph generative performance through constructing a probability path that interpolates between a reference distribution and the data distribution. However, these methods typically model the evolution of individual nodes and edges independently and use linear interpolations to build the path assuming that the data lie in Euclidean space. We show that this is suboptimal given the intrinsic non-Euclidean structure and interconnected patterns of graphs, and it poses risks to the sampling convergence. To build a better probability path, we model the joint evolution of the nodes and edges by representing graphs as connected systems parameterized by Markov random fields (MRF). We then leverage the optimal transport displacement between MRF objects to design the probability path for graph generation. Based on this, we introduce BWFlow, a flow-matching framework for graph generation that respects the underlying geometry of graphs and provides smooth velocities in the probability path. The novel framework can be adapted to both continuous and discrete flow-matching algorithms. Experimental evaluations in plain graph generation and 2D/3D molecule generation validate the effectiveness of BWFlow in graph generation with competitive performance, stable training, and guaranteed sampling convergence. △ Less

Submitted 23 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13914 [pdf]

The Impact of Social Isolation on Subjective Cognitive Decline in Older Adults: A Study Based on Network Analysis and Longitudinal Model

Authors: Yingchen Liu, Haixin Jiang

Abstract: Social isolation (SI) in older adults has emerged as a critical mental health concern, with established links to cognitive decline. While depression is hypothesized to mediate this relationship, the longitudinal mechanisms remain unclear. This study employed network analysis and cross-lagged modeling to examine these relationships during pandemic-related social restrictions. We collected data from… ▽ More Social isolation (SI) in older adults has emerged as a critical mental health concern, with established links to cognitive decline. While depression is hypothesized to mediate this relationship, the longitudinal mechanisms remain unclear. This study employed network analysis and cross-lagged modeling to examine these relationships during pandemic-related social restrictions. We collected data from 1,230 older adults (Mage=64.49, SD=3.84) across three timepoints (during lockdown, immediately post-lockdown, and 6 months later). Network analysis identified depressive symptoms (particularly PHQ-9 item 9) as central nodes bridging SI and subjective cognitive decline (SCD). Longitudinal analyses revealed: 1) T1 SI predicted T2 depression; 2) T2 depression predicted T3 SCD. These patterns held for both online and offline SI, with excellent model fit. Our findings demonstrate depression's mediating role in the "SI-depression-SCD" pathway, highlighting how reduced social connections may gradually impair cognitive self-assessment through depressive symptoms. The results underscore the universal mental health impact of SI, regardless of interaction modality (virtual/in-person). These insights suggest that combined interventions targeting both social connection and depression management may be particularly effective for mitigating age-related cognitive decline. △ Less

Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

Comments: 29 pages, 5 figures, 2 tables

arXiv:2506.13807 [pdf, ps, other]

BraTS orchestrator : Democratizing and Disseminating state-of-the-art brain tumor image analysis

Authors: Florian Kofler, Marcel Rosier, Mehdi Astaraki, Ujjwal Baid, Hendrik Möller, Josef A. Buchner, Felix Steinbauer, Eva Oswald, Ezequiel de la Rosa, Ivan Ezhov, Constantin von See, Jan Kirschke, Anton Schmick, Sarthak Pati, Akis Linardos, Carla Pitarch, Sanyukta Adap, Jeffrey Rudie, Maria Correia de Verdier, Rachit Saluja, Evan Calabrese, Dominic LaBella, Mariam Aboian, Ahmed W. Moawad, Nazanin Maleki , et al. (12 additional authors not shown)

Abstract: The Brain Tumor Segmentation (BraTS) cluster of challenges has significantly advanced brain tumor image analysis by providing large, curated datasets and addressing clinically relevant tasks. However, despite its success and popularity, algorithms and models developed through BraTS have seen limited adoption in both scientific and clinical communities. To accelerate their dissemination, we introdu… ▽ More The Brain Tumor Segmentation (BraTS) cluster of challenges has significantly advanced brain tumor image analysis by providing large, curated datasets and addressing clinically relevant tasks. However, despite its success and popularity, algorithms and models developed through BraTS have seen limited adoption in both scientific and clinical communities. To accelerate their dissemination, we introduce BraTS orchestrator, an open-source Python package that provides seamless access to state-of-the-art segmentation and synthesis algorithms for diverse brain tumors from the BraTS challenge ecosystem. Available on GitHub (https://github.com/BrainLesion/BraTS), the package features intuitive tutorials designed for users with minimal programming experience, enabling both researchers and clinicians to easily deploy winning BraTS algorithms for inference. By abstracting the complexities of modern deep learning, BraTS orchestrator democratizes access to the specialized knowledge developed within the BraTS community, making these advances readily available to broader neuro-radiology and neuro-oncology audiences. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 27p, 2figs, 3tabs

arXiv:2506.13709 [pdf, ps, other]

SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

Authors: Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li

Abstract: Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To ad… ▽ More Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To address this, we introduce SpeechRefiner, a post-processing tool that utilizes Conditional Flow Matching (CFM) to improve the perceptual quality of speech. In this study, we benchmark SpeechRefiner against recent task-specific refinement methods and evaluate its performance within our internal processing pipeline, which integrates multiple front-end algorithms. Experiments show that SpeechRefiner exhibits strong generalization across diverse impairment sources, significantly enhancing speech perceptual quality. Audio demos can be found at https://speechrefiner.github.io/SpeechRefiner/. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Accepted by Interspeech 2025

arXiv:2506.13695 [pdf, ps, other]

OneRec Technical Report

Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimization inconsistencies, and hindering the effective application of key breakthrough technologies from the AI community in recommendation scenarios. To address these issues, we propose OneRec, which reshapes the recommendation system through an end-to-end generative approach and achieves promising results. Firstly, we have enhanced the computational FLOPs of the current recommendation model by 10 $\times$ and have identified the scaling laws for recommendations within certain boundaries. Secondly, reinforcement learning techniques, previously difficult to apply for optimizing recommendations, show significant potential in this framework. Lastly, through infrastructure optimizations, we have achieved 23.7% and 28.8% Model FLOPs Utilization (MFU) on flagship GPUs during training and inference, respectively, aligning closely with the LLM community. This architecture significantly reduces communication and storage overhead, resulting in operating expense that is only 10.6% of traditional recommendation pipelines. Deployed in Kuaishou/Kuaishou Lite APP, it handles 25% of total queries per second, enhancing overall App Stay Time by 0.54% and 1.24%, respectively. Additionally, we have observed significant increases in metrics such as 7-day Lifetime, which is a crucial indicator of recommendation experience. We also provide practical lessons and insights derived from developing, optimizing, and maintaining a production-scale recommendation system with significant real-world impact. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Authors are listed alphabetically by their first name

arXiv:2506.13656 [pdf, ps, other]

Generalized Frobenius Manifold Structures on the Orbit Spaces of Affine Weyl Groups I

Authors: Lingrui Jiang, Si-Qi Liu, Yingchao Tian, Youjin Zhang

Abstract: We present an approach to construct a class of generalized Frobenius manifold structures on the orbit spaces of affine Weyl groups, and prove that their monodromy groups are proper subgroups of the associated affine Weyl groups. We present an approach to construct a class of generalized Frobenius manifold structures on the orbit spaces of affine Weyl groups, and prove that their monodromy groups are proper subgroups of the associated affine Weyl groups. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13651 [pdf, ps, other]

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Authors: Kaiyuan Chen, Yixin Ren, Yang Liu, Xiaobo Hu, Haotong Tian, Tianbao Xie, Fangfu Liu, Haoye Zhang, Hongzhang Liu, Yuan Gong, Chen Sun, Han Hou, Hui Yang, James Pan, Jianan Lou, Jiayi Mao, Jizheng Liu, Jinpeng Li, Kangyi Liu, Kenkun Liu, Rui Wang, Run Li, Tong Niu, Wenlong Zhang, Wenqi Yan , et al. (8 additional authors not shown)

Abstract: We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks… ▽ More We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks defined by industry professionals. Our framework creates metrics that strongly correlate with productivity value, enables prediction of Technology-Market Fit (TMF), and facilitates tracking of product capabilities over time. As our initial implementations, we present two benchmarks: Recruitment and Marketing. For Recruitment, we collect 50 tasks from real-world headhunting business scenarios to evaluate agents' abilities in company mapping, information retrieval, and talent sourcing. For Marketing, we assess agents' ability to match influencers with advertiser needs, evaluating their performance across 50 advertiser requirements using a curated pool of 836 candidate influencers. We present initial evaluation results for leading contemporary agents, establishing a baseline for these professional domains. Our continuously updated evalsets and evaluations are available at https://xbench.org. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Project page: https://xbench.org

arXiv:2506.13585 [pdf, ps, other]

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

arXiv:2506.13444 [pdf, ps, other]

Self-Supervised Enhancement for Depth from a Lightweight ToF Sensor with Monocular Images

Authors: Laiyan Ding, Hualie Jiang, Jiwei Chen, Rui Huang

Abstract: Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed… ▽ More Depth map enhancement using paired high-resolution RGB images offers a cost-effective solution for improving low-resolution depth data from lightweight ToF sensors. Nevertheless, naively adopting a depth estimation pipeline to fuse the two modalities requires groundtruth depth maps for supervision. To address this, we propose a self-supervised learning framework, SelfToF, which generates detailed and scale-aware depth maps. Starting from an image-based self-supervised depth estimation pipeline, we add low-resolution depth as inputs, design a new depth consistency loss, propose a scale-recovery module, and finally obtain a large performance boost. Furthermore, since the ToF signal sparsity varies in real-world applications, we upgrade SelfToF to SelfToF* with submanifold convolution and guided feature fusion. Consequently, SelfToF* maintain robust performance across varying sparsity levels in ToF data. Overall, our proposed method is both efficient and effective, as verified by extensive experiments on the NYU and ScanNet datasets. The code is available at \href{https://github.com/denyingmxd/selftof}{https://github.com/denyingmxd/selftof}. △ Less

Submitted 17 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

Comments: accepted by IROS 2025

arXiv:2506.13428 [pdf, ps, other]

VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

Authors: Jiaming Chen, Yiyu Jiang, Aoshen Huang, Yang Li, Wei Pan

Abstract: Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two obje… ▽ More Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion Network (SFDNet) employs a dual-encoder-decoder Siamese architecture to embed two target objects into a shared latent space, while a diffusion-based conditioning process-conditioned by task instructions-generates two-stream object-centric motion flows that guide dual-arm coordination. We further design a dynamic task assignment strategy that seamlessly maps the predicted 2D motion flows into 3D space and incorporates a pre-trained vision-language model (VLM) to adaptively assign the optimal motion to each robotic arm over time. Experiments validate the effectiveness of the proposed method, demonstrating its ability to generalize to diverse manipulation tasks while maintaining high efficiency and adaptability. The code and demo videos are publicly available on our project website https://sites.google.com/view/vlm-sfd/. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13420 [pdf, ps, other]

Observability-Aware Active Calibration of Multi-Sensor Extrinsics for Ground Robots via Online Trajectory Optimization

Authors: Jiang Wang, Yaozhong Kang, Linya Fu, Kazuhiro Nakadai, He Kong

Abstract: Accurate calibration of sensor extrinsic parameters for ground robotic systems (i.e., relative poses) is crucial for ensuring spatial alignment and achieving high-performance perception. However, existing calibration methods typically require complex and often human-operated processes to collect data. Moreover, most frameworks neglect acoustic sensors, thereby limiting the associated systems' audi… ▽ More Accurate calibration of sensor extrinsic parameters for ground robotic systems (i.e., relative poses) is crucial for ensuring spatial alignment and achieving high-performance perception. However, existing calibration methods typically require complex and often human-operated processes to collect data. Moreover, most frameworks neglect acoustic sensors, thereby limiting the associated systems' auditory perception capabilities. To alleviate these issues, we propose an observability-aware active calibration method for ground robots with multimodal sensors, including a microphone array, a LiDAR (exteroceptive sensors), and wheel encoders (proprioceptive sensors). Unlike traditional approaches, our method enables active trajectory optimization for online data collection and calibration, contributing to the development of more intelligent robotic systems. Specifically, we leverage the Fisher information matrix (FIM) to quantify parameter observability and adopt its minimum eigenvalue as an optimization metric for trajectory generation via B-spline curves. Through planning and replanning of robot trajectory online, the method enhances the observability of multi-sensor extrinsic parameters. The effectiveness and advantages of our method have been demonstrated through numerical simulations and real-world experiments. For the benefit of the community, we have also open-sourced our code and data at https://github.com/AISLAB-sustech/Multisensor-Calibration. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Accepted and to appear in the IEEE Sensors Journal

arXiv:2506.13406 [pdf, ps, other]

CALM: Consensus-Aware Localized Merging for Multi-Task Learning

Authors: Kunda Yan, Min Zhang, Sen Cui, Zikun Qu, Bo Jiang, Feng Liu, Changshui Zhang

Abstract: Model merging aims to integrate the strengths of multiple fine-tuned models into a unified model while preserving task-specific capabilities. Existing methods, represented by task arithmetic, are typically classified into global- and local-aware methods. However, global-aware methods inevitably cause parameter interference, while local-aware methods struggle to maintain the effectiveness of task-s… ▽ More Model merging aims to integrate the strengths of multiple fine-tuned models into a unified model while preserving task-specific capabilities. Existing methods, represented by task arithmetic, are typically classified into global- and local-aware methods. However, global-aware methods inevitably cause parameter interference, while local-aware methods struggle to maintain the effectiveness of task-specific details in the merged model. To address these limitations, we propose a Consensus-Aware Localized Merging (CALM) method which incorporates localized information aligned with global task consensus, ensuring its effectiveness post-merging. CALM consists of three key components: (1) class-balanced entropy minimization sampling, providing a more flexible and reliable way to leverage unsupervised data; (2) an efficient-aware framework, selecting a small set of tasks for sequential merging with high scalability; (3) a consensus-aware mask optimization, aligning localized binary masks with global task consensus and merging them conflict-free. Experiments demonstrate the superiority and robustness of our CALM, significantly outperforming existing methods and achieving performance close to traditional MTL. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Accepted by ICML2025

arXiv:2506.13334 [pdf, ps, other]

Measurement of the $Ω_c^0$ and $Ξ_c^0$ baryon lifetimes using hadronic $b$-baryon decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1141 additional authors not shown)

Abstract: The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses top… ▽ More The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses topologically and kinematically similar $B^- \rightarrow D^0(\rightarrow K^-K^+π^-π^+)~π^-$ decays for normalisation. The measured lifetimes are $τ_{Ω_c^0} = 276.3 \pm 19.4~\rm{(stat)} \pm 1.8~\rm{(syst)} \pm 0.7~(τ_{D^0})~\rm{fs}$, $τ_{Ξ_c^0} = 149.2 \pm ~\,2.5~\rm{(stat)} \pm 0.9~\rm{(syst)} \pm 0.4~(τ_{D^0})~\rm{fs}$, where the first uncertainty is statistical, the second systematic and the third due to the uncertainty of the $D^0$ lifetime. These results are consistent with previous measurements performed by the LHCb experiment. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3875/ (LHCb public pages)

Report number: LHCb-PAPER-2025-013,CERN-EP-2025-117

Showing 151–200 of 26,487 results for author: Jiang