-
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
Authors:
Xiang Xu,
Lingdong Kong,
Song Wang,
Chuanwei Zhou,
Qingshan Liu
Abstract:
LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory A…
▽ More
LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. LiMA comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments. LiMA maintains high pretraining efficiency and incurs no additional computational overhead during downstream tasks. Extensive experiments on mainstream LiDAR-based perception benchmarks demonstrate that LiMA significantly improves both LiDAR semantic segmentation and 3D object detection. We hope this work inspires more effective pretraining paradigms for autonomous driving. The code has be made publicly accessible for future research.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Authors:
Chenchen Zhang,
Yuhang Li,
Can Xu,
Jiaheng Liu,
Ao Liu,
Shihui Hu,
Dengpeng Wu,
Guanhua Huang,
Kejiao Li,
Qi Yi,
Ruibin Xiong,
Haotian Zhu,
Yuanxing Zhang,
Yuhao Jiang,
Yue Zhang,
Zenan Xu,
Bohui Zhai,
Guoxiang He,
Hebin Li,
Jie Zhao,
Le Zhang,
Lingyun Tan,
Pengyu Guo,
Xianshu Pang,
Yang Ruan
, et al. (7 additional authors not shown)
Abstract:
The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap: established benchmarks focus on algorithmic correctness and are blind to the visual fidelity and interactive integrity that define modern user experiences. To bridge this gap, we introduce ArtifactsB…
▽ More
The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap: established benchmarks focus on algorithmic correctness and are blind to the visual fidelity and interactive integrity that define modern user experiences. To bridge this gap, we introduce ArtifactsBench, a new benchmark and paradigm for the automated, multimodal evaluation of visual code generation. Our framework programmatically renders each generated artifact and captures its dynamic behavior through temporal screenshots. This visual evidence, alongside the source code, is then assessed by a Multimodal LLM (MLLM)-as-Judge, which is rigorously guided by a fine-grained, per-task checklist to ensure holistic and reproducible scoring. We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading LLMs. Our automated evaluation achieves a striking 94.4% ranking consistency with WebDev Arena, the gold-standard for human preference in web development, and over 90% pairwise agreement with human experts. This establishes ArtifactsBench as the first framework to reliably automate the assessment of human-perceived quality at scale. Our analysis provides a high-resolution map of the current SOTA, revealing that generalist models often outperform domain-specific ones. We open-source ArtifactsBench, including the benchmark, evaluation harness, and baseline results at https://artifactsbenchmark.github.io/, to provide the community with a scalable and accurate tool to accelerate the development of user-centric generative models.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
On-Demand Multimedia Delivery in 6G: An Optimal-Cost Steiner Tree Approach
Authors:
Zien Wang,
Xiucheng Wang,
Nan Cheng,
Wenchao Xu,
Wei Quan,
Ruijin Sun,
Conghao Zhou
Abstract:
The exponential growth of multimedia data traffic in 6G networks poses unprecedented challenges for immersive communication, where ultra-high-definition, multi-quality streaming must be delivered on demand while minimizing network operational costs. Traditional routing approaches, such as shortest-path algorithms, fail to optimize flow multiplexing across multiple destinations, while conventional…
▽ More
The exponential growth of multimedia data traffic in 6G networks poses unprecedented challenges for immersive communication, where ultra-high-definition, multi-quality streaming must be delivered on demand while minimizing network operational costs. Traditional routing approaches, such as shortest-path algorithms, fail to optimize flow multiplexing across multiple destinations, while conventional Steiner tree methods cannot accommodate heterogeneous quality-of-service (QoS) requirements-a critical need for 6G's personalized services. In this paper, we address a fundamental but unsolved challenge: the minimum flow problem (MFP) with multi-destination, heterogeneous outflow demands, which is pivotal for efficient multimedia distribution such as adaptive-resolution video streaming. To overcome the limitations of existing methods, we propose a two-stage dynamic programming-enhanced On-demand Steiner Tree (OST) algorithm, the first approach that jointly optimizes flow aggregation and QoS-aware path selection for arbitrary outflow requirements. We rigorously prove the optimality of OST using mathematical induction, demonstrating that it guarantees the minimum-cost multicast flow under differentiated service constraints. Extensive experiments in 6G-like multimedia transmission scenarios show that OST reduces total network flow by over 10% compared to state-of-the-art methods while ensuring on-demand QoS fulfillment. The complete code is available at https://github.com/UNIC-Lab/OST.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Exploring a Gamified Personality Assessment Method through Interaction with Multi-Personality LLM Agents
Authors:
Baiqiao Zhang,
Xiangxian Li,
Chao Zhou,
Xinyu Gai,
Zhifeng Liao,
Juan Liu,
Xue Yang,
Niqi Liu,
Xiaojuan Ma,
Yong-jin Liu,
Yulong Bian
Abstract:
The execution of effective and imperceptible personality assessments is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of gamified personality assessment through multi-personality representations (Multi-PR GP…
▽ More
The execution of effective and imperceptible personality assessments is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of gamified personality assessment through multi-personality representations (Multi-PR GPA). The framework leverages Large Language Models to empower virtual agents with diverse personalities. These agents elicit multifaceted human personality representations through engaging in interactive games. Drawing upon the multi-type textual data generated throughout the interaction, it achieves two ways of personality assessments (i.e., Direct Assessment and Que-based Assessment) and provides interpretable insights. Grounded in the classic Big Five theory, we implemented a prototype system and conducted a user study to assess the efficacy of Multi-PR GPA. The results underscore the effectiveness of our approach in personality assessment and demonstrate that it achieves superior performance when considering the multiplicity of personality representation.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Revealing Secondary Particle Signatures in PoCA-Based Muography
Authors:
Rongfeng Zhang,
Zibo Qin,
Cheng-en Liu,
Qite Li,
Yong Ban,
Chen Zhou,
Qiang Li
Abstract:
This work reinterprets so-called 'noise' in cosmic ray imaging, demonstrating that reconstructed Points of Closest Approach (PoCA points) at the detector locations contain valuable physical information that has been traditionally disregarded. Through comprehensive analysis of data from the detection system of four resistive plate chambers (RPCs) and Monte Carlo simulations employing energy deposit…
▽ More
This work reinterprets so-called 'noise' in cosmic ray imaging, demonstrating that reconstructed Points of Closest Approach (PoCA points) at the detector locations contain valuable physical information that has been traditionally disregarded. Through comprehensive analysis of data from the detection system of four resistive plate chambers (RPCs) and Monte Carlo simulations employing energy deposition weighting for coordinate determination, we establish that these points physically originate from secondary particles produced by cosmic ray interactions with materials of both detectors and surrounding structures. The research yields two principal findings: first, the generation of secondary particles significantly affects the measurement accuracy of cosmic ray positions; second, the roof structure significantly impacts the distribution of PoCA points at detector positions, where quantitative analysis demonstrates a strong correlation between roof thickness and the number of reconstructed PoCA points -- a relationship that can be precisely measured through z-coordinate distribution analysis in specific intervals. These discoveries demonstrate that the same detection system can extract information from a new dimension, enabling acquisition of more comprehensive physical results. More importantly, it suggests the necessity to revise standard analysis approaches to fully exploit this additional information channel in cosmic ray tomography.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
CPKD: Clinical Prior Knowledge-Constrained Diffusion Models for Surgical Phase Recognition in Endoscopic Submucosal Dissection
Authors:
Xiangning Zhang,
Jinnan Chen,
Qingwei Zhang,
Yaqi Wang,
Chengfeng Zhou,
Xiaobo Li,
Dahong Qian
Abstract:
Gastrointestinal malignancies constitute a leading cause of cancer-related mortality worldwide, with advanced-stage prognosis remaining particularly dismal. Originating as a groundbreaking technique for early gastric cancer treatment, Endoscopic Submucosal Dissection has evolved into a versatile intervention for diverse gastrointestinal lesions. While computer-assisted systems significantly enhanc…
▽ More
Gastrointestinal malignancies constitute a leading cause of cancer-related mortality worldwide, with advanced-stage prognosis remaining particularly dismal. Originating as a groundbreaking technique for early gastric cancer treatment, Endoscopic Submucosal Dissection has evolved into a versatile intervention for diverse gastrointestinal lesions. While computer-assisted systems significantly enhance procedural precision and safety in ESD, their clinical adoption faces a critical bottleneck: reliable surgical phase recognition within complex endoscopic workflows. Current state-of-the-art approaches predominantly rely on multi-stage refinement architectures that iteratively optimize temporal predictions. In this paper, we present Clinical Prior Knowledge-Constrained Diffusion (CPKD), a novel generative framework that reimagines phase recognition through denoising diffusion principles while preserving the core iterative refinement philosophy. This architecture progressively reconstructs phase sequences starting from random noise and conditioned on visual-temporal features. To better capture three domain-specific characteristics, including positional priors, boundary ambiguity, and relation dependency, we design a conditional masking strategy. Furthermore, we incorporate clinical prior knowledge into the model training to improve its ability to correct phase logical errors. Comprehensive evaluations on ESD820, Cholec80, and external multi-center demonstrate that our proposed CPKD achieves superior or comparable performance to state-of-the-art approaches, validating the effectiveness of diffusion-based generative paradigms for surgical phase recognition.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
From Photospheric Footpoint Motion to Plasmoid Ejection: A Two-Stage Reconnection Process in a Small-scale Chromospheric Jet
Authors:
Zehao Tang,
Yuandeng Shen,
Chengrui Zhou,
Surui Yao,
Dongxu Liu,
Xiaobo Li
Abstract:
Using high spatiotemporal resolution, multi-wavelength observations from the New Vacuum Solar Telescope (NVST) and the Solar Dynamics Observatory (SDO), we present a detailed analysis of a small-scale chromospheric jet driven by plasmoid-mediated magnetic reconnection. Our results reveal that the entire process is governed by the dynamic evolution of photospheric magnetic footpoints, which proceed…
▽ More
Using high spatiotemporal resolution, multi-wavelength observations from the New Vacuum Solar Telescope (NVST) and the Solar Dynamics Observatory (SDO), we present a detailed analysis of a small-scale chromospheric jet driven by plasmoid-mediated magnetic reconnection. Our results reveal that the entire process is governed by the dynamic evolution of photospheric magnetic footpoints, which proceeds in two distinct stages. An initial separating motion of the footpoints corresponds to a mild reconnection phase, characterized by a short current sheet and the eruption of a cool H$α$ jet. Subsequently, a converging motion of the footpoints triggers an intense reconnection phase. During this intense stage, the current sheet rapidly elongates, and the resulting decrease in its aspect ratio initiates a tearing-mode instability, forming a plasmoid. The appearance of this plasmoid mediates the onset of fast magnetic reconnection, which produces a hot EUV jet and is concurrent with significant magnetic flux cancellation. We interpret this cancellation as the submergence of newly formed, post-reconnection loops. Furthermore, we identify a distinct, high-temperature plasma blob in the jet spire, significantly hotter than the surrounding jet plasma. We attribute this feature to a secondary heating process, likely caused by reconnection between the upward-propagating plasmoid and the overlying magnetic cusp structure. These observations provide a comprehensive, observationally driven picture (from the initial photospheric triggers to the multi-stage, plasmoid-mediated reconnection) that forms chromospheric jets, highlighting the critical role of footpoint motions in solar atmospheric dynamics.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Frequency-switching Array Enhanced Physical-Layer Security in Terahertz Bands: A Movable Antenna Perspective
Authors:
Cong Zhou,
Changsheng You,
Shuo Shi,
Weidong Mei
Abstract:
In this paper, we propose a new frequency-switching array (FSA) enhanced physical-layer security (PLS) system in terahertz bands, where the carrier frequency can be flexibly switched and small frequency offsets can be imposed on each antenna at Alice, so as to eliminate information wiretapping by undesired eavesdroppers. First, we analytically show that by flexibly controlling the carrier frequenc…
▽ More
In this paper, we propose a new frequency-switching array (FSA) enhanced physical-layer security (PLS) system in terahertz bands, where the carrier frequency can be flexibly switched and small frequency offsets can be imposed on each antenna at Alice, so as to eliminate information wiretapping by undesired eavesdroppers. First, we analytically show that by flexibly controlling the carrier frequency parameters, FSAs can effectively form uniform/non-uniform sparse arrays, hence resembling movable antennas (MAs) in the control of inter-antenna spacing and providing additional degree-of-freedom (DoF) in the beam control. Although the proposed FSA experiences additional path-gain attenuation in the received signals, it can overcome several hardware and signal processing issues incurred by MAs, such as limited positioning accuracy, considerable response latency, and demanding hardware and energy cost. To shed useful insights, we first consider a secrecy-guaranteed problem with a null-steering constraint for which maximum ratio transmission (MRT) beamformer is considered at Alice and the frequency offsets are set as uniform frequency increment. Interestingly, it is shown that the proposed FSA can flexibly realize null-steering over Eve in both the angular domain (by tuning carrier frequency) and range domain (by controlling per-antenna frequency offset), thereby achieving improved PLS performance. Then, for the general case, we propose an efficient algorithm to solve the formulated non-convex problem by using the block coordinate descent (BCD) and projected gradient ascent (PGA) techniques. Finally, numerical results demonstrate the convergence of the proposed optimization algorithm and its superiority over fixed-position arrays (FPAs) in terms of secrecy-rate performance.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
FreNBRDF: A Frequency-Rectified Neural Material Representation
Authors:
Chenliang Zhou,
Zheyuan Hu,
Cengiz Oztireli
Abstract:
Accurate material modeling is crucial for achieving photorealistic rendering, bridging the gap between computer-generated imagery and real-world photographs. While traditional approaches rely on tabulated BRDF data, recent work has shifted towards implicit neural representations, which offer compact and flexible frameworks for a range of tasks. However, their behavior in the frequency domain remai…
▽ More
Accurate material modeling is crucial for achieving photorealistic rendering, bridging the gap between computer-generated imagery and real-world photographs. While traditional approaches rely on tabulated BRDF data, recent work has shifted towards implicit neural representations, which offer compact and flexible frameworks for a range of tasks. However, their behavior in the frequency domain remains poorly understood. To address this, we introduce FreNBRDF, a frequency-rectified neural material representation. By leveraging spherical harmonics, we integrate frequency-domain considerations into neural BRDF modeling. We propose a novel frequency-rectified loss, derived from a frequency analysis of neural materials, and incorporate it into a generalizable and adaptive reconstruction and editing pipeline. This framework enhances fidelity, adaptability, and efficiency. Extensive experiments demonstrate that \ours improves the accuracy and robustness of material appearance reconstruction and editing compared to state-of-the-art baselines, enabling more structured and interpretable downstream tasks and applications.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Enhancing ferroelectric stability: Wide-range of adaptive control in epitaxial HfO2/ZrO2 superlattices
Authors:
Jingxuan Li,
Shiqing Deng,
Liyang Ma,
Yangyang Si,
Chao Zhou,
Kefan Wang,
Sizhe Huang,
Jiyuan Yang,
Yunlong Tang,
Yu-Chieh Ku,
Chang-Yang Kuo,
Yijie Li,
Sujit Das,
Shi Liu,
Zuhuang Chen
Abstract:
The metastability of the polar phase in HfO2, despite its excellent compatibility with the complementary metal-oxide-semiconductor process, remains a key obstacle for its industrial applications. Traditional stabilization approaches, such as doping, often induce crystal defects and impose constraints on the thickness of ferroelectric HfO2 thin films. These limitations render the ferroelectric prop…
▽ More
The metastability of the polar phase in HfO2, despite its excellent compatibility with the complementary metal-oxide-semiconductor process, remains a key obstacle for its industrial applications. Traditional stabilization approaches, such as doping, often induce crystal defects and impose constraints on the thickness of ferroelectric HfO2 thin films. These limitations render the ferroelectric properties vulnerable to degradation, particularly due to phase transitions under operational conditions. Here, we demonstrate robust ferroelectricity in high-quality epitaxial (HfO2)n/(ZrO2)n superlattices, which exhibit significantly enhanced ferroelectric stability across an extended thickness range. Optimized-period superlattices maintain stable ferroelectricity from up to 100 nm, excellent fatigue resistance exceeding 109 switching cycles, and a low coercive field of ~0.85 MV/cm. First-principles calculations reveal that the kinetic energy barrier of phase transition and interfacial formation energy are crucial factors in suppressing the formation of non-polar phases. This work establishes a versatile platform for exploring high-performance fluorite-structured superlattices and advances the integration of HfO2-based ferroelectrics into a broader range of applications.
△ Less
Submitted 30 June, 2025;
originally announced July 2025.
-
Learning Efficient Robotic Garment Manipulation with Standardization
Authors:
Changshi Zhou,
Feng Luan,
Jiarui Hu,
Shaoqiang Meng,
Zhipeng Wang,
Yanchao Dong,
Yanmin Zhou,
Bin He
Abstract:
Garment manipulation is a significant challenge for robots due to the complex dynamics and potential self-occlusion of garments. Most existing methods of efficient garment unfolding overlook the crucial role of standardization of flattened garments, which could significantly simplify downstream tasks like folding, ironing, and packing. This paper presents APS-Net, a novel approach to garment manip…
▽ More
Garment manipulation is a significant challenge for robots due to the complex dynamics and potential self-occlusion of garments. Most existing methods of efficient garment unfolding overlook the crucial role of standardization of flattened garments, which could significantly simplify downstream tasks like folding, ironing, and packing. This paper presents APS-Net, a novel approach to garment manipulation that combines unfolding and standardization in a unified framework. APS-Net employs a dual-arm, multi-primitive policy with dynamic fling to quickly unfold crumpled garments and pick-and-place (p and p) for precise alignment. The purpose of garment standardization during unfolding involves not only maximizing surface coverage but also aligning the garment's shape and orientation to predefined requirements. To guide effective robot learning, we introduce a novel factorized reward function for standardization, which incorporates garment coverage (Cov), keypoint distance (KD), and intersection-over-union (IoU) metrics. Additionally, we introduce a spatial action mask and an Action Optimized Module to improve unfolding efficiency by selecting actions and operation points effectively. In simulation, APS-Net outperforms state-of-the-art methods for long sleeves, achieving 3.9 percent better coverage, 5.2 percent higher IoU, and a 0.14 decrease in KD (7.09 percent relative reduction). Real-world folding tasks further demonstrate that standardization simplifies the folding process. Project page: see https://hellohaia.github.io/APS/
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Pay Attention to Small Weights
Authors:
Chao Zhou,
Tom Jacobs,
Advait Gadhikar,
Rebekka Burkholz
Abstract:
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights.…
▽ More
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Bridging Video Quality Scoring and Justification via Large Multimodal Models
Authors:
Qizhi Xie,
Kun Yuan,
Yunpeng Qu,
Jiachao Gong,
Mingda Wu,
Ming Sun,
Chao Zhou,
Jihong Zhu
Abstract:
Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core…
▽ More
Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core of the approach lies in the video quality-centric instruction data. Previous explorations mainly focus on the image domain, and their data generation processes heavily rely on human quality annotations and proprietary systems, limiting data scalability and effectiveness. To address these challenges, we propose the Score-based Instruction Generation (SIG) pipeline. Specifically, SIG first scores multiple quality dimensions of an unlabeled video and maps scores to text-defined levels. It then explicitly incorporates a hierarchical Chain-of-Thought (CoT) to model the correlation between specific dimensions and overall quality, mimicking the human visual system's reasoning process. The automated pipeline eliminates the reliance on expert-written quality descriptions and proprietary systems, ensuring data scalability and generation efficiency. To this end, the resulting Score2Instruct (S2I) dataset contains over 320K diverse instruction-response pairs, laying the basis for instruction tuning. Moreover, to advance video LMMs' quality scoring and justification abilities simultaneously, we devise a progressive tuning strategy to fully unleash the power of S2I. Built upon SIG, we further curate a benchmark termed S2I-Bench with 400 open-ended questions to better evaluate the quality justification capacity of video LMMs. Experimental results on the S2I-Bench and existing benchmarks indicate that our method consistently improves quality scoring and justification capabilities across multiple video LMMs.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Dual Synchronization Effects in Light Scattering by Spherical Particle Systems
Authors:
Guanglang Xu,
Bingqiang Sun,
Ping Zhu,
Huizeng Liu,
Ye Zhou,
Chen Zhou
Abstract:
We report the discovery of a novel and fundamental dual synchronization relationship between the scattering efficiency (Q$_{\text{sca}}$) and a specifically formulated angular distribution complexity parameter ($\widetilde{C}_{\text{p}}$) in spherical particle systems. Through extensive numerical simulations using the rigorous Multiple Sphere T-Matrix (MSTM) method, we found that Q$_{\text{sca}}$…
▽ More
We report the discovery of a novel and fundamental dual synchronization relationship between the scattering efficiency (Q$_{\text{sca}}$) and a specifically formulated angular distribution complexity parameter ($\widetilde{C}_{\text{p}}$) in spherical particle systems. Through extensive numerical simulations using the rigorous Multiple Sphere T-Matrix (MSTM) method, we found that Q$_{\text{sca}}$ exhibits a strong positive correlation with (1-$\widetilde{C}_{\text{p}}$) when the real part of the refractive index is varied, while it synchronizes strongly and positively with $\widetilde{C}_{\text{p}}$ when the imaginary part is varied. Our analysis reveals that this duality arises from the distinct ways the real and imaginary parts of the refractive index \textbf{perturb vs.~dampen electromagnetic resonances} within the particles, leading to different coupled responses in the total scattered energy and the angular distribution. This discovery provides unprecedented insights into how phase contrast and absorption processes distinctly modulate scattering properties and the angular distribution of scattered light, particularly in regimes dominated by resonance. It establishes that the specific formulation of $\widetilde{C}_{\text{p}}$ used here is sensitive to the overall balance of multipole contributions, making it a valuable parameter for capturing refractive index-driven changes. }.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Drift-Adaptive Slicing-Based Resource Management for Cooperative ISAC Networks
Authors:
Shisheng Hu,
Jie Gao,
Xue Qin,
Conghao Zhou,
Xinyu Huang,
Mushu Li,
Mingcheng He,
Xuemin Shen
Abstract:
In this paper, we propose a novel drift-adaptive slicing-based resource management scheme for cooperative integrated sensing and communication (ISAC) networks. Particularly, we establish two network slices to provide sensing and communication services, respectively. In the large-timescale planning for the slices, we partition the sensing region of interest (RoI) of each mobile device and reserve n…
▽ More
In this paper, we propose a novel drift-adaptive slicing-based resource management scheme for cooperative integrated sensing and communication (ISAC) networks. Particularly, we establish two network slices to provide sensing and communication services, respectively. In the large-timescale planning for the slices, we partition the sensing region of interest (RoI) of each mobile device and reserve network resources accordingly, facilitating low-complexity distance-based sensing target assignment in small timescales. To cope with the non-stationary spatial distributions of mobile devices and sensing targets, which can result in the drift in modeling the distributions and ineffective planning decisions, we construct digital twins (DTs) of the slices. In each DT, a drift-adaptive statistical model and an emulation function are developed for the spatial distributions in the corresponding slice, which facilitates closed-form decision-making and efficient validation of a planning decision, respectively. Numerical results show that the proposed drift-adaptive slicing-based resource management scheme can increase the service satisfaction ratio by up to 18% and reduce resource consumption by up to 13.1% when compared with benchmark schemes.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Clustering Tails in High Dimension
Authors:
Liujun Chen,
Marco Oesting,
Chen Zhou
Abstract:
One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem…
▽ More
One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices.
We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme value indices in a two-step clustering method.
We show the consistency property of the proposed algorithm and demonstrate its finite-sample performance using a simulation study and a real data application.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Computational Discovery of Metastable NaMnO$_2$ Polymorphs as High-Performance Cathodes with Ultralow Na$^+$ Migration Barriers
Authors:
Fukuan Wang,
Chen Zhou,
Busheng Wang,
Yong Liu
Abstract:
Using an ab initio evolutionary algorithm combined with first-principles calculations, two metastable NaMnO$_2$ polymorphs, $I4_1/amd$ and Cmcm, are identified as promising cathode materials for sodium-ion batteries. Both phases exhibit excellent thermodynamic stability, lying within 35~meV/atom of the ground-state \textit{Pmmn} phase across 0--50~GPa, and are dynamically and thermally stable unde…
▽ More
Using an ab initio evolutionary algorithm combined with first-principles calculations, two metastable NaMnO$_2$ polymorphs, $I4_1/amd$ and Cmcm, are identified as promising cathode materials for sodium-ion batteries. Both phases exhibit excellent thermodynamic stability, lying within 35~meV/atom of the ground-state \textit{Pmmn} phase across 0--50~GPa, and are dynamically and thermally stable under ambient conditions following high-pressure synthesis, as confirmed by phonon and ab initio molecular dynamics simulations. During desodiation, a Jahn--Teller-induced magnetic transition enhances Mn--O hybridization, reduces the bandgap, and promotes robust charge compensation and oxygen retention. Remarkably, the Cmcm phase achieves record-low Na$^+$ migration barriers (0.39~eV at high Na concentration; 0.27~eV at low concentration), representing 47\% and 36\% reductions respectively compared to conventional $C2/m$, while delivering a higher average voltage (3.19~V vs 2.88~V). The $I4_1/amd$ phase exhibits concentration-dependent diffusion with a low-energy pathway (0.38~eV) and maintains competitive voltage (2.94~V). These findings suggest that metastable NaMnO$_2$ polymorphs may offer viable alternatives to conventional cathode materials, particularly where fast ionic conduction is required.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems
Authors:
Haowei Yang,
Yu Tian,
Zhongheng Yang,
Zhao Wang,
Chengrui Zhou,
Dannier Li
Abstract:
With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper systematically investigates two classes of optimization methods-model parallelism and data parallelism-for distributed training of LLMs in recommendation scena…
▽ More
With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper systematically investigates two classes of optimization methods-model parallelism and data parallelism-for distributed training of LLMs in recommendation scenarios. For model parallelism, we implement both tensor parallelism and pipeline parallelism, and introduce an adaptive load-balancing mechanism to reduce cross-device communication overhead. For data parallelism, we compare synchronous and asynchronous modes, combining gradient compression and sparsification techniques with an efficient aggregation communication framework to significantly improve bandwidth utilization. Experiments conducted on a real-world recommendation dataset in a simulated service environment demonstrate that our proposed hybrid parallelism scheme increases training throughput by over 30% and improves resource utilization by approximately 20% compared to traditional single-mode parallelism, while maintaining strong scalability and robustness. Finally, we discuss trade-offs among different parallel strategies in online deployment and outline future directions involving heterogeneous hardware integration and automated scheduling technologies.
△ Less
Submitted 23 June, 2025; v1 submitted 20 June, 2025;
originally announced June 2025.
-
Association between optically identified galaxy clusters and the underlying dark matter halos
Authors:
DES Collaboration,
Shulei Cao,
Hao-Yi Wu,
Matteo Costanzi,
Arya Farahi,
Sebastian Grandis,
David H. Weinberg,
August E. Evrard,
Eduardo Rozo,
Andrés N. Salcedo,
Chun-Hao To,
Lei Yang,
Conghao Zhou
Abstract:
Clusters of galaxies trace massive dark matter halos in the Universe, but they can include multiple halos projected along lines of sight. We study the halos contributing to clusters using the Cardinal simulation, which mimics the Dark Energy Survey data. We use the red-sequence-based cluster finding algorithm redMaPPer as a case study. For each cluster, we identify the halos hosting its member gal…
▽ More
Clusters of galaxies trace massive dark matter halos in the Universe, but they can include multiple halos projected along lines of sight. We study the halos contributing to clusters using the Cardinal simulation, which mimics the Dark Energy Survey data. We use the red-sequence-based cluster finding algorithm redMaPPer as a case study. For each cluster, we identify the halos hosting its member galaxies, and we define the main halo as the one contributing the most to the cluster's richness ($λ$, the estimated number of member galaxies). At $z=0.3$, for clusters with $λ> 60$, the main halo typically contributes to $92\%$ of the richness, and this fraction drops to $67\%$ for $λ\approx 20$. Defining "clean" clusters as those with $\geq50\%$ of the richness contributed by the main halo, we find that $100\%$ of the $λ> 60$ clusters are clean, while $73\%$ of the $λ\approx 20$ clusters are clean. Three halos can usually account for more than $80\%$ of the richness of a cluster. The main halos associated with redMaPPer clusters have a completeness ranging from $98\%$ at virial mass $10^{14.6}~h^{-1}M_{\odot}$ to $64\%$ at $10^{14}~h^{-1}M_{\odot}$. In addition, we compare the inferred cluster centers with true halo centers, finding that $30\%$ of the clusters are miscentered with a mean offset $40\%$ of the cluster radii, in agreement with recent X-ray studies. These systematics worsen as redshift increases, but we expect that upcoming surveys extending to longer wavelengths will improve the cluster finding at high redshifts. Our results affirm the robustness of the redMaPPer algorithm and provide a framework for benchmarking other cluster-finding strategies.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
RiOT: Efficient Prompt Refinement with Residual Optimization Tree
Authors:
Chenyi Zhou,
Zhengyan Shi,
Yuan Yao,
Lei Liang,
Huajun Chen,
Qiang Zhang
Abstract:
Recent advancements in large language models (LLMs) have highlighted their potential across a variety of tasks, but their performance still heavily relies on the design of effective prompts. Existing methods for automatic prompt optimization face two challenges: lack of diversity, limiting the exploration of valuable and innovative directions and semantic drift, where optimizations for one task ca…
▽ More
Recent advancements in large language models (LLMs) have highlighted their potential across a variety of tasks, but their performance still heavily relies on the design of effective prompts. Existing methods for automatic prompt optimization face two challenges: lack of diversity, limiting the exploration of valuable and innovative directions and semantic drift, where optimizations for one task can degrade performance in others. To address these issues, we propose Residual Optimization Tree (RiOT), a novel framework for automatic prompt optimization. RiOT iteratively refines prompts through text gradients, generating multiple semantically diverse candidates at each step, and selects the best prompt using perplexity. Additionally, RiOT incorporates the text residual connection to mitigate semantic drift by selectively retaining beneficial content across optimization iterations. A tree structure efficiently manages the optimization process, ensuring scalability and flexibility. Extensive experiments across five benchmarks, covering commonsense, mathematical, logical, temporal, and semantic reasoning, demonstrate that RiOT outperforms both previous prompt optimization methods and manual prompting.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)
Authors:
Rúben Adão,
Zhongjie Wu,
Changjun Zhou,
Oana Balmau,
João Paulo,
Ricardo Macedo
Abstract:
We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of com…
▽ More
We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques - concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write- and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
RelTopo: Enhancing Relational Modeling for Driving Scene Topology Reasoning
Authors:
Yueru Luo,
Changqing Zhou,
Yiming Yang,
Erlong Li,
Chao Zheng,
Shuqi Mei,
Shuguang Cui,
Zhen Li
Abstract:
Accurate road topology reasoning is critical for autonomous driving, enabling effective navigation and adherence to traffic regulations. Central to this task are lane perception and topology reasoning. However, existing methods typically focus on either lane detection or Lane-to-Lane (L2L) topology reasoning, often \textit{neglecting} Lane-to-Traffic-element (L2T) relationships or \textit{failing}…
▽ More
Accurate road topology reasoning is critical for autonomous driving, enabling effective navigation and adherence to traffic regulations. Central to this task are lane perception and topology reasoning. However, existing methods typically focus on either lane detection or Lane-to-Lane (L2L) topology reasoning, often \textit{neglecting} Lane-to-Traffic-element (L2T) relationships or \textit{failing} to optimize these tasks jointly. Furthermore, most approaches either overlook relational modeling or apply it in a limited scope, despite the inherent spatial relationships among road elements. We argue that relational modeling is beneficial for both perception and reasoning, as humans naturally leverage contextual relationships for road element recognition and their connectivity inference. To this end, we introduce relational modeling into both perception and reasoning, \textit{jointly} enhancing structural understanding. Specifically, we propose: 1) a relation-aware lane detector, where our geometry-biased self-attention and \curve\ cross-attention refine lane representations by capturing relational dependencies; 2) relation-enhanced topology heads, including a geometry-enhanced L2L head and a cross-view L2T head, boosting reasoning with relational cues; and 3) a contrastive learning strategy with InfoNCE loss to regularize relationship embeddings. Extensive experiments on OpenLane-V2 demonstrate that our approach significantly improves both detection and topology reasoning metrics, achieving +3.1 in DET$_l$, +5.3 in TOP$_{ll}$, +4.9 in TOP$_{lt}$, and an overall +4.4 in OLS, setting a new state-of-the-art. Code will be released.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Feeling Machines: Ethics, Culture, and the Rise of Emotional AI
Authors:
Vivek Chavan,
Arsen Cenaj,
Shuyuan Shen,
Ariane Bar,
Srishti Binwani,
Tommaso Del Becaro,
Marius Funk,
Lynn Greschner,
Roberto Hung,
Stina Klein,
Romina Kleiner,
Stefanie Krause,
Sylwia Olbrych,
Vishvapalsinhji Parmar,
Jaleh Sarafraz,
Daria Soroko,
Daksitha Withanage Don,
Chang Zhou,
Hoang Thuy Duong Vu,
Parastoo Semnani,
Daniel Weinhardt,
Elisabeth Andre,
Jörg Krüger,
Xavier Fresquet
Abstract:
This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life…
▽ More
This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life. The analysis is structured around four central themes: the ethical implications of emotional AI, the cultural dynamics of human-machine interaction, the risks and opportunities for vulnerable populations, and the emerging regulatory, design, and technical considerations. The authors highlight the potential of affective AI to support mental well-being, enhance learning, and reduce loneliness, as well as the risks of emotional manipulation, over-reliance, misrepresentation, and cultural bias. Key challenges include simulating empathy without genuine understanding, encoding dominant sociocultural norms into AI systems, and insufficient safeguards for individuals in sensitive or high-risk contexts. Special attention is given to children, elderly users, and individuals with mental health challenges, who may interact with AI in emotionally significant ways. However, there remains a lack of cognitive or legal protections which are necessary to navigate such engagements safely. The report concludes with ten recommendations, including the need for transparency, certification frameworks, region-specific fine-tuning, human oversight, and longitudinal research. A curated supplementary section provides practical tools, models, and datasets to support further work in this domain.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Holistic approach and Advanced Color Singlet Identification for physics measurements at high energy frontier
Authors:
Yongfeng Zhu,
Hao Liang,
Yuexin Wang,
Yuzhi Che,
Hengyu Wang,
Chen Zhou,
Huilin Qu,
Manqi Ruan
Abstract:
To enhance the discovery power of high-energy colliders, we propose a holistic approach and Advanced Color Singlet Identification (ACSI), both of which utilize inclusive reconstructed information as input. The holistic approach is designed to simultaneously classify physics events, while ACSI focuses on associating final-state particles with their parent massive bosons. Implemented using state-of-…
▽ More
To enhance the discovery power of high-energy colliders, we propose a holistic approach and Advanced Color Singlet Identification (ACSI), both of which utilize inclusive reconstructed information as input. The holistic approach is designed to simultaneously classify physics events, while ACSI focuses on associating final-state particles with their parent massive bosons. Implemented using state-of-the-art artificial intelligence architectures and applied to benchmark analyses with simulated data from a future Higgs factory, these new concepts significantly improve the accuracy of H->bb/cc/ss/gg measurements by up to a factor of two to six.
△ Less
Submitted 24 June, 2025; v1 submitted 13 June, 2025;
originally announced June 2025.
-
Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems
Authors:
Junli Shao,
Jing Dong,
Dingzhou Wang,
Kowei Shih,
Dannier Li,
Chengrui Zhou
Abstract:
With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation systems is how to reduce inference latency and increase system throughput without sacrificing recommendation quality. This paper addresses the high computational co…
▽ More
With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation systems is how to reduce inference latency and increase system throughput without sacrificing recommendation quality. This paper addresses the high computational cost and resource bottlenecks of deep learning models in real-time settings by proposing a combined set of modeling- and system-level acceleration and optimization strategies. At the model level, we dramatically reduce parameter counts and compute requirements through lightweight network design, structured pruning, and weight quantization. At the system level, we integrate multiple heterogeneous compute platforms and high-performance inference libraries, and we design elastic inference scheduling and load-balancing mechanisms based on real-time load characteristics. Experiments show that, while maintaining the original recommendation accuracy, our methods cut latency to less than 30% of the baseline and more than double system throughput, offering a practical solution for deploying large-scale online recommendation services.
△ Less
Submitted 17 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs
Authors:
Yilin Xiao,
Chuang Zhou,
Qinggang Zhang,
Bo Li,
Qing Li,
Xiao Huang
Abstract:
Large language models (LLMs) often struggle with knowledge-intensive tasks due to a lack of background knowledge and a tendency to hallucinate. To address these limitations, integrating knowledge graphs (KGs) with LLMs has been intensively studied. Existing KG-enhanced LLMs focus on supplementary factual knowledge, but still struggle with solving complex questions. We argue that refining the relat…
▽ More
Large language models (LLMs) often struggle with knowledge-intensive tasks due to a lack of background knowledge and a tendency to hallucinate. To address these limitations, integrating knowledge graphs (KGs) with LLMs has been intensively studied. Existing KG-enhanced LLMs focus on supplementary factual knowledge, but still struggle with solving complex questions. We argue that refining the relationships among facts and organizing them into a logically consistent reasoning path is equally important as factual knowledge itself. Despite their potential, extracting reliable reasoning paths from KGs poses the following challenges: the complexity of graph structures and the existence of multiple generated paths, making it difficult to distinguish between useful and redundant ones. To tackle these challenges, we propose the RRP framework to mine the knowledge graph, which combines the semantic strengths of LLMs with structural information obtained through relation embedding and bidirectional distribution learning. Additionally, we introduce a rethinking module that evaluates and refines reasoning paths according to their significance. Experimental results on two public datasets show that RRP achieves state-of-the-art performance compared to existing baseline methods. Moreover, RRP can be easily integrated into various LLMs to enhance their reasoning abilities in a plug-and-play manner. By generating high-quality reasoning paths tailored to specific questions, RRP distills effective guidance for LLM reasoning.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Authors:
Inclusion AI,
Biao Gong,
Cheng Zou,
Chuanyang Zheng,
Chunluan Zhou,
Canxiang Yan,
Chunxiang Jin,
Chunjie Shen,
Dandan Zheng,
Fudong Wang,
Furong Xu,
GuangMing Yao,
Jun Zhou,
Jingdong Chen,
Jianxin Sun,
Jiajia Liu,
Jianjiang Zhu,
Jun Peng,
Kaixiang Ji,
Kaiyou Song,
Kaimeng Ren,
Libin Wang,
Lixiang Ru,
Lele Xie,
Longhua Tan
, et al. (33 additional authors not shown)
Abstract:
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single…
▽ More
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single model to efficiently process and fuse multimodal inputs within a unified framework, thereby facilitating diverse tasks without requiring separate models, task-specific fine-tuning, or structural redesign. Importantly, Ming-Omni extends beyond conventional multimodal models by supporting audio and image generation. This is achieved through the integration of an advanced audio decoder for natural-sounding speech and Ming-Lite-Uni for high-quality image generation, which also allow the model to engage in context-aware chatting, perform text-to-speech conversion, and conduct versatile image editing. Our experimental results showcase Ming-Omni offers a powerful solution for unified perception and generation across all modalities. Notably, our proposed Ming-Omni is the first open-source model we are aware of to match GPT-4o in modality support, and we release all code and model weights to encourage further research and development in the community.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Intra-Trajectory Consistency for Reward Modeling
Authors:
Chaoyang Zhou,
Shunyu Liu,
Zengmao Wang,
Di Wang,
Rong-Cheng Tu,
Bo Du,
Dacheng Tao
Abstract:
Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) or inference-time verification. Current reward modeling typically relies on scores of overall responses to learn the outcome rewards for the responses. However, since the response-level scores are coarse-grained supervision signals, the reward model struggles to…
▽ More
Reward models are critical for improving large language models (LLMs), particularly in reinforcement learning from human feedback (RLHF) or inference-time verification. Current reward modeling typically relies on scores of overall responses to learn the outcome rewards for the responses. However, since the response-level scores are coarse-grained supervision signals, the reward model struggles to identify the specific components within a response trajectory that truly correlate with the scores, leading to poor generalization on unseen responses. In this paper, we propose to leverage generation probabilities to establish reward consistency between processes in the response trajectory, which allows the response-level supervisory signal to propagate across processes, thereby providing additional fine-grained signals for reward learning. Building on analysis under the Bayesian framework, we develop an intra-trajectory consistency regularization to enforce that adjacent processes with higher next-token generation probability maintain more consistent rewards. We apply the proposed regularization to the advanced outcome reward model, improving its performance on RewardBench. Besides, we show that the reward model trained with the proposed regularization induces better DPO-aligned policies and achieves better best-of-N (BON) inference-time verification results. Our code is provided in https://github.com/chaoyang101/ICRM.
△ Less
Submitted 16 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Physics-Informed Neural Networks for Irregular Domain Mapping and Partial Differential Equations solving
Authors:
Cuizhi Zhou,
Kaien Zhu
Abstract:
The solution of partial differential equations (PDES) on irregular domains has long been a subject of significant research interest. In this work, we present an approach utilizing physics-informed neural networks (PINNs) to achieve irregular-to-regular domain mapping. Thus we can use finite difference method and physics-informed convolutional neural networks to solve PDEs on rectangular grids inst…
▽ More
The solution of partial differential equations (PDES) on irregular domains has long been a subject of significant research interest. In this work, we present an approach utilizing physics-informed neural networks (PINNs) to achieve irregular-to-regular domain mapping. Thus we can use finite difference method and physics-informed convolutional neural networks to solve PDEs on rectangular grids instead of the original irregular boundary.
Structured grids on irregular domains are obtained by inverse mapping. We demonstrate PINN's versatile capability to produce customized structured grids tailored to diverse computational requirements, thereby significantly facilitating PDEs solving.
△ Less
Submitted 10 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Measurement of the $η$ transition form factor through $η' \rightarrow π^+π^-η$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and…
▽ More
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and $Λ^{-2}=1.645\pm0.343_{\rm stat.}\pm0.017_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ for the di-muon channel. The branching fractions for $η\rightarrowγe^+e^-$ and $η\rightarrowγμ^+μ^-$ are measured to be $\mathcal{B}(η\toγe^+e^-)=(6.79\pm0.04_{\rm stat.}\pm0.36_{\rm sys.})\times 10^{-3}$ and $\mathcal{B}(η\toγμ^+μ^-)=(2.97\pm0.11_{\rm stat.}\pm0.07_{\rm sys.})\times 10^{-4}$. By combining with the results based on the $J/ψ\toγη,~η\toγe^+e^-$ events from the previous BESIII measurement, we determine $Λ^{-2}=1.707\pm0.076_{\rm stat.}\pm0.029_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ and $\mathcal{B}(η\toγe^+e^-)=(6.93\pm0.28_{\rm tot.})\times 10^{-3}$. In addition, we search for the dark photon ($A'$) using the combined events. No significant signal is observed, and the upper limits on $\mathcal{B}(η\toγA',~A'\to e^+e^-)$ are set at 90\% confidence level for different $A'$ mass hypotheses.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks
Authors:
Xiaojun zhou,
Chunna Zhao,
Yaqun Huang,
Chengli Zhou,
Junjie Ye,
Kemeng Xiang
Abstract:
Fractional-order differentiation has many characteristics different from integer-order differentiation. These characteristics can be applied to the optimization algorithms of artificial neural networks to obtain better results. However, due to insufficient theoretical research, at present, there is no fractional-order matrix differentiation method that is perfectly compatible with automatic differ…
▽ More
Fractional-order differentiation has many characteristics different from integer-order differentiation. These characteristics can be applied to the optimization algorithms of artificial neural networks to obtain better results. However, due to insufficient theoretical research, at present, there is no fractional-order matrix differentiation method that is perfectly compatible with automatic differentiation (Autograd) technology. Therefore, we propose a fractional-order matrix differentiation calculation method. This method is introduced by the definition of the integer-order Jacobian matrix. We denote it as fractional-order Jacobian matrix differentiation (${\bf{J}^α}$). Through ${\bf{J}^α}$, we can carry out the matrix-based fractional-order chain rule. Based on the Linear module and the fractional-order differentiation, we design the fractional-order Autograd technology to enable the use of fractional-order differentiation in hidden layers, thereby enhancing the practicality of fractional-order differentiation in deep learning. In the experiment, according to the PyTorch framework, we design fractional-order Linear (FLinear) and replace nn.Linear in the multilayer perceptron with FLinear. Through the qualitative analysis of the training set and validation set $Loss$, the quantitative analysis of the test set indicators, and the analysis of time consumption and GPU memory usage during model training, we verify the superior performance of ${\bf{J}^α}$ and prove that it is an excellent fractional-order gradient descent method in the field of deep learning.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Research on E-Commerce Long-Tail Product Recommendation Mechanism Based on Large-Scale Language Models
Authors:
Qingyi Lu,
Haotian Lyu,
Jiayun Zheng,
Yang Wang,
Li Zhang,
Chengrui Zhou
Abstract:
As e-commerce platforms expand their product catalogs, accurately recommending long-tail items becomes increasingly important for enhancing both user experience and platform revenue. A key challenge is the long-tail problem, where extreme data sparsity and cold-start issues limit the performance of traditional recommendation methods. To address this, we propose a novel long-tail product recommenda…
▽ More
As e-commerce platforms expand their product catalogs, accurately recommending long-tail items becomes increasingly important for enhancing both user experience and platform revenue. A key challenge is the long-tail problem, where extreme data sparsity and cold-start issues limit the performance of traditional recommendation methods. To address this, we propose a novel long-tail product recommendation mechanism that integrates product text descriptions and user behavior sequences using a large-scale language model (LLM). First, we introduce a semantic visor, which leverages a pre-trained LLM to convert multimodal textual content such as product titles, descriptions, and user reviews into meaningful embeddings. These embeddings help represent item-level semantics effectively. We then employ an attention-based user intent encoder that captures users' latent interests, especially toward long-tail items, by modeling collaborative behavior patterns. These components feed into a hybrid ranking model that fuses semantic similarity scores, collaborative filtering outputs, and LLM-generated recommendation candidates. Extensive experiments on a real-world e-commerce dataset show that our method outperforms baseline models in recall (+12%), hit rate (+9%), and user coverage (+15%). These improvements lead to better exposure and purchase rates for long-tail products. Our work highlights the potential of LLMs in interpreting product content and user intent, offering a promising direction for future e-commerce recommendation systems.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Research on Personalized Financial Product Recommendation by Integrating Large Language Models and Graph Neural Networks
Authors:
Yushang Zhao,
Yike Peng,
Dannier Li,
Yuxin Yang,
Chengrui Zhou,
Jing Dong
Abstract:
With the rapid growth of fintech, personalized financial product recommendations have become increasingly important. Traditional methods like collaborative filtering or content-based models often fail to capture users' latent preferences and complex relationships. We propose a hybrid framework integrating large language models (LLMs) and graph neural networks (GNNs). A pre-trained LLM encodes text…
▽ More
With the rapid growth of fintech, personalized financial product recommendations have become increasingly important. Traditional methods like collaborative filtering or content-based models often fail to capture users' latent preferences and complex relationships. We propose a hybrid framework integrating large language models (LLMs) and graph neural networks (GNNs). A pre-trained LLM encodes text data (e.g., user reviews) into rich feature vectors, while a heterogeneous user-product graph models interactions and social ties. Through a tailored message-passing mechanism, text and graph information are fused within the GNN to jointly optimize embeddings. Experiments on public and real-world financial datasets show our model outperforms standalone LLM or GNN in accuracy, recall, and NDCG, with strong interpretability. This work offers new insights for personalized financial recommendations and cross-modal fusion in broader recommendation tasks.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Measurement of the branching fractions of the Cabibbo-favored decays $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ and $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ and search for $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (660 additional authors not shown)
Abstract:
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII…
▽ More
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII Collaboration, the branching fraction of the decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is calculated to be $(3.07\pm0.26\pm0.13)\times10^{-3}$. The decay $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ is observed for the first time with a statistical significance of $6.6σ$, and its branching fraction is determined to be $(3.70\pm0.60\pm0.21)\times10^{-3}$. In addition, a search for the decay $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$ is performed and its branching fraction is determined to be $(0.80^{+0.28}_{-0.24}\pm0.16)\times10^{-3}$, corresponding to an upper limit of $1.28\times10^{-3}$ at $90\%$ confidence level. These measurements provide new information that can be used to distinguish between theoretical models.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Improved Measurements of $D^+ \to ηe^+ν_e$ and $D^+ \to ημ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (682 additional authors not shown)
Abstract:
Using 20.3 fb$^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we measure the branching fractions of $D^+\to ηe^+ν_e$ and $D^+\to ημ^+ν_μ$ to be $(9.75\pm0.29\pm0.28)\times10^{-4}$ and $(9.08\pm0.35\pm0.23)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. From a simultaneous fit to t…
▽ More
Using 20.3 fb$^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we measure the branching fractions of $D^+\to ηe^+ν_e$ and $D^+\to ημ^+ν_μ$ to be $(9.75\pm0.29\pm0.28)\times10^{-4}$ and $(9.08\pm0.35\pm0.23)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. From a simultaneous fit to their partial decay rates, we determine the product of the hadronic form factor $f^η_+(0)$ and the modulus of the $c\to d$ Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ to be $f^η_+(0)|V_{cd}|=0.078\pm0.002\pm0.001$. Taking the $|V_{cd}|$ value from the Standard Model global fit as input, we obtain $f^η_+(0)=0.345\pm0.008\pm0.003$. The ratio between the measured branching fractions of $D^+\toη^+μ^+ν_μ$ and $D^+\toηe^+ν_e$, is determined to be $0.93\pm0.05_{\rm stat.}\pm0.02_{\rm syst.}$, indicating no violation of lepton flavor universality.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation
Authors:
Yilin Xiao,
Junnan Dong,
Chuang Zhou,
Su Dong,
Qian-wen Zhang,
Di Yin,
Xing Sun,
Xiao Huang
Abstract:
Graph Retrieval Augmented Generation (GraphRAG) has garnered increasing recognition for its potential to enhance large language models (LLMs) by structurally organizing domain-specific corpora and facilitating complex reasoning. However, current evaluations of GraphRAG models predominantly rely on traditional question-answering datasets. Their limited scope in questions and evaluation metrics fail…
▽ More
Graph Retrieval Augmented Generation (GraphRAG) has garnered increasing recognition for its potential to enhance large language models (LLMs) by structurally organizing domain-specific corpora and facilitating complex reasoning. However, current evaluations of GraphRAG models predominantly rely on traditional question-answering datasets. Their limited scope in questions and evaluation metrics fails to comprehensively assess the reasoning capacity improvements enabled by GraphRAG models. To address this gap, we introduce GraphRAG-Bench, a large-scale, domain-specific benchmark designed to rigorously evaluate GraphRAG models. Our benchmark offers three key superiorities: \((i)\) Challenging question design. Featuring college-level, domain-specific questions that demand multi-hop reasoning, the benchmark ensures that simple content retrieval is insufficient for problem-solving. For example, some questions require mathematical reasoning or programming. \((ii)\) Diverse task coverage. The dataset includes a broad spectrum of reasoning tasks, multiple-choice, true/false, multi-select, open-ended, and fill-in-the-blank. It spans 16 disciplines in twenty core textbooks. \((iii)\) Holistic evaluation framework. GraphRAG-Bench provides comprehensive assessment across the entire GraphRAG pipeline, including graph construction, knowledge retrieval, and answer generation. Beyond final-answer correctness, it evaluates the logical coherence of the reasoning process. By applying nine contemporary GraphRAG methods to GraphRAG-Bench, we demonstrate its utility in quantifying how graph-based structuring improves model reasoning capabilities. Our analysis reveals critical insights about graph architectures, retrieval efficacy, and reasoning capabilities, offering actionable guidance for the research community.
△ Less
Submitted 19 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots
Authors:
Lingfan Bao,
Yan Pan,
Tianhu Peng,
Dimitrios Kanoulas,
Chengxu Zhou
Abstract:
Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive generative models. We propose a hierarchical framework that combines intention-aware reasoning via in-context learning (ICL) with real-time motion generation using dif…
▽ More
Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive generative models. We propose a hierarchical framework that combines intention-aware reasoning via in-context learning (ICL) with real-time motion generation using diffusion models. Our system introduces structured prompting with confidence scoring, fallback behaviors, and social context awareness to enable intention refinement and adaptive response. Leveraging large-scale motion datasets and efficient latent-space denoising, the framework generates diverse, physically plausible gestures suitable for dynamic humanoid interactions. Experimental validation on a physical platform demonstrates the robustness and social alignment of our method in realistic scenarios.
△ Less
Submitted 27 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
GenDMR: A dynamic multimodal role-swapping network for identifying risk gene phenotypes
Authors:
Lina Qin,
Cheng Zhu,
Chuqi Zhou,
Yukun Huang,
Jiayi Zhu,
Ping Liang,
Jinju Wang,
Yixing Huang,
Cheng Luo,
Dezhong Yao,
Ying Tan
Abstract:
Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic infor…
▽ More
Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic information. Secondly, due to the significantly superior classification value of AD imaging features compared to genetic features, many studies in multimodal fusion emphasize the strengths of imaging features, actively mitigating the influence of weaker features, thereby diminishing the learning of the unique value of genetic features. To address this issue, this study proposes the dynamic multimodal role-swapping network (GenDMR). In GenDMR, we develop a novel approach to encode the spatial organization of single nucleotide polymorphisms (SNPs), enhancing the representation of their genomic context. Additionally, to adaptively quantify the disease risk of SNPs and brain region, we propose a multi-instance attention module to enhance model interpretability. Furthermore, we introduce a dominant modality selection module and a contrastive self-distillation module, combining them to achieve a dynamic teacher-student role exchange mechanism based on dominant and auxiliary modalities for bidirectional co-updating of different modal data. Finally, GenDMR achieves state-of-the-art performance on the ADNI public dataset and visualizes attention to different SNPs, focusing on confirming 12 potential high-risk genes related to AD, including the most classic APOE and recently highlighted significant risk genes. This demonstrates GenDMR's interpretable analytical capability in exploring AD genetic features, providing new insights and perspectives for the development of multimodal data fusion techniques.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
Authors:
Songtao Jiang,
Chenyi Zhou,
Yan Zhang,
Yeying Jin,
Zuozhu Liu
Abstract:
Multimodal large language models (MLLMs) still struggle with complex reasoning tasks in Visual Question Answering (VQA). While current methods have advanced by incorporating visual prompts, our study uncovers critical limitations: these approaches indiscriminately annotate all detected objects for every visual question, generating excessive visual markers that degrade task performance. This issue…
▽ More
Multimodal large language models (MLLMs) still struggle with complex reasoning tasks in Visual Question Answering (VQA). While current methods have advanced by incorporating visual prompts, our study uncovers critical limitations: these approaches indiscriminately annotate all detected objects for every visual question, generating excessive visual markers that degrade task performance. This issue stems primarily from a lack of focus on key visual elements, raising two important questions: Are all objects equally important, and do all questions require visual prompts? Motivated by Dual Process Theory, which distinguishes between instinctive and deliberate cognitive modes in human reasoning, we propose FOCUS, a plug-and-play approach that dynamically adapts to the complexity of questions, combining fast intuitive judgments with deliberate analytical reasoning to enhance the vision-language reasoning capability of the MLLM. For straightforward questions, FOCUS supports efficient zero-shot reasoning. For more complex tasks, it employs the conceptualizing before observation strategy to highlight critical elements. Extensive experiments on four benchmarks, ScienceQA, TextQA, VizWiz, and MME, demonstrate that FOCUS consistently improves the performance of both open-source and black-box MLLMs, achieving significant gains across all datasets. Ablation studies further validate the importance of combining diverse cognitive strategies with refined visual information for superior performance. Code will be released.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals
Authors:
Pedram Ghamisi,
Weikang Yu,
Xiaokang Zhang,
Aldino Rizaldy,
Jian Wang,
Chufeng Zhou,
Richard Gloaguen,
Gustau Camps-Valls
Abstract:
Foundation Models (FMs) are large-scale, pre-trained AI systems that have revolutionized natural language processing and computer vision, and are now advancing geospatial analysis and Earth Observation (EO). They promise improved generalization across tasks, scalability, and efficient adaptation with minimal labeled data. However, despite the rapid proliferation of geospatial FMs, their real-world…
▽ More
Foundation Models (FMs) are large-scale, pre-trained AI systems that have revolutionized natural language processing and computer vision, and are now advancing geospatial analysis and Earth Observation (EO). They promise improved generalization across tasks, scalability, and efficient adaptation with minimal labeled data. However, despite the rapid proliferation of geospatial FMs, their real-world utility and alignment with global sustainability goals remain underexplored. We introduce SustainFM, a comprehensive benchmarking framework grounded in the 17 Sustainable Development Goals with extremely diverse tasks ranging from asset wealth prediction to environmental hazard detection. This study provides a rigorous, interdisciplinary assessment of geospatial FMs and offers critical insights into their role in attaining sustainability goals. Our findings show: (1) While not universally superior, FMs often outperform traditional approaches across diverse tasks and datasets. (2) Evaluating FMs should go beyond accuracy to include transferability, generalization, and energy efficiency as key criteria for their responsible use. (3) FMs enable scalable, SDG-grounded solutions, offering broad utility for tackling complex sustainability challenges. Critically, we advocate for a paradigm shift from model-centric development to impact-driven deployment, and emphasize metrics such as energy efficiency, robustness to domain shifts, and ethical considerations.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Quasi-Homogeneous Integrable Systems: Free Parameters, Kovalevskaya Exponents, and the Painlevé Property
Authors:
Changyu Zhou,
Hayato Chiba
Abstract:
This paper investigates quasi-homogeneous integrable systems by analyzing their Laurent series solutions near movable singularities, motivated by patterns observed in Kovalevskaya exponents of four-dimensional Painlevé-type equations. We introduce a parameter space encoding the free coefficients in these expansions and study its deformation under a commuting quasi-homogeneous vector field.
Withi…
▽ More
This paper investigates quasi-homogeneous integrable systems by analyzing their Laurent series solutions near movable singularities, motivated by patterns observed in Kovalevskaya exponents of four-dimensional Painlevé-type equations. We introduce a parameter space encoding the free coefficients in these expansions and study its deformation under a commuting quasi-homogeneous vector field.
Within this framework, we derive lower indicial loci from the principal one and establish an arithmetic resonance condition on Kovalevskaya exponents that governs the emergence of fractional powers and the breakdown of the Painlevé property. Moreover, we construct a Frobenius manifold structure on the parameter space via the initial value map, which becomes conformal when all weights coincide.
In the Hamiltonian context, we demonstrate that the induced flow on the parameter space preserves a symplectic form and yields a natural pairing of Kovalevskaya exponents. These findings unify analytic and geometric aspects of quasi-homogeneous integrable systems and offer new insights into their deformation theory and singularity structures. Our results provide a comprehensive framework applicable to the classification and analysis of Painlevé-type equations and related integrable models.
△ Less
Submitted 14 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
Authors:
Chiwei Zhu,
Benfeng Xu,
An Yang,
Junyang Lin,
Quan Wang,
Chang Zhou,
Zhendong Mao
Abstract:
Training language models with rationales augmentation has been shown to be beneficial in many existing works. In this paper, we identify that such a prevailing view does not hold consistently. We conduct comprehensive investigations to thoroughly inspect the impact of rationales on model performance as well as a novel perspective of model reliability. The results lead to several key findings that…
▽ More
Training language models with rationales augmentation has been shown to be beneficial in many existing works. In this paper, we identify that such a prevailing view does not hold consistently. We conduct comprehensive investigations to thoroughly inspect the impact of rationales on model performance as well as a novel perspective of model reliability. The results lead to several key findings that add new insights upon existing understandings: 1) Rationales can, at times, deteriorate model performance; 2) Rationales can, at times, improve model reliability, even outperforming their untrained counterparts; 3) A linear correspondence exists in between the performance and reliability improvements, while both are driven by the intrinsic difficulty of the task. These findings provide informative regulations on the broad utilization of rationales and raise critical implications on the procedure of explicitly aligning language models with implicit human thoughts. Codes can be found at https://github.com/Ignoramus0817/rationales.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Weak but influential: Nonlinear contributions of structural connectivity to human cognitive abilities and brain functions
Authors:
Rong Wang,
Zhao Chang,
Xuechun Liu,
Daniel Kristanto,
Étienne Gérard Guy Gartner,
Xinyang Liu,
Mianxin Liu,
Ying Wu,
Ming Lui,
Changsong Zhou
Abstract:
Diverse human cognitive abilities are rooted in brain structural connectivity which has weights spanning several orders of magnitude. However, due to false-positive challenges in tractography, weak connectivity has been often treated as noise and ignored - despite its prevalence across mammalian brains. Here we show that weak connectivity significantly predicts human cognitive abilities and suppor…
▽ More
Diverse human cognitive abilities are rooted in brain structural connectivity which has weights spanning several orders of magnitude. However, due to false-positive challenges in tractography, weak connectivity has been often treated as noise and ignored - despite its prevalence across mammalian brains. Here we show that weak connectivity significantly predicts human cognitive abilities and supports brain functions through amplification of its small weight in a nonlinear manner. Using the Human Connectome Project dataset (n=999) and multiple tractography algorithms, we constructed the whole-brain structural connectivity with heterogeneous weights of streamline numbers. We found that weak connectivity involves high individual variability and significantly predicts general cognitive ability and memory in individuals, and it is also critical for whole-brain dynamic simulation and structure-function coupling. Importantly, fusing two post-tractography filtering methods of streamlines potentially results in more reliable connectivity that preserves weak links and outperforms conventional thresholding in predicting cognitive abilities and functional connectivity. At the network level, weak connectivity expands the operational capacity of brain networks to enhance both global integration and fine-grained segregation, thereby supporting a functional balance essential for cognitive abilities. Finally, we identified a specific type of weak connectivity mainly linking visual/motor to limbic areas with negative gene co-expression, which has a disproportionately large impact on cognitive predictions and network dynamics.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Vision Transformers with Self-Distilled Registers
Authors:
Yinjie Chen,
Zipeng Yan,
Chong Zhou,
Bo Dai,
Andrew F. Luo
Abstract:
Vision Transformers (ViTs) have emerged as the dominant architecture for visual processing tasks, demonstrating excellent scalability with increased training data and model size. However, recent work has identified the emergence of artifact tokens in ViTs that are incongruous with the local semantics. These anomalous tokens degrade ViT performance in tasks that require fine-grained localization or…
▽ More
Vision Transformers (ViTs) have emerged as the dominant architecture for visual processing tasks, demonstrating excellent scalability with increased training data and model size. However, recent work has identified the emergence of artifact tokens in ViTs that are incongruous with the local semantics. These anomalous tokens degrade ViT performance in tasks that require fine-grained localization or structural coherence. An effective mitigation of this issue is to the addition of register tokens to ViTs, which implicitly "absorb" the artifact term during training. Given the availability of various large-scale pre-trained ViTs, in this paper we aim at equipping them with such register tokens without the need of re-training them from scratch, which is infeasible considering their size. Specifically, we propose Post Hoc Registers (PH-Reg), an efficient self-distillation method that integrates registers into an existing ViT without requiring additional labeled data and full retraining. PH-Reg initializes both teacher and student networks from the same pre-trained ViT. The teacher remains frozen and unmodified, while the student is augmented with randomly initialized register tokens. By applying test-time augmentation to the teacher's inputs, we generate denoised dense embeddings free of artifacts, which are then used to optimize only a small subset of unlocked student weights. We show that our approach can effectively reduce the number of artifact tokens, improving the segmentation and depth prediction of the student ViT under zero-shot and linear probing.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
All-optical discrete illumination-based compressed ultrafast photography
Authors:
Long Cheng,
Dalong Qi,
Jiali Yao,
Ning Xu,
Chengyu Zhou,
Wenzhang Lin,
Yu He,
Zhen Pan,
Yunhua Yao,
Lianzhong Deng,
Yuecheng Shen,
Zhenrong Sun,
Shian Zhang
Abstract:
Snapshot ultrafast optical imaging (SUOI) plays a vital role in capturing complex transient events in real time, with significant implications for both fundamental science and practical applications. As an outstanding talent in SUOI, compressed ultrafast photography (CUP) has demonstrated remarkable frame rate reaching trillions of frames per second and hundreds of sequence depth. Nevertheless, as…
▽ More
Snapshot ultrafast optical imaging (SUOI) plays a vital role in capturing complex transient events in real time, with significant implications for both fundamental science and practical applications. As an outstanding talent in SUOI, compressed ultrafast photography (CUP) has demonstrated remarkable frame rate reaching trillions of frames per second and hundreds of sequence depth. Nevertheless, as CUP relies on streak cameras, the system's imaging fidelity suffers from an inevitable limitation induced by the charge coupling artifacts in a streak camera. Moreover, although advanced image reconstruction algorithms have improved the recovered scenes, its high compression ratio still causes a compromise in image quality. To address these challenges, we propose a novel approach termed all-optical discrete illumination compressed ultrafast photography (AOD-CUP), which employs a free-space angular-chirp-enhanced delay (FACED) technique to temporally stretch femtosecond pulses and achieves discrete illumination for dynamic scenes. With its distinctive system architecture, AOD-CUP features adjustable frame numbers and flexible inter-frame intervals ranging from picoseconds to nanoseconds, thereby achieving high-fidelity ultrafast imaging in a snapshot. Experimental results demonstrate the system's superior dynamic spatial resolution and its capability to visualize ultrafast phenomena with complex spatial details, such as stress wave propagation in LiF crystals and air plasma channel formation. These results highlight the potential of AOD-CUP for high-fidelity, real-time ultrafast imaging, which provides an unprecedented tool for advancing the frontiers of ultrafast science.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion
Authors:
Tianhu Peng,
Lingfan Bao,
Chengxu Zhou
Abstract:
We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired rewar…
▽ More
We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
△ Less
Submitted 11 June, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Graph Wave Networks
Authors:
Juwei Yue,
Haikuo Li,
Jiawei Sheng,
Yihan Guo,
Xinghua Zhang,
Chuan Zhou,
Tingwen Liu,
Li Guo
Abstract:
Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equat…
▽ More
Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equation is essentially a partial differential equation (PDE) involving a first partial derivative of time, whose numerical solution usually has low stability, and leads to inefficient model training. In this paper, we would like to depict more wave details in MP, since graph signals are essentially wave signals that can be seen as a superposition of a series of waves in the form of eigenvector. This motivates us to consider MP as a wave propagation process to capture the temporal evolution of wave signals in the space. Based on wave equation in physics, we innovatively develop a graph wave equation to leverage the wave propagation on graphs. In details, we demonstrate that the graph wave equation can be connected to traditional spectral GNNs, facilitating the design of graph wave networks based on various Laplacians and enhancing the performance of the spectral GNNs. Besides, the graph wave equation is particularly a PDE involving a second partial derivative of time, which has stronger stability on graphs than the heat equation that involves a first partial derivative of time. Additionally, we theoretically prove that the numerical solution derived from the graph wave equation are constantly stable, enabling to significantly enhance model efficiency while ensuring its performance. Extensive experiments show that GWNs achieve SOTA and efficient performance on benchmark datasets, and exhibit outstanding performance in addressing challenging graph problems, such as over-smoothing and heterophily.
△ Less
Submitted 28 May, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
First measurement of $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ cross-sections via $Σ^+$-nucleus scattering at an electron-positron collider
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals o…
▽ More
Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals of these two reactions are observed for the first time. Their cross-sections are measured to be $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΛ+p+{^8\rm{Be}})=(45.2\pm12.1_{\rm{stat}}\pm7.2_{\rm{sys}})$ mb and $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΣ^{0}+p+{^8\rm{Be}})=(29.8\pm9.7_{\rm{stat}}\pm6.9_{\rm{sys}})$ mb for a $Σ^{+}$ average momentum of $0.992$ GeV/$c$, within a range of $\pm0.015$ GeV/$c$. This is the first study of $Σ^{+}$-nucleon scattering at an electron-positron collider.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems
Authors:
Haoqiang Yang,
Congde Yuan,
Kun Bai,
Mengzhuo Guo,
Wei Yang,
Chao Zhou
Abstract:
Online display advertising platforms rely on pre-ranking systems to efficiently filter and prioritize candidate ads from large corpora, balancing relevance to users with strict computational constraints. The prevailing two-tower architecture, though highly efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction and coarse similarity metrics, undermining its cap…
▽ More
Online display advertising platforms rely on pre-ranking systems to efficiently filter and prioritize candidate ads from large corpora, balancing relevance to users with strict computational constraints. The prevailing two-tower architecture, though highly efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction and coarse similarity metrics, undermining its capacity to model complex user-ad relationships. In this study, we propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model, a new architecture that augments the two-tower paradigm with two key components: $\textit{generators}$ that pre-generate holistic vectors incorporating coarse-grained user-ad interactions through a dual-generator framework with a cosine-similarity-based generation loss as the training objective, and $\textit{multi-head representers}$ that project embeddings into multiple latent subspaces to capture fine-grained, multi-faceted user interests and multi-dimensional ad attributes. This design enhances modeling effectiveness without compromising inference efficiency. Extensive experiments on public datasets and large-scale online A/B testing on Tencent's advertising platform demonstrate that HIT significantly outperforms several baselines in relevance metrics, yielding a $1.66\%$ increase in Gross Merchandise Volume and a $1.55\%$ improvement in Return on Investment, alongside similar serving latency to the vanilla two-tower models. The HIT model has been successfully deployed in Tencent's online display advertising system, serving billions of impressions daily. The code is available at https://anonymous.4open.science/r/HIT_model-5C23.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Poison in the Well: Feature Embedding Disruption in Backdoor Attacks
Authors:
Zhou Feng,
Jiahao Chen,
Chunyi Zhou,
Yuwen Pu,
Qingming Li,
Shouling Ji
Abstract:
Backdoor attacks embed malicious triggers into training data, enabling attackers to manipulate neural network behavior during inference while maintaining high accuracy on benign inputs. However, existing backdoor attacks face limitations manifesting in excessive reliance on training data, poor stealth, and instability, which hinder their effectiveness in real-world applications. Therefore, this pa…
▽ More
Backdoor attacks embed malicious triggers into training data, enabling attackers to manipulate neural network behavior during inference while maintaining high accuracy on benign inputs. However, existing backdoor attacks face limitations manifesting in excessive reliance on training data, poor stealth, and instability, which hinder their effectiveness in real-world applications. Therefore, this paper introduces ShadowPrint, a versatile backdoor attack that targets feature embeddings within neural networks to achieve high ASRs and stealthiness. Unlike traditional approaches, ShadowPrint reduces reliance on training data access and operates effectively with exceedingly low poison rates (as low as 0.01%). It leverages a clustering-based optimization strategy to align feature embeddings, ensuring robust performance across diverse scenarios while maintaining stability and stealth. Extensive evaluations demonstrate that ShadowPrint achieves superior ASR (up to 100%), steady CA (with decay no more than 1% in most cases), and low DDR (averaging below 5%) across both clean-label and dirty-label settings, and with poison rates ranging from as low as 0.01% to 0.05%, setting a new standard for backdoor attack capabilities and emphasizing the need for advanced defense strategies focused on feature space manipulations.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.