-
Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading
Authors:
Qinkai Yu,
Wei Zhou,
Hantao Liu,
Yanyu Xu,
Meng Wang,
Yitian Zhao,
Huazhu Fu,
Xujiong Ye,
Yalin Zheng,
Yanda Meng
Abstract:
As a long-term complication of diabetes, diabetic retinopathy (DR) progresses slowly, potentially taking years to threaten vision. An accurate and robust evaluation of its severity is vital to ensure prompt management and care. Ordinal regression leverages the underlying inherent order between categories to achieve superior performance beyond traditional classification. However, there exist challe…
▽ More
As a long-term complication of diabetes, diabetic retinopathy (DR) progresses slowly, potentially taking years to threaten vision. An accurate and robust evaluation of its severity is vital to ensure prompt management and care. Ordinal regression leverages the underlying inherent order between categories to achieve superior performance beyond traditional classification. However, there exist challenges leading to lower DR classification performance: 1) The uneven distribution of DR severity levels, characterized by a long-tailed pattern, adds complexity to the grading process. 2)The ambiguity in defining category boundaries introduces additional challenges, making the classification process more complex and prone to inconsistencies. This work proposes a novel autoregressive ordinal regression method called AOR-DR to address the above challenges by leveraging the clinical knowledge of inherent ordinal information in DR grading dataset settings. Specifically, we decompose the DR grading task into a series of ordered steps by fusing the prediction of the previous steps with extracted image features as conditions for the current prediction step. Additionally, we exploit the diffusion process to facilitate conditional probability modeling, enabling the direct use of continuous global image features for autoregression without relearning contextual information from patch-level features. This ensures the effectiveness of the autoregressive process and leverages the capabilities of pre-trained large-scale foundation models. Extensive experiments were conducted on four large-scale publicly available color fundus datasets, demonstrating our model's effectiveness and superior performance over six recent state-of-the-art ordinal regression methods. The implementation code is available at https://github.com/Qinkaiyu/AOR-DR.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Introduction to the China Space Station Telescope (CSST)
Authors:
CSST Collaboration,
Yan Gong,
Haitao Miao,
Hu Zhan,
Zhao-Yu Li,
Jinyi Shangguan,
Haining Li,
Chao Liu,
Xuefei Chen,
Haibo Yuan,
Jilin Zhou,
Hui-Gen Liu,
Cong Yu,
Jianghui Ji,
Zhaoxiang Qi,
Jiacheng Liu,
Zigao Dai,
Xiaofeng Wang,
Zhenya Zheng,
Lei Hao,
Jiangpei Dou,
Yiping Ao,
Zhenhui Lin,
Kun Zhang,
Wei Wang
, et al. (88 additional authors not shown)
Abstract:
The China Space Station Telescope (CSST) is a next-generation Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instrum…
▽ More
The China Space Station Telescope (CSST) is a next-generation Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, the CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxy and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
BiFair: A Fairness-aware Training Framework for LLM-enhanced Recommender Systems via Bi-level Optimization
Authors:
Jiaming Zhang,
Yuyuan Li,
Yiqun Xu,
Li Zhang,
Xiaohua Feng,
Zhifei Ren,
Chaochao Chen
Abstract:
Large Language Model-enhanced Recommender Systems (LLM-enhanced RSs) have emerged as a powerful approach to improving recommendation quality by leveraging LLMs to generate item representations. Despite these advancements, the integration of LLMs raises severe fairness concerns. Existing studies reveal that LLM-based RSs exhibit greater unfairness than traditional RSs, yet fairness issues in LLM-en…
▽ More
Large Language Model-enhanced Recommender Systems (LLM-enhanced RSs) have emerged as a powerful approach to improving recommendation quality by leveraging LLMs to generate item representations. Despite these advancements, the integration of LLMs raises severe fairness concerns. Existing studies reveal that LLM-based RSs exhibit greater unfairness than traditional RSs, yet fairness issues in LLM-enhanced RSs remain largely unexplored. In this paper, our empirical study reveals that while LLM-enhanced RSs improve fairness across item groups, a significant fairness gap persists. Further enhancement remains challenging due to the architectural differences and varying sources of unfairness inherent in LLM-enhanced RSs. To bridge this gap, we first decompose unfairness into i) \textit{prior unfairness} in LLM-generated representations and ii) \textit{training unfairness} in recommendation models. Then, we propose BiFair, a bi-level optimization-based fairness-aware training framework designed to mitigate both prior and training unfairness simultaneously. BiFair optimizes two sets of learnable parameters: LLM-generated representations and a trainable projector in the recommendation model, using a two-level nested optimization process. Additionally, we introduce an adaptive inter-group balancing mechanism, leveraging multi-objective optimization principles to dynamically balance fairness across item groups. Extensive experiments on three real-world datasets demonstrate that BiFair significantly mitigates unfairness and outperforms previous state-of-the-art methods.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation
Authors:
Weilun Feng,
Chuanguang Yang,
Haotong Qin,
Yuqi Li,
Xiangqi Li,
Zhulin An,
Libo Huang,
Boyu Diao,
Fuzhen Zhuang,
Michele Magno,
Yongjun Xu,
Yingli Tian,
Tingwen Huang
Abstract:
Diffusion models have demonstrated remarkable performance on vision generation tasks. However, the high computational complexity hinders its wide application on edge devices. Quantization has emerged as a promising technique for inference acceleration and memory reduction. However, existing quantization methods do not generalize well under extremely low-bit (2-4 bit) quantization. Directly applyin…
▽ More
Diffusion models have demonstrated remarkable performance on vision generation tasks. However, the high computational complexity hinders its wide application on edge devices. Quantization has emerged as a promising technique for inference acceleration and memory reduction. However, existing quantization methods do not generalize well under extremely low-bit (2-4 bit) quantization. Directly applying these methods will cause severe performance degradation. We identify that the existing quantization framework suffers from the outlier-unfriendly quantizer design, suboptimal initialization, and optimization strategy. We present MPQ-DMv2, an improved \textbf{M}ixed \textbf{P}recision \textbf{Q}uantization framework for extremely low-bit \textbf{D}iffusion \textbf{M}odels. For the quantization perspective, the imbalanced distribution caused by salient outliers is quantization-unfriendly for uniform quantizer. We propose \textit{Flexible Z-Order Residual Mixed Quantization} that utilizes an efficient binary residual branch for flexible quant steps to handle salient error. For the optimization framework, we theoretically analyzed the convergence and optimality of the LoRA module and propose \textit{Object-Oriented Low-Rank Initialization} to use prior quantization error for informative initialization. We then propose \textit{Memory-based Temporal Relation Distillation} to construct an online time-aware pixel queue for long-term denoising temporal information distillation, which ensures the overall temporal consistency between quantized and full-precision model. Comprehensive experiments on various generation tasks show that our MPQ-DMv2 surpasses current SOTA methods by a great margin on different architectures, especially under extremely low-bit widths.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Signature of gate tunable superconducting network in twisted bilayer graphene
Authors:
Yingbo Wang,
Yingzhuo Han,
Lu Cao,
Xun-Jiang Luo,
Yucheng Xue,
Jiefei Shi,
Xiaomeng Wang,
Xiangjia Bai,
Junnan Jiang,
Ziyi Tian,
Kenji Watanabe,
Takashi Taniguchi,
Fengcheng Wu,
Qing-feng Sun,
Hong-Jun Gao,
Yuhang Jiang,
Jinhai Mao
Abstract:
Twisted van der Waals materials provide a tunable platform for investigating two-dimensional superconductivity and quantum phases. Using spectra-imaging scanning tunneling microscopy, we study the superconducting states in twisted bilayer graphene and track their evolution from insulating phases. Gate-dependent spectroscopic measurements reveal two distinct regimes: under-doped (ν = -2.3) and opti…
▽ More
Twisted van der Waals materials provide a tunable platform for investigating two-dimensional superconductivity and quantum phases. Using spectra-imaging scanning tunneling microscopy, we study the superconducting states in twisted bilayer graphene and track their evolution from insulating phases. Gate-dependent spectroscopic measurements reveal two distinct regimes: under-doped (ν = -2.3) and optimally doped (ν = -2.6). In the under-doped regime, partial superconductivity arises, forming a network interspersed with non-gapped regions. At optimal doping, the entire unit cell demonstrates superconductivity, with gap size modulation showing an anti-correlation with the local density of states. This gate-dependent transition from an insulating phase to a modulated superconductor uncovers an unexpected spatial hierarchy in pairing behavior and offers direct microscopic insights to constrain theories of superconductivity in moiré systems.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Extriangulated factorization systems, $s$-torsion pairs and recollements
Authors:
Yan Xu,
Haicheng Zhang,
Zhiwei Zhu
Abstract:
We introduce extriangulated factorization systems in extriangulated categories and show that there exists a bijection between $s$-torsion pairs and extriangulated factorization systems. We also consider the gluing of $s$-torsion pairs and extriangulated factorization systems under recollements of extriangulated categories.
We introduce extriangulated factorization systems in extriangulated categories and show that there exists a bijection between $s$-torsion pairs and extriangulated factorization systems. We also consider the gluing of $s$-torsion pairs and extriangulated factorization systems under recollements of extriangulated categories.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
HAWK: A Hierarchical Workflow Framework for Multi-Agent Collaboration
Authors:
Yuyang Cheng,
Yumiao Xu,
Chaojia Yu,
Yong Zhao
Abstract:
Contemporary multi-agent systems encounter persistent challenges in cross-platform interoperability, dynamic task scheduling, and efficient resource sharing. Agents with heterogeneous implementations often lack standardized interfaces; collaboration frameworks remain brittle and hard to extend; scheduling policies are static; and inter-agent state synchronization is insufficient. We propose Hierar…
▽ More
Contemporary multi-agent systems encounter persistent challenges in cross-platform interoperability, dynamic task scheduling, and efficient resource sharing. Agents with heterogeneous implementations often lack standardized interfaces; collaboration frameworks remain brittle and hard to extend; scheduling policies are static; and inter-agent state synchronization is insufficient. We propose Hierarchical Agent Workflow (HAWK), a modular framework comprising five layers-User, Workflow, Operator, Agent, and Resource-and supported by sixteen standardized interfaces. HAWK delivers an end-to-end pipeline covering task parsing, workflow orchestration, intelligent scheduling, resource invocation, and data synchronization. At its core lies an adaptive scheduling and optimization module in the Workflow Layer, which harnesses real-time feedback and dynamic strategy adjustment to maximize utilization. The Resource Layer provides a unified abstraction over heterogeneous data sources, large models, physical devices, and third-party services&tools, simplifying cross-domain information retrieval. We demonstrate HAWK's scalability and effectiveness via CreAgentive, a multi-agent novel-generation prototype, which achieves marked gains in throughput, lowers invocation complexity, and improves system controllability. We also show how hybrid deployments of large language models integrate seamlessly within HAWK, highlighting its flexibility. Finally, we outline future research avenues-hallucination mitigation, real-time performance tuning, and enhanced cross-domain adaptability-and survey prospective applications in healthcare, government, finance, and education.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Authors:
Yifan Jiang,
Yibo Xue,
Yukun Kang,
Pin Zheng,
Jian Peng,
Feiran Wu,
Changliang Xu
Abstract:
Slide animations, such as fade-ins, fly-ins, and wipes, are critical for audience engagement, efficient information delivery, and vivid visual expression. However, most AI-driven slide-generation tools still lack native animation support, and existing vision-language models (VLMs) struggle with animation tasks due to the absence of public datasets and limited temporal-reasoning capabilities. To ad…
▽ More
Slide animations, such as fade-ins, fly-ins, and wipes, are critical for audience engagement, efficient information delivery, and vivid visual expression. However, most AI-driven slide-generation tools still lack native animation support, and existing vision-language models (VLMs) struggle with animation tasks due to the absence of public datasets and limited temporal-reasoning capabilities. To address this gap, we release the first public dataset for slide-animation modeling: 12,000 triplets of natural-language descriptions, animation JSON files, and rendered videos, collectively covering every built-in PowerPoint effect. Using this resource, we fine-tune Qwen-2.5-VL-7B with Low-Rank Adaptation (LoRA) and achieve consistent improvements over GPT-4.1 and Gemini-2.5-Pro in BLEU-4, ROUGE-L, SPICE, and our Coverage-Order-Detail Assessment (CODA) metric, which evaluates action coverage, temporal order, and detail fidelity. On a manually curated test set of slides, the LoRA model increases BLEU-4 by around 60%, ROUGE-L by 30%, and shows significant improvements in CODA-detail. This demonstrates that low-rank adaptation enables reliable temporal reasoning and generalization beyond synthetic data. Overall, our dataset, LoRA-enhanced model, and CODA metric provide a rigorous benchmark and foundation for future research on VLM-based dynamic slide generation.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Artificial intelligence in drug discovery: A comprehensive review with a case study on hyperuricemia, gout arthritis, and hyperuricemic nephropathy
Authors:
Junwei Su,
Cheng Xin,
Ao Shang,
Shan Wu,
Zhenzhen Xie,
Ruogu Xiong,
Xiaoyu Xu,
Cheng Zhang,
Guang Chen,
Yau-Tuen Chan,
Guoyi Tang,
Ning Wang,
Yong Xu,
Yibin Feng
Abstract:
This paper systematically reviews recent advances in artificial intelligence (AI), with a particular focus on machine learning (ML), across the entire drug discovery pipeline. Due to the inherent complexity, escalating costs, prolonged timelines, and high failure rates of traditional drug discovery methods, there is a critical need to comprehensively understand how AI/ML can be effectively integra…
▽ More
This paper systematically reviews recent advances in artificial intelligence (AI), with a particular focus on machine learning (ML), across the entire drug discovery pipeline. Due to the inherent complexity, escalating costs, prolonged timelines, and high failure rates of traditional drug discovery methods, there is a critical need to comprehensively understand how AI/ML can be effectively integrated throughout the full process. Currently available literature reviews often narrowly focus on specific phases or methodologies, neglecting the dependence between key stages such as target identification, hit screening, and lead optimization. To bridge this gap, our review provides a detailed and holistic analysis of AI/ML applications across these core phases, highlighting significant methodological advances and their impacts at each stage. We further illustrate the practical impact of these techniques through an in-depth case study focused on hyperuricemia, gout arthritis, and hyperuricemic nephropathy, highlighting real-world successes in molecular target identification and therapeutic candidate discovery. Additionally, we discuss significant challenges facing AI/ML in drug discovery and outline promising future research directions. Ultimately, this review serves as an essential orientation for researchers aiming to leverage AI/ML to overcome existing bottlenecks and accelerate drug discovery.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Observation and research on cosmic ray muons and solar modulation effect based on plastic scintillator detector
Authors:
Wang Dexin,
Zhang Rui,
Yu Dekang,
Na Hui,
Yao Zhangha,
Wu Linghe,
Zhang Suyalatu,
Liang Tairan,
Huang Meirong,
Wang Zhilong,
Bai Yu,
Huang Yongshun,
Yang Xue,
Zhang Jiawen,
Liu Mengdi,
Ma Qiang,
Yu Jing,
Ji Xiuyan,
Yu Yiliqi,
Shao Xuepeng
Abstract:
Cosmic rays, originating from stars, supernovae, and other astrophysical sources, are composed of high-energy particles that enter Earths atmosphere. Upon interaction with atmospheric nuclei, these primary cosmic rays generate secondary particles, including neutrons, electrons, and muons, with muons constituting a dominant component at ground level. Muons, due to their relative abundance, stabilit…
▽ More
Cosmic rays, originating from stars, supernovae, and other astrophysical sources, are composed of high-energy particles that enter Earths atmosphere. Upon interaction with atmospheric nuclei, these primary cosmic rays generate secondary particles, including neutrons, electrons, and muons, with muons constituting a dominant component at ground level. Muons, due to their relative abundance, stability, and well-characterized energy loss mechanisms, serve as critical probes for investigating the fundamental properties of cosmic rays. Studies of muon energy distribution, diurnal anisotropy, and their modulation by solar activity provide critical insights into the mechanism of particle acceleration in cosmic ray sources and the effects of solar and atmospheric.This study aims to characterize the counting spectra and anisotropic properties of cosmic ray muons by using a plastic scintillator detector system. The experiment was conducted over a three-month period, from December 2023 to February 2024, leveraging long-bar plastic scintillator detectors equipped with dual-end photomultiplier tubes (PMTs) and a high-resolution digital data acquisition system. A dual-end coincidence measurement technique was used to enhance the signal-to-noise ratio by suppressing thermal noise and other background interferences. Diurnal variations in muon count rates exhibit a pronounced pattern, with a systematic reduction occurring between 8:00 AM and 1:00 PM. This phenomenon is attributed to the solar shielding effects, where enhanced solar activity during daytime hours modulates the flux of galactic cosmic rays reaching Earths surface. The study further corroborates these findings through cross-comparisons with data from the Yangbajing Cosmic Ray Observatory. These observations underscore the robustness of the plastic scintillator detector system for capturing detailed muon spectra and anisotropic patterns.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
User Location Disclosure Fails to Deter Overseas Criticism but Amplifies Regional Divisions on Chinese Social Media
Authors:
Leo Yang Yang,
Yiqing Xu
Abstract:
We examine the behavioral impact of a user location disclosure policy implemented on Sina Weibo, China's largest microblogging platform, using a high-frequency, real-time dataset of uncensored user engagement with 165 leading government and media accounts. Leveraging a natural experiment result from the platform's sudden rollout of location tagging on April 28, 2022, we compare millions of time-st…
▽ More
We examine the behavioral impact of a user location disclosure policy implemented on Sina Weibo, China's largest microblogging platform, using a high-frequency, real-time dataset of uncensored user engagement with 165 leading government and media accounts. Leveraging a natural experiment result from the platform's sudden rollout of location tagging on April 28, 2022, we compare millions of time-stamped observations of user behavior in the comment sections of these accounts before and after the policy change. Although the policy appeared intended to deter overseas users from spreading information deemed harmful by the regime, we find no reduction in their engagement. Instead, the policy sharply reduced domestic users' willingness to comment on posts about local issues outside their own provinces. This effect was especially pronounced among out-of-province commenters and disproportionately curtailed criticisms. Using large language models, we further show that location disclosure triggered a rise in regionally discriminatory replies, which in turn heightened the perceived risk of cross-provincial engagement and reshaped the norms of online participation. Our findings suggest that authoritarian regimes can reinforce censorship not only through top-down control, but by mobilizing social cleavages, here, regional divisions, to suppress dissent and fragment public discourse.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Hybrid Superscattering Driven by Toroidal Dipole
Authors:
D. Kislov,
D. Borovkov,
L. Huang,
A. Kuznetsov,
A. Canos Valero,
A. Ipatovs,
V. Bobrovs,
V. Fedotov,
L. Gao,
S. Xie,
Y. Xu,
J. Luo,
D. Baranov,
A. Arsenin,
A. Bolshakov,
A. S. Shalin
Abstract:
The dynamic toroidal dipole is a unique radiation source beyond standard multipoles. Since its first demonstration 15 years ago, it has attracted growing theoretical and experimental interest. Research mainly aims to enhance its weak electromagnetic coupling to free space. Here we report on a surprising finding that the toroidal dipole can, in fact, be engaged in the enhancement of electromagnetic…
▽ More
The dynamic toroidal dipole is a unique radiation source beyond standard multipoles. Since its first demonstration 15 years ago, it has attracted growing theoretical and experimental interest. Research mainly aims to enhance its weak electromagnetic coupling to free space. Here we report on a surprising finding that the toroidal dipole can, in fact, be engaged in the enhancement of electromagnetic scattering per se driving the so-called superscattering the regime of anomalously strong light scattering where the total cross-section of the effect exceeds the fundamental single-channel limit. We introduce a new paradigm of hybrid superscattering enabled by the toroidal dipole, which we implement with a dielectric scatterer of a simple geometry, and demonstrate for the first time that two complementary mechanisms of superscattering the Friedrich-Wintgen mechanism and resonance overlap can act synergistically to yield the substantially enhanced effect. Using coupled-dipole theory, full-wave numerical modeling and coupled-mode theory, we identify and quantify the dominant multipolar contributions and show that the normalized scattering cross-section exceeds the dipole limit due to a toroidal dipole-magnetic quadrupole interplay. These findings are supported by experimental measurements in the GHz frequency range using a dimer of ceramic cubes, which confirm both the spectral and spatial features of toroidal superscattering. Our results open a new powerful route to engineering strong light-matter interaction via peculiar toroidal modes (never observed before) with potential applications in toroidal superscattering metamaterials and metasurfaces, photonic devices, and sensors.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Authors:
Yuqi Li,
Chuanguang Yang,
Hansheng Zeng,
Zeyu Dong,
Zhulin An,
Yongjun Xu,
Yingli Tian,
Hao Wu
Abstract:
Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a…
▽ More
Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a more efficient lightweight student network. The teacher model follows an encoder-latent evolution-decoder architecture, where its latent evolution module decouples high-frequency details and low-frequency trends using convolution and Transformer (global low-frequency modeler). However, the multi-layer convolution and deconvolution structures result in slow training and high memory usage. To address these issues, we propose a frequency-aligned knowledge distillation strategy, which extracts multi-scale spectral features from the teacher's latent space, including both high and low frequency components, to guide the lightweight student model in capturing both local fine-grained variations and global evolution patterns. Experimental results show that SDKD significantly improves performance, achieving reductions of up to 81.3% in MSE and in MAE 52.3% on the Navier-Stokes equation dataset. The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity. Our codes are available at https://github.com/itsnotacie/SDKD
△ Less
Submitted 27 June, 2025;
originally announced July 2025.
-
Loki's Dance of Illusions: A Comprehensive Survey of Hallucination in Large Language Models
Authors:
Chaozhuo Li,
Pengbo Wang,
Chenxu Wang,
Litian Zhang,
Zheng Liu,
Qiwei Ye,
Yuanbo Xu,
Feiran Huang,
Xi Zhang,
Philip S. Yu
Abstract:
Edgar Allan Poe noted, "Truth often lurks in the shadow of error," highlighting the deep complexity intrinsic to the interplay between truth and falsehood, notably under conditions of cognitive and informational asymmetry. This dynamic is strikingly evident in large language models (LLMs). Despite their impressive linguistic generation capabilities, LLMs sometimes produce information that appears…
▽ More
Edgar Allan Poe noted, "Truth often lurks in the shadow of error," highlighting the deep complexity intrinsic to the interplay between truth and falsehood, notably under conditions of cognitive and informational asymmetry. This dynamic is strikingly evident in large language models (LLMs). Despite their impressive linguistic generation capabilities, LLMs sometimes produce information that appears factually accurate but is, in reality, fabricated, an issue often referred to as 'hallucinations'. The prevalence of these hallucinations can mislead users, affecting their judgments and decisions. In sectors such as finance, law, and healthcare, such misinformation risks causing substantial economic losses, legal disputes, and health risks, with wide-ranging consequences.In our research, we have methodically categorized, analyzed the causes, detection methods, and solutions related to LLM hallucinations. Our efforts have particularly focused on understanding the roots of hallucinations and evaluating the efficacy of current strategies in revealing the underlying logic, thereby paving the way for the development of innovative and potent approaches. By examining why certain measures are effective against hallucinations, our study aims to foster a comprehensive approach to tackling this issue within the domain of LLMs.
△ Less
Submitted 6 June, 2025;
originally announced July 2025.
-
GRB 240825A: Early Reverse Shock and Its Physical Implications
Authors:
Chao Wu,
Yun Wang,
Hua-Li Li,
Li-Ping Xin,
Dong Xu,
Benjamin Schneider,
Antonio de Ugarte Postigo,
Gavin Lamb,
Andrea Reguitti,
Andrea Saccardi,
Xing Gao,
Xing-Ling Li,
Qiu-Li Wang,
Bing Zhang,
Jian-Yan Wei,
Shuang-Nan Zhang,
Frédéric Daigne,
Jean-Luc Atteia,
Maria-Grazia Bernardini,
Hong-bo Cai,
Arnaud Claret,
Bertrand Cordier,
Jin-Song Deng,
Olivier Godet,
Diego Götz
, et al. (62 additional authors not shown)
Abstract:
Early multi-wavelength observations offer crucial insights into the nature of the relativistic jets responsible for gamma-ray bursts and their interaction with the surrounding medium.We present data of GRB 240825A from 17 space- and ground-based telescopes/instruments, covering wavelengths from NIR/optical to X-ray and GeV, and spanning from the prompt emission to the afterglow phase triggered by…
▽ More
Early multi-wavelength observations offer crucial insights into the nature of the relativistic jets responsible for gamma-ray bursts and their interaction with the surrounding medium.We present data of GRB 240825A from 17 space- and ground-based telescopes/instruments, covering wavelengths from NIR/optical to X-ray and GeV, and spanning from the prompt emission to the afterglow phase triggered by Swift and Fermi. The early afterglow observations were carried out by SVOM/C-GFT, and spectroscopic observations of the afterglow by GTC, VLT, and TNG determined the redshift of the burst ($z = 0.659$) later.A comprehensive analysis of the prompt emission spectrum observed by Swift-BAT and Fermi-GBM/LAT reveals a rare and significant high-energy cutoff at ~76 MeV. Assuming this cutoff is due to $γγ$ absorption allows us to place an upper limit on the initial Lorentz factor, $Γ_0 < 245$. The optical/NIR and GeV afterglow light curves be described by the standard external shock model, with early-time emission dominated by a reverse shock (RS) and a subsequent transition to forward shock (FS) emission. Our afterglow modelling yields a consistent estimate of the initial Lorentz factor ($Γ_{\rm 0} \sim 234$). Furthermore, the RS-to-FS magnetic field ratio ($\mathcal{R}_B \sim 302$) indicates that the reverse shock region is significantly more magnetized than the FS region. An isotropic-equivalent kinetic energy of $E_{\text{k,iso}} = 5.25 \times 10^{54}$ erg is derived, and the corresponding $γ$-ray radiation efficiency is estimated to be $η_γ$ = 3.1%. On the other hand, the standard afterglow model can not reproduce the X-ray light curve of GRB 240825A, calling for improved models to characterize all multi-wavelength data.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Application of the microscopic optical potential of chiral effective field theory in astrophysical neutron-capture reactions
Authors:
Bing Wang,
Dong Bai,
Yi Xu
Abstract:
The microscopic global nucleon-nucleus optical potential proposed by Whitehead, Lim, and Holt (WLH) is a state-of-the-art potential developed within the framework of many-body perturbation theory using realistic nuclear interactions from chiral effective field theory. Given its potentially greater predictive power for reactions involving exotic isotopes, we apply it to the calculations of astrophy…
▽ More
The microscopic global nucleon-nucleus optical potential proposed by Whitehead, Lim, and Holt (WLH) is a state-of-the-art potential developed within the framework of many-body perturbation theory using realistic nuclear interactions from chiral effective field theory. Given its potentially greater predictive power for reactions involving exotic isotopes, we apply it to the calculations of astrophysical neutron-capture reactions for the first time, which are particularly important to the nucleosynthesis of elements heavier than iron. It is found that this potential provides a good description of experimental known neutron-capture cross sections and Maxwellian-averaged cross sections. For unstable neutron-rich nuclei, we comprehensively calculate the neutron-capture reaction rates for all nuclei with $26\leq Z\leq84$, located between the valley of stability and the neutron drip line, using the backward-forward Monte Carlo method with the $f_{rms}$ deviation as the $χ^2$ estimator. The results reveal a noticeable separation in the uncertainty of rates around an isospin asymmetry of 0.28 under the constraint $f_{rms} \leq 1.56$. This highlights the critical role of isospin dependence in optical potentials and suggests that future developments of the WLH potential may pay special attention to the isospin dependence.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Joint Radiation Power, Antenna Position, and Beamforming Optimization for Pinching-Antenna Systems with Motion Power Consumption
Authors:
Yiming Xu,
Dongfang Xu,
Xianghao Yu,
Shenghui Song,
Zhiguo Ding,
Robert Schober
Abstract:
Pinching-antenna systems (PASS) have been recently proposed to improve the performance of wireless networks by reconfiguring both the large-scale and small-scale channel conditions. However, existing studies ignore the physical constraints of antenna placement and assume fixed antenna radiation power. To fill this research gap, this paper investigates the design of PASS taking into account the mot…
▽ More
Pinching-antenna systems (PASS) have been recently proposed to improve the performance of wireless networks by reconfiguring both the large-scale and small-scale channel conditions. However, existing studies ignore the physical constraints of antenna placement and assume fixed antenna radiation power. To fill this research gap, this paper investigates the design of PASS taking into account the motion power consumption of pinching-antennas (PAs) and the impact of adjustable antenna radiation power. To that end, we minimize the average power consumption for a given quality-of-service (QoS) requirement, by jointly optimizing the antenna positions, antenna radiation power ratios, and transmit beamforming. To the best of the authors' knowledge, this is the first work to consider radiation power optimization in PASS, which provides an additional degree of freedom (DoF) for system design. The cases with both continuous and discrete antenna placement are considered, where the main challenge lies in the fact that the antenna positions affect both the magnitude and phase of the channel coefficients of PASS, making system optimization very challenging. To tackle the resulting unique obstacles, an alternating direction method of multipliers (ADMM)-based framework is proposed to solve the problem for continuous antenna movement, while its discrete counterpart is formulated as a mixed integer nonlinear programming (MINLP) problem and solved by the block coordinate descent (BCD) method. Simulation results validate the performance enhancement achieved by incorporating PA movement power assumption and adjustable radiation power into PASS design, while also demonstrating the efficiency of the proposed optimization framework. The benefits of PASS over conventional multiple-input multiple-output (MIMO) systems in mitigating the large-scale path loss and inter-user interference is also revealed.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
ViRefSAM: Visual Reference-Guided Segment Anything Model for Remote Sensing Segmentation
Authors:
Hanbo Bi,
Yulong Xu,
Ya Li,
Yongqiang Mao,
Boyuan Tong,
Chongyang Li,
Chunbo Lang,
Wenhui Diao,
Hongqi Wang,
Yingchao Feng,
Xian Sun
Abstract:
The Segment Anything Model (SAM), with its prompt-driven paradigm, exhibits strong generalization in generic segmentation tasks. However, applying SAM to remote sensing (RS) images still faces two major challenges. First, manually constructing precise prompts for each image (e.g., points or boxes) is labor-intensive and inefficient, especially in RS scenarios with dense small objects or spatially…
▽ More
The Segment Anything Model (SAM), with its prompt-driven paradigm, exhibits strong generalization in generic segmentation tasks. However, applying SAM to remote sensing (RS) images still faces two major challenges. First, manually constructing precise prompts for each image (e.g., points or boxes) is labor-intensive and inefficient, especially in RS scenarios with dense small objects or spatially fragmented distributions. Second, SAM lacks domain adaptability, as it is pre-trained primarily on natural images and struggles to capture RS-specific semantics and spatial characteristics, especially when segmenting novel or unseen classes. To address these issues, inspired by few-shot learning, we propose ViRefSAM, a novel framework that guides SAM utilizing only a few annotated reference images that contain class-specific objects. Without requiring manual prompts, ViRefSAM enables automatic segmentation of class-consistent objects across RS images. Specifically, ViRefSAM introduces two key components while keeping SAM's original architecture intact: (1) a Visual Contextual Prompt Encoder that extracts class-specific semantic clues from reference images and generates object-aware prompts via contextual interaction with target images; and (2) a Dynamic Target Alignment Adapter, integrated into SAM's image encoder, which mitigates the domain gap by injecting class-specific semantics into target image features, enabling SAM to dynamically focus on task-relevant regions. Extensive experiments on three few-shot segmentation benchmarks, including iSAID-5$^i$, LoveDA-2$^i$, and COCO-20$^i$, demonstrate that ViRefSAM enables accurate and automatic segmentation of unseen classes by leveraging only a few reference images and consistently outperforms existing few-shot segmentation methods across diverse datasets.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
Downregulation of aquaporin 3 promotes hyperosmolarity-induced apoptosis of nucleus pulposus cells through PI3K/Akt/mTOR pathway suppression
Authors:
Yuan Sang,
Huiqing Zhao,
Jiajun Wu,
Ting Zhang,
Wenbin Xu,
Hui Yao,
Kaihua Liu,
Chang Liu,
Junbin Zhang,
Ping Li,
Depeng Wu,
Yichun Xu,
Jianying Zhang,
Gang Hou
Abstract:
Hyperosmolarity is a key contributor to nucleus pulposus cell (NPC) apoptosis during intervertebral disc degeneration (IVDD). Aquaporin 3 (AQP3), a membrane channel protein, regulates cellular osmotic balance by transporting water and osmolytes. Although AQP3 downregulation is associated with disc degeneration, its role in apoptosis under hyperosmotic conditions remains unclear. Here, we demonstra…
▽ More
Hyperosmolarity is a key contributor to nucleus pulposus cell (NPC) apoptosis during intervertebral disc degeneration (IVDD). Aquaporin 3 (AQP3), a membrane channel protein, regulates cellular osmotic balance by transporting water and osmolytes. Although AQP3 downregulation is associated with disc degeneration, its role in apoptosis under hyperosmotic conditions remains unclear. Here, we demonstrate that hyperosmolarity induces AQP3 depletion, suppresses the PI3K/AKT/mTOR signaling pathway, and promotes mitochondrial dysfunction and ROS accumulation in NPCs. Lentiviral overexpression of AQP3 restores this pathway, attenuates oxidative damage, and reduces apoptosis, preserving disc structure in IVDD rat models. In contrast, pharmacological inhibition of AQP3 exacerbates ECM catabolism and NP tissue loss. Our findings reveal that AQP3 deficiency under hyperosmolarity contributes to NPC apoptosis via suppression of PI3K/AKT/mTOR signaling, potentially creating a pathological cycle of disc degeneration. These results highlight AQP3 as a promising therapeutic target for IVDD.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Hybrid least squares for learning functions from highly noisy data
Authors:
Ben Adcock,
Bernhard Hientzsch,
Akil Narayan,
Yiming Xu
Abstract:
Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose a hybrid approach that combines Christoffel sampling with certain types of optimal experimental design to address this is…
▽ More
Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are powerful in the small noise regime are suboptimal when large noise is present. We propose a hybrid approach that combines Christoffel sampling with certain types of optimal experimental design to address this issue. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convex-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Quantum Geometry in the NbSe$_2$ Family I: Obstructed Compact Wannier Function and New Perturbation Theory
Authors:
Jiabin Yu,
Yi Jiang,
Yuanfeng Xu,
Dumitru Călugăru,
Haoyu Hu,
Haojie Guo,
Sandra Sajan,
Yongsong Wang,
Miguel M. Ugeda,
Fernando De Juan,
B. Andrei Bernevig
Abstract:
We revisit the electronic structure and band topology of monolayer 1H-NbSe$_2$, which hosts both superconductivity and charge density wave, and its related compounds 1H-MoS$_2$, NbS$_2$, TaS$_2$, TaSe$_2$ and WS$_2$. We construct a 6-band, a 3-band, and - simplest of all - a single-band model for this material family, by directly Wannierizing the ab initio bands. All host obstructed atomic isolate…
▽ More
We revisit the electronic structure and band topology of monolayer 1H-NbSe$_2$, which hosts both superconductivity and charge density wave, and its related compounds 1H-MoS$_2$, NbS$_2$, TaS$_2$, TaSe$_2$ and WS$_2$. We construct a 6-band, a 3-band, and - simplest of all - a single-band model for this material family, by directly Wannierizing the ab initio bands. All host obstructed atomic isolated bands away from the atomic positions near the Fermi energy. We find that in the 3-band model, the obstructed atomic Wannier function can be well approximated by an optimally compact Wannier function with more than 90% accuracy for all the compounds, rising to a remarkable 94% accuracy in NbSe$_2$. Interestingly, the simplest single-band model has next nearest-neighboring hopping larger than the nearest-neighboring hopping (by nearly an order of magnitude for MoS$_2$, NbSe$_2$, TaSe$_2$ and WS$_2$), which comes from the cancellation between the atomic onsite terms and the atomic nearest-neighboring hopping after projecting to the obstructed atomic Wannier functions. Furthermore for NbSe$_2$, we employ a novel approximation scheme to obtain an effective Hamiltonian that captures the 3 bands originating mainly from the Nb atom. We also use conventional perturbation theory to derive the ab initio obstructed Wannier function with 95% accuracy. Our results pave the way for future study of the effect of quantum geometry on the correlated phases in this family of materials.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
RoboBrain 2.0 Technical Report
Authors:
BAAI RoboBrain Team,
Mingyu Cao,
Huajie Tan,
Yuheng Ji,
Minglan Lin,
Zhiyu Li,
Zhou Cao,
Pengwei Wang,
Enshen Zhou,
Yi Han,
Yingbo Tang,
Xiangqi Xu,
Wei Guo,
Yaoxu Lyu,
Yijie Xu,
Jiayu Shi,
Mengfei Du,
Cheng Chi,
Mengdi Zhao,
Xiaoshuai Hao,
Junkai Zhao,
Xiaojie Zhang,
Sh/anyu Rong,
Huaihai Lyu,
Zhengliang Cai
, et al. (26 additional authors not shown)
Abstract:
We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain…
▽ More
We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performance across a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32B variant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supports key real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatial referring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction, multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advances embodied AI research and serves as a practical step toward building generalist embodied agents. The code, checkpoint and benchmark are available at https://superrobobrain.github.io.
△ Less
Submitted 5 July, 2025; v1 submitted 2 July, 2025;
originally announced July 2025.
-
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
Authors:
Chengao Li,
Hanyu Zhang,
Yunkun Xu,
Hongyan Xue,
Xiang Ao,
Qing He
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maxim…
▽ More
Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user's specific needs. Our theoretical analysis demonstrates that GAPO converges towards a Pareto optimal solution for multiple objectives. Empirical results on Mistral-7B show that GAPO outperforms current state-of-the-art methods, achieving superior performance in both helpfulness and harmlessness.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Breaking the $n^{1.5}$ Additive Error Barrier for Private and Efficient Graph Sparsification via Private Expander Decomposition
Authors:
Anders Aamand,
Justin Y. Chen,
Mina Dalirrooyfard,
Slobodan Mitrović,
Yuriy Nevmyvaka,
Sandeep Silwal,
Yinzhan Xu
Abstract:
We study differentially private algorithms for graph cut sparsification, a fundamental problem in algorithms, privacy, and machine learning. While significant progress has been made, the best-known private and efficient cut sparsifiers on $n$-node graphs approximate each cut within $\widetilde{O}(n^{1.5})$ additive error and $1+γ$ multiplicative error for any $γ> 0$ [Gupta, Roth, Ullman TCC'12]. I…
▽ More
We study differentially private algorithms for graph cut sparsification, a fundamental problem in algorithms, privacy, and machine learning. While significant progress has been made, the best-known private and efficient cut sparsifiers on $n$-node graphs approximate each cut within $\widetilde{O}(n^{1.5})$ additive error and $1+γ$ multiplicative error for any $γ> 0$ [Gupta, Roth, Ullman TCC'12]. In contrast, "inefficient" algorithms, i.e., those requiring exponential time, can achieve an $\widetilde{O}(n)$ additive error and $1+γ$ multiplicative error [Eli{á}{š}, Kapralov, Kulkarni, Lee SODA'20]. In this work, we break the $n^{1.5}$ additive error barrier for private and efficient cut sparsification. We present an $(\varepsilon,δ)$-DP polynomial time algorithm that, given a non-negative weighted graph, outputs a private synthetic graph approximating all cuts with multiplicative error $1+γ$ and additive error $n^{1.25 + o(1)}$ (ignoring dependencies on $\varepsilon, δ, γ$).
At the heart of our approach lies a private algorithm for expander decomposition, a popular and powerful technique in (non-private) graph algorithms.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Authors:
Zeyu Huang,
Tianhao Cheng,
Zihan Qiu,
Zili Wang,
Yinghui Xu,
Edoardo M. Ponti,
Ivan Titov
Abstract:
Existing post-training techniques for large language models are broadly categorized into Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each paradigm presents a distinct trade-off: SFT excels at mimicking demonstration data but can lead to problematic generalization as a form of behavior cloning. Conversely, RFT can significantly enhance a model's performance but is prone to lea…
▽ More
Existing post-training techniques for large language models are broadly categorized into Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each paradigm presents a distinct trade-off: SFT excels at mimicking demonstration data but can lead to problematic generalization as a form of behavior cloning. Conversely, RFT can significantly enhance a model's performance but is prone to learn unexpected behaviors, and its performance is highly sensitive to the initial policy. In this paper, we propose a unified view of these methods and introduce Prefix-RFT, a hybrid approach that synergizes learning from both demonstration and exploration. Using mathematical reasoning problems as a testbed, we empirically demonstrate that Prefix-RFT is both simple and effective. It not only surpasses the performance of standalone SFT and RFT but also outperforms parallel mixed-policy RFT methods. A key advantage is its seamless integration into existing open-source frameworks, requiring only minimal modifications to the standard RFT pipeline. Our analysis highlights the complementary nature of SFT and RFT, and validates that Prefix-RFT effectively harmonizes these two learning paradigms. Furthermore, ablation studies confirm the method's robustness to variations in the quality and quantity of demonstration data. We hope this work offers a new perspective on LLM post-training, suggesting that a unified paradigm that judiciously integrates demonstration and exploration could be a promising direction for future research.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Authors:
GLM-V Team,
:,
Wenyi Hong,
Wenmeng Yu,
Xiaotao Gu,
Guo Wang,
Guobing Gan,
Haomiao Tang,
Jiale Cheng,
Ji Qi,
Junhui Ji,
Lihang Pan,
Shuaiqi Duan,
Weihan Wang,
Yan Wang,
Yean Cheng,
Zehai He,
Zhe Su,
Zhen Yang,
Ziyang Pan,
Aohan Zeng,
Baoxu Wang,
Boyan Shi,
Changyu Pang,
Chenhui Zhang
, et al. (54 additional authors not shown)
Abstract:
We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi…
▽ More
We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.
△ Less
Submitted 2 July, 2025; v1 submitted 1 July, 2025;
originally announced July 2025.
-
Hilbert series of second order jets of determinantal varieties
Authors:
Yifan Chen,
Yongxin Xu,
Huaiqing Zuo
Abstract:
In this paper, we will investigate the jet schemes of determinantal varieties. It is quite often the case that the geometric information concerning the jet schemes of an algebraic variety can be described, but the more refined algebraic information is quite mysterious. For example, it is known that computing the Hilbert function associated to a natural grading on these jet schemes is a very hard p…
▽ More
In this paper, we will investigate the jet schemes of determinantal varieties. It is quite often the case that the geometric information concerning the jet schemes of an algebraic variety can be described, but the more refined algebraic information is quite mysterious. For example, it is known that computing the Hilbert function associated to a natural grading on these jet schemes is a very hard problem. The present paper handles a few such computations. It succeeds in computing the Hilbert functions of the second order jet schemes in the case of maximal minors of a $2\times n$ matrix.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Anti-aliasing Algorithm Based on Three-dimensional Display Image
Authors:
Ziyang Liu,
Xingchen Xiao,
Yueyang Xu
Abstract:
3D-display technology has been a promising emerging area with potential to be the core of next-generation display technology. When directly observing unprocessed images and text through a naked-eye 3D display device, severe distortion and jaggedness will be displayed, which will make the display effect much worse. In this work, we try to settle down such degradation with spatial and frequency proc…
▽ More
3D-display technology has been a promising emerging area with potential to be the core of next-generation display technology. When directly observing unprocessed images and text through a naked-eye 3D display device, severe distortion and jaggedness will be displayed, which will make the display effect much worse. In this work, we try to settle down such degradation with spatial and frequency processing, furthermore, we make efforts to extract degenerate function of columnar lens array thus fundamentally eliminating degradation.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Pinching-Antenna Systems with In-Waveguide Attenuation: Performance Analysis and Algorithm Design
Authors:
Yanqing Xu,
Zhiguo Ding,
Robert Schober,
Tsung-Hui Chang
Abstract:
Pinching-antenna systems have emerged as a promising flexible-antenna architecture for next-generation wireless networks, enabling enhanced adaptability and user-centric connectivity through antenna repositioning along waveguides. However, existing studies often overlook in-waveguide signal attenuation and in the literature, there is no comprehensive analysis on whether and under what conditions s…
▽ More
Pinching-antenna systems have emerged as a promising flexible-antenna architecture for next-generation wireless networks, enabling enhanced adaptability and user-centric connectivity through antenna repositioning along waveguides. However, existing studies often overlook in-waveguide signal attenuation and in the literature, there is no comprehensive analysis on whether and under what conditions such an assumption is justified. This paper addresses this gap by explicitly incorporating in-waveguide attenuation into both the system model and algorithm design, and studying its impact on the downlink user data rates. We begin with a single-user scenario and derive a closed-form expression for the globally optimal antenna placement, which reveals how the attenuation coefficient and the user-to-waveguide distance jointly affect the optimal antenna position. Based on this analytical solution, we further provide a theoretical analysis identifying the system conditions under which the in-waveguide attenuation has an insignificant impact on the user achievable rate. The study is then extended to the multi-user multiple-input multiple-output setting, where two efficient algorithms are developed, based on the weighted minimum mean square error method and the maximum ratio combining method, to jointly optimize beamforming and antenna placement. Simulation results validate the efficacy of the proposed algorithms and demonstrate that pinching-antenna systems substantially outperform conventional fixed-antenna baselines, underscoring their potential for future flexible wireless communications.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Visual Textualization for Image Prompted Object Detection
Authors:
Yongjian Wu,
Yang Zhou,
Jiya Saiyin,
Bingzheng Wei,
Yan Xu
Abstract:
We propose VisTex-OVLM, a novel image prompted object detection method that introduces visual textualization -- a process that projects a few visual exemplars into the text feature space to enhance Object-level Vision-Language Models' (OVLMs) capability in detecting rare categories that are difficult to describe textually and nearly absent from their pre-training data, while preserving their pre-t…
▽ More
We propose VisTex-OVLM, a novel image prompted object detection method that introduces visual textualization -- a process that projects a few visual exemplars into the text feature space to enhance Object-level Vision-Language Models' (OVLMs) capability in detecting rare categories that are difficult to describe textually and nearly absent from their pre-training data, while preserving their pre-trained object-text alignment. Specifically, VisTex-OVLM leverages multi-scale textualizing blocks and a multi-stage fusion strategy to integrate visual information from visual exemplars, generating textualized visual tokens that effectively guide OVLMs alongside text prompts. Unlike previous methods, our method maintains the original architecture of OVLM, maintaining its generalization capabilities while enhancing performance in few-shot settings. VisTex-OVLM demonstrates superior performance across open-set datasets which have minimal overlap with OVLM's pre-training data and achieves state-of-the-art results on few-shot benchmarks PASCAL VOC and MSCOCO. The code will be released at https://github.com/WitGotFlg/VisTex-OVLM.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Alleviating CoD in Renewable Energy Profile Clustering Using an Optical Quantum Computer
Authors:
Chengjun Liu,
Yijun Xu,
Wei Gu,
Bo Sun,
Kai Wen,
Shuai Lu,
Lamine Mili
Abstract:
The traditional clustering problem of renewable energy profiles is typically formulated as a combinatorial optimization that suffers from the Curse of Dimensionality (CoD) on classical computers. To address this issue, this paper first proposed a kernel-based quantum clustering method. More specifically, the kernel-based similarity between profiles with minimal intra-group distance is encoded into…
▽ More
The traditional clustering problem of renewable energy profiles is typically formulated as a combinatorial optimization that suffers from the Curse of Dimensionality (CoD) on classical computers. To address this issue, this paper first proposed a kernel-based quantum clustering method. More specifically, the kernel-based similarity between profiles with minimal intra-group distance is encoded into the ground-state of the Hamiltonian in the form of an Ising model. Then, this NP-hard problem can be reformulated into a Quadratic Unconstrained Binary Optimization (QUBO), which a Coherent Ising Machine (CIM) can naturally solve with significant improvement over classical computers. The test results from a real optical quantum computer verify the validity of the proposed method. It also demonstrates its ability to address CoD in an NP-hard clustering problem.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Realization of a functioning dual-type trapped-ion quantum network node
Authors:
Y. -Y. Huang,
L. Feng,
Y. -K. Wu,
Y. -L. Xu,
L. Zhang,
Z. -B. Cui,
C. -X. Huang,
C. Zhang,
S. -A. Guo,
Q. -X. Mei,
B. -X. Qi,
Y. Xu,
Y. -F. Pu,
Z. -C. Zhou,
L. -M. Duan
Abstract:
Trapped ions constitute a promising platform for implementation of a quantum network. Recently, a dual-type qubit scheme has been realized in a quantum network node where the communication qubits and the memory qubits are encoded in different energy levels of the same ion species, such that the generation of ion-photon entanglement on the communication qubits has negligible crosstalk error on the…
▽ More
Trapped ions constitute a promising platform for implementation of a quantum network. Recently, a dual-type qubit scheme has been realized in a quantum network node where the communication qubits and the memory qubits are encoded in different energy levels of the same ion species, such that the generation of ion-photon entanglement on the communication qubits has negligible crosstalk error on the preloaded quantum information in the memory qubits. However, to achieve the versatile applications of a quantum network, a crucial component of the dual-type node, namely the entangling gate between the communication and the memory qubits, is still missing. Here we report a dual-type quantum network node equipped with ion-photon entanglement generation, crosstalk-free quantum memory and entangling gates between the dual-type qubits simultaneously. We demonstrate its practical applications including the quantum state teleportation and the preparation of multipartite entangled state. Our work achieves the necessary components of a dual-type quantum network node and paves the way toward its applications in a large-scale quantum internet.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound
Authors:
Yuhao Huang,
Yueyue Xu,
Haoran Dou,
Jiaxiao Deng,
Xin Yang,
Hongyu Zheng,
Dong Ni
Abstract:
Congenital uterine anomalies (CUAs) can lead to infertility, miscarriage, preterm birth, and an increased risk of pregnancy complications. Compared to traditional 2D ultrasound (US), 3D US can reconstruct the coronal plane, providing a clear visualization of the uterine morphology for assessing CUAs accurately. In this paper, we propose an intelligent system for simultaneous automated plane locali…
▽ More
Congenital uterine anomalies (CUAs) can lead to infertility, miscarriage, preterm birth, and an increased risk of pregnancy complications. Compared to traditional 2D ultrasound (US), 3D US can reconstruct the coronal plane, providing a clear visualization of the uterine morphology for assessing CUAs accurately. In this paper, we propose an intelligent system for simultaneous automated plane localization and CUA diagnosis. Our highlights are: 1) we develop a denoising diffusion model with local (plane) and global (volume/text) guidance, using an adaptive weighting strategy to optimize attention allocation to different conditions; 2) we introduce a reinforcement learning-based framework with unsupervised rewards to extract the key slice summary from redundant sequences, fully integrating information across multiple planes to reduce learning difficulty; 3) we provide text-driven uncertainty modeling for coarse prediction, and leverage it to adjust the classification probability for overall performance improvement. Extensive experiments on a large 3D uterine US dataset show the efficacy of our method, in terms of plane localization and CUA diagnosis. Code is available at https://github.com/yuhoo0302/CUA-US.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data
Authors:
Yu Zhang,
Ruijie Yu,
Jidong Tian,
Feng Zhu,
Jiapeng Liu,
Xiaokang Yang,
Yaohui Jin,
Yanyan Xu
Abstract:
With the increasing interest in robotic synthesis in the context of organic chemistry, the automated extraction of chemical procedures from literature is critical. However, this task remains challenging due to the inherent ambiguity of chemical language and the high cost of human annotation required for developing reliable computer-aided extraction protocols. Here, we present ChemActor, a fully fi…
▽ More
With the increasing interest in robotic synthesis in the context of organic chemistry, the automated extraction of chemical procedures from literature is critical. However, this task remains challenging due to the inherent ambiguity of chemical language and the high cost of human annotation required for developing reliable computer-aided extraction protocols. Here, we present ChemActor, a fully fine-tuned large language model (LLM), as a chemical executor to convert between unstructured experimental procedures and structured action sequences. We propose a sequential LLM-generated data framework to address the challenges of insufficient and low-quality annotated data. This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input. Additionally, we introduce a novel multi-round LLMs circle review metric, which reflects the model's advanced understanding of chemical experimental procedures. Extensive experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor, augmented by LLM-generated data, achieves state-of-the-art performance, outperforming the baseline model by 10%. The code is available at: https://github.com/Zhanghahah/ChemActor.
△ Less
Submitted 1 July, 2025; v1 submitted 30 June, 2025;
originally announced June 2025.
-
Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis
Authors:
Zelin Zang,
WenZhe Li,
Fei Chen,
Yongjie Xu,
Chang Yu,
Zhen Lei,
Stan Z. Li
Abstract:
In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding complex biological processes. Key to this is the modeling and generation of hierarchical data that represents the intrinsic structure within datasets. Traditional methods face limitations in terms of computational cost, performance, generative capacity, and stability…
▽ More
In single-cell research, tracing and analyzing high-throughput single-cell differentiation trajectories is crucial for understanding complex biological processes. Key to this is the modeling and generation of hierarchical data that represents the intrinsic structure within datasets. Traditional methods face limitations in terms of computational cost, performance, generative capacity, and stability. Recent VAEs based approaches have made strides in addressing these challenges but still require specialized network modules for each tree branch, limiting their stability and ability to capture deep hierarchical relationships. To overcome these challenges, we introduce diffusion-based approach called HDTree. HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes to model tree node transitions. This method improves stability by eliminating branch-specific modules and enhancing generative capacity through gradual hierarchical changes simulated by the diffusion process. HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets, where it outperforms existing methods in terms of accuracy and performance. These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths and offering insights for downstream biological tasks. The code of HDTree is available at anonymous link https://anonymous.4open.science/r/code_HDTree_review-A8DB.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship
Authors:
Yue Xu,
Wenjie Wang
Abstract:
Multimodal large language models (MLLMs) have shown impressive capabilities across tasks involving both visual and textual modalities. However, growing concerns remain about their potential to encode and amplify gender bias, particularly in socially sensitive applications. Existing benchmarks predominantly evaluate bias in isolated scenarios, overlooking how bias may emerge subtly through interper…
▽ More
Multimodal large language models (MLLMs) have shown impressive capabilities across tasks involving both visual and textual modalities. However, growing concerns remain about their potential to encode and amplify gender bias, particularly in socially sensitive applications. Existing benchmarks predominantly evaluate bias in isolated scenarios, overlooking how bias may emerge subtly through interpersonal interactions. We fill this gap by going beyond single-entity evaluation and instead focusing on a deeper examination of relational and contextual gender bias in dual-individual interactions. We introduce Genres, a novel benchmark designed to evaluate gender bias in MLLMs through the lens of social relationships in generated narratives. Genres assesses gender bias through a dual-character profile and narrative generation task that captures rich interpersonal dynamics and supports a fine-grained bias evaluation suite across multiple dimensions. Experiments on both open- and closed-source MLLMs reveal persistent, context-sensitive gender biases that are not evident in single-character settings. Our findings underscore the importance of relationship-aware benchmarks for diagnosing subtle, interaction-driven gender bias in MLLMs and provide actionable insights for future bias mitigation.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Best approximation by polynomials on the conic domains
Authors:
Yan Ge,
Yuan Xu
Abstract:
A new modulus of smoothness and its equivalent $K$-function are defined on the conic domains in $\mathbb{R}^d$, and used to characterize the weighted best approximation by polynomials. Both direct and weak inverse theorems of the characterization are established via the modulus of smoothness. For the conic surface $\mathbb{V}_0^{d+1} = \{(x,t): \|x\| = t\le 1\}$, the natural weight function is…
▽ More
A new modulus of smoothness and its equivalent $K$-function are defined on the conic domains in $\mathbb{R}^d$, and used to characterize the weighted best approximation by polynomials. Both direct and weak inverse theorems of the characterization are established via the modulus of smoothness. For the conic surface $\mathbb{V}_0^{d+1} = \{(x,t): \|x\| = t\le 1\}$, the natural weight function is $t^{-1}(1-t)^γ$, which has a singularity at the apex, the rotational part of the modulus of smoothness is defined in terms of the difference operator in Euler angles with an increment $h/\sqrt{t}$, akin to the Ditzian-Totik modulus on the interval but with $\sqrt{t}$ in the denominator, which captures the singularity at the apex.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Point Cloud Compression and Objective Quality Assessment: A Survey
Authors:
Yiling Xu,
Yujie Zhang,
Shuting Xia,
Kaifa Yang,
He Huang,
Ziyu Shan,
Wenjie Huang,
Qi Yang,
Le Yang
Abstract:
The rapid growth of 3D point cloud data, driven by applications in autonomous driving, robotics, and immersive environments, has led to criticals demand for efficient compression and quality assessment techniques. Unlike traditional 2D media, point clouds present unique challenges due to their irregular structure, high data volume, and complex attributes. This paper provides a comprehensive survey…
▽ More
The rapid growth of 3D point cloud data, driven by applications in autonomous driving, robotics, and immersive environments, has led to criticals demand for efficient compression and quality assessment techniques. Unlike traditional 2D media, point clouds present unique challenges due to their irregular structure, high data volume, and complex attributes. This paper provides a comprehensive survey of recent advances in point cloud compression (PCC) and point cloud quality assessment (PCQA), emphasizing their significance for real-time and perceptually relevant applications. We analyze a wide range of handcrafted and learning-based PCC algorithms, along with objective PCQA metrics. By benchmarking representative methods on emerging datasets, we offer detailed comparisons and practical insights into their strengths and limitations. Despite notable progress, challenges such as enhancing visual fidelity, reducing latency, and supporting multimodal data remain. This survey outlines future directions, including hybrid compression frameworks and advanced feature extraction strategies, to enable more efficient, immersive, and intelligent 3D applications.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Neural Cellular Automata: From Cells to Pixels
Authors:
Ehsan Pajouheshgar,
Yitao Xu,
Ali Abbasi,
Alexander Mordvintsev,
Wenzel Jakob,
Sabine Süsstrunk
Abstract:
Neural Cellular Automata (NCAs) are bio-inspired systems in which identical cells self-organize to form complex and coherent patterns by repeatedly applying simple local rules. NCAs display striking emergent behaviors including self-regeneration, generalization and robustness to unseen situations, and spontaneous motion. Despite their success in texture synthesis and morphogenesis, NCAs remain lar…
▽ More
Neural Cellular Automata (NCAs) are bio-inspired systems in which identical cells self-organize to form complex and coherent patterns by repeatedly applying simple local rules. NCAs display striking emergent behaviors including self-regeneration, generalization and robustness to unseen situations, and spontaneous motion. Despite their success in texture synthesis and morphogenesis, NCAs remain largely confined to low-resolution grids. This limitation stems from (1) training time and memory requirements that grow quadratically with grid size, (2) the strictly local propagation of information which impedes long-range cell communication, and (3) the heavy compute demands of real-time inference at high resolution. In this work, we overcome this limitation by pairing NCA with a tiny, shared implicit decoder, inspired by recent advances in implicit neural representations. Following NCA evolution on a coarse grid, a lightweight decoder renders output images at arbitrary resolution. We also propose novel loss functions for both morphogenesis and texture synthesis tasks, specifically tailored for high-resolution output with minimal memory and computation overhead. Combining our proposed architecture and loss functions brings substantial improvement in quality, efficiency, and performance. NCAs equipped with our implicit decoder can generate full-HD outputs in real time while preserving their self-organizing, emergent properties. Moreover, because each MLP processes cell states independently, inference remains highly parallelizable and efficient. We demonstrate the applicability of our approach across multiple NCA variants (on 2D, 3D grids, and 3D meshes) and multiple tasks, including texture generation and morphogenesis (growing patterns from a seed), showing that with our proposed framework, NCAs seamlessly scale to high-resolution outputs with minimal computational overhead.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication
Authors:
Jinliang Shi,
Shigang Li,
Youxuan Xu,
Xueying Wang,
Rongtian Fu,
Zhi Ma,
Tong Wu
Abstract:
Sparse matrix multiplication operators (i.e., SpMM and SDDMM) are widely used in deep learning and scientific computing. Modern accelerators are commonly equipped with Tensor cores and CUDA cores to accelerate sparse operators. The former brings superior computing power but only for structured matrix multiplication, while the latter has relatively lower performance but with higher programming flex…
▽ More
Sparse matrix multiplication operators (i.e., SpMM and SDDMM) are widely used in deep learning and scientific computing. Modern accelerators are commonly equipped with Tensor cores and CUDA cores to accelerate sparse operators. The former brings superior computing power but only for structured matrix multiplication, while the latter has relatively lower performance but with higher programming flexibility. In this work, we discover that utilizing one resource alone leads to inferior performance for sparse matrix multiplication, due to their respective limitations. To this end, we propose Libra, a systematic approach that enables synergistic computation between CUDA and Tensor cores to achieve the best performance for sparse matrix multiplication. Specifically, we propose a 2D-aware workload distribution strategy to find out the sweet point of task mapping for different sparse operators, leveraging both the high performance of Tensor cores and the low computational redundancy on CUDA cores. In addition, Libra incorporates systematic optimizations for heterogeneous computing, including hybrid load-balancing, finely optimized kernel implementations, and GPU-accelerated preprocessing. Extensive experimental results on H100 and RTX 4090 GPUs show that Libra outperforms the state-of-the-art by on average 3.1x (up to 9.23x) over DTC-SpMM and 2.9x (up to 3.9x) for end-to-end GNN applications. Libra opens up a new perspective for sparse operator acceleration by fully exploiting the heterogeneous computing resources on GPUs.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization
Authors:
Osama Zafar,
Mina Namazi,
Yuqiao Xu,
Youngjin Yoo,
Erman Ayday
Abstract:
In the current paradigm of digital personalized services, the centralized management of personal data raises significant privacy concerns, security vulnerabilities, and diminished individual autonomy over sensitive information. Despite their efficiency, traditional centralized architectures frequently fail to satisfy rigorous privacy requirements and expose users to data breaches and unauthorized…
▽ More
In the current paradigm of digital personalized services, the centralized management of personal data raises significant privacy concerns, security vulnerabilities, and diminished individual autonomy over sensitive information. Despite their efficiency, traditional centralized architectures frequently fail to satisfy rigorous privacy requirements and expose users to data breaches and unauthorized access risks. This pressing challenge calls for a fundamental paradigm shift in methodologies for collecting, storing, and utilizing personal data across diverse sectors, including education, healthcare, and finance.
This paper introduces a novel decentralized, privacy-preserving architecture that handles heterogeneous personal information, ranging from educational credentials to health records and financial data. Unlike traditional models, our system grants users complete data ownership and control, allowing them to selectively share information without compromising privacy. The architecture's foundation comprises advanced privacy-enhancing technologies, including secure enclaves and federated learning, enabling secure computation, verification, and data sharing. The system supports diverse functionalities, including local computation, model training, and privacy-preserving data sharing, while ensuring data credibility and robust user privacy.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Preconditioned Conjugate Gradient for MIMO-AFDM System
Authors:
Jun Zhu,
Yin Xu,
Dazhi He,
Haoyang Li,
Yunfeng Guan,
Wenjun Zhang
Abstract:
Affine frequency division multiplexing (AFDM) is a promising chirp-assisted multicarrier waveform for future high mobility communications. A significant challenge in MIMO-AFDM systems is the multi-user interference (MUI), which can be effectively addressed by employing precoding techniques. However, the complexity introduced by AFDM makes the precoding process computationally expensive and challen…
▽ More
Affine frequency division multiplexing (AFDM) is a promising chirp-assisted multicarrier waveform for future high mobility communications. A significant challenge in MIMO-AFDM systems is the multi-user interference (MUI), which can be effectively addressed by employing precoding techniques. However, the complexity introduced by AFDM makes the precoding process computationally expensive and challenging. To overcome this issue, We combine AFDM channel sparse property and using Preconditioned Conjugate Gradient (PCG) method to iteratively process the precoding, thereby reducing the complexity of the precoding design. Simulation results demonstrate that the proposed sparsification approach, coupled with the PCG method, achieving quite precoding performance while significantly reducing computational complexity. This makes the application of AFDM more feasible and efficient for high-mobility communication scenarios, paving the way for its broader implementation in next-generation communication systems.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Score-Based Model for Low-Rank Tensor Recovery
Authors:
Zhengyun Cheng,
Changhao Wang,
Guanwen Zhang,
Yi Xu,
Wei Zhou,
Xiangyang Ji
Abstract:
Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these can be viewed as using Dirac delta distributions to model the relationships between shared factors and the low-rank tensor. However, such prior knowledge is rare…
▽ More
Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these can be viewed as using Dirac delta distributions to model the relationships between shared factors and the low-rank tensor. However, such prior knowledge is rarely available in practical scenarios, particularly regarding the optimal rank structure and contraction rules. The optimization procedures based on fixed contraction rules are complex, and approximations made during these processes often lead to accuracy loss. To address this issue, we propose a score-based model that eliminates the need for predefined structural or distributional assumptions, enabling the learning of compatibility between tensors and shared factors. Specifically, a neural network is designed to learn the energy function, which is optimized via score matching to capture the gradient of the joint log-probability of tensor entries and shared factors. Our method allows for modeling structures and distributions beyond the Dirac delta assumption. Moreover, integrating the block coordinate descent (BCD) algorithm with the proposed smooth regularization enables the model to perform both tensor completion and denoising. Experimental results demonstrate significant performance improvements across various tensor types, including sparse and continuous-time tensors, as well as visual data.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
EAMamba: Efficient All-Around Vision State Space Model for Image Restoration
Authors:
Yu-Cheng Lin,
Yu-Syuan Xu,
Hao-Wei Chen,
Hsien-Kai Kuo,
Chun-Yi Lee
Abstract:
Image restoration is a key task in low-level computer vision that aims to reconstruct high-quality images from degraded inputs. The emergence of Vision Mamba, which draws inspiration from the advanced state space model Mamba, marks a significant advancement in this field. Vision Mamba demonstrates excellence in modeling long-range dependencies with linear complexity, a crucial advantage for image…
▽ More
Image restoration is a key task in low-level computer vision that aims to reconstruct high-quality images from degraded inputs. The emergence of Vision Mamba, which draws inspiration from the advanced state space model Mamba, marks a significant advancement in this field. Vision Mamba demonstrates excellence in modeling long-range dependencies with linear complexity, a crucial advantage for image restoration tasks. Despite its strengths, Vision Mamba encounters challenges in low-level vision tasks, including computational complexity that scales with the number of scanning sequences and local pixel forgetting. To address these limitations, this study introduces Efficient All-Around Mamba (EAMamba), an enhanced framework that incorporates a Multi-Head Selective Scan Module (MHSSM) with an all-around scanning mechanism. MHSSM efficiently aggregates multiple scanning sequences, which avoids increases in computational complexity and parameter count. The all-around scanning strategy implements multiple patterns to capture holistic information and resolves the local pixel forgetting issue. Our experimental evaluations validate these innovations across several restoration tasks, including super resolution, denoising, deblurring, and dehazing. The results validate that EAMamba achieves a significant 31-89% reduction in FLOPs while maintaining favorable performance compared to existing low-level Vision Mamba methods.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
Authors:
Jiahui Zhang,
Yurui Chen,
Yueming Xu,
Ze Huang,
Yanpeng Zhou,
Yu-Jie Yuan,
Xinyue Cai,
Guowei Huang,
Xingyue Quan,
Hang Xu,
Li Zhang
Abstract:
Leveraging diverse robotic data for pretraining remains a critical challenge. Existing methods typically model the dataset's action distribution using simple observations as inputs. However, these inputs are often incomplete, resulting in a dispersed conditional action distribution-an issue we refer to as coordinate system chaos and state chaos. This inconsistency significantly hampers pretraining…
▽ More
Leveraging diverse robotic data for pretraining remains a critical challenge. Existing methods typically model the dataset's action distribution using simple observations as inputs. However, these inputs are often incomplete, resulting in a dispersed conditional action distribution-an issue we refer to as coordinate system chaos and state chaos. This inconsistency significantly hampers pretraining efficiency. To address this, we propose 4D-VLA, a novel approach that effectively integrates 4D information into the input to mitigate these sources of chaos. Our model introduces depth and temporal information into visual features with sequential RGB-D inputs, aligning the coordinate systems of the robot and the scene. This alignment endows the model with strong spatiotemporal reasoning capabilities while minimizing training overhead. Additionally, we introduce memory bank sampling, a frame sampling strategy designed to extract informative frames from historical images, further improving effectiveness and efficiency. Experimental results demonstrate that our pretraining method and architectural components substantially enhance model performance. In both simulated and real-world experiments, our model achieves a significant increase in success rate over OpenVLA. To further assess spatial perception and generalization to novel views, we introduce MV-Bench, a multi-view simulation benchmark. Our model consistently outperforms existing methods, demonstrating stronger spatial understanding and adaptability.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Function space induced by no arbitrage
Authors:
Kihun Nam,
Yunxi Xu
Abstract:
In this article, we show necessary and sufficient conditions for a function to transform a continuous Markov semimartingale to a semimartingale. As a result, the no-arbitrage principle guarantees the differentiability of asset prices with respect to the underlying noise, if the asset prices are continuous and the underlying noise is a continuous Markov semimartingale.
In this article, we show necessary and sufficient conditions for a function to transform a continuous Markov semimartingale to a semimartingale. As a result, the no-arbitrage principle guarantees the differentiability of asset prices with respect to the underlying noise, if the asset prices are continuous and the underlying noise is a continuous Markov semimartingale.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Low-Rank Implicit Neural Representation via Schatten-p Quasi-Norm and Jacobian Regularization
Authors:
Zhengyun Cheng,
Changhao Wang,
Guanwen Zhang,
Yi Xu,
Wei Zhou,
Xiangyang Ji
Abstract:
Higher-order tensors are well-suited for representing multi-dimensional data, such as color images and videos. Low-rank tensor representation has become essential in machine learning and computer vision, but existing methods like Tucker decomposition offer flexibility at the expense of interpretability. In contrast, while the CANDECOMP/PARAFAC (CP) decomposition provides a more natural and interpr…
▽ More
Higher-order tensors are well-suited for representing multi-dimensional data, such as color images and videos. Low-rank tensor representation has become essential in machine learning and computer vision, but existing methods like Tucker decomposition offer flexibility at the expense of interpretability. In contrast, while the CANDECOMP/PARAFAC (CP) decomposition provides a more natural and interpretable tensor structure, obtaining sparse solutions remains challenging. Leveraging the rich properties of CP decomposition, we propose a CP-based low-rank tensor function parameterized by neural networks for implicit neural representation (CP-INR). This approach enables continuous data representation beyond structured grids, fully exploiting the non-linearity of tensor data with theoretical guarantees on excess risk bounds. To achieve a sparse CP decomposition, we introduce a variational form of the Schatten-p quasi-norm and prove its relationship to multilinear rank minimization. For smoothness, we propose a regularization term based on the spectral norm of the Jacobian and Hutchinson's trace estimator. Our proposed smoothness regularization is SVD-free and avoids explicit chain rule derivations. It can serve as an alternative to Total Variation (TV) regularization in image denoising tasks and is naturally applicable to continuous data. Extensive experiments on multi-dimensional data recovery tasks, including image inpainting, denoising, and point cloud upsampling, demonstrate the superiority and versatility of our method compared to state-of-the-art approaches.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Lost at the Beginning of Reasoning
Authors:
Baohao Liao,
Xinyi Chen,
Sara Rajaee,
Yuhui Xu,
Christian Herold,
Anders Søgaard,
Maarten de Rijke,
Christof Monz
Abstract:
Recent advancements in large language models (LLMs) have significantly advanced complex reasoning capabilities, particularly through extended chain-of-thought (CoT) reasoning that incorporates mechanisms such as backtracking, self-reflection and self-correction. Despite these developments, the self-correction abilities of LLMs during long CoT reasoning remain underexplored. And recent findings on…
▽ More
Recent advancements in large language models (LLMs) have significantly advanced complex reasoning capabilities, particularly through extended chain-of-thought (CoT) reasoning that incorporates mechanisms such as backtracking, self-reflection and self-correction. Despite these developments, the self-correction abilities of LLMs during long CoT reasoning remain underexplored. And recent findings on overthinking suggest that such models often engage in unnecessarily redundant reasoning. In this work, we empirically show that the first reasoning step exerts a disproportionately large influence on the final prediction - errors introduced at this stage can substantially degrade subsequent reasoning quality. This phenomenon is consistently observed across two state-of-the-art open-source reasoning model families: DeepSeek-R1 and Qwen3. To address this, we propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps while discarding suboptimal ones, achieving up to a 70% reduction in inference cost without sacrificing accuracy. Finally, we introduce a new benchmark specifically constructed with deliberately flawed first reasoning steps to systematically evaluate model self-correction capabilities, offering a foundation for future research on robust reasoning in LLMs.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Joint Task Offloading and Resource Allocation in Low-Altitude MEC via Graph Attention Diffusion
Authors:
Yifan Xue,
Ruihuai Liang,
Bo Yang,
Xuelin Cao,
Zhiwen Yu,
Mérouane Debbah,
Chau Yuen
Abstract:
With the rapid development of the low-altitude economy, air-ground integrated multi-access edge computing (MEC) systems are facing increasing demands for real-time and intelligent task scheduling. In such systems, task offloading and resource allocation encounter multiple challenges, including node heterogeneity, unstable communication links, and dynamic task variations. To address these issues, t…
▽ More
With the rapid development of the low-altitude economy, air-ground integrated multi-access edge computing (MEC) systems are facing increasing demands for real-time and intelligent task scheduling. In such systems, task offloading and resource allocation encounter multiple challenges, including node heterogeneity, unstable communication links, and dynamic task variations. To address these issues, this paper constructs a three-layer heterogeneous MEC system architecture for low-altitude economic networks, encompassing aerial and ground users as well as edge servers. The system is systematically modeled from the perspectives of communication channels, computational costs, and constraint conditions, and the joint optimization problem of offloading decisions and resource allocation is uniformly abstracted into a graph-structured modeling task. On this basis, we propose a graph attention diffusion-based solution generator (GADSG). This method integrates the contextual awareness of graph attention networks with the solution distribution learning capability of diffusion models, enabling joint modeling and optimization of discrete offloading variables and continuous resource allocation variables within a high-dimensional latent space. We construct multiple simulation datasets with varying scales and topologies. Extensive experiments demonstrate that the proposed GADSG model significantly outperforms existing baseline methods in terms of optimization performance, robustness, and generalization across task structures, showing strong potential for efficient task scheduling in dynamic and complex low-altitude economic network environments.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Lightweight Fingernail Haptic Device: Unobstructed Fingerpad Force and Vibration Feedback for Enhanced Virtual Dexterous Manipulation
Authors:
Yunxiu Xu,
Siyu Wang,
Shoichi Hasegawa
Abstract:
This study presents a lightweight, wearable fingertip haptic device that provides physics-based haptic feedback for dexterous manipulation in virtual environments without hindering real-world interactions. The device, designed with thin strings and actuators attached to the fingernails, ensures minimal weight (1.55 g per finger) and preserves finger flexibility. Integrating the software with a phy…
▽ More
This study presents a lightweight, wearable fingertip haptic device that provides physics-based haptic feedback for dexterous manipulation in virtual environments without hindering real-world interactions. The device, designed with thin strings and actuators attached to the fingernails, ensures minimal weight (1.55 g per finger) and preserves finger flexibility. Integrating the software with a physics engine renders multiple types of haptic feedback (grip force, collision, and sliding vibration feedback). We evaluated the device's performance in pressure perception, slip feedback, typical dexterous manipulation tasks, and daily operations, and we gathered user experience through subjective assessments. Our results show that participants could perceive and respond to pressure and vibration feedback. Through dexterous manipulation experiments, we further demonstrated that these minimal haptic cues significantly improved virtual task efficiency, showcasing how lightweight haptic feedback can enhance manipulation performance without complex mechanisms. The device's ability to preserve tactile sensations and minimize hindrance to real-world operations is a key advantage over glove-type haptic devices. This research offers a potential solution for designing haptic interfaces that balance lightweight construction, haptic feedback for dexterous manipulation, and daily wearability.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.