Search | arXiv e-print repository

Pinching-Antenna-Assisted Index Modulation: Channel Modeling, Transceiver Design, and Performance Analysis

Authors: Shuaixin Yang, Yijia Li, Yue Xiao, Yong Liang Guan, Xianfu Lei, Zhiguo Ding

Abstract: In this paper, a novel pinching-antenna assisted index modulation (PA-IM) scheme is proposed for improving the spectral efficiency without increasing the hardware complexity, where the information bits are conveyed not only by the conventional M-ary quadrature amplitude modulation (QAM) symbols but also by the indices of pinching antenna (PA) position patterns. To realize the full potential of thi… ▽ More In this paper, a novel pinching-antenna assisted index modulation (PA-IM) scheme is proposed for improving the spectral efficiency without increasing the hardware complexity, where the information bits are conveyed not only by the conventional M-ary quadrature amplitude modulation (QAM) symbols but also by the indices of pinching antenna (PA) position patterns. To realize the full potential of this scheme, this paper focuses on the comprehensive transceiver design, addressing key challenges in signal detection at the receiver and performance optimization at thetransmitter. First, a comprehensive channel model is formulated for this architecture, which sophisticatedly integrates the deterministic in-waveguide propagation effects with the stochastic nature of wireless channels, including both largescale path loss and small-scale fading. Next, to overcome the prohibitive complexity of optimal maximum likelihood (ML) detection, a low-complexity box-optimized sphere decoding (BOSD) algorithm is designed, which adaptively prunes the search space whilst preserving optimal ML performance. Furthermore, an analytical upper bound on the bit error rate (BER) is derived and validated by the simulations. Moreover, a new transmit precoding method is designed using manifold optimization, which minimizes the BER by jointly optimizing the complex-valued precoding coefficients across the waveguides for the sake of maximizing the minimum Euclidean distance of all received signal points. Finally, the simulation results demonstrate that the proposed PA-IM scheme attains a significant performance gain over its conventional counterparts and that the overall BER of the pinching-antenna system is substantially improved by the proposed precoding design. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2507.02348 [pdf, ps, other]

Joint Radiation Power, Antenna Position, and Beamforming Optimization for Pinching-Antenna Systems with Motion Power Consumption

Authors: Yiming Xu, Dongfang Xu, Xianghao Yu, Shenghui Song, Zhiguo Ding, Robert Schober

Abstract: Pinching-antenna systems (PASS) have been recently proposed to improve the performance of wireless networks by reconfiguring both the large-scale and small-scale channel conditions. However, existing studies ignore the physical constraints of antenna placement and assume fixed antenna radiation power. To fill this research gap, this paper investigates the design of PASS taking into account the mot… ▽ More Pinching-antenna systems (PASS) have been recently proposed to improve the performance of wireless networks by reconfiguring both the large-scale and small-scale channel conditions. However, existing studies ignore the physical constraints of antenna placement and assume fixed antenna radiation power. To fill this research gap, this paper investigates the design of PASS taking into account the motion power consumption of pinching-antennas (PAs) and the impact of adjustable antenna radiation power. To that end, we minimize the average power consumption for a given quality-of-service (QoS) requirement, by jointly optimizing the antenna positions, antenna radiation power ratios, and transmit beamforming. To the best of the authors' knowledge, this is the first work to consider radiation power optimization in PASS, which provides an additional degree of freedom (DoF) for system design. The cases with both continuous and discrete antenna placement are considered, where the main challenge lies in the fact that the antenna positions affect both the magnitude and phase of the channel coefficients of PASS, making system optimization very challenging. To tackle the resulting unique obstacles, an alternating direction method of multipliers (ADMM)-based framework is proposed to solve the problem for continuous antenna movement, while its discrete counterpart is formulated as a mixed integer nonlinear programming (MINLP) problem and solved by the block coordinate descent (BCD) method. Simulation results validate the performance enhancement achieved by incorporating PA movement power assumption and adjustable radiation power into PASS design, while also demonstrating the efficiency of the proposed optimization framework. The benefits of PASS over conventional multiple-input multiple-output (MIMO) systems in mitigating the large-scale path loss and inter-user interference is also revealed. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: 13 pages

arXiv:2507.01716 [pdf, ps, other]

Embedding a Praeger-Xu graph into a surface

Authors: Zhaochen Ding, Zheng Guo, Luyi Liu

Abstract: Rotary maps (orientably regular maps) are highly symmetric graph embeddings on orientable surfaces. This paper classifies all rotary maps whose underlying graphs are Praeger-Xu graphs, denoted $\operatorname{C}(p,r,s)$, for any odd prime $p$ that does not divide $r$. Our main result establishes a one-to-one correspondence between the isomorphism classes of these maps and the multiplicity-free repr… ▽ More Rotary maps (orientably regular maps) are highly symmetric graph embeddings on orientable surfaces. This paper classifies all rotary maps whose underlying graphs are Praeger-Xu graphs, denoted $\operatorname{C}(p,r,s)$, for any odd prime $p$ that does not divide $r$. Our main result establishes a one-to-one correspondence between the isomorphism classes of these maps and the multiplicity-free representations of the dihedral group $\operatorname{D}_{2r}$ over the finite field $\mathbb{F}_p$. This work extends a recent classification for the case where $p=2$. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: 27 pages

MSC Class: 05C25; 20B05; 20C15

arXiv:2507.01593 [pdf, ps, other]

DESI DR2 reference mocks: clustering results from Uchuu-BGS and LRG

Authors: E. Fernández-García, F. Prada, A. Smith, J. DeRose, A. J. Ross, S. Bailey, M. S. Wang, Z. Ding, C. Guandalin, C. Lamman, R. Vaisakh, R. Kehoe, J. Lasker, T. Ishiyama, S. M. Moore, S. Cole, M. Siudek, A. Amalbert, A. Salcedo, A. Hearin, B. Joachimi, A. Rocher, S. Saito, A. Krolewski, Z. Slepian , et al. (42 additional authors not shown)

Abstract: The aim of this work is to construct mock galaxy catalogues that accurately reproduce the redshift evolution of galaxy number density, clustering statistics, and baryonic properties, such as stellar mass for luminous red galaxies (LRGs) and absolute magnitude in the $r$-band for the bright galaxy sample (BGS), based on the first three years of observations from the Dark Energy Spectroscopic Instru… ▽ More The aim of this work is to construct mock galaxy catalogues that accurately reproduce the redshift evolution of galaxy number density, clustering statistics, and baryonic properties, such as stellar mass for luminous red galaxies (LRGs) and absolute magnitude in the $r$-band for the bright galaxy sample (BGS), based on the first three years of observations from the Dark Energy Spectroscopic Instrument (DESI). To achieve this, we applied the subhalo abundance matching (SHAM) technique to the Uchuu $N$-body simulation, which follows the evolution of 2.1 trillion particles within a volume of $8\,h^{-3}\,\mathrm{Gpc}^{3}$, assuming a Planck base-$Λ$CDM cosmology. Using SHAM, we populated Uchuu subhalos with LRGs and BGS-BRIGHT ($r<19.5$) galaxies up to redshift $z=1.1$, assigning stellar masses to LRGs and luminosities to BGS galaxies (up to $M_{\rm r}\leq 20$). Furthermore, we analyzed the clustering dependence on stellar mass and luminosity for each tracer. Our results show that the Uchuu BGS-BRIGHT and LRG mocks accurately reproduce the observed redshift evolution of clustering, with better than 5\% agreement for separations of $1<r<20\,h^{-1}\,\mathrm{Mpc}$ and below 10\% for $0.1<r<1\,h^{-1}\,\mathrm{Mpc}$. For the Uchuu-LRG mock, we successfully captured the stellar mass dependence of clustering, while for the Uchuu-BGS mock, we replicated the clustering for various volume-limited subsamples. We also find good agreement between the data and mocks in the dependence of large-scale bias on luminosity for BGS-BRIGHT galaxies and on stellar mass for LRGs. Altogether, these results equip DESI with robust tools for generating high-fidelity lightcones for the remainder of the survey, thereby enhancing our understanding of the galaxy--halo connection. △ Less

Submitted 2 July, 2025; originally announced July 2025.

Comments: 22 paages, 14 figures

arXiv:2507.00954 [pdf, ps, other]

Inverse Velocity Dispersion of Solar Energetic Protons Observed by Solar Orbiter and Its Shock Acceleration Explanation

Authors: Yuncong Li, Jingnan Guo, Daniel Pacheco, Yuming Wang, Manuela Temmer, Zheyi Ding, Robert F. Wimmer-Schweingruber

Abstract: The particle acceleration and transport process during solar eruptions is one of the critical and long-standing problems in space plasma physics. Through decades of research, it is well accepted that particles with higher energies released during a solar eruption arrive at observers earlier than the particles with lower energies, forming a well-known structure in the dynamic energy spectrum called… ▽ More The particle acceleration and transport process during solar eruptions is one of the critical and long-standing problems in space plasma physics. Through decades of research, it is well accepted that particles with higher energies released during a solar eruption arrive at observers earlier than the particles with lower energies, forming a well-known structure in the dynamic energy spectrum called particle velocity dispersion (VD), as frequently observed by space missions. However, this picture is challenged by new observations from NASA's Parker Solar Probe and ESA's Solar Orbiter which show an unexpected inverse velocity dispersion (IVD) phenomenon, where particles with higher-energies arrive later at the observer. Facing on the challenge, we here report the recent discovery of such IVD structures with 10 solar energetic proton events observed by Solar Orbiter, and then analyze the mechanisms causing this unusual phenomenon. We suggest that shock diffusive acceleration, with respect to magnetic reconnection, is probably a dominant mechanism to accelerate protons to tens of MeV in such events where particles need longer time to reach higher energies. And we determine, innovatively, the physical conditions and time scales during the actual shock acceleration process that cannot be observed directly. △ Less

Submitted 1 July, 2025; originally announced July 2025.

arXiv:2507.00042 [pdf, ps, other]

Catastrophic Forgetting Mitigation via Discrepancy-Weighted Experience Replay

Authors: Xinrun Xu, Jianwen Yang, Qiuhong Zhang, Zhanbiao Lian, Zhiming Ding, Shan Jiang

Abstract: Continually adapting edge models in cloud-edge collaborative object detection for traffic monitoring suffers from catastrophic forgetting, where models lose previously learned knowledge when adapting to new data distributions. This is especially problematic in dynamic traffic environments characterised by periodic variations (e.g., day/night, peak hours), where past knowledge remains valuable. Exi… ▽ More Continually adapting edge models in cloud-edge collaborative object detection for traffic monitoring suffers from catastrophic forgetting, where models lose previously learned knowledge when adapting to new data distributions. This is especially problematic in dynamic traffic environments characterised by periodic variations (e.g., day/night, peak hours), where past knowledge remains valuable. Existing approaches like experience replay and visual prompts offer some mitigation, but struggle to effectively prioritize and leverage historical data for optimal knowledge retention and adaptation. Specifically, simply storing and replaying all historical data can be inefficient, while treating all historical experiences as equally important overlooks their varying relevance to the current domain. This paper proposes ER-EMU, an edge model update algorithm based on adaptive experience replay, to address these limitations. ER-EMU utilizes a limited-size experience buffer managed using a First-In-First-Out (FIFO) principle, and a novel Domain Distance Metric-based Experience Selection (DDM-ES) algorithm. DDM-ES employs the multi-kernel maximum mean discrepancy (MK-MMD) to quantify the dissimilarity between target domains, prioritizing the selection of historical data that is most dissimilar to the current target domain. This ensures training diversity and facilitates the retention of knowledge from a wider range of past experiences, while also preventing overfitting to the new domain. The experience buffer is also updated using a simple random sampling strategy to maintain a balanced representation of previous domains. Experiments on the Bellevue traffic video dataset, involving repeated day/night cycles, demonstrate that ER-EMU consistently improves the performance of several state-of-the-art cloud-edge collaborative object detection frameworks. △ Less

Submitted 23 June, 2025; originally announced July 2025.

Comments: ICANN 2025

arXiv:2506.24070 [pdf, ps, other]

Spectroscopy of drive-induced unwanted state transitions in superconducting circuits

Authors: W. Dai, S. Hazra, D. K. Weiss, P. D. Kurilovich, T. Connolly, H. K. Babla, S. Singh, V. R. Joshi, A. Z. Ding, P. D. Parakh, J. Venkatraman, X. Xiao, L. Frunzio, M. H. Devoret

Abstract: Microwave drives are essential for implementing control and readout operations in superconducting quantum circuits. However, increasing the drive strength eventually leads to unwanted state transitions which limit the speed and fidelity of such operations. In this work, we systematically investigate such transitions in a fixed-frequency qubit subjected to microwave drives spanning a 9 GHz frequenc… ▽ More Microwave drives are essential for implementing control and readout operations in superconducting quantum circuits. However, increasing the drive strength eventually leads to unwanted state transitions which limit the speed and fidelity of such operations. In this work, we systematically investigate such transitions in a fixed-frequency qubit subjected to microwave drives spanning a 9 GHz frequency range. We identify the physical origins of these transitions and classify them into three categories. (1) Resonant energy exchange with parasitic two-level systems, activated by drive-induced ac-Stark shifts, (2) multi-photon transitions to non-computational states, intrinsic to the circuit Hamiltonian, and (3) inelastic scattering processes in which the drive causes a state transition in the superconducting circuit, while transferring excess energy to a spurious electromagnetic mode or two-level system (TLS) material defect. We show that the Floquet steady-state simulation, complemented by an electromagnetic simulation of the physical device, accurately predicts the observed transitions that do not involve TLS. Our results provide a comprehensive classification of these transitions and offer mitigation strategies through informed choices of drive frequency as well as improved circuit design. △ Less

Submitted 30 June, 2025; originally announced June 2025.

Comments: 16 figures

arXiv:2506.23966 [pdf, ps, other]

Pinching-Antenna Systems with In-Waveguide Attenuation: Performance Analysis and Algorithm Design

Authors: Yanqing Xu, Zhiguo Ding, Robert Schober, Tsung-Hui Chang

Abstract: Pinching-antenna systems have emerged as a promising flexible-antenna architecture for next-generation wireless networks, enabling enhanced adaptability and user-centric connectivity through antenna repositioning along waveguides. However, existing studies often overlook in-waveguide signal attenuation and in the literature, there is no comprehensive analysis on whether and under what conditions s… ▽ More Pinching-antenna systems have emerged as a promising flexible-antenna architecture for next-generation wireless networks, enabling enhanced adaptability and user-centric connectivity through antenna repositioning along waveguides. However, existing studies often overlook in-waveguide signal attenuation and in the literature, there is no comprehensive analysis on whether and under what conditions such an assumption is justified. This paper addresses this gap by explicitly incorporating in-waveguide attenuation into both the system model and algorithm design, and studying its impact on the downlink user data rates. We begin with a single-user scenario and derive a closed-form expression for the globally optimal antenna placement, which reveals how the attenuation coefficient and the user-to-waveguide distance jointly affect the optimal antenna position. Based on this analytical solution, we further provide a theoretical analysis identifying the system conditions under which the in-waveguide attenuation has an insignificant impact on the user achievable rate. The study is then extended to the multi-user multiple-input multiple-output setting, where two efficient algorithms are developed, based on the weighted minimum mean square error method and the maximum ratio combining method, to jointly optimize beamforming and antenna placement. Simulation results validate the efficacy of the proposed algorithms and demonstrate that pinching-antenna systems substantially outperform conventional fixed-antenna baselines, underscoring their potential for future flexible wireless communications. △ Less

Submitted 30 June, 2025; originally announced June 2025.

Comments: This paper aims to address a fundamental question in pinching-antenna systems: Can in-waveguide attenuation be safely ignored without causing significant performance degradation? Our analytical results provide a clear answer -- YES, provided that certain mild and practically realizable conditions on the system parameters are satisfied

arXiv:2506.23495 [pdf, ps, other]

Far-Field vs. Near-Field Propagation Channels: Key Differences and Impact on 6G XL-MIMO Performance Evaluation

Authors: Zihang Ding, Jianhua Zhang, Changsheng You, Pan Tang, Hongbo Xing, Zhiqiang Yuan, Jie Meng, Guangyi Liu

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is regarded as a promising technology for next-generation communication systems. However, this will expand the near-field (NF) range, rendering more users more likely to be located in the NF region. In this paper, we aim to answer two questions: What are the new characteristics of the NF channel? Is it necessary to develop new transciv… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) is regarded as a promising technology for next-generation communication systems. However, this will expand the near-field (NF) range, rendering more users more likely to be located in the NF region. In this paper, we aim to answer two questions: What are the new characteristics of the NF channel? Is it necessary to develop new transciver techniques to maintain system performance within the NF region? To this end, we first review current NF channel models and analyze the differences between the existing 3GPP TR 38.901 channel model and the NF channel model, including the spherical wavefront and spatially non-stationarity. Then, we provide examples on how these differences affect the XL-MIMO system performance in terms of beamforming gain and achievable rate. Simulation results demonstrate that, when using far-field (FF) technique under the NF channel, the maximum normalized beam gain loss is less than 3 dB for most users in the NF region defined by Rayleigh distance. Moreover, the achievable rate loss of beam training is less than 3% compared to that realized by NF technique. Finally, we demonstrate the necessity of employing NF transceiver techniques based on simulation results. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: 13 pages, 8 figures, 2 tables, 52 references. Note: This article has been submitted to China Communications and is currently under review

arXiv:2506.23490 [pdf, ps, other]

UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quantification. However, it remains challenging due to the rare paired data, complex structures, and US noises. In this study, we introduce a novel generative framework UltraTwin, to obtain cardiac anatomical twin from sparse multi-view 2D US. Our contribution is three-fold. First, pioneered the construction of a real-world and high-quality dataset containing strictly paired multi-view 2D US and CT, and pseudo-paired data. Second, we propose a coarse-to-fine scheme to achieve hierarchical reconstruction optimization. Last, we introduce an implicit autoencoder for topology-aware constraints. Extensive experiments show that UltraTwin reconstructs high-quality anatomical twins versus strong competitors. We believe it advances anatomical twin modeling for potential applications in personalized cardiac care. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: accepted by miccai 2025

arXiv:2506.22813 [pdf, ps, other]

Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models

Authors: Zhuojun Ding, Wei Wei, Chenghao Fan

Abstract: Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER). However, annotating such fine-grained labels and training domain-specific models is costly. Existing works typically train a unified model across multiple domains, but such approaches lack adaptation and scalability since not all training… ▽ More Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER). However, annotating such fine-grained labels and training domain-specific models is costly. Existing works typically train a unified model across multiple domains, but such approaches lack adaptation and scalability since not all training data benefits target domains and scaling trained models remains challenging. We propose the SaM framework, which dynamically Selects and Merges expert models at inference time. Specifically, for a target domain, we select domain-specific experts pre-trained on existing domains based on (i) domain similarity to the target domain and (ii) performance on sampled instances, respectively. The experts are then merged to create task-specific models optimized for the target domain. By dynamically merging experts beneficial to target domains, we improve generalization across various domains without extra training. Additionally, experts can be added or removed conveniently, leading to great scalability. Extensive experiments on multiple benchmarks demonstrate our framework's effectiveness, which outperforms the unified model by an average of 10%. We further provide insights into potential improvements, practical experience, and extensions of our framework. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.21101 [pdf, ps, other]

OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

Authors: Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, SevenShu, Yunsheng Wu, Yongge Liu, Rongrong Ji

Abstract: As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations. Despite the discovery of approximately 4,500 OBS characters, only about 1,600 have been deciphered. The remaining undeciphered ones, with their complex structure and abstract imagery, pose significant challenges for interpretation. To address t… ▽ More As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations. Despite the discovery of approximately 4,500 OBS characters, only about 1,600 have been deciphered. The remaining undeciphered ones, with their complex structure and abstract imagery, pose significant challenges for interpretation. To address these challenges, this paper proposes a novel two-stage semantic typography framework, named OracleFusion. In the first stage, this approach leverages the Multimodal Large Language Model (MLLM) with enhanced Spatial Awareness Reasoning (SAR) to analyze the glyph structure of the OBS character and perform visual localization of key components. In the second stage, we introduce Oracle Structural Vector Fusion (OSVF), incorporating glyph structure constraints and glyph maintenance constraints to ensure the accurate generation of semantically enriched vector fonts. This approach preserves the objective integrity of the glyph structure, offering visually enhanced representations that assist experts in deciphering OBS. Extensive qualitative and quantitative experiments demonstrate that OracleFusion outperforms state-of-the-art baseline models in terms of semantics, visual appeal, and glyph maintenance, significantly enhancing both readability and aesthetic quality. Furthermore, OracleFusion provides expert-like insights on unseen oracle characters, making it a valuable tool for advancing the decipherment of OBS. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: Accepted to ICCV 2025

arXiv:2506.19171 [pdf, ps, other]

Distilling Tool Knowledge into Language Models via Back-Translated Traces

Authors: Xingyue Huang, Xianglong Hu, Zifeng Ding, Yuan He, Rishabh, Waleed Alzarooni, Ziyu Ye, Wendong Fan, Bailan He, Haige Bo, Changran Hu, Guohao Li

Abstract: Large language models (LLMs) often struggle with mathematical problems that require exact computation or multi-step algebraic reasoning. Tool-integrated reasoning (TIR) offers a promising solution by leveraging external tools such as code interpreters to ensure correctness, but it introduces inference-time dependencies that hinder scalability and deployment. In this work, we propose a new paradigm… ▽ More Large language models (LLMs) often struggle with mathematical problems that require exact computation or multi-step algebraic reasoning. Tool-integrated reasoning (TIR) offers a promising solution by leveraging external tools such as code interpreters to ensure correctness, but it introduces inference-time dependencies that hinder scalability and deployment. In this work, we propose a new paradigm for distilling tool knowledge into LLMs purely through natural language. We first construct a Solver Agent that solves math problems by interleaving planning, symbolic tool calls, and reflective reasoning. Then, using a back-translation pipeline powered by multiple LLM-based agents, we convert interleaved TIR traces into natural language reasoning traces. A Translator Agent generates explanations for individual tool calls, while a Rephrase Agent merges them into a fluent and globally coherent narrative. Empirically, we show that fine-tuning a small open-source model on these synthesized traces enables it to internalize both tool knowledge and structured reasoning patterns, yielding gains on competition-level math benchmarks without requiring tool access at inference. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: Accepted in Workshop in Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures, ICML 2025

arXiv:2506.17559 [pdf, ps, other]

Joint Transmission for Cellular Networks with Pinching Antennas: System Design and Analysis

Authors: Enzhi Zhou, Jingjing Cui, Ziyue Liu, Zhiguo Ding, Pingzhi Fan

Abstract: As an emerging flexible antenna technology for wireless communications, pinching-antenna systems, offer distinct advantages in terms of cost efficiency and deployment flexibility. This paper investigates joint transmission strategies of the base station (BS) and pinching antennas (PAS), focusing specifically on how to cooperate efficiently between the BS and waveguide-mounted pinching antennas for… ▽ More As an emerging flexible antenna technology for wireless communications, pinching-antenna systems, offer distinct advantages in terms of cost efficiency and deployment flexibility. This paper investigates joint transmission strategies of the base station (BS) and pinching antennas (PAS), focusing specifically on how to cooperate efficiently between the BS and waveguide-mounted pinching antennas for enhancing the performance of the user equipment (UE). By jointly considering the performance, flexibility, and complexity, we propose three joint BS-PAS transmission schemes along with the best beamforming designs, namely standalone deployment (SD), semi-cooperative deployment (SCD) and full-cooperative deployment (FCD). More specifically, for each BS-PAS joint transmission scheme, we conduct a comprehensive performance analysis in terms of the power allocation strategy, beamforming design, and practical implementation considerations. We also derive closed-form expressions for the average received SNR across the proposed BS-PAS joint transmission schemes, which are verified through Monte Carlo simulations. Finally, numerical results demonstrate that deploying pinching antennas in cellular networks, particularly through cooperation between the BS and PAS, can achieve significant performance gains. We further identify and characterize the key network parameters that influence the performance, providing insights for deploying pinching antennas. △ Less

Submitted 20 June, 2025; originally announced June 2025.

arXiv:2506.14317 [pdf, ps, other]

ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes

Authors: Zeyuan Chen, Qiyang Yan, Yuanpei Chen, Tianhao Wu, Jiyao Zhang, Zihan Ding, Jinzhou Li, Yaodong Yang, Hao Dong

Abstract: Dexterous grasping in cluttered scenes presents significant challenges due to diverse object geometries, occlusions, and potential collisions. Existing methods primarily focus on single-object grasping or grasp-pose prediction without interaction, which are insufficient for complex, cluttered scenes. Recent vision-language-action models offer a potential solution but require extensive real-world d… ▽ More Dexterous grasping in cluttered scenes presents significant challenges due to diverse object geometries, occlusions, and potential collisions. Existing methods primarily focus on single-object grasping or grasp-pose prediction without interaction, which are insufficient for complex, cluttered scenes. Recent vision-language-action models offer a potential solution but require extensive real-world demonstrations, making them costly and difficult to scale. To address these limitations, we revisit the sim-to-real transfer pipeline and develop key techniques that enable zero-shot deployment in reality while maintaining robust generalization. We propose ClutterDexGrasp, a two-stage teacher-student framework for closed-loop target-oriented dexterous grasping in cluttered scenes. The framework features a teacher policy trained in simulation using clutter density curriculum learning, incorporating both a geometry and spatially-embedded scene representation and a novel comprehensive safety curriculum, enabling general, dynamic, and safe grasping behaviors. Through imitation learning, we distill the teacher's knowledge into a student 3D diffusion policy (DP3) that operates on partial point cloud observations. To the best of our knowledge, this represents the first zero-shot sim-to-real closed-loop system for target-oriented dexterous grasping in cluttered scenes, demonstrating robust performance across diverse objects and layouts. More details and videos are available at https://clutterdexgrasp.github.io/. △ Less

Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.14298 [pdf, ps, other]

Capacity Characterization of Pinching-Antenna Systems

Authors: Chongjun Ouyang, Zhaolin Wang, Yuanwei Liu, Zhiguo Ding

Abstract: Unlike conventional systems using a fixed-location antenna, the channel capacity of the pinching-antenna system (PASS) is determined by the activated positions of pinching antennas. This article characterizes the capacity region of multiuser PASS, where a single pinched waveguide is deployed to enable both uplink and downlink communications. The capacity region of the uplink channel is first chara… ▽ More Unlike conventional systems using a fixed-location antenna, the channel capacity of the pinching-antenna system (PASS) is determined by the activated positions of pinching antennas. This article characterizes the capacity region of multiuser PASS, where a single pinched waveguide is deployed to enable both uplink and downlink communications. The capacity region of the uplink channel is first characterized. \romannumeral1) For the single-pinch case, closed-form expressions are derived for the optimal antenna activation position, along with the corresponding capacity region and the achievable data rate regions under time-division multiple access (TDMA) and frequency-division multiple access (FDMA). It is proven that the capacity region of PASS encompasses that of conventional fixed-antenna systems, and that the FDMA rate region contains the TDMA rate region. \romannumeral2) For the multiple-pinch case, inner and outer bounds on the capacity region are derived using an element-wise alternating antenna position optimization technique and the Cauchy-Schwarz inequality, respectively. The achievable FDMA rate region is also derived using the same optimization framework, while the TDMA rate region is obtained through an antenna position refinement approach. The analysis is then extended to the downlink PASS using the uplink-downlink duality framework. It is proven that the relationships among the downlink capacity and rate regions are consistent with those in the uplink case. Numerical results demonstrate that: \romannumeral1) the derived bounds closely approximate the exact capacity region, \romannumeral2) PASS yields a significantly enlarged capacity region compared to conventional fixed-antenna systems, and \romannumeral3) in the multiple-pinch case, TDMA and FDMA are capable of approaching the channel capacity limit. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: submit to possible IEEE journal

arXiv:2506.14142 [pdf, ps, other]

RadFabric: Agentic AI System with Reasoning Capability for Radiology

Authors: Wenting Chen, Yi Dong, Zhaojun Ding, Yucheng Shi, Yifan Zhou, Fang Zeng, Yijun Luo, Tianyu Lin, Yihang Su, Yichen Wu, Kai Zhang, Zhen Xiang, Tianming Liu, Ninghao Liu, Lichao Sun, Yixuan Yuan, Xiang Li

Abstract: Chest X ray (CXR) imaging remains a critical diagnostic tool for thoracic conditions, but current automated systems face limitations in pathology coverage, diagnostic accuracy, and integration of visual and textual reasoning. To address these gaps, we propose RadFabric, a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation. RadF… ▽ More Chest X ray (CXR) imaging remains a critical diagnostic tool for thoracic conditions, but current automated systems face limitations in pathology coverage, diagnostic accuracy, and integration of visual and textual reasoning. To address these gaps, we propose RadFabric, a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation. RadFabric is built on the Model Context Protocol (MCP), enabling modularity, interoperability, and scalability for seamless integration of new diagnostic agents. The system employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses. RadFabric achieves significant performance improvements, with near-perfect detection of challenging pathologies like fractures (1.000 accuracy) and superior overall diagnostic accuracy (0.799) compared to traditional systems (0.229 to 0.527). By integrating cross modal feature alignment and preference-driven reasoning, RadFabric advances AI-driven radiology toward transparent, anatomically precise, and clinically actionable CXR analysis. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 4 figures, 2 tables

arXiv:2506.13448 [pdf]

Electronic Correlations Control Interlayer Coupling and Magnetic Transition in MnBi$_2$Te$_4$/MnBr$_3$ Heterostructure

Authors: Yuanhao Zhu, Xixi Yuan, Ying Zhao, Jin Zhang, Zijing Ding, Huixia Fu

Abstract: Bulk MnBi$_2$Te$_4$ (MBT) is an intrinsic antiferromagnetic topological insulator. However, its low Néel temperature of $\sim 25\,\mathrm{K}$ severely restricts its practical applications. Here, we propose a van der Waals heterostructure composed of monolayer MBT (ML-MBT) and monolayer MnBr$_3$, an intrinsic Chern insulator possessing a high Curie temperature ($T_\mathrm{C} \sim 200\,\mathrm{K}$).… ▽ More Bulk MnBi$_2$Te$_4$ (MBT) is an intrinsic antiferromagnetic topological insulator. However, its low Néel temperature of $\sim 25\,\mathrm{K}$ severely restricts its practical applications. Here, we propose a van der Waals heterostructure composed of monolayer MBT (ML-MBT) and monolayer MnBr$_3$, an intrinsic Chern insulator possessing a high Curie temperature ($T_\mathrm{C} \sim 200\,\mathrm{K}$). By employing density functional theory calculations and Monte Carlo simulations, we demonstrate that interfacing ML-MBT with MnBr$_3$ significantly enhances the $T_\mathrm{C}$ of ML-MBT by a factor of four to five. Electronic correlations characterized by the Hubbard parameter $U_2$ for Mn-$d$ orbitals in MnBr$_3$ play a crucial role in governing magnetic coupling within the system. At a moderate correlation strength of $U_2 = 3.0\,\mathrm{eV}$, slight structural distortions in MnBr$_3$ break intralayer symmetry, enabling robust interlayer ferromagnetic coupling and yielding a single, unified magnetic transition. Increasing $U_2$ reduces these structural distortions, weakens interlayer coupling, and induces two distinct magnetic transitions, indicating interlayer magnetic decoupling. Thus, the MBT/MnBr$_3$ heterostructure offers a novel approach for controlling magnetic order and enhancing the performance of spintronic devices. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13055 [pdf, ps, other]

CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

Authors: Jiangtong Li, Yiyun Zhu, Dawei Cheng, Zhijun Ding, Changjun Jiang

Abstract: Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial… ▽ More Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial application. In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams. Additionally, we develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step. Despite MLLMs having inherent financial knowledge, experimental results still show limited efficiency and robustness in handling multimodal financial context. Further analysis on incorrect responses reveals the misinterpretation of visual content and the misunderstanding of financial concepts are the primary issues. Our research validates the significant, yet underexploited, potential of MLLMs in financial analysis, highlighting the need for further development and domain-specific optimization to encourage the enhanced use in financial domain. △ Less

Submitted 15 June, 2025; originally announced June 2025.

Comments: 22 pages, 9 figures

arXiv:2506.12779 [pdf, ps, other]

From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots

Authors: Yuxuan Wang, Ming Yang, Weishuai Zeng, Yu Zhang, Xinrun Xu, Haobin Jiang, Ziluo Ding, Zongqing Lu

Abstract: Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an expert-… ▽ More Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation to overcome these challenges. BB first leverages an autoencoder-based clustering method to group behaviorally similar motions using motion features and motion descriptions. Expert policies are then trained within each cluster and refined with real-world data through iterative delta action modeling to bridge the sim-to-real gap. Finally, these experts are distilled into a unified generalist controller that preserves agility and robustness across all motion types. Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control, setting a new benchmark for agile, robust, and generalizable humanoid performance in the real world. △ Less

Submitted 19 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12769 [pdf, ps, other]

RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control

Authors: Junpeng Yue, Zepeng Wang, Yuxuan Wang, Weishuai Zeng, Jiangxing Wang, Xinrun Xu, Yu Zhang, Sipeng Zheng, Ziluo Ding, Zongqing Lu

Abstract: This paper focuses on a critical challenge in robotics: translating text-driven human motions into executable actions for humanoid robots, enabling efficient and cost-effective learning of new behaviors. While existing text-to-motion generation methods achieve semantic alignment between language and motion, they often produce kinematically or physically infeasible motions unsuitable for real-world… ▽ More This paper focuses on a critical challenge in robotics: translating text-driven human motions into executable actions for humanoid robots, enabling efficient and cost-effective learning of new behaviors. While existing text-to-motion generation methods achieve semantic alignment between language and motion, they often produce kinematically or physically infeasible motions unsuitable for real-world deployment. To bridge this sim-to-real gap, we propose Reinforcement Learning from Physical Feedback (RLPF), a novel framework that integrates physics-aware motion evaluation with text-conditioned motion generation. RLPF employs a motion tracking policy to assess feasibility in a physics simulator, generating rewards for fine-tuning the motion generator. Furthermore, RLPF introduces an alignment verification module to preserve semantic fidelity to text instructions. This joint optimization ensures both physical plausibility and instruction alignment. Extensive experiments show that RLPF greatly outperforms baseline methods in generating physically feasible motions while maintaining semantic correspondence with text instruction, enabling successful deployment on real humanoid robots. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12700 [pdf, ps, other]

Large Scalable Cross-Domain Graph Neural Networks for Personalized Notification at LinkedIn

Authors: Shihai He, Julie Choi, Tianqi Li, Zhiwei Ding, Peng Du, Priya Bannur, Franco Liang, Fedor Borisyuk, Padmini Jaikumar, Xiaobing Xue, Viral Gupta

Abstract: Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments.… ▽ More Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments. In this paper, we present a cross-domain GNN-based system deployed at LinkedIn that unifies user, content, and activity signals into a single, large-scale graph. By training on this cross-domain structure, our model significantly outperforms single-domain baselines on key tasks, including click-through rate (CTR) prediction and professional engagement. We introduce architectural innovations including temporal modeling and multi-task learning, which further enhance performance. Deployed in LinkedIn's notification system, our approach led to a 0.10% lift in weekly active users and a 0.62% improvement in CTR. We detail our graph construction process, model design, training pipeline, and both offline and online evaluations. Our work demonstrates the scalability and effectiveness of cross-domain GNNs in real-world, high-impact applications. △ Less

Submitted 14 June, 2025; originally announced June 2025.

MSC Class: 68R10

arXiv:2506.12583 [pdf, ps, other]

A Gradient Meta-Learning Joint Optimization for Beamforming and Antenna Position in Pinching-Antenna Systems

Authors: Kang Zhou, Weixi Zhou, Donghong Cai, Xianfu Lei, Yanqing Xu, Zhiguo Ding, Pingzhi Fan

Abstract: In this paper, we consider a novel optimization design for multi-waveguide pinching-antenna systems, aiming to maximize the weighted sum rate (WSR) by jointly optimizing beamforming coefficients and antenna position. To handle the formulated non-convex problem, a gradient-based meta-learning joint optimization (GML-JO) algorithm is proposed. Specifically, the original problem is initially decompos… ▽ More In this paper, we consider a novel optimization design for multi-waveguide pinching-antenna systems, aiming to maximize the weighted sum rate (WSR) by jointly optimizing beamforming coefficients and antenna position. To handle the formulated non-convex problem, a gradient-based meta-learning joint optimization (GML-JO) algorithm is proposed. Specifically, the original problem is initially decomposed into two sub-problems of beamforming optimization and antenna position optimization through equivalent substitution. Then, the convex approximation methods are used to deal with the nonconvex constraints of sub-problems, and two sub-neural networks are constructed to calculate the sub-problems separately. Different from alternating optimization (AO), where two sub-problems are solved alternately and the solutions are influenced by the initial values, two sub-neural networks of proposed GML-JO with fixed channel coefficients are considered as local sub-tasks and the computation results are used to calculate the loss function of joint optimization. Finally, the parameters of sub-networks are updated using the average loss function over different sub-tasks and the solution that is robust to the initial value is obtained. Simulation results demonstrate that the proposed GML-JO algorithm achieves 5.6 bits/s/Hz WSR within 100 iterations, yielding a 32.7\% performance enhancement over conventional AO with substantially reduced computational complexity. Moreover, the proposed GML-JO algorithm is robust to different choices of initialization and yields better performance compared with the existing optimization methods. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.12323 [pdf, ps, other]

Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback

Authors: Janet Wang, Yunbei Zhang, Zhengming Ding, Jihun Hamm

Abstract: Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model… ▽ More Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal Large Language Models (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), that synthesizes clinically accurate skin disease images for data augmentation. Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs, significantly improving clinical accuracy while reducing the direct human workload. Experiments demonstrate that our method greatly improves the clinical quality of synthesized skin disease images, with outputs aligning with dermatologist assessments. Additionally, augmenting training data with these synthesized images improves diagnostic accuracy by +9.02% on a challenging 20-condition skin disease classification task, and by +13.89% in the few-shot setting. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.10972 [pdf, ps, other]

Farseer: A Refined Scaling Law in Large Language Models

Authors: Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

Abstract: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing… ▽ More Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N,D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. We are comprehensively open-sourcing all models, data, results, and logs at https://github.com/Farseer-Scaling-Law/Farseer to foster further research. △ Less

Submitted 14 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

Comments: 34

ACM Class: I.2

arXiv:2506.08688 [pdf, ps, other]

Causality-aware Safety Testing for Autonomous Driving Systems

Authors: Wenbing Tang, Mingfei Cheng, Renzhi Wang, Yuan Zhou, Chengwei Liu, Yang Liu, Zuohua Ding

Abstract: Simulation-based testing is essential for evaluating the safety of Autonomous Driving Systems (ADSs). Comprehensive evaluation requires testing across diverse scenarios that can trigger various types of violations under different conditions. While existing methods typically focus on individual diversity metrics, such as input scenarios, ADS-generated motion commands, and system violations, they of… ▽ More Simulation-based testing is essential for evaluating the safety of Autonomous Driving Systems (ADSs). Comprehensive evaluation requires testing across diverse scenarios that can trigger various types of violations under different conditions. While existing methods typically focus on individual diversity metrics, such as input scenarios, ADS-generated motion commands, and system violations, they often fail to capture the complex interrelationships among these elements. This oversight leads to gaps in testing coverage, potentially missing critical issues in the ADS under evaluation. However, quantifying these interrelationships presents a significant challenge. In this paper, we propose a novel causality-aware fuzzing technique, Causal-Fuzzer, to enable efficient and comprehensive testing of ADSs by exploring causally diverse scenarios. The core of Causal-Fuzzer is constructing a causal graph to model the interrelationships among the diversities of input scenarios, ADS motion commands, and system violations. Then the causal graph will guide the process of critical scenario generation. Specifically, Causal-Fuzzer proposes (1) a causality-based feedback mechanism that quantifies the combined diversity of test scenarios by assessing whether they activate new causal relationships, and (2) a causality-driven mutation strategy that prioritizes mutations on input scenario elements with higher causal impact on ego action changes and violation occurrence, rather than treating all elements equally. We evaluated Causal-Fuzzer on an industry-grade ADS Apollo, with a high-fidelity. Our empirical results demonstrate that Causal-Fuzzer significantly outperforms existing methods in (1) identifying a greater diversity of violations, (2) providing enhanced testing sufficiency with improved coverage of causal relationships, and (3) achieving greater efficiency in detecting the first critical scenarios. △ Less

Submitted 14 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.07771 [pdf, ps, other]

Pinching-Antenna Systems For Indoor Immersive Communications: A 3D-Modeling Based Performance Analysis

Authors: Yulei Wang, Yalin Liu, Yaru Fu, Zhiguo Ding

Abstract: The emerging pinching antenna (PA) technology has high flexibility to reconfigure wireless channels and combat line-of-sight blockage, thus holding transformative potential for indoor immersive applications in 6G. This paper investigates Pinching-antenna systems (PASS) for indoor immersive communications. Our contributions are threefold: (1) we construct a 3D model to characterize the distribution… ▽ More The emerging pinching antenna (PA) technology has high flexibility to reconfigure wireless channels and combat line-of-sight blockage, thus holding transformative potential for indoor immersive applications in 6G. This paper investigates Pinching-antenna systems (PASS) for indoor immersive communications. Our contributions are threefold: (1) we construct a 3D model to characterize the distribution of users, waveguides, and PAs in the PASS; (2) we develop a general theoretical model on downlink performance of PASS by capturing PA-user relationships and system parameters' impacts; and (3) we conduct comprehensive numerical results of the theoretical model and provide implementation guidelines for PASS deployments. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.06156 [pdf, ps, other]

Resource Allocation for Pinching-Antenna Systems: State-of-the-Art, Key Techniques and Open Issues

Authors: Ming Zeng, Ji Wang, Octavia A. Dobre, Zhiguo Ding, George K. Karagiannidis, Robert Schober, H. Vincent Poor

Abstract: Pinching antennas have emerged as a promising technology for reconfiguring wireless propagation environments, particularly in high-frequency communication systems operating in the millimeter-wave and terahertz bands. By enabling dynamic activation at arbitrary positions along a dielectric waveguide, pinching antennas offer unprecedented channel reconfigurability and the ability to provide line-of-… ▽ More Pinching antennas have emerged as a promising technology for reconfiguring wireless propagation environments, particularly in high-frequency communication systems operating in the millimeter-wave and terahertz bands. By enabling dynamic activation at arbitrary positions along a dielectric waveguide, pinching antennas offer unprecedented channel reconfigurability and the ability to provide line-of-sight (LoS) links in scenarios with severe LoS blockages. The performance of pinching-antenna systems is highly dependent on the optimized placement of the pinching antennas, which must be jointly considered with traditional resource allocation (RA) variables -- including transmission power, time slots, and subcarriers. The resulting joint RA problems are typically non-convex with complex variable coupling, necessitating sophisticated optimization techniques. This article provides a comprehensive survey of existing RA algorithms designed for pinching-antenna systems, supported by numerical case studies that demonstrate their potential performance gains. Key challenges and open research problems are also identified to guide future developments in this emerging field. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: submitted to IEEE WCM, 8 pages, 5 figures

arXiv:2506.05393 [pdf, ps, other]

Are Large Language Models Good Temporal Graph Learners?

Authors: Shenyang Huang, Ali Parviz, Emma Kondrup, Zachary Yang, Zifeng Ding, Michael Bronstein, Reihaneh Rabbany, Guillaume Rabusseau

Abstract: Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. While a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies… ▽ More Large Language Models (LLMs) have recently driven significant advancements in Natural Language Processing and various other applications. While a broad range of literature has explored the graph-reasoning capabilities of LLMs, including their use of predictors on graphs, the application of LLMs to dynamic graphs -- real world evolving networks -- remains relatively unexplored. Recent work studies synthetic temporal graphs generated by random graph models, but applying LLMs to real-world temporal graphs remains an open question. To address this gap, we introduce Temporal Graph Talker (TGTalker), a novel temporal graph learning framework designed for LLMs. TGTalker utilizes the recency bias in temporal graphs to extract relevant structural information, converted to natural language for LLMs, while leveraging temporal neighbors as additional information for prediction. TGTalker demonstrates competitive link prediction capabilities compared to existing Temporal Graph Neural Network (TGNN) models. Across five real-world networks, TGTalker performs competitively with state-of-the-art temporal graph methods while consistently outperforming popular models such as TGN and HTGN. Furthermore, TGTalker generates textual explanations for each prediction, thus opening up exciting new directions in explainability and interpretability for temporal link prediction. The code is publicly available at https://github.com/shenyangHuang/TGTalker. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: 9 pages, 9 tables, 4 figures

arXiv:2506.05306 [pdf, ps, other]

Full characterization of measurement-induced transitions of a superconducting qubit

Authors: Thomas Connolly, Pavel D. Kurilovich, Vladislav D. Kurilovich, Charlotte G. L. Bøttcher, Sumeru Hazra, Wei Dai, Andy Z. Ding, Vidul R. Joshi, Heekun Nho, Spencer Diamond, Daniel K. Weiss, Valla Fatemi, Luigi Frunzio, Leonid I. Glazman, Michel H. Devoret

Abstract: Repeated quantum non-demolition measurement is a cornerstone of quantum error correction protocols. In superconducting qubits, the speed of dispersive state readout can be enhanced by increasing the power of the readout tone. However, such an increase has been found to result in additional qubit state transitions that violate the desired quantum non-demolition character of the measurement. Recentl… ▽ More Repeated quantum non-demolition measurement is a cornerstone of quantum error correction protocols. In superconducting qubits, the speed of dispersive state readout can be enhanced by increasing the power of the readout tone. However, such an increase has been found to result in additional qubit state transitions that violate the desired quantum non-demolition character of the measurement. Recently, the readout of a transmon superconducting qubit was improved by using a tone with frequency much larger than the qubit frequency. Here, we experimentally identify the mechanisms of readout-induced transitions in this regime. In the dominant mechanism, the energy of an incoming readout photon is partially absorbed by the transmon and partially returned to the transmission line as a photon with lower frequency. Other mechanisms involve the excitation of unwanted package modes, decay via material defects, and, at higher qubit frequencies, the activation of undesired resonances in the transmon spectrum. Our work provides a comprehensive characterization of superconducting qubit state transitions caused by a strong drive. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 30 pages, 16 figures

arXiv:2506.03958 [pdf, ps, other]

A Tale of Two Shocks

Authors: Robert F. Wimmer-Schweingruber, Domenico Trotta, Rungployphan Kieokaew, Liu Yang, Alexander Kollhoff, Lars Berger, Patrick Kühl, Stephan I. Böttcher, Bernd Heber, Philippe Louarn, Andrey Fedorov, Javier Rodriguez-Pacheco, Raúl Gómez-Herrero, Francisco Espinosa Lara, Ignacio Cernuda, Yulia Kartavykh, Linghua Wang, George C. Ho, Robert C. Allen, Glenn M. Mason, Zheyi Ding, Andrea Larosa, G. Sindhuja, Sandra Eldrum, Sebastian Fleth , et al. (1 additional authors not shown)

Abstract: It was the best of times, it was the worst of times, . . . - for the thermal/suprathermal particle populations in the vicinity of two traveling interplanetary shocks observed by Solar Orbiter on 2023-11-29 07:51:17 UTC and 2023-11-30 10:47:26 UTC at $\sim 0.83$ astronomical units from the Sun. We investigate these two very dissimilar shocks and elucidate their non-equilibrium features. We do not… ▽ More It was the best of times, it was the worst of times, . . . - for the thermal/suprathermal particle populations in the vicinity of two traveling interplanetary shocks observed by Solar Orbiter on 2023-11-29 07:51:17 UTC and 2023-11-30 10:47:26 UTC at $\sim 0.83$ astronomical units from the Sun. We investigate these two very dissimilar shocks and elucidate their non-equilibrium features. We do not provide explanations of all observed features, our aim is to report them here for future reference. We use high-resolution data obtained with Solar Orbiter's Energetic Particle Detector (EPD), magnetometer (MAG), and Solar Wind Analyzer (SWA) to exhibit the very different natures of these two shocks and describe the detailed properties of suprathermal and energetic particles in their vicinity. We observe very different behavior of the energetic particle population because the two shocks are quite different. Solar wind protons and $α$-particles are highly dynamic at the first, their beams appear to align well with rapid oscillations of the interplanetary magnetic field. Suprathermal particles associated with the second shock exhibit clear non-equilibrium and anisotropic features in their differential intensities at time scales comparable to the proton gyroperiod. The different geometries of the two shocks resulted in highly dissimilar populations of suprathermal and energetic particles in their vicinity. The first shock was associated with very interesting microphysics of the bulk plasma velocity distribution, the second resulted in similarly interesting microphysics of the accelerated particles. Both showed strong temporal variability of the particle populations at scales comparable to the proton gyroperiod. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 14 pages, 13 figures

arXiv:2506.03546 [pdf, ps, other]

From Virtual Agents to Robot Teams: A Multi-Robot Framework Evaluation in High-Stakes Healthcare Context

Authors: Yuanchen Bai, Zijian Ding, Angelique Taylor

Abstract: Advancements in generative models have enabled multi-agent systems (MAS) to perform complex virtual tasks such as writing and code generation, which do not generalize well to physical multi-agent robotic teams. Current frameworks often treat agents as conceptual task executors rather than physically embodied entities, and overlook critical real-world constraints such as spatial context, robotic ca… ▽ More Advancements in generative models have enabled multi-agent systems (MAS) to perform complex virtual tasks such as writing and code generation, which do not generalize well to physical multi-agent robotic teams. Current frameworks often treat agents as conceptual task executors rather than physically embodied entities, and overlook critical real-world constraints such as spatial context, robotic capabilities (e.g., sensing and navigation). To probe this gap, we reconfigure and stress-test a hierarchical multi-agent robotic team built on the CrewAI framework in a simulated emergency department onboarding scenario. We identify five persistent failure modes: role misalignment; tool access violations; lack of in-time handling of failure reports; noncompliance with prescribed workflows; bypassing or false reporting of task completion. Based on this analysis, we propose three design guidelines emphasizing process transparency, proactive failure recovery, and contextual grounding. Our work informs the development of more resilient and robust multi-agent robotic systems (MARS), including opportunities to extend virtual multi-agent frameworks to the real world. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.02625 [pdf, ps, other]

Zero-Energy RIS-Assisted Communications With Noise Modulation and Interference-Based Energy Harvesting

Authors: Ahmad Massud Tota Khel, Aissa Ikhlef, Zhiguo Ding, Hongjian Sun

Abstract: To advance towards carbon-neutrality and improve the limited {performance} of conventional passive wireless communications, in this paper, we investigate the integration of noise modulation with zero-energy reconfigurable intelligent surfaces (RISs). In particular, the RIS reconfigurable elements (REs) are divided into two groups: one for beamforming the desired signals in reflection mode and anot… ▽ More To advance towards carbon-neutrality and improve the limited {performance} of conventional passive wireless communications, in this paper, we investigate the integration of noise modulation with zero-energy reconfigurable intelligent surfaces (RISs). In particular, the RIS reconfigurable elements (REs) are divided into two groups: one for beamforming the desired signals in reflection mode and another for harvesting energy from interference signals in an absorption mode, providing the power required for RIS operation. Since the harvested energy is a random variable, a random number of REs can beamform the signals, while the remainder blindly reflects them. We present a closed-form solution and a search algorithm for REs allocation, jointly optimizing both the energy harvesting (EH) and communication performance. Considering the repetition coding technique and discrete phase shifts, we derive analytical expressions for the energy constrained success rate, bit error rate, optimal threshold, mutual information, {and energy efficiency}. Numerical and simulation results confirm the effectiveness of the algorithm and expressions, demonstrating the superiority of the proposed integration over conventional noise-modulation systems. It is shown that by properly allocating the REs, both the EH and communication performance can be improved in low to moderate interference scenarios, while the latter is restricted in the high-interference regime. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: 32 pages, 12 figures, accepted by IEEE Transactions on Green Communications and Networking

arXiv:2506.02380 [pdf, ps, other]

EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR

Authors: Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu

Abstract: 3D Gaussian Splatting (3DGS) is an emerging media representation that reconstructs real-world 3D scenes in high fidelity, enabling 6-degrees-of-freedom (6-DoF) navigation in virtual reality (VR). However, developing and evaluating 3DGS-enabled applications and optimizing their rendering performance, require realistic user navigation data. Such data is currently unavailable for photorealistic 3DGS… ▽ More 3D Gaussian Splatting (3DGS) is an emerging media representation that reconstructs real-world 3D scenes in high fidelity, enabling 6-degrees-of-freedom (6-DoF) navigation in virtual reality (VR). However, developing and evaluating 3DGS-enabled applications and optimizing their rendering performance, require realistic user navigation data. Such data is currently unavailable for photorealistic 3DGS reconstructions of real-world scenes. This paper introduces EyeNavGS (EyeNavGS), the first publicly available 6-DoF navigation dataset featuring traces from 46 participants exploring twelve diverse, real-world 3DGS scenes. The dataset was collected at two sites, using the Meta Quest Pro headsets, recording the head pose and eye gaze data for each rendered frame during free world standing 6-DoF navigation. For each of the twelve scenes, we performed careful scene initialization to correct for scene tilt and scale, ensuring a perceptually-comfortable VR experience. We also release our open-source SIBR viewer software fork with record-and-replay functionalities and a suite of utility tools for data processing, conversion, and visualization. The EyeNavGS dataset and its accompanying software tools provide valuable resources for advancing research in 6-DoF viewport prediction, adaptive streaming, 3D saliency, and foveated rendering for 3DGS scenes. The EyeNavGS dataset is available at: https://symmru.github.io/EyeNavGS/. △ Less

Submitted 2 June, 2025; originally announced June 2025.

arXiv:2506.01038 [pdf, ps, other]

Self-Supervised-ISAR-Net Enables Fast Sparse ISAR Imaging

Authors: Ziwen Wang, Jianping wang, Pucheng Li, Yifan Wu, Zegang Ding

Abstract: Numerous sparse inverse synthetic aperture radar (ISAR) imaging methods based on unfolded neural networks have been developed for high-quality image reconstruction with sparse measurements. However, their training typically requires paired ISAR images and echoes, which are often difficult to obtain. Meanwhile, one property can be observed that for a certain sparse measurement configuration of ISAR… ▽ More Numerous sparse inverse synthetic aperture radar (ISAR) imaging methods based on unfolded neural networks have been developed for high-quality image reconstruction with sparse measurements. However, their training typically requires paired ISAR images and echoes, which are often difficult to obtain. Meanwhile, one property can be observed that for a certain sparse measurement configuration of ISAR, when a target is rotated around its center of mass, only the image of the target undergoes the corresponding rotation after ISAR imaging, while the grating lobes do not follow this rotation and are solely determined by the sparse-sampling pattern. This property is mathematically termed as the equivariant property. Taking advantage of this property, an unfolded neural network for sparse ISAR imaging with self-supervised learning, named SS-ISAR-Net is proposed. It effectively mitigates grating lobes caused by sparse radar echo, allowing high-quality training to be achieved using only sparse radar echo data. The superiority of the proposed SS-ISAR-Net, compared to existing methods, is verified through experiments with both synthetic and real-world measurement data. △ Less

Submitted 1 June, 2025; originally announced June 2025.

arXiv:2506.00400 [pdf, ps, other]

Scaling Textual Gradients via Sampling-Based Momentum

Authors: Zixin Ding, Junyuan Hong, Jiachen T. Wang, Zinan Lin, Zhangyang Wang, Yuxin Chen

Abstract: As prompts play an increasingly critical role in large language models (LLMs), optimizing textual prompts has become a crucial challenge. The Textual Gradient Descent (TGD) framework has emerged as a promising data-driven approach that iteratively refines textual prompts using LLM - suggested updates (or textual gradients) over minibatches of training samples. In this paper, we empirically demonst… ▽ More As prompts play an increasingly critical role in large language models (LLMs), optimizing textual prompts has become a crucial challenge. The Textual Gradient Descent (TGD) framework has emerged as a promising data-driven approach that iteratively refines textual prompts using LLM - suggested updates (or textual gradients) over minibatches of training samples. In this paper, we empirically demonstrate that scaling the number of training examples initially improves but later degrades TGD's performance across multiple downstream NLP tasks. However, while data scaling improves results for most tasks, it also significantly increases the computational cost when leveraging LLMs. To address this, we draw inspiration from numerical gradient descent and propose Textual Stochastic Gradient Descent with Momentum (TSGD-M) - a method that facilitates scalable in-context learning by reweighting prompt sampling based on past batch distributions. Across nine NLP tasks spanning three domains - including BIG-Bench Hard (BBH), natural language understanding tasks, and reasoning tasks - TSGD-M significantly outperforms TGD baselines that do not incorporate reweighted sampling, while also reducing variance in most tasks. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.23131 [pdf, ps, other]

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

Authors: Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine

Abstract: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness o… ▽ More We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose \textsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that \textsc{Doppler} outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 32 pages, 19 figures

arXiv:2505.21996 [pdf, ps, other]

Learning World Models for Interactive Video Generation

Authors: Taiye Chen, Xun Hu, Zihan Ding, Chi Jin

Abstract: Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation have limited inherent world modeling capabilities due to two main challenges: compounding errors and insufficient memory mechanisms. We enhance image-to-video models with interactive capabilities through additi… ▽ More Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation have limited inherent world modeling capabilities due to two main challenges: compounding errors and insufficient memory mechanisms. We enhance image-to-video models with interactive capabilities through additional action conditioning and autoregressive framework, and reveal that compounding error is inherently irreducible in autoregressive video generation, while insufficient memory mechanism leads to incoherence of world models. We propose video retrieval augmented generation (VRAG) with explicit global state conditioning, which significantly reduces long-term compounding errors and increases spatiotemporal consistency of world models. In contrast, naive autoregressive generation with extended context windows and retrieval-augmented generation prove less effective for video generation, primarily due to the limited in-context learning capabilities of current video models. Our work illuminates the fundamental challenges in video world models and establishes a comprehensive benchmark for improving video generation models with internal world modeling capabilities. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.21547 [pdf, ps, other]

Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing

Authors: Weixing Wang, Zifeng Ding, Jindong Gu, Rui Cao, Christoph Meinel, Gerard de Melo, Haojin Yang

Abstract: Large Vision-Language Models (LVLMs) with discrete image tokenizers unify multimodal representations by encoding visual inputs into a finite set of tokens. Despite their effectiveness, we find that these models still hallucinate non-existent objects. We hypothesize that this may be due to visual priors induced during training: When certain image tokens frequently co-occur in the same spatial regio… ▽ More Large Vision-Language Models (LVLMs) with discrete image tokenizers unify multimodal representations by encoding visual inputs into a finite set of tokens. Despite their effectiveness, we find that these models still hallucinate non-existent objects. We hypothesize that this may be due to visual priors induced during training: When certain image tokens frequently co-occur in the same spatial regions and represent shared objects, they become strongly associated with the verbalizations of those objects. As a result, the model may hallucinate by evoking visually absent tokens that often co-occur with present ones. To test this assumption, we construct a co-occurrence graph of image tokens using a segmentation dataset and employ a Graph Neural Network (GNN) with contrastive learning followed by a clustering method to group tokens that frequently co-occur in similar visual contexts. We find that hallucinations predominantly correspond to clusters whose tokens dominate the input, and more specifically, that the visually absent tokens in those clusters show much higher correlation with hallucinated objects compared to tokens present in the image. Based on this observation, we propose a hallucination mitigation method that suppresses the influence of visually absent tokens by modifying latent image embeddings during generation. Experiments show our method reduces hallucinations while preserving expressivity. Code is available at https://github.com/weixingW/CGC-VTD/tree/main △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.19927 [pdf, ps, other]

TCP: a Benchmark for Temporal Constraint-Based Planning

Authors: Zifeng Ding, Sikuan Yan, Zhangdie Yuan, Xianglong Hu, Fangru Lin, Andreas Vlachos

Abstract: Temporal reasoning and planning are essential capabilities for large language models (LLMs), yet most existing benchmarks evaluate them in isolation and under limited forms of complexity. To address this gap, we introduce the Temporal Constraint-based Planning (TCP) benchmark, that jointly assesses both capabilities. Each instance in TCP features a naturalistic dialogue around a collaborative proj… ▽ More Temporal reasoning and planning are essential capabilities for large language models (LLMs), yet most existing benchmarks evaluate them in isolation and under limited forms of complexity. To address this gap, we introduce the Temporal Constraint-based Planning (TCP) benchmark, that jointly assesses both capabilities. Each instance in TCP features a naturalistic dialogue around a collaborative project, where diverse and interdependent temporal constraints are explicitly or implicitly expressed, and models must infer an optimal schedule that satisfies all constraints. To construct TCP, we first generate abstract problem prototypes that are paired with realistic scenarios from various domains and enriched into dialogues using an LLM. A human quality check is performed on a sampled subset to confirm the reliability of our benchmark. We evaluate state-of-the-art LLMs and find that even the strongest models struggle with TCP, highlighting its difficulty and revealing limitations in LLMs' temporal constraint-based planning abilities. We analyze underlying failure cases, open source our benchmark, and hope our findings can inspire future research. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19897 [pdf, ps, other]

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Authors: Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu

Abstract: Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are pavin… ▽ More Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. Extensive evaluations of agents with state-of-the-art backbones (e.g., GPT-4o, Claude 3.7, UI-TARS) show that, despite some promising results, they still fall short of reliably assisting scientists in complex workflows, achieving only a 15% overall success rate. In-depth analysis further provides valuable insights for addressing current agent limitations and more effective design principles, paving the way to build more capable agents for scientific discovery. Our code, environment, and benchmark are at https://qiushisun.github.io/ScienceBoard-Home/. △ Less

Submitted 27 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

Comments: work in progress

arXiv:2505.17978 [pdf, ps, other]

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Authors: Rui Cao, Zifeng Ding, Zhijiang Guo, Michael Schlichtkrull, Andreas Vlachos

Abstract: Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVer… ▽ More Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in AVerImaTeC via inter-annotator studies, achieving a $κ=0.742$ on verdicts and $74.7\%$ consistency on QA pairs. We also propose a novel evaluation method for evidence retrieval and conduct extensive experiments to establish baselines for verifying image-text claims using open-web evidence. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17435 [pdf, ps, other]

Discretization-free Multicalibration through Loss Minimization over Tree Ensembles

Authors: Hongyi Henry Jin, Zijun Ding, Dung Daniel Ngo, Zhiwei Steven Wu

Abstract: In recent years, multicalibration has emerged as a desirable learning objective for ensuring that a predictor is calibrated across a rich collection of overlapping subpopulations. Existing approaches typically achieve multicalibration by discretizing the predictor's output space and iteratively adjusting its output values. However, this discretization approach departs from the standard empirical r… ▽ More In recent years, multicalibration has emerged as a desirable learning objective for ensuring that a predictor is calibrated across a rich collection of overlapping subpopulations. Existing approaches typically achieve multicalibration by discretizing the predictor's output space and iteratively adjusting its output values. However, this discretization approach departs from the standard empirical risk minimization (ERM) pipeline, introduces rounding error and additional sensitive hyperparameter, and may distort the predictor's outputs in ways that hinder downstream decision-making. In this work, we propose a discretization-free multicalibration method that directly optimizes an empirical risk objective over an ensemble of depth-two decision trees. Our ERM approach can be implemented using off-the-shelf tree ensemble learning methods such as LightGBM. Our algorithm provably achieves multicalibration, provided that the data distribution satisfies a technical condition we term as loss saturation. Across multiple datasets, our empirical evaluation shows that this condition is always met in practice. Our discretization-free algorithm consistently matches or outperforms existing multicalibration approaches--even when evaluated using a discretization-based multicalibration metric that shares its discretization granularity with the baselines. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.16877 [pdf, ps, other]

Predicate-Conditional Conformalized Answer Sets for Knowledge Graph Embeddings

Authors: Yuqicheng Zhu, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Evgeny Kharlamov, Steffen Staab

Abstract: Uncertainty quantification in Knowledge Graph Embedding (KGE) methods is crucial for ensuring the reliability of downstream applications. A recent work applies conformal prediction to KGE methods, providing uncertainty estimates by generating a set of answers that is guaranteed to include the true answer with a predefined confidence level. However, existing methods provide probabilistic guarantees… ▽ More Uncertainty quantification in Knowledge Graph Embedding (KGE) methods is crucial for ensuring the reliability of downstream applications. A recent work applies conformal prediction to KGE methods, providing uncertainty estimates by generating a set of answers that is guaranteed to include the true answer with a predefined confidence level. However, existing methods provide probabilistic guarantees averaged over a reference set of queries and answers (marginal coverage guarantee). In high-stakes applications such as medical diagnosis, a stronger guarantee is often required: the predicted sets must provide consistent coverage per query (conditional coverage guarantee). We propose CondKGCP, a novel method that approximates predicate-conditional coverage guarantees while maintaining compact prediction sets. CondKGCP merges predicates with similar vector representations and augments calibration with rank information. We prove the theoretical guarantees and demonstrate empirical effectiveness of CondKGCP by comprehensive evaluations. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: Accepted to the Finding of ACL 2025

arXiv:2505.13949 [pdf, ps, other]

FlashThink: An Early Exit Method For Efficient Reasoning

Authors: Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, Zheng Hu

Abstract: Large Language Models (LLMs) have shown impressive performance in reasoning tasks. However, LLMs tend to generate excessively long reasoning content, leading to significant computational overhead. Our observations indicate that even on simple problems, LLMs tend to produce unnecessarily lengthy reasoning content, which is against intuitive expectations. Preliminary experiments show that at a certa… ▽ More Large Language Models (LLMs) have shown impressive performance in reasoning tasks. However, LLMs tend to generate excessively long reasoning content, leading to significant computational overhead. Our observations indicate that even on simple problems, LLMs tend to produce unnecessarily lengthy reasoning content, which is against intuitive expectations. Preliminary experiments show that at a certain point during the generation process, the model is already capable of producing the correct solution without completing the full reasoning content. Therefore, we consider that the reasoning process of the model can be exited early to achieve the purpose of efficient reasoning. We introduce a verification model that identifies the exact moment when the model can stop reasoning and still provide the correct answer. Comprehensive experiments on four different benchmarks demonstrate that our proposed method, FlashThink, effectively shortens the reasoning content while preserving the model accuracy. For the Deepseek-R1 and QwQ-32B models, we reduced the length of reasoning content by 77.04% and 77.47%, respectively, without reducing the accuracy. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.12877 [pdf, ps, other]

Exceptional extensions of local fields and the Carlitz--Wan conjecture

Authors: Zhiguo Ding, Wei Xiong, Qifan Zhang

Abstract: For any prime power $q$, a polynomial $f(X)\in\F_q[X]$ is ``exceptional'' if it induces bijections of $\F_{q^k}$ for infinitely many $k$; this condition is known to be equivalent to $f(X)$ inducing a bijection of $\F_{q^k}$ for at least one $k$ with $q^k\ge °(f)^4$. In this paper, we introduce the notion of an ``exceptional'' extension of local fields of any characteristic, and show that if… ▽ More For any prime power $q$, a polynomial $f(X)\in\F_q[X]$ is ``exceptional'' if it induces bijections of $\F_{q^k}$ for infinitely many $k$; this condition is known to be equivalent to $f(X)$ inducing a bijection of $\F_{q^k}$ for at least one $k$ with $q^k\ge °(f)^4$. In this paper, we introduce the notion of an ``exceptional'' extension of local fields of any characteristic, and show that if $f(X)\in\F_q[X]$ is exceptional in the classical sense then the field extension $\F_q(X)/\F_q(f(X))$ yields an exceptional local field extension upon passing to the completion at a degree-$1$ place. We describe all exceptional local field extensions of degree coprime to the residue characteristic, determine the relationship between exceptionality of a local field extension and exceptionality of a subextension, and give various Galois-theoretic characterizations of exceptional local field extensions. As a consequence, we obtain three new proofs, using quite different tools, of a theorem of Guralnick and Müller about ramification indices in exceptional maps between curves over $\F_q$. This theorem generalizes a result of Lenstra which subsumes earlier conjectures of Carlitz and Wan. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12188 [pdf, ps, other]

LLM-DSE: Searching Accelerator Parameters with LLM Agents

Authors: Hanyu Wang, Xinrui Wu, Zijian Ding, Su Zheng, Chengyue Wang, Tony Nowatzki, Yizhou Sun, Jason Cong

Abstract: Even though high-level synthesis (HLS) tools mitigate the challenges of programming domain-specific accelerators (DSAs) by raising the abstraction level, optimizing hardware directive parameters remains a significant hurdle. Existing heuristic and learning-based methods struggle with adaptability and sample efficiency. We present LLM-DSE, a multi-agent framework designed specifically for optimizin… ▽ More Even though high-level synthesis (HLS) tools mitigate the challenges of programming domain-specific accelerators (DSAs) by raising the abstraction level, optimizing hardware directive parameters remains a significant hurdle. Existing heuristic and learning-based methods struggle with adaptability and sample efficiency. We present LLM-DSE, a multi-agent framework designed specifically for optimizing HLS directives. Combining LLM with design space exploration (DSE), our explorer coordinates four agents: Router, Specialists, Arbitrator, and Critic. These multi-agent components interact with various tools to accelerate the optimization process. LLM-DSE leverages essential domain knowledge to identify efficient parameter combinations while maintaining adaptability through verbal learning from online interactions. Evaluations on the HLSyn dataset demonstrate that LLM-DSE achieves substantial $2.55\times$ performance gains over state-of-the-art methods, uncovering novel designs while reducing runtime. Ablation studies validate the effectiveness and necessity of the proposed agent interactions. Our code is open-sourced here: https://github.com/Nozidoali/LLM-DSE. △ Less

Submitted 20 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11939 [pdf, ps, other]

Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement

Authors: Haitao Li, Che Liu, Zhengyao Ding, Ziyi Liu, Zhengxing Huang

Abstract: Electrocardiograms (ECGs) are essential for diagnosing cardiovascular diseases. While previous ECG-text contrastive learning methods have shown promising results, they often overlook the incompleteness of the reports. Given an ECG, the report is generated by first identifying key waveform features and then inferring the final diagnosis through these features. Despite their importance, these wavefo… ▽ More Electrocardiograms (ECGs) are essential for diagnosing cardiovascular diseases. While previous ECG-text contrastive learning methods have shown promising results, they often overlook the incompleteness of the reports. Given an ECG, the report is generated by first identifying key waveform features and then inferring the final diagnosis through these features. Despite their importance, these waveform features are often not recorded in the report as intermediate results. Aligning ECGs with such incomplete reports impedes the model's ability to capture the ECG's waveform features and limits its understanding of diagnostic reasoning based on those features. To address this, we propose FG-CLEP (Fine-Grained Contrastive Language ECG Pre-training), which aims to recover these waveform features from incomplete reports with the help of large language models (LLMs), under the challenges of hallucinations and the non-bijective relationship between waveform features and diagnoses. Additionally, considering the frequent false negatives due to the prevalence of common diagnoses in ECGs, we introduce a semantic similarity matrix to guide contrastive learning. Furthermore, we adopt a sigmoid-based loss function to accommodate the multi-label nature of ECG-related tasks. Experiments on six datasets demonstrate that FG-CLEP outperforms state-of-the-art methods in both zero-shot prediction and linear probing across these datasets. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11921 [pdf, ps, other]

DC-Seg: Disentangled Contrastive Learning for Brain Tumor Segmentation with Missing Modalities

Authors: Haitao Li, Ziyu Li, Yiheng Mao, Zhengyao Ding, Zhengxing Huang

Abstract: Accurate segmentation of brain images typically requires the integration of complementary information from multiple image modalities. However, clinical data for all modalities may not be available for every patient, creating a significant challenge. To address this, previous studies encode multiple modalities into a shared latent space. While somewhat effective, it remains suboptimal, as each moda… ▽ More Accurate segmentation of brain images typically requires the integration of complementary information from multiple image modalities. However, clinical data for all modalities may not be available for every patient, creating a significant challenge. To address this, previous studies encode multiple modalities into a shared latent space. While somewhat effective, it remains suboptimal, as each modality contains distinct and valuable information. In this study, we propose DC-Seg (Disentangled Contrastive Learning for Segmentation), a new method that explicitly disentangles images into modality-invariant anatomical representation and modality-specific representation, by using anatomical contrastive learning and modality contrastive learning respectively. This solution improves the separation of anatomical and modality-specific features by considering the modality gaps, leading to more robust representations. Furthermore, we introduce a segmentation-based regularizer that enhances the model's robustness to missing modalities. Extensive experiments on the BraTS 2020 and a private white matter hyperintensity(WMH) segmentation dataset demonstrate that DC-Seg outperforms state-of-the-art methods in handling incomplete multimodal brain tumor segmentation tasks with varying missing modalities, while also demonstrate strong generalizability in WMH segmentation. The code is available at https://github.com/CuCl-2/DC-Seg. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11893 [pdf, ps, other]

RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving

Authors: Zepeng Ding, Dixuan Wang, Ziqin Luo, Guochao Jiang, Deqing Yang, Jiaqing Liang

Abstract: Multi-step planning has been widely employed to enhance the performance of large language models (LLMs) on downstream natural language processing (NLP) tasks, which decomposes the original task into multiple subtasks and guide LLMs to solve them sequentially without additional training. When addressing task instances, existing methods either preset the order of steps or attempt multiple paths at e… ▽ More Multi-step planning has been widely employed to enhance the performance of large language models (LLMs) on downstream natural language processing (NLP) tasks, which decomposes the original task into multiple subtasks and guide LLMs to solve them sequentially without additional training. When addressing task instances, existing methods either preset the order of steps or attempt multiple paths at each step. However, these methods overlook instances' linguistic features and rely on the intrinsic planning capabilities of LLMs to evaluate intermediate feedback and then select subtasks, resulting in suboptimal outcomes. To better solve multi-step NLP tasks with LLMs, in this paper we propose a Reinforcement Learning enhanced Adaptive Planning framework (RLAP). In our framework, we model an NLP task as a Markov decision process (MDP) and employ an LLM directly into the environment. In particular, a lightweight Actor model is trained to estimate Q-values for natural language sequences consisting of states and actions through reinforcement learning. Therefore, during sequential planning, the linguistic features of each sequence in the MDP can be taken into account, and the Actor model interacts with the LLM to determine the optimal order of subtasks for each task instance. We apply RLAP on three different types of NLP tasks and conduct extensive experiments on multiple datasets to verify RLAP's effectiveness and robustness. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Showing 1–50 of 1,100 results for author: Ding, Z