Search | arXiv e-print repository

Distributed Composite Optimization with Sub-Weibull Noises

Authors: Zhan Yu, Zhongjie Shi, Deming Yuan

Abstract: With the rapid development of multi-agent distributed optimization (MA-DO) theory over the past decade, the distributed stochastic gradient method (DSGM) occupies an important position. Although the theory of different DSGMs has been widely established, the main-stream results of existing work are still derived under the condition of light-tailed stochastic gradient noises. Increasing recent examp… ▽ More With the rapid development of multi-agent distributed optimization (MA-DO) theory over the past decade, the distributed stochastic gradient method (DSGM) occupies an important position. Although the theory of different DSGMs has been widely established, the main-stream results of existing work are still derived under the condition of light-tailed stochastic gradient noises. Increasing recent examples from various fields, indicate that, the light-tailed noise model is overly idealized in many practical instances, failing to capture the complexity and variability of noises in real-world scenarios, such as the presence of outliers or extreme values from data science and statistical learning. To address this issue, we propose a new DSGM framework that incorporates stochastic gradients under sub-Weibull randomness. We study a distributed composite stochastic mirror descent scheme with sub-Weibull gradient noise (DCSMD-SW) for solving a distributed composite optimization (DCO) problem over the time-varying multi-agent network. By investigating sub-Weibull randomness in DCSMD for the first time, we show that the algorithm is applicable in common heavy-tailed noise environments while also guaranteeing good convergence properties. We comprehensively study the convergence performance of DCSMD-SW. Satisfactory high probability convergence rates are derived for DCSMD-SW without any smoothness requirement. The work also offers a unified analysis framework for several critical cases of both algorithms and noise environments. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.06526 [pdf, ps, other]

Prompting Wireless Networks: Reinforced In-Context Learning for Power Control

Authors: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu, Xue Liu, Jianzhong, Zhang

Abstract: To manage and optimize constantly evolving wireless networks, existing machine learning (ML)- based studies operate as black-box models, leading to increased computational costs during training and a lack of transparency in decision-making, which limits their practical applicability in wireless networks. Motivated by recent advancements in large language model (LLM)-enabled wireless networks, this… ▽ More To manage and optimize constantly evolving wireless networks, existing machine learning (ML)- based studies operate as black-box models, leading to increased computational costs during training and a lack of transparency in decision-making, which limits their practical applicability in wireless networks. Motivated by recent advancements in large language model (LLM)-enabled wireless networks, this paper proposes ProWin, a novel framework that leverages reinforced in-context learning to design task-specific demonstration Prompts for Wireless Network optimization, relying on the inference capabilities of LLMs without the need for dedicated model training or finetuning. The task-specific prompts are designed to incorporate natural language descriptions of the task description and formulation, enhancing interpretability and eliminating the need for specialized expertise in network optimization. We further propose a reinforced in-context learning scheme that incorporates a set of advisable examples into task-specific prompts, wherein informative examples capturing historical environment states and decisions are adaptively selected to guide current decision-making. Evaluations on a case study of base station power control showcases that the proposed ProWin outperforms reinforcement learning (RL)-based methods, highlighting the potential for next-generation future wireless network optimization. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: arXiv admin note: substantial text overlap with arXiv:2408.00214

arXiv:2505.18254 [pdf, other]

Time independence does not limit information flow. II. The case with ancillas

Authors: T. C. Mooney, Dong Yuan, Adam Ehrenberg, Christopher L. Baldwin, Alexey V. Gorshkov, Andrew M. Childs

Abstract: While the impact of locality restrictions on quantum dynamics and algorithmic complexity has been well studied in the general case of time-dependent Hamiltonians, the capabilities of time-independent protocols are less well understood. Using clock constructions, we show that the light cone for time-independent Hamiltonians, as captured by Lieb-Robinson bounds, is the same as that for time-dependen… ▽ More While the impact of locality restrictions on quantum dynamics and algorithmic complexity has been well studied in the general case of time-dependent Hamiltonians, the capabilities of time-independent protocols are less well understood. Using clock constructions, we show that the light cone for time-independent Hamiltonians, as captured by Lieb-Robinson bounds, is the same as that for time-dependent systems when local ancillas are allowed. More specifically, we develop time-independent protocols for approximate quantum state transfer with the same run-times as their corresponding time-dependent protocols. Given any piecewise-continuous Hamiltonian, our construction gives a time-independent Hamiltonian that implements its dynamics in the same time, up to error $\varepsilon$, at the cost of introducing a number of local ancilla qubits for each data qubit that is polylogarithmic in the number of qubits, the norm of the Hamiltonian and its derivative (if it exists), the run time, and $1/\varepsilon$. We apply this construction to state transfer for systems with power-law-decaying interactions and one-dimensional nearest-neighbor systems with disordered interaction strengths. In both cases, this gives time-independent protocols with the same optimal light-cone-saturating run-times as their time-dependent counterparts. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 28 pages, 2 figures

arXiv:2505.18249 [pdf, ps, other]

Time Independence Does Not Limit Information Flow. I. The Free-Particle Case

Authors: Dong Yuan, Chao Yin, T. C. Mooney, Christopher L. Baldwin, Andrew M. Childs, Alexey V. Gorshkov

Abstract: The speed of information propagation in long-range interacting quantum systems is limited by Lieb-Robinson-type bounds, whose tightness can be established by finding specific quantum state-transfer protocols. Previous works have given quantum state-transfer protocols that saturate the corresponding Lieb-Robinson bounds using time-dependent Hamiltonians. Are speed limits for quantum information pro… ▽ More The speed of information propagation in long-range interacting quantum systems is limited by Lieb-Robinson-type bounds, whose tightness can be established by finding specific quantum state-transfer protocols. Previous works have given quantum state-transfer protocols that saturate the corresponding Lieb-Robinson bounds using time-dependent Hamiltonians. Are speed limits for quantum information propagation different for time-independent Hamiltonians? In a step towards addressing this question, we present and analyze two optimal time-independent state-transfer protocols for free-particle systems, which utilize continuous-time single-particle quantum walks with hopping strength decaying as a power law. We rigorously prove and numerically confirm that our protocols achieve quantum state transfer, with controllable error over an arbitrarily long distance in any spatial dimension, at the speed limits set by the free-particle Lieb-Robinson bounds. This shows that time independence does not limit information flow for long-range free-particle Hamiltonians. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 7+15 pages, 2+4 figures

arXiv:2505.17683 [pdf, ps, other]

Dual Attention Residual U-Net for Accurate Brain Ultrasound Segmentation in IVH Detection

Authors: Dan Yuan, Yi Feng, Ziyun Tang

Abstract: Intraventricular hemorrhage (IVH) is a severe neurological complication among premature infants, necessitating early and accurate detection from brain ultrasound (US) images to improve clinical outcomes. While recent deep learning methods offer promise for computer-aided diagnosis, challenges remain in capturing both local spatial details and global contextual dependencies critical for segmenting… ▽ More Intraventricular hemorrhage (IVH) is a severe neurological complication among premature infants, necessitating early and accurate detection from brain ultrasound (US) images to improve clinical outcomes. While recent deep learning methods offer promise for computer-aided diagnosis, challenges remain in capturing both local spatial details and global contextual dependencies critical for segmenting brain anatomies. In this work, we propose an enhanced Residual U-Net architecture incorporating two complementary attention mechanisms: the Convolutional Block Attention Module (CBAM) and a Sparse Attention Layer (SAL). The CBAM improves the model's ability to refine spatial and channel-wise features, while the SAL introduces a dual-branch design, sparse attention filters out low-confidence query-key pairs to suppress noise, and dense attention ensures comprehensive information propagation. Extensive experiments on the Brain US dataset demonstrate that our method achieves state-of-the-art segmentation performance, with a Dice score of 89.04% and IoU of 81.84% for ventricle region segmentation. These results highlight the effectiveness of integrating spatial refinement and attention sparsity for robust brain anatomy detection. Code is available at: https://github.com/DanYuan001/BrainImgSegment. △ Less

Submitted 10 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

Comments: 10 pages,6 figures and 3 tables

arXiv:2505.12284 [pdf, other]

Efficient RL Training for Reasoning Models via Length-Aware Optimization

Authors: Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao

Abstract: Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by introducing additional training data and stages. In this paper, we propose three critical reward designs integrated directly into the reinforce… ▽ More Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by introducing additional training data and stages. In this paper, we propose three critical reward designs integrated directly into the reinforcement learning process of large reasoning models, which reduce the response length without extra training stages. Experiments on four settings show that our method significantly decreases response length while maintaining or even improving performance. Specifically, in a logic reasoning setting, we achieve a 40% reduction in response length averaged by steps alongside a 14% gain in performance. For math problems, we reduce response length averaged by steps by 33% while preserving performance. △ Less

Submitted 18 May, 2025; originally announced May 2025.

Comments: Under review

arXiv:2505.03612 [pdf, other]

Backstepping Reach-avoid Controller Synthesis for Multi-input Multi-output Systems with Mixed Relative Degrees

Authors: Jianqiang Ding, Dingran Yuan, Shankar A. Deka

Abstract: Designing controllers with provable formal guarantees has become an urgent requirement for cyber-physical systems in safety-critical scenarios. Beyond addressing scalability in high-dimensional implementations, controller synthesis methodologies separating safety and reachability objectives may risk optimization infeasibility due to conflicting constraints, thereby significantly undermining their… ▽ More Designing controllers with provable formal guarantees has become an urgent requirement for cyber-physical systems in safety-critical scenarios. Beyond addressing scalability in high-dimensional implementations, controller synthesis methodologies separating safety and reachability objectives may risk optimization infeasibility due to conflicting constraints, thereby significantly undermining their applicability in practical applications. In this paper, by leveraging feedback linearization and backstepping techniques, we present a novel framework for constructing provable reach-avoid formal certificates tailored to multi-input multi-output systems. Based on this, we developed a systematic synthesis approach for controllers with reach-avoid guarantees, which ensures that the outputs of the system eventually enter the predefined target set while staying within the required safe set. Finally, we demonstrate the effectiveness of our method through simulations. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2504.19417 [pdf, ps, other]

A Real-Time Event-Based Normal Flow Estimator

Authors: Dehao Yuan, Cornelia Fermüller

Abstract: This paper presents a real-time, asynchronous, event-based normal flow estimator. It follows the same algorithm as Learning Normal Flow Directly From Event Neighborhoods, but with a more optimized implementation. The original method treats event slices as 3D point clouds, encodes each event's local geometry into a fixed-length vector, and uses a multi-layer perceptron to predict normal flow. It co… ▽ More This paper presents a real-time, asynchronous, event-based normal flow estimator. It follows the same algorithm as Learning Normal Flow Directly From Event Neighborhoods, but with a more optimized implementation. The original method treats event slices as 3D point clouds, encodes each event's local geometry into a fixed-length vector, and uses a multi-layer perceptron to predict normal flow. It constructs representations by multiplying an adjacency matrix with a feature matrix, resulting in quadratic time complexity with respect to the number of events. In contrast, we leverage the fact that event coordinates are integers and reformulate the representation step as a pooling operation. This achieves the same effect as the adjacency matrix but with much lower computational cost. As a result, our method supports real-time normal flow prediction on event cameras. Our estimator uses 1 GB of CUDA memory and runs at 4 million normal flows per second on an RTX 3070, or 6 million per second on an RTX A5000. We release the CUDA implementation along with a Python interface at https://github.com/dhyuan99/VecKM_flow_cpp. △ Less

Submitted 27 April, 2025; originally announced April 2025.

arXiv:2504.09416 [pdf, other]

Spatially Directional Dual-Attention GAT for Spatial Fluoride Health Risk Modeling

Authors: Da Yuan

Abstract: Environmental exposure to fluoride is a major public health concern, particularly in regions with naturally elevated fluoride concentrations. Accurate modeling of fluoride-related health risks, such as dental fluorosis, requires spatially aware learning frameworks capable of capturing both geographic and semantic heterogeneity. In this work, we propose Spatially Directional Dual-Attention Graph At… ▽ More Environmental exposure to fluoride is a major public health concern, particularly in regions with naturally elevated fluoride concentrations. Accurate modeling of fluoride-related health risks, such as dental fluorosis, requires spatially aware learning frameworks capable of capturing both geographic and semantic heterogeneity. In this work, we propose Spatially Directional Dual-Attention Graph Attention Network (SDD-GAT), a novel spatial graph neural network designed for fine-grained health risk prediction. SDD-GAT introduces a dual-graph architecture that disentangles geographic proximity and attribute similarity, and incorporates a directional attention mechanism that explicitly encodes spatial orientation and distance into the message passing process. To further enhance spatial coherence, we introduce a spatial smoothness regularization term that enforces consistency in predictions across neighboring locations. We evaluate SDD-GAT on a large-scale dataset covering over 50,000 fluoride monitoring samples and fluorosis records across Guizhou Province, China. Results show that SDD-GAT significantly outperforms traditional models and state-of-the-art GNNs in both regression and classification tasks, while also exhibiting improved spatial autocorrelation as measured by Moran's I. Our framework provides a generalizable foundation for spatial health risk modeling and geospatial learning under complex environmental settings. △ Less

Submitted 12 April, 2025; originally announced April 2025.

arXiv:2504.08321 [pdf, ps, other]

doi 10.1029/2025JA033772

Exploring the origin of multi-periodic pulsations during a white-light flare

Authors: Dong Li, Ding Yuan, Jingye Yan, Xinhua Zhao, Zhao Wu, Jincheng Wang, Zhenyong Hou, Chuan Li, Haisheng Zhao, Libo Fu, Lin Wu, Li Deng

Abstract: We explored the quasi-periodic pulsations (QPPs) at multiple periods during an X4.0 flare on 2024 May 10 (SOL2024-05-10T06:27), which occurred in the complex active region of NOAA 13664. The flare radiation reveals five prominent periods in multiple wavelengths. A 8-min QPP is simultaneously detected in wavelengths of HXR, radio, UV/EUV, Lya, and white light, which may be associated with nontherma… ▽ More We explored the quasi-periodic pulsations (QPPs) at multiple periods during an X4.0 flare on 2024 May 10 (SOL2024-05-10T06:27), which occurred in the complex active region of NOAA 13664. The flare radiation reveals five prominent periods in multiple wavelengths. A 8-min QPP is simultaneously detected in wavelengths of HXR, radio, UV/EUV, Lya, and white light, which may be associated with nonthermal electrons periodically accelerated by intermittent magnetic reconnection that is modulated by the slow wave. A quasi-period at 14 minutes is observed in the SXR and high-temperature EUV wavebands, and it may be caused by repeatedly heated plasmas in hot flare loops. A quasiperiod at about 18 minutes is only observed by STIX, with reconstructed SXR images suggesting that the 18-min period pulsations should be considered as different flares. Meanwhile, a 3-min QPP is simultaneously detected in wavelengths of HXR, radio, and UV/ EUV, which is directly modulated by the slow magnetoacoustic wave leaking from sunspot umbrae. At last, a 2-min QPP is simultaneously detected in HXR and radio emissions during the pre-flare phase, which is possibly generated by a quasi-periodic regime of magnetic reconnection that is triggered by the kink wave. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 32 pages, 11 figures, accepted to Journal of Geophysical Research-Space Physics

arXiv:2504.06129 [pdf, other]

Knowledge Graph Completion with Relation-Aware Anchor Enhancement

Authors: Duanyang Yuan, Sihang Zhou, Xiaoshu Chen, Dong Wang, Ke Liang, Xinwang Liu, Jian Huang

Abstract: Text-based knowledge graph completion methods take advantage of pre-trained language models (PLM) to enhance intrinsic semantic connections of raw triplets with detailed text descriptions. Typical methods in this branch map an input query (textual descriptions associated with an entity and a relation) and its candidate entities into feature vectors, respectively, and then maximize the probability… ▽ More Text-based knowledge graph completion methods take advantage of pre-trained language models (PLM) to enhance intrinsic semantic connections of raw triplets with detailed text descriptions. Typical methods in this branch map an input query (textual descriptions associated with an entity and a relation) and its candidate entities into feature vectors, respectively, and then maximize the probability of valid triples. These methods are gaining promising performance and increasing attention for the rapid development of large language models. According to the property of the language models, the more related and specific context information the input query provides, the more discriminative the resultant embedding will be. In this paper, through observation and validation, we find a neglected fact that the relation-aware neighbors of the head entities in queries could act as effective contexts for more precise link prediction. Driven by this finding, we propose a relation-aware anchor enhanced knowledge graph completion method (RAA-KGC). Specifically, in our method, to provide a reference of what might the target entity be like, we first generate anchor entities within the relation-aware neighborhood of the head entity. Then, by pulling the query embedding towards the neighborhoods of the anchors, it is tuned to be more discriminative for target entity matching. The results of our extensive experiments not only validate the efficacy of RAA-KGC but also reveal that by integrating our relation-aware anchor enhancement strategy, the performance of current leading methods can be notably enhanced without substantial modifications. △ Less

Submitted 30 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

arXiv:2503.24245 [pdf, other]

Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation

Authors: Dun Yuan, Hao Zhou, Di Wu, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

Abstract: Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks. However, LLMs are still facing challenges when applied to domain-specific areas like telecommunications, which demands specialized expertise and adaptability to evolving standards. This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generati… ▽ More Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks. However, LLMs are still facing challenges when applied to domain-specific areas like telecommunications, which demands specialized expertise and adaptability to evolving standards. This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generation (RAG) techniques to enhance LLM performance in the telecom domain. The framework leverages a KG to capture structured, domain-specific information about network protocols, standards, and other telecom-related entities, comprehensively representing their relationships. By integrating KG with RAG, LLMs can dynamically access and utilize the most relevant and up-to-date knowledge during response generation. This hybrid approach bridges the gap between structured knowledge representation and the generative capabilities of LLMs, significantly enhancing accuracy, adaptability, and domain-specific comprehension. Our results demonstrate the effectiveness of the KG-RAG framework in addressing complex technical queries with precision. The proposed KG-RAG model attained an accuracy of 88% for question answering tasks on a frequently used telecom-specific dataset, compared to 82% for the RAG-only and 48% for the LLM-only approaches. △ Less

Submitted 21 May, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.22070 [pdf, ps, other]

Quantum Quasi-neutral Limits and Isothermal Euler Equations

Authors: Immanuel Ben Porat, Gui-Qiang G. Chen, Difan Yuan

Abstract: We provide a rigorous justification of the semiclassical quasi-neutral and the quantum many-body limits to the isothermal Euler equations. We consider the nonlinear Schrödinger-Poisson-Boltzmann system under a quasi-neutral scaling and establish the convergence of its solutions to the isothermal Euler equations. Different from the previous results that dealt with the linear Poisson equations, the… ▽ More We provide a rigorous justification of the semiclassical quasi-neutral and the quantum many-body limits to the isothermal Euler equations. We consider the nonlinear Schrödinger-Poisson-Boltzmann system under a quasi-neutral scaling and establish the convergence of its solutions to the isothermal Euler equations. Different from the previous results that dealt with the linear Poisson equations, the system under our consideration accounts for the exponential nonlinearity in the potential. A modulated energy method is adopted, allowing us to derive the stability estimates and asymptotics. Furthermore, we focus our analysis on the many-body quantum problem via the von Neumann equation and establish a mean-field limit in one dimension by using Serfaty's functional inequalities, and thus connecting the quantum many-body dynamics with the macroscopic hydrodynamic equations. A refined analysis of the quasi-neutral scaling for the massless systems is presented, and the well-posedness of the underlying quantum dynamics is established. Moreover, the construction of general admissible initial data is obtained. Our results provide a rigorous mathematical analysis for the derivation of quantum hydrodynamic models and their limits, contributing to the broader understanding of interactions between quantum mechanics and compressible fluid dynamics. △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: 50 pages

arXiv:2503.21506 [pdf]

Yin-Yang vortex on UTe2 (011) surface

Authors: Ruotong Yin, Yuanji Li, Zengyi Du, Dengpeng Yuan, Shiyuan Wang, Jiashuo Gong, Mingzhe Li, Ziyuan Chen, Jiakang Zhang, Yuguang Wang, Ziwei Xue, Xinchun Lai, Shiyong Tan, Da Wang, Qiang-Hua Wang, Dong-Lai Feng, Ya-Jun Yan

Abstract: UTe2 is a promising candidate for spin-triplet superconductor, yet its exact superconducting order parameter remains highly debated. Here, via scanning tunneling microscopy/spectroscopy, we observe a novel type of magnetic vortex with distinct dark-bright contrast in local density of states on UTe2 (011) surface under a perpendicular magnetic field, resembling the conjugate structure of Yin-Yang d… ▽ More UTe2 is a promising candidate for spin-triplet superconductor, yet its exact superconducting order parameter remains highly debated. Here, via scanning tunneling microscopy/spectroscopy, we observe a novel type of magnetic vortex with distinct dark-bright contrast in local density of states on UTe2 (011) surface under a perpendicular magnetic field, resembling the conjugate structure of Yin-Yang diagram in Taoism. Each Yin-Yang vortex contains a quantized magnetic flux, and the boundary between the Yin and Yang parts aligns with the crystallographic a-axis of UTe2. The vortex states exhibit intriguing behaviors -- a sharp zero-energy conductance peak exists at the Yang part, while a superconducting gap with pronounced coherence peaks exists at the Yin part, which is even sharper than those measured far from the vortex core or in the absence of magnetic field. By theoretical modeling, we show that the Yin-Yang vortices on UTe2 (011) surface can be explained by the asymmetric vortex-derived local distortion of the zero-energy surface states associated with spin-triplet pairing with appropriate d-vectors. Therefore, the observation of Yin-Yang vortex confirms the spin-triplet pairing in UTe2 and imposes constraints on the candidate d-vector for the spin-triplet pairing. △ Less

Submitted 21 May, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

Comments: 13 pages, 4 figures

arXiv:2503.16758 [pdf, other]

Nonlinear stability of compressible vortex sheets in three-dimensional elastodynamics

Authors: Robin Ming Chen, Feimin Huang, Dehua Wang, Difan Yuan

Abstract: We investigate the nonlinear stability of compressible vortex sheet solutions for three-dimensional (3D) isentropic elastic flows. Building upon previous results on the weakly linear stability of elastic vortex sheets [19], we perform a detailed study of the roots of the Lopatinskii determinant and identify a geometric stability condition associated with the deformation gradient. We employ an uppe… ▽ More We investigate the nonlinear stability of compressible vortex sheet solutions for three-dimensional (3D) isentropic elastic flows. Building upon previous results on the weakly linear stability of elastic vortex sheets [19], we perform a detailed study of the roots of the Lopatinskii determinant and identify a geometric stability condition associated with the deformation gradient. We employ an upper triangularization technique that isolates the outgoing modes into a closed system, where they appear only at the leading order. This enables us to derive energy estimates despite derivative loss. The major novelty of our approach includes the following two key aspects: (1) For the 3D compressible Euler vortex sheets, the front symbol exhibits degenerate ellipticity in certain frequency directions, which makes it challenging to ensure the front's regularity using standard energy estimates. Our analysis reveals that the non-parallel structure of the deformation gradient tensor plays a crucial role in recovering ellipticity in the front symbol, thereby enhancing the regularity of the free interface. (2) Another significant challenge in 3D arises from the strong degeneracy caused by the collision of repeated roots and poles. Unlike in 2D, where such interactions are absent, we encounter a co-dimension one set in frequency space where a double root coincides with a double pole. To resolve this, we refine Coulombel's diagonalization framework [21] and construct a suitable transformation that reduces the degeneracy order of the Lopatinskii matrix, enabling the use of localized Garding-type estimates to control the characteristic components. Finally, we employ a Nash-Moser iteration scheme to establish the local existence and nonlinear stability of vortex sheets under small initial perturbations, showing stability within a subsonic regime. △ Less

Submitted 20 March, 2025; originally announced March 2025.

MSC Class: 35Q51; 35Q35; 74F10; 76E17; 76N99

arXiv:2503.16300 [pdf, other]

Localized Heating and Dynamics of the Solar Corona due to a Symbiosis of Waves and Reconnection

Authors: A. K. Srivastava, Sripan Mondal, Eric R. Priest, Sudheer K. Mishra, David I. Pontin, R. Y. Kwon, Ding Yuan, K. Murawski, Ayumi Asai

Abstract: The Sun's outer atmosphere, the corona, is maintained at mega-Kelvin temperatures and fills the heliosphere with a supersonic outflowing wind. The dissipation of magnetic waves and direct electric currents are likely to be the most significant processes for heating the corona, but a lively debate exists on their relative roles. Here, we suggest that the two are often intrinsically linked, since ma… ▽ More The Sun's outer atmosphere, the corona, is maintained at mega-Kelvin temperatures and fills the heliosphere with a supersonic outflowing wind. The dissipation of magnetic waves and direct electric currents are likely to be the most significant processes for heating the corona, but a lively debate exists on their relative roles. Here, we suggest that the two are often intrinsically linked, since magnetic waves may trigger current dissipation, and impulsive reconnection can launch magnetic waves. We present a study of the first of these processes by using a 2D physics-based numerical simulation using the Adaptive Mesh Refined (AMR) Versatile Advection Code (VAC). Magnetic waves such as fast magnetoacoustic waves are often observed to propagate in the large-scale corona and interact with local magnetic structures. The present numerical simulations show how the propagation of magnetic disturbances towards a null point or separator can lead to the accumulation of the electric currents. Lorentz forces can laterally push and vertically stretch the magnetic fields, forming a current sheet with a strong magnetic-field gradient. The magnetic field lines then break and reconnect, and so contribute towards coronal heating. Numerical results are presented that support these ideas and support the concept of a symbiosis between waves and reconnection in heating the solar corona. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 13 pages, 6 figures; Accepted for the publication in ApJ

arXiv:2503.07669 [pdf, other]

WECAR: An End-Edge Collaborative Inference and Training Framework for WiFi-Based Continuous Human Activity Recognition

Authors: Rong Li, Tao Deng, Siwei Feng, He Huang, Juncheng Jia, Di Yuan, Keqin Li

Abstract: WiFi-based human activity recognition (HAR) holds significant promise for ubiquitous sensing in smart environments. A critical challenge lies in enabling systems to dynamically adapt to evolving scenarios, learning new activities without catastrophic forgetting of prior knowledge, while adhering to the stringent computational constraints of edge devices. Current approaches struggle to reconcile th… ▽ More WiFi-based human activity recognition (HAR) holds significant promise for ubiquitous sensing in smart environments. A critical challenge lies in enabling systems to dynamically adapt to evolving scenarios, learning new activities without catastrophic forgetting of prior knowledge, while adhering to the stringent computational constraints of edge devices. Current approaches struggle to reconcile these requirements due to prohibitive storage demands for retaining historical data and inefficient parameter utilization. We propose WECAR, an end-edge collaborative inference and training framework for WiFi-based continuous HAR, which decouples computational workloads to overcome these limitations. In this framework, edge devices handle model training, lightweight optimization, and updates, while end devices perform efficient inference. WECAR introduces two key innovations, i.e., dynamic continual learning with parameter efficiency and hierarchical distillation for end deployment. For the former, we propose a transformer-based architecture enhanced by task-specific dynamic model expansion and stability-aware selective retraining. For the latter, we propose a dual-phase distillation mechanism that includes multi-head self-attention relation distillation and prefix relation distillation. We implement WECAR based on heterogeneous hardware using Jetson Nano as edge devices and the ESP32 as end devices, respectively. Our experiments across three public WiFi datasets reveal that WECAR not only outperforms several state-of-the-art methods in performance and parameter efficiency, but also achieves a substantial reduction in the model's parameter count post-optimization without sacrificing accuracy. This validates its practicality for resource-constrained environments. △ Less

Submitted 8 March, 2025; originally announced March 2025.

Comments: arXiv admin note: text overlap with arXiv:2502.17483

arXiv:2503.06468 [pdf, other]

Mobility-Aware Multi-Task Decentralized Federated Learning for Vehicular Networks: Modeling, Analysis, and Optimization

Authors: Dongyu Chen, Tao Deng, He Huang, Juncheng Jia, Mianxiong Dong, Di Yuan, Keqin Li

Abstract: Federated learning (FL) is a promising paradigm that can enable collaborative model training between vehicles while protecting data privacy, thereby significantly improving the performance of intelligent transportation systems (ITSs). In vehicular networks, due to mobility, resource constraints, and the concurrent execution of multiple training tasks, how to allocate limited resources effectively… ▽ More Federated learning (FL) is a promising paradigm that can enable collaborative model training between vehicles while protecting data privacy, thereby significantly improving the performance of intelligent transportation systems (ITSs). In vehicular networks, due to mobility, resource constraints, and the concurrent execution of multiple training tasks, how to allocate limited resources effectively to achieve optimal model training of multiple tasks is an extremely challenging issue. In this paper, we propose a mobility-aware multi-task decentralized federated learning (MMFL) framework for vehicular networks. By this framework, we address task scheduling, subcarrier allocation, and leader selection, as a joint optimization problem, termed as TSLP. For the case with a single FL task, we derive the convergence bound of model training. For general cases, we first model TSLP as a resource allocation game, and prove the existence of a Nash equilibrium (NE). Then, based on this proof, we reformulate the game as a decentralized partially observable Markov decision process (DEC-POMDP), and develop an algorithm based on heterogeneous-agent proximal policy optimization (HAPPO) to solve DEC-POMDP. Finally, numerical results are used to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: Submitted to IEEE for possible publication

arXiv:2503.06443 [pdf, other]

doi 10.1016/j.comnet.2025.111232

Mobility-Aware Decentralized Federated Learning with Joint Optimization of Local Iteration and Leader Selection for Vehicular Networks

Authors: Dongyu Chen, Tao Deng, Juncheng Jia, Siwei Feng, Di Yuan

Abstract: Federated learning (FL) emerges as a promising approach to empower vehicular networks, composed by intelligent connected vehicles equipped with advanced sensing, computing, and communication capabilities. While previous studies have explored the application of FL in vehicular networks, they have largely overlooked the intricate challenges arising from the mobility of vehicles and resource constrai… ▽ More Federated learning (FL) emerges as a promising approach to empower vehicular networks, composed by intelligent connected vehicles equipped with advanced sensing, computing, and communication capabilities. While previous studies have explored the application of FL in vehicular networks, they have largely overlooked the intricate challenges arising from the mobility of vehicles and resource constraints. In this paper, we propose a framework of mobility-aware decentralized federated learning (MDFL) for vehicular networks. In this framework, nearby vehicles train an FL model collaboratively, yet in a decentralized manner. We formulate a local iteration and leader selection joint optimization problem (LSOP) to improve the training efficiency of MDFL. For problem solving, we first reformulate LSOP as a decentralized partially observable Markov decision process (Dec-POMDP), and then develop an effective optimization algorithm based on multi-agent proximal policy optimization (MAPPO) to solve Dec-POMDP. Finally, we verify the performance of the proposed algorithm by comparing it with other algorithms. △ Less

Submitted 11 March, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

Comments: Preprint submitted to Computer Networks; Corrected a missing space in arXiv abstract to ensure proper formatting

arXiv:2503.01116 [pdf, other]

Large AI Model for Delay-Doppler Domain Channel Prediction in 6G OTFS-Based Vehicular Networks

Authors: Jianzhe Xue, Dongcheng Yuan, Zhanxi Ma, Tiankai Jiang, Yu Sun, Haibo Zhou, Xuemin Shen

Abstract: Channel prediction is crucial for high-mobility vehicular networks, as it enables the anticipation of future channel conditions and the proactive adjustment of communication strategies. However, achieving accurate vehicular channel prediction is challenging due to significant Doppler effects and rapid channel variations resulting from high-speed vehicle movement and complex propagation environment… ▽ More Channel prediction is crucial for high-mobility vehicular networks, as it enables the anticipation of future channel conditions and the proactive adjustment of communication strategies. However, achieving accurate vehicular channel prediction is challenging due to significant Doppler effects and rapid channel variations resulting from high-speed vehicle movement and complex propagation environments. In this paper, we propose a novel delay-Doppler (DD) domain channel prediction framework tailored for high-mobility vehicular networks. By transforming the channel representation into the DD domain, we obtain an intuitive, sparse, and stable depiction that closely aligns with the underlying physical propagation processes, effectively reducing the complex vehicular channel to a set of time-series parameters with enhanced predictability. Furthermore, we leverage the large artificial intelligence (AI) model to predict these DD-domain time-series parameters, capitalizing on their advanced ability to model temporal correlations. The zero-shot capability of the pre-trained large AI model facilitates accurate channel predictions without requiring task-specific training, while subsequent fine-tuning on specific vehicular channel data further improves prediction accuracy. Extensive simulation results demonstrate the effectiveness of our DD-domain channel prediction framework and the superior accuracy of the large AI model in predicting time-series channel parameters, thereby highlighting the potential of our approach for robust vehicular communication systems. △ Less

Submitted 8 May, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

Comments: This manuscript has been accepted by SCIENCE CHINA Information Sciences

arXiv:2502.13094 [pdf, other]

Global Existence and Nonlinear Stability of Finite-Energy Solutions of the Compressible Euler-Riesz Equations with Large Initial Data of Spherical Symmetry

Authors: José A. Carrillo, Samuel R. Charles, Gui-Qiang G. Chen, Difan Yuan

Abstract: The compressible Euler-Riesz equations are fundamental with wide applications in astrophysics, plasma physics, and mathematical biology. In this paper, we are concerned with the global existence and nonlinear stability of finite-energy solutions of the multidimensional Euler-Riesz equations with large initial data of spherical symmetry. We consider both attractive and repulsive interactions for a… ▽ More The compressible Euler-Riesz equations are fundamental with wide applications in astrophysics, plasma physics, and mathematical biology. In this paper, we are concerned with the global existence and nonlinear stability of finite-energy solutions of the multidimensional Euler-Riesz equations with large initial data of spherical symmetry. We consider both attractive and repulsive interactions for a wide range of Riesz and logarithmic potentials for dimensions larger than or equal to two. This is achieved by the inviscid limit of the solutions of the corresponding Cauchy problem for the Navier-Stokes-Riesz equations. The strong convergence of the vanishing viscosity solutions is achieved through delicate uniform estimates in $L^p$. It is observed that, even if the attractive potential is super-Coulomb, no concentration is formed near the origin in the inviscid limit. Moreover, we prove that the nonlinear stability of global finite-energy solutions for the Euler-Riesz equations is unconditional under a spherically symmetric perturbation around the steady solutions. Unlike the Coulomb case where the potential can be represented locally, the singularity and regularity of the nonlocal radial Riesz potential near the origin require careful analysis, which is a crucial step. Finally, unlike the Coulomb case, a Grönwall type estimate is required to overcome the difficulty of the appearance of boundary terms in the sub-Coulomb case and the singularity of the super-Coulomb potential. Furthermore, we prove the nonlinear stability of global finite-energy solutions for the compressible Euler-Riesz equations around steady states by employing concentration compactness arguments. Steady states properties are obtained by variational arguments connecting to recent advances in aggregation-diffusion equations. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 68 pages, 1 figure

MSC Class: 35Q35; 35Q31; 35B25; 35B44; 35L65; 35L67; 76N10; 35R09; 35R35; 35D30; 76X05; 76N17

arXiv:2501.13794 [pdf, ps, other]

Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

Authors: Zhi Sheng, Daisy Yuan, Jingtao Ding, Yong Li

Abstract: Accurate prediction of mobile traffic, i.e., network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due… ▽ More Accurate prediction of mobile traffic, i.e., network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into prior and residual components, with the prior} derived from data dynamics, enhancing the model's ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30\%, offering a new perspective on leveraging diffusion models in this domain. We provide code and data at https://github.com/tsinghua-fib-lab/NPDiff. △ Less

Submitted 26 June, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.07879 [pdf, ps, other]

Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

Authors: Deheng Yuan, Tao Guo, Zhongyi Huang

Abstract: Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by pr… ▽ More Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.06255 [pdf, other]

Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan

Abstract: Long-term and Large-scale Wireless Traffic Forecasting (LL-WTF) is pivotal for strategic network management and comprehensive planning on a macro scale. However, LL-WTF poses greater challenges than short-term ones due to the pronounced non-stationarity of extended wireless traffic and the vast number of nodes distributed at the city scale. To cope with this, we propose a Progressive Supervision m… ▽ More Long-term and Large-scale Wireless Traffic Forecasting (LL-WTF) is pivotal for strategic network management and comprehensive planning on a macro scale. However, LL-WTF poses greater challenges than short-term ones due to the pronounced non-stationarity of extended wireless traffic and the vast number of nodes distributed at the city scale. To cope with this, we propose a Progressive Supervision method based on Label Decomposition (PSLD). Specifically, we first introduce a Random Subgraph Sampling (RSS) algorithm designed to sample a tractable subset from large-scale traffic data, thereby enabling efficient network training. Then, PSLD employs label decomposition to obtain multiple easy-to-learn components, which are learned progressively at shallow layers and combined at deep layers to effectively cope with the non-stationary problem raised by LL-WTF tasks. Finally, we compare the proposed method with various state-of-the-art (SOTA) methods on three large-scale WT datasets. Extensive experimental results demonstrate that the proposed PSLD significantly outperforms existing methods, with an average 2%, 4%, and 11% performance improvement on three WT datasets, respectively. In addition, we built an open source library for WT forecasting (WTFlib) to facilitate related research, which contains numerous SOTA methods and provides a strong benchmark.Experiments can be reproduced through https://github.com/Anoise/WTFlib. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: Published at Knowledge-Based Systems. arXiv admin note: substantial text overlap with arXiv:2412.00108

arXiv:2501.04688 [pdf, other]

Observation of topological prethermal strong zero modes

Authors: Feitong Jin, Si Jiang, Xuhao Zhu, Zehang Bao, Fanhao Shen, Ke Wang, Zitian Zhu, Shibo Xu, Zixuan Song, Jiachen Chen, Ziqi Tan, Yaozu Wu, Chuanyu Zhang, Yu Gao, Ning Wang, Yiren Zou, Aosai Zhang, Tingting Li, Jiarun Zhong, Zhengyi Cui, Yihang Han, Yiyang He, Han Wang, Jianan Yang, Yanzhe Wang , et al. (20 additional authors not shown)

Abstract: Symmetry-protected topological phases cannot be described by any local order parameter and are beyond the conventional symmetry-breaking paradigm for understanding quantum matter. They are characterized by topological boundary states robust against perturbations that respect the protecting symmetry. In a clean system without disorder, these edge modes typically only occur for the ground states of… ▽ More Symmetry-protected topological phases cannot be described by any local order parameter and are beyond the conventional symmetry-breaking paradigm for understanding quantum matter. They are characterized by topological boundary states robust against perturbations that respect the protecting symmetry. In a clean system without disorder, these edge modes typically only occur for the ground states of systems with a bulk energy gap and would not survive at finite temperatures due to mobile thermal excitations. Here, we report the observation of a distinct type of topological edge modes, which are protected by emergent symmetries and persist even up to infinite temperature, with an array of 100 programmable superconducting qubits. In particular, through digital quantum simulation of the dynamics of a one-dimensional disorder-free "cluster" Hamiltonian, we observe robust long-lived topological edge modes over up to 30 cycles at a wide range of temperatures. By monitoring the propagation of thermal excitations, we show that despite the free mobility of these excitations, their interactions with the edge modes are substantially suppressed in the dimerized regime due to an emergent U(1)$\times$U(1) symmetry, resulting in an unusually prolonged lifetime of the topological edge modes even at infinite temperature. In addition, we exploit these topological edge modes as logical qubits and prepare a logical Bell state, which exhibits persistent coherence in the dimerized and off-resonant regime, despite the system being disorder-free and far from its ground state. Our results establish a viable digital simulation approach to experimentally exploring a variety of finite-temperature topological phases and demonstrate a potential route to construct long-lived robust boundary qubits that survive to infinite temperature in disorder-free systems. △ Less

Submitted 8 January, 2025; originally announced January 2025.

arXiv:2412.19906 [pdf, other]

Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM

Authors: Dong Yuan, Eti Rastogi, Fen Zhao, Sagar Goyal, Gautam Naik, Sree Prasanna Rajagopal

Abstract: Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges, particularly when dealing with long and unstructured texts rich in content. Existing methods, such as ROUGE (Lin, 2004) and embedding similarities, often yield… ▽ More Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges, particularly when dealing with long and unstructured texts rich in content. Existing methods, such as ROUGE (Lin, 2004) and embedding similarities, often yield scores that have low correlation with human judgements and are also not intuitively understandable, making it difficult to gauge the true quality of the summaries. LLMs can mimic human in giving subjective reviews but subjective scores are hard to interpret and justify. They can be easily manipulated by altering the models and the tones of the prompts. In this paper, we introduce a novel evaluation methodology and tooling designed to address these challenges, providing a more comprehensive, accurate and interpretable assessment of summarization outputs. Our method (SumAutoEval) proposes and evaluates metrics at varying granularity levels, giving objective scores on 4 key dimensions such as completeness, correctness, Alignment and readability. We empirically demonstrate, that SumAutoEval enhances the understanding of output quality with better human correlation. △ Less

Submitted 27 December, 2024; originally announced December 2024.

arXiv:2412.19191 [pdf, other]

Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models

Authors: Haonan He, Yuchen Ren, Yining Tang, Ziyang Xu, Junxian Li, Minghao Yang, Di Zhang, Dong Yuan, Tao Chen, Shufei Zhang, Yuqiang Li, Nanqing Dong, Wanli Ouyang, Dongzhan Zhou, Peng Ye

Abstract: Large language models have already demonstrated their formidable capabilities in general domains, ushering in a revolutionary transformation. However, exploring and exploiting the extensive knowledge of these models to comprehend multi-omics biology remains underexplored. To fill this research gap, we first introduce Biology-Instructions, the first large-scale multi-omics biological sequences-rela… ▽ More Large language models have already demonstrated their formidable capabilities in general domains, ushering in a revolutionary transformation. However, exploring and exploiting the extensive knowledge of these models to comprehend multi-omics biology remains underexplored. To fill this research gap, we first introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset including DNA, RNA, proteins, and multi-molecules, designed to bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. This dataset can enhance the versatility of LLMs by integrating diverse biological sequenced-based prediction tasks with advanced reasoning capabilities, while maintaining conversational fluency. Additionally, we reveal significant performance limitations in even state-of-the-art LLMs on biological sequence-related multi-omics tasks without specialized pre-training and instruction-tuning. We further develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline, demonstrating the powerful ability to understand biology by using Biology-Instructions. Biology-Instructions and ChatMultiOmics are publicly available and crucial resources for enabling more effective integration of LLMs with multi-omics sequence analysis. △ Less

Submitted 26 December, 2024; originally announced December 2024.

arXiv:2412.11540 [pdf, other]

SP$^2$T: Sparse Proxy Attention for Dual-stream Point Transformer

Authors: Jiaxu Wan, Hong Zhang, Ziqi He, Qishu Wang, Ding Yuan, Yifan Yang

Abstract: In 3D understanding, point transformers have yielded significant advances in broadening the receptive field. However, further enhancement of the receptive field is hindered by the constraints of grouping attention. The proxy-based model, as a hot topic in image and language feature extraction, uses global or local proxies to expand the model's receptive field. But global proxy-based methods fail t… ▽ More In 3D understanding, point transformers have yielded significant advances in broadening the receptive field. However, further enhancement of the receptive field is hindered by the constraints of grouping attention. The proxy-based model, as a hot topic in image and language feature extraction, uses global or local proxies to expand the model's receptive field. But global proxy-based methods fail to precisely determine proxy positions and are not suited for tasks like segmentation and detection in the point cloud, and exist local proxy-based methods for image face difficulties in global-local balance, proxy sampling in various point clouds, and parallel cross-attention computation for sparse association. In this paper, we present SP$^2$T, a local proxy-based dual stream point transformer, which promotes global receptive field while maintaining a balance between local and global information. To tackle robust 3D proxy sampling, we propose a spatial-wise proxy sampling with vertex-based point proxy associations, ensuring robust point-cloud sampling in many scales of point cloud. To resolve economical association computation, we introduce sparse proxy attention combined with table-based relative bias, which enables low-cost and precise interactions between proxy and point features. Comprehensive experiments across multiple datasets reveal that our model achieves SOTA performance in downstream tasks. The code has been released in https://github.com/TerenceWallel/Sparse-Proxy-Point-Transformer . △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: 13 pages, 14 figures, 14 tables

arXiv:2412.11284 [pdf, other]

Learning Normal Flow Directly From Event Neighborhoods

Authors: Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller

Abstract: Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more relia… ▽ More Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios. △ Less

Submitted 15 December, 2024; originally announced December 2024.

arXiv:2412.10347 [pdf, other]

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Authors: Yuchen Ren, Wenwei Han, Qianyuan Zhang, Yining Tang, Weiqiang Bai, Yuchen Cai, Lifeng Qiao, Hao Jiang, Dong Yuan, Tao Chen, Siqi Sun, Pan Tan, Wanli Ouyang, Nanqing Dong, Xinzhu Ma, Peng Ye

Abstract: As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large langua… ▽ More As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis. △ Less

Submitted 13 December, 2024; originally announced December 2024.

arXiv:2412.10182 [pdf, other]

doi 10.1109/TPAMI.2024.3522298

Multi-Head Encoding for Extreme Label Classification

Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan, Minggao Zhang

Abstract: The number of categories of instances in the real world is normally huge, and each instance may contain multiple labels. To distinguish these massive labels utilizing machine learning, eXtreme Label Classification (XLC) has been established. However, as the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Classifier C… ▽ More The number of categories of instances in the real world is normally huge, and each instance may contain multiple labels. To distinguish these massive labels utilizing machine learning, eXtreme Label Classification (XLC) has been established. However, as the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Classifier Computational Overload Problem (CCOP). To address this, we propose a Multi-Head Encoding (MHE) mechanism, which replaces the vanilla classifier with a multi-head classifier. During the training process, MHE decomposes extreme labels into the product of multiple short local labels, with each head trained on these local labels. During testing, the predicted labels can be directly calculated from the local predictions of each head. This reduces the computational load geometrically. Then, according to the characteristics of different XLC tasks, e.g., single-label, multi-label, and model pretraining tasks, three MHE-based implementations, i.e., Multi-Head Product, Multi-Head Cascade, and Multi-Head Sampling, are proposed to more effectively cope with CCOP. Moreover, we theoretically demonstrate that MHE can achieve performance approximately equivalent to that of the vanilla classifier by generalizing the low-rank approximation problem from Frobenius-norm to Cross-Entropy. Experimental results show that the proposed methods achieve state-of-the-art performance while significantly streamlining the training and inference processes of XLC tasks. The source code has been made public at https://github.com/Anoise/MHE. △ Less

Submitted 13 December, 2024; originally announced December 2024.

Comments: 20 pages, 12 figs, Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2024

arXiv:2412.07761 [pdf, other]

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Authors: Jingxi Chen, Brandon Y. Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christopher A. Metzler, Yiannis Aloimonos

Abstract: Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion gui… ▽ More Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches. △ Less

Submitted 25 March, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

Comments: Accepted to CVPR 2025

arXiv:2412.00108 [pdf, other]

Act Now: A Novel Online Forecasting Framework for Large-Scale Streaming Data

Authors: Daojun Liang, Haixia Zhang, Jing Wang, Dongfeng Yuan, Minggao Zhang

Abstract: In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation… ▽ More In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation set cuts off the model's continued learning. 4) Existing GPU devices cannot support online learning of large-scale streaming data. To address the above issues, we propose a novel online learning framework, Act-Now, to improve the online prediction on large-scale streaming data. Firstly, we introduce a Random Subgraph Sampling (RSS) algorithm designed to enable efficient model training. Then, we design a Fast Stream Buffer (FSB) and a Slow Stream Buffer (SSB) to update the model online. FSB updates the model immediately with the consistent pseudo- and partial labels to avoid information leakage. SSB updates the model in parallel using complete labels from earlier times. Further, to address concept drift, we propose a Label Decomposition model (Lade) with statistical and normalization flows. Lade forecasts both the statistical variations and the normalized future values of the data, integrating them through a combiner to produce the final predictions. Finally, we propose to perform online updates on the validation set to ensure the consistency of model learning on streaming data. Extensive experiments demonstrate that the proposed Act-Now framework performs well on large-scale streaming data, with an average 28.4% and 19.5% performance improvement, respectively. Experiments can be reproduced via https://github.com/Anoise/Act-Now. △ Less

Submitted 27 November, 2024; originally announced December 2024.

Comments: 12 pages, 8 figures

arXiv:2411.08043 [pdf, other]

Graph-GIC: A Smart and Parallelized Geomagnetically Induced Current Modelling Algorithm Based on Graph Theory for Space Weather Applications

Authors: Wen Chen, Ding Yuan, Xueshang Feng, Stefaan Poedts, Zhengyang Zou, Song Feng, Yuxuan Zhu, Tong Yin

Abstract: Geomagnetically Induced Current (GIC) refers to the electromagnetic response of the Earth and its conductive modern infrastructures to space weather and would pose a significant threat to high-voltage power grids designed for the alternative current operation. To assess the impact of space weather on the power grid, one needs to calculate the GIC on a national or continental scale. In this study,… ▽ More Geomagnetically Induced Current (GIC) refers to the electromagnetic response of the Earth and its conductive modern infrastructures to space weather and would pose a significant threat to high-voltage power grids designed for the alternative current operation. To assess the impact of space weather on the power grid, one needs to calculate the GIC on a national or continental scale. In this study, we developed a smart and parallelized GIC modelling algorithm, Graph GIC. This algorithm deploys a graph representing a power grid in a single-line diagram, in which substations/transformers act as nodes and transmission lines as edges. With these denotations, a power grid and its electric parameters are mathematically represented with an adjacency matrix and an admittance matrix. We used sparse matrix and parallelisation techniques to expedite the intensive computation in cases of large-scale power grids. The Graph GIC was validated with a benchmark grid, applied to the GIC calculation of the 500 kV power grid of Guangdong, China, and conducted preliminary analysis on the grid's susceptibility to geomagnetic storms. The Graph GIC algorithm has the advantage of an intuitive and highly scalable graph representation of a power grid at any scale. It achieves high-accuracy calculation and a speedup of about 18 times after parallelisation. This algorithm could be applied to assess the impact of space weather on a power grid up to continental scales and could be incorporated into global space weather modelling frameworks. △ Less

Submitted 29 October, 2024; originally announced November 2024.

Comments: 19 pages, 10 figures

arXiv:2411.04136 [pdf, other]

Large Language Models for Wireless Networks: An Overview from the Prompt Engineering Perspective

Authors: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu, Xi Chen, Hina Tabassum, Xue Liu

Abstract: Recently, large language models (LLMs) have been successfully applied to many fields, showing outstanding comprehension and reasoning capabilities. Despite their great potential, LLMs usually require dedicated pre-training and fine-tuning for domain-specific applications such as wireless networks. These adaptations can be extremely demanding for computational resources and datasets, while most net… ▽ More Recently, large language models (LLMs) have been successfully applied to many fields, showing outstanding comprehension and reasoning capabilities. Despite their great potential, LLMs usually require dedicated pre-training and fine-tuning for domain-specific applications such as wireless networks. These adaptations can be extremely demanding for computational resources and datasets, while most network devices have limited computation power, and there are a limited number of high-quality networking datasets. To this end, this work explores LLM-enabled wireless networks from the prompt engineering perspective, i.e., designing prompts to guide LLMs to generate desired output without updating LLM parameters. Compared with other LLM-driven methods, prompt engineering can better align with the demands of wireless network devices, e.g., higher deployment flexibility, rapid response time, and lower requirements on computation power. In particular, this work first introduces LLM fundamentals and compares different prompting techniques such as in-context learning, chain-of-thought, and self-refinement. Then we propose two novel prompting schemes for network applications: iterative prompting for network optimization, and self-refined prompting for network prediction. The case studies show that the proposed schemes can achieve comparable performance as conventional machine learning techniques, and our proposed prompting-based methods avoid the complexity of dedicated model training and fine-tuning, which is one of the key bottlenecks of existing machine learning techniques. △ Less

Submitted 27 December, 2024; v1 submitted 26 October, 2024; originally announced November 2024.

arXiv:2411.02180 [pdf, other]

Generation of fast magnetoacoustic waves in the corona by impulsive bursty reconnection

Authors: Sripan Mondal, A. K. Srivastava, David I. Pontin, Eric R. Priest, R. Kwon, Ding Yuan

Abstract: Fast-mode magnetohydrodynamic (MHD) waves in the solar corona are often known to be produced by solar flares and eruptive prominences. We here simulate the effect of the interaction of an external perturbation on a magnetic null in the solar corona which results in the formation of a current sheet (CS). Once the CS undergoes a sufficient extension in its length and squeezing of its width, it may g… ▽ More Fast-mode magnetohydrodynamic (MHD) waves in the solar corona are often known to be produced by solar flares and eruptive prominences. We here simulate the effect of the interaction of an external perturbation on a magnetic null in the solar corona which results in the formation of a current sheet (CS). Once the CS undergoes a sufficient extension in its length and squeezing of its width, it may go unstable to the formation of multiple impulsive plasmoids. Eventually, the plasmoids merge with one another to form larger plasmoids and/or are expelled from the sheet. The formation, motion and coalescence of plasmoids with each other and with magnetic Y-points at the outer periphery of the extended CS are found to generate wave-like perturbations. An analysis of the resultant quasi-periodic variations of pressure, density, velocity and magnetic field at certain locations in the model corona indicate that these waves are predominantly fast-mode magnetoacoustic waves. For typical coronal parameters, the resultant propagating waves carry an energy flux of $\mathrm{10^{5}~\mathrm{erg~cm^{-2}~s^{-1}}}$ to a large distance of at least 60 Mm away from the current sheet. In general, we suggest that both waves and reconnection play a role in heating the solar atmosphere and driving the solar wind and may interact with one another in a manner that we refer to as a $"$Symbiosis of WAves and Reconnection (SWAR)$"$. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: 24 pages, 13 figures, Accepted for publication in The Astrophysical Journal

arXiv:2411.01915 [pdf, other]

RoboCrowd: Scaling Robot Data Collection through Crowdsourcing

Authors: Suvir Mirchandani, David D. Yuan, Kaylee Burns, Md Sazzad Islam, Tony Z. Zhao, Chelsea Finn, Dorsa Sadigh

Abstract: In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizin… ▽ More In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizing crowdsourcing principles and incentive design. RoboCrowd helps enable scalable data collection and facilitates more efficient learning of robot policies. We build RoboCrowd on top of ALOHA (Zhao et al. 2023) -- a bimanual platform that supports data collection via puppeteering -- to explore the design space for crowdsourcing in-person demonstrations in a public environment. We propose three classes of incentive mechanisms to appeal to users' varying sources of motivation for interacting with the system: material rewards, intrinsic interest, and social comparison. We instantiate these incentives through tasks that include physical rewards, engaging or challenging manipulations, as well as gamification elements such as a leaderboard. We conduct a large-scale, two-week field experiment in which the platform is situated in a university cafe. We observe significant engagement with the system -- over 200 individuals independently volunteered to provide a total of over 800 interaction episodes. Our findings validate the proposed incentives as mechanisms for shaping users' data quantity and quality. Further, we demonstrate that the crowdsourced data can serve as useful pre-training data for policies fine-tuned on expert demonstrations -- boosting performance up to 20% compared to when this data is not available. These results suggest the potential for RoboCrowd to reduce the burden of robot data collection by carefully implementing crowdsourcing and incentive design principles. △ Less

Submitted 21 May, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

Comments: 21 pages, 25 figures. International Conference on Robotics and Automation (ICRA) 2025

arXiv:2410.23752 [pdf, other]

A Peaceman-Rachford Splitting Approach with Deep Equilibrium Network for Channel Estimation

Authors: Dingli Yuan, Shitong Wu, Haoran Tang, Lu Yang, Chenghui Peng

Abstract: Multiple-input multiple-output (MIMO) is pivotal for wireless systems, yet its high-dimensional, stochastic channel poses significant challenges for accurate estimation, highlighting the critical need for robust estimation techniques. In this paper, we introduce a novel channel estimation method for the MIMO system. The main idea is to construct a fixed-point equation for channel estimation, which… ▽ More Multiple-input multiple-output (MIMO) is pivotal for wireless systems, yet its high-dimensional, stochastic channel poses significant challenges for accurate estimation, highlighting the critical need for robust estimation techniques. In this paper, we introduce a novel channel estimation method for the MIMO system. The main idea is to construct a fixed-point equation for channel estimation, which can be implemented into the deep equilibrium (DEQ) model with a fixed network. Specifically, the Peaceman-Rachford (PR) splitting method is applied to the dual form of the regularized minimization problem to construct fixed-point equation with non-expansive property. Then, the fixed-point equation is implemented into the DEQ model with a fixed layer, leveraging its advantage of the low training complexity. Moreover, we provide a rigorous theoretical analysis, demonstrating the convergence and optimality of our approach. Additionally, simulations of hybrid far- and near-field channels demonstrate that our approach yields favorable results, indicating its ability to advance channel estimation in MIMO system. △ Less

Submitted 7 January, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

arXiv:2410.16947 [pdf, ps, other]

ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images

Authors: Nabil Jabareen, Dongsheng Yuan, Sören Lukassen

Abstract: This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across mult… ▽ More This paper demonstrates that spatial information can be used to learn interpretable representations in medical images using Self-Supervised Learning (SSL). Our proposed method, ISImed, is based on the observation that medical images exhibit a much lower variability among different images compared to classic data vision benchmarks. By leveraging this resemblance of human body structures across multiple images, we establish a self-supervised objective that creates a latent representation capable of capturing its location in the physical realm. More specifically, our method involves sampling image crops and creating a distance matrix that compares the learned representation vectors of all possible combinations of these crops to the true distance between them. The intuition is, that the learned latent space is a positional encoding for a given image crop. We hypothesize, that by learning these positional encodings, comprehensive image representations have to be generated. To test this hypothesis and evaluate our method, we compare our learned representation with two state-of-the-art SSL benchmarking methods on two publicly available medical imaging datasets. We show that our method can efficiently learn representations that capture the underlying structure of the data and can be used to transfer to a downstream classification task. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 11 pages, 4 figures

arXiv:2410.15455 [pdf, other]

Observation of quantum information collapse-and-revival in a strongly-interacting Rydberg atom array

Authors: De-Sheng Xiang, Yao-Wen Zhang, Hao-Xiang Liu, Peng Zhou, Dong Yuan, Kuan Zhang, Shun-Yao Zhang, Biao Xu, Lu Liu, Yitong Li, Lin Li

Abstract: Interactions of isolated quantum many-body systems typically scramble local information into the entire system and make it unrecoverable. Ergodicity-breaking systems possess the potential to exhibit fundamentally different information scrambling dynamics beyond this paradigm. For many-body localized systems with strong ergodicity breaking, local transport vanishes and information scrambles logarit… ▽ More Interactions of isolated quantum many-body systems typically scramble local information into the entire system and make it unrecoverable. Ergodicity-breaking systems possess the potential to exhibit fundamentally different information scrambling dynamics beyond this paradigm. For many-body localized systems with strong ergodicity breaking, local transport vanishes and information scrambles logarithmically slowly. Whereas in Rydberg atom arrays, local qubit flips induce dynamical retardation on surrounding qubits through the Rydberg blockade effect, giving rise to quantum many-body scars that weakly break ergodicity, and resulting in the predicted unconventional quantum information spreading behaviours. Here, we present the first measurements of out-of-time-ordered correlators and Holevo information in a Rydberg atom array, enabling us to precisely track quantum information scrambling and transport dynamics. By leveraging these tools, we observe a novel spatio-temporal collapse-and-revival behaviour of quantum information, which differs from both typical chaotic and many-body localized systems. Our experiment sheds light on the unique information dynamics in many-body systems with kinetic constraints, and demonstrates an effective digital-analogue approach to coherently reverse time evolution and steer information propagation in near-term quantum devices. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: 12 pages, 6 figures + Supplementary Information 37 pages, 24 figures

arXiv:2410.14741 [pdf, other]

CAKD: A Correlation-Aware Knowledge Distillation Framework Based on Decoupling Kullback-Leibler Divergence

Authors: Zao Zhang, Huaming Chen, Pei Ning, Nan Yang, Dong Yuan

Abstract: In knowledge distillation, a primary focus has been on transforming and balancing multiple distillation components. In this work, we emphasize the importance of thoroughly examining each distillation component, as we observe that not all elements are equally crucial. From this perspective,we decouple the Kullback-Leibler (KL) divergence into three unique elements: Binary Classification Divergence… ▽ More In knowledge distillation, a primary focus has been on transforming and balancing multiple distillation components. In this work, we emphasize the importance of thoroughly examining each distillation component, as we observe that not all elements are equally crucial. From this perspective,we decouple the Kullback-Leibler (KL) divergence into three unique elements: Binary Classification Divergence (BCD), Strong Correlation Divergence (SCD), and Weak Correlation Divergence (WCD). Each of these elements presents varying degrees of influence. Leveraging these insights, we present the Correlation-Aware Knowledge Distillation (CAKD) framework. CAKD is designed to prioritize the facets of the distillation components that have the most substantial influence on predictions, thereby optimizing knowledge transfer from teacher to student models. Our experiments demonstrate that adjusting the effect of each element enhances the effectiveness of knowledge transformation. Furthermore, evidence shows that our novel CAKD framework consistently outperforms the baseline across diverse models and datasets. Our work further highlights the importance and effectiveness of closely examining the impact of different parts of distillation process. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Report number: DM741

Journal ref: IEEE International Conference on Data Mining 2024

arXiv:2410.10366 [pdf, other]

Affinity-Graph-Guided Contractive Learning for Pretext-Free Medical Image Segmentation with Minimal Annotation

Authors: Zehua Cheng, Di Yuan, Thomas Lukasiewicz

Abstract: The combination of semi-supervised learning (SemiSL) and contrastive learning (CL) has been successful in medical image segmentation with limited annotations. However, these works often rely on pretext tasks that lack the specificity required for pixel-level segmentation, and still face overfitting issues due to insufficient supervision signals resulting from too few annotations. Therefore, this p… ▽ More The combination of semi-supervised learning (SemiSL) and contrastive learning (CL) has been successful in medical image segmentation with limited annotations. However, these works often rely on pretext tasks that lack the specificity required for pixel-level segmentation, and still face overfitting issues due to insufficient supervision signals resulting from too few annotations. Therefore, this paper proposes an affinity-graph-guided semi-supervised contrastive learning framework (Semi-AGCL) by establishing additional affinity-graph-based supervision signals between the student and teacher network, to achieve medical image segmentation with minimal annotations without pretext. The framework first designs an average-patch-entropy-driven inter-patch sampling method, which can provide a robust initial feature space without relying on pretext tasks. Furthermore, the framework designs an affinity-graph-guided loss function, which can improve the quality of the learned representation and the model generalization ability by exploiting the inherent structure of the data, thus mitigating overfitting. Our experiments indicate that with merely 10% of the complete annotation set, our model approaches the accuracy of the fully annotated baseline, manifesting a marginal deviation of only 2.52%. Under the stringent conditions where only 5% of the annotations are employed, our model exhibits a significant enhancement in performance surpassing the second best baseline by 23.09% on the dice metric and achieving an improvement of 26.57% on the notably arduous CRAG and ACDC datasets. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: BIBM 2024

arXiv:2410.08799 [pdf, ps, other]

Online Learning for Intelligent Thermal Management of Interference-coupled and Passively Cooled Base Stations

Authors: Zhanwei Yu, Yi Zhao, Xiaoli Chu, Di Yuan

Abstract: Passively cooled base stations (PCBSs) have emerged to deliver better cost and energy efficiency. However, passive cooling necessitates intelligent thermal control via traffic management, i.e., the instantaneous data traffic or throughput of a PCBS directly impacts its thermal performance. This is particularly challenging for outdoor deployment of PCBSs because the heat dissipation efficiency is u… ▽ More Passively cooled base stations (PCBSs) have emerged to deliver better cost and energy efficiency. However, passive cooling necessitates intelligent thermal control via traffic management, i.e., the instantaneous data traffic or throughput of a PCBS directly impacts its thermal performance. This is particularly challenging for outdoor deployment of PCBSs because the heat dissipation efficiency is uncertain and fluctuates over time. What is more, the PCBSs are interference-coupled in multi-cell scenarios. Thus, a higher-throughput PCBS leads to higher interference to the other PCBSs, which, in turn, would require more resource consumption to meet their respective throughput targets. In this paper, we address online decision-making for maximizing the total downlink throughput for a multi-PCBS system subject to constraints related on operating temperature. We demonstrate that a reinforcement learning (RL) approach, specifically soft actor-critic (SAC), can successfully perform throughput maximization while keeping the PCBSs cool, by adapting the throughput to time-varying heat dissipation conditions. Furthermore, we design a denial and reward mechanism that effectively mitigates the risk of overheating during the exploration phase of RL. Simulation results show that our approach achieves up to 88.6% of the global optimum. This is very promising, as our approach operates without prior knowledge of future heat dissipation efficiency, which is required by the global optimum. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.06884 [pdf, ps, other]

Adaptive Refinement Protocols for Distributed Distribution Estimation under $\ell^p$-Losses

Authors: Deheng Yuan, Tao Guo, Zhongyi Huang

Abstract: Consider the communication-constrained estimation of discrete distributions under $\ell^p$ losses, where each distributed terminal holds multiple independent samples and uses limited number of bits to describe the samples. We obtain the minimax optimal rates of the problem in most parameter regimes. An elbow effect of the optimal rates at $p=2$ is clearly identified. To show the optimal rates, we… ▽ More Consider the communication-constrained estimation of discrete distributions under $\ell^p$ losses, where each distributed terminal holds multiple independent samples and uses limited number of bits to describe the samples. We obtain the minimax optimal rates of the problem in most parameter regimes. An elbow effect of the optimal rates at $p=2$ is clearly identified. To show the optimal rates, we first design estimation protocols to achieve them. The key ingredient of these protocols is to introduce adaptive refinement mechanisms, which first generate rough estimate by partial information and then establish refined estimate in subsequent steps guided by the rough estimate. The protocols leverage successive refinement, sample compression, thresholding and random hashing methods to achieve the optimal rates in different parameter regimes. The optimality of the protocols is shown by deriving compatible minimax lower bounds. △ Less

Submitted 8 November, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2409.15505 [pdf, other]

Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs

Authors: Angelos Mavrogiannis, Dehao Yuan, Yiannis Aloimonos

Abstract: There has been a lot of interest in grounding natural language to physical entities through visual context. While Vision Language Models (VLMs) can ground linguistic instructions to visual sensory information, they struggle with grounding non-visual attributes, like the weight of an object. Our key insight is that non-visual attribute detection can be effectively achieved by active perception guid… ▽ More There has been a lot of interest in grounding natural language to physical entities through visual context. While Vision Language Models (VLMs) can ground linguistic instructions to visual sensory information, they struggle with grounding non-visual attributes, like the weight of an object. Our key insight is that non-visual attribute detection can be effectively achieved by active perception guided by visual reasoning. To this end, we present a perception-action API that consists of VLMs and Large Language Models (LLMs) as backbones, together with a set of robot control functions. When prompted with this API and a natural language query, an LLM generates a program to actively identify attributes given an input image. Offline testing on the Odd-One-Out dataset demonstrates that our framework outperforms vanilla VLMs in detecting attributes like relative object location, size, and weight. Online testing in realistic household scenes on AI2-THOR and a real robot demonstration on a DJI RoboMaster EP robot highlight the efficacy of our approach. △ Less

Submitted 6 March, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

Comments: ICRA 2025

arXiv:2409.00002 [pdf, ps, other]

Distributed Optimization by Network Flows with Spatio-Temporal Compression

Authors: Zihao Ren, Lei Wang, Xinlei Yi, Xi Wang, Deming Yuan, Tao Yang, Zhengguang Wu, Guodong Shi

Abstract: Several data compressors have been proposed in distributed optimization frameworks of network systems to reduce communication overhead in large-scale applications. In this paper, we demonstrate that effective information compression may occur over time or space during sequences of node communications in distributed algorithms, leading to the concept of spatio-temporal compressors. This abstraction… ▽ More Several data compressors have been proposed in distributed optimization frameworks of network systems to reduce communication overhead in large-scale applications. In this paper, we demonstrate that effective information compression may occur over time or space during sequences of node communications in distributed algorithms, leading to the concept of spatio-temporal compressors. This abstraction classifies existing compressors as spatio-temporal compressors, with their effectiveness described by constructive stability criteria from nonlinear system theory. Subsequently, we apply these spatio-temporal compressors to standard continuous-time consensus flows and distributed prime-dual flows, establishing conditions ensuring convergence. Additionally, we introduce a novel observer-based distributed primal-dual continuous flow integrated with spatio-temporal compressors, which provides broader convergence conditions. These continuous flows achieve exponential convergence to the global optimum when the objective function is strongly convex and can be discretized using Euler approximations. Finally, numerical simulations illustrate the versatility of the proposed spatio-temporal compressors and verify the convergence of algorithms. △ Less

Submitted 5 March, 2025; v1 submitted 14 August, 2024; originally announced September 2024.

Comments: arXiv admin note: text overlap with arXiv:2408.02332

arXiv:2408.15569 [pdf, other]

Temporal Attention for Cross-View Sequential Image Localization

Authors: Dong Yuan, Frederic Maire, Feras Dayoub

Abstract: This paper introduces a novel approach to enhancing cross-view localization, focusing on the fine-grained, sequential localization of street-view images within a single known satellite image patch, a significant departure from traditional one-to-one image retrieval methods. By expanding to sequential image fine-grained localization, our model, equipped with a novel Temporal Attention Module (TAM),… ▽ More This paper introduces a novel approach to enhancing cross-view localization, focusing on the fine-grained, sequential localization of street-view images within a single known satellite image patch, a significant departure from traditional one-to-one image retrieval methods. By expanding to sequential image fine-grained localization, our model, equipped with a novel Temporal Attention Module (TAM), leverages contextual information to significantly improve sequential image localization accuracy. Our method shows substantial reductions in both mean and median localization errors on the Cross-View Image Sequence (CVIS) dataset, outperforming current state-of-the-art single-image localization techniques. Additionally, by adapting the KITTI-CVL dataset into sequential image sets, we not only offer a more realistic dataset for future research but also demonstrate our model's robust generalization capabilities across varying times and areas, evidenced by a 75.3% reduction in mean distance error in cross-view sequential image localization. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Accepted to IROS 2024

arXiv:2408.15496 [pdf, other]

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Authors: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao

Abstract: While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mam… ▽ More While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models. △ Less

Submitted 1 January, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.12086 [pdf, other]

Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

Authors: Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, Yifan Yang

Abstract: In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framewor… ▽ More In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framework for the evaluation of camouflage designs. To support this analysis, we have compiled the first dataset comprising descriptions of camouflaged objects and their attribute contributions, termed COD-Text And X-attributions (COD-TAX). Moreover, drawing inspiration from the hierarchical process by which humans process information: from high-level textual descriptions of overarching scenarios, through mid-level summaries of local areas, to low-level pixel data for detailed analysis. We have developed a robust framework that combines textual and visual information for the task of COS, named Attribution CUe Modeling with Eye-fixation Network (ACUMEN). ACUMEN demonstrates superior performance, outperforming nine leading methods across three widely-used datasets. We conclude by highlighting key insights derived from the attributes identified in our study. Code: https://github.com/lyu-yx/ACUMEN. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV 2024

arXiv:2408.02549 [pdf, other]

Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning

Authors: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu, Xue Liu, Zhu Han, Charlie Zhang

Abstract: Generative artificial intelligence (GAI) is a promising technique towards 6G networks, and generative foundation models such as large language models (LLMs) have attracted considerable interest from academia and telecom industry. This work considers a novel edge-cloud deployment of foundation models in 6G networks. Specifically, it aims to minimize the service delay of foundation models by radio r… ▽ More Generative artificial intelligence (GAI) is a promising technique towards 6G networks, and generative foundation models such as large language models (LLMs) have attracted considerable interest from academia and telecom industry. This work considers a novel edge-cloud deployment of foundation models in 6G networks. Specifically, it aims to minimize the service delay of foundation models by radio resource allocation and task offloading, i.e., offloading diverse content generation tasks to proper LLMs at the network edge or cloud. In particular, we first introduce the communication system model, i.e., allocating radio resources and calculating link capacity to support generated content transmission, and then we present the LLM inference model to calculate the delay of content generation. After that, we propose a novel in-context learning method to optimize the task offloading decisions. It utilizes LLM's inference capabilities, and avoids the difficulty of dedicated model training or fine-tuning as in conventional machine learning algorithms. Finally, the simulations demonstrate that the proposed edge-cloud deployment and in-context learning task offloading method can achieve satisfactory generation service quality without dedicated model training or fine-tuning. △ Less

Submitted 21 March, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by IEEE Wireless Communications Letters

Showing 1–50 of 315 results for author: Yuan, D