Search | arXiv e-print repository

Fluid Antenna System-Assisted Self-Interference Cancellation for In-Band Full Duplex Communications

Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Yiyan Wu, Sai Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong

Abstract: In-band full-duplex (IBFD) systems are expected to double the spectral efficiency compared to half-duplex systems, provided that loopback self-interference (SI) can be effectively suppressed. The inherent interference mitigation capabilities of the emerging fluid antenna system (FAS) technology make it a promising candidate for addressing the SI challenge in IBFD systems. This paper thus proposes… ▽ More In-band full-duplex (IBFD) systems are expected to double the spectral efficiency compared to half-duplex systems, provided that loopback self-interference (SI) can be effectively suppressed. The inherent interference mitigation capabilities of the emerging fluid antenna system (FAS) technology make it a promising candidate for addressing the SI challenge in IBFD systems. This paper thus proposes a FAS-assisted self-interference cancellation (SIC) framework, which leverages a receiver-side FAS to dynamically select an interference-free port. Analytical results include a lower bound and an approximation of the residual SI (RSI) power, both derived for rich-scattering channels by considering the joint spatial correlation amongst the FAS ports. Simulations of RSI power and forward link rates validate the analysis, showing that the SIC performance improves with the number of FAS ports. Additionally, simulations under practical conditions, such as finite-scattering environments and wideband integrated access and backhaul (IAB) channels, reveal that the proposed approach offers superior SIC capability and significant forward rate gains over conventional IBFD SIC schemes. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2506.02197 [pdf, ps, other]

NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution

Authors: Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Zhang, Xin Luo, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang , et al. (14 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and… ▽ More This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and noise degradations, (ii) upscale RAW Bayer images by 2x, considering unknown noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. This report presents the current state-of-the-art in RAW Restoration. △ Less

Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

arXiv:2506.00045 [pdf, other]

ACE-Step: A Step Towards Music Generation Foundation Model

Authors: Junmin Gong, Sean Zhao, Sen Wang, Shengyuan Xu, Joe Guo

Abstract: We introduce ACE-Step, a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. For example, LLM-based models (e.g. Yue, SongGen) excel at lyric alignment… ▽ More We introduce ACE-Step, a novel open-source foundation model for music generation that overcomes key limitations of existing approaches and achieves state-of-the-art performance through a holistic architectural design. Current methods face inherent trade-offs between generation speed, musical coherence, and controllability. For example, LLM-based models (e.g. Yue, SongGen) excel at lyric alignment but suffer from slow inference and structural artifacts. Diffusion models (e.g. DiffRhythm), on the other hand, enable faster synthesis but often lack long-range structural coherence. ACE-Step bridges this gap by integrating diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer. It also leverages MERT and m-hubert to align semantic representations (REPA) during training, allowing rapid convergence. As a result, our model synthesizes up to 4 minutes of music in just 20 seconds on an A100 GPU-15x faster than LLM-based baselines-while achieving superior musical coherence and lyric alignment across melody, harmony, and rhythm metrics. Moreover, ACE-Step preserves fine-grained acoustic details, enabling advanced control mechanisms such as voice cloning, lyric editing, remixing, and track generation (e.g. lyric2vocal, singing2accompaniment). Rather than building yet another end-to-end text-to-music pipeline, our vision is to establish a foundation model for music AI: a fast, general-purpose, efficient yet flexible architecture that makes it easy to train subtasks on top of it. This paves the way for the development of powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. In short, our goal is to build a stable diffusion moment for music. The code, the model weights and the demo are available at: https://ace-step.github.io/. △ Less

Submitted 28 May, 2025; originally announced June 2025.

Comments: 14 pages, 5 figures, ace-step's tech report

arXiv:2505.23408 [pdf, ps, other]

doi 10.1109/TMI.2025.3570226

Self-supervised feature learning for cardiac Cine MR image reconstruction

Authors: Siying Xu, Marcel Früh, Kerstin Hammernik, Andreas Lingg, Jens Kübler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Küstner

Abstract: We propose a self-supervised feature learning assisted reconstruction (SSFL-Recon) framework for MRI reconstruction to address the limitation of existing supervised learning methods. Although recent deep learning-based methods have shown promising performance in MRI reconstruction, most require fully-sampled images for supervised learning, which is challenging in practice considering long acquisit… ▽ More We propose a self-supervised feature learning assisted reconstruction (SSFL-Recon) framework for MRI reconstruction to address the limitation of existing supervised learning methods. Although recent deep learning-based methods have shown promising performance in MRI reconstruction, most require fully-sampled images for supervised learning, which is challenging in practice considering long acquisition times under respiratory or organ motion. Moreover, nearly all fully-sampled datasets are obtained from conventional reconstruction of mildly accelerated datasets, thus potentially biasing the achievable performance. The numerous undersampled datasets with different accelerations in clinical practice, hence, remain underutilized. To address these issues, we first train a self-supervised feature extractor on undersampled images to learn sampling-insensitive features. The pre-learned features are subsequently embedded in the self-supervised reconstruction network to assist in removing artifacts. Experiments were conducted retrospectively on an in-house 2D cardiac Cine dataset, including 91 cardiovascular patients and 38 healthy subjects. The results demonstrate that the proposed SSFL-Recon framework outperforms existing self-supervised MRI reconstruction methods and even exhibits comparable or better performance to supervised learning up to $16\times$ retrospective undersampling. The feature learning strategy can effectively extract global representations, which have proven beneficial in removing artifacts and increasing generalization ability during reconstruction. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: Accepted to IEEE Transactions on Medical Imaging (TMI), 2025

arXiv:2505.21928 [pdf]

Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

Authors: Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li , et al. (2 additional authors not shown)

Abstract: Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterati… ▽ More Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath is pretrained on over 353 million multi-scale images from 210,043 H&E-stained slides of GI diseases. It attains state-of-the-art performance on 33 out of 34 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. We further translate the intelligent screening module for early GI cancer and achieve near-perfect 99.70% sensitivity across nine independent medical institutions. This work not only advances AI-driven precision pathology for GI diseases but also bridge critical gaps in histopathological practice. △ Less

Submitted 6 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.21767 [pdf, ps, other]

Beyond 1D: Vision Transformers and Multichannel Signal Images for PPG-to-ECG Reconstruction

Authors: Xiaoyan Li, Shixin Xu, Faisal Habib, Arvind Gupta, Huaxiong Huang

Abstract: Reconstructing ECG from PPG is a promising yet challenging task. While recent advancements in generative models have significantly improved ECG reconstruction, accurately capturing fine-grained waveform features remains a key challenge. To address this, we propose a novel PPG-to-ECG reconstruction method that leverages a Vision Transformer (ViT) as the core network. Unlike conventional approaches… ▽ More Reconstructing ECG from PPG is a promising yet challenging task. While recent advancements in generative models have significantly improved ECG reconstruction, accurately capturing fine-grained waveform features remains a key challenge. To address this, we propose a novel PPG-to-ECG reconstruction method that leverages a Vision Transformer (ViT) as the core network. Unlike conventional approaches that rely on single-channel PPG, our method employs a four-channel signal image representation, incorporating the original PPG, its first-order difference, second-order difference, and area under the curve. This multi-channel design enriches feature extraction by preserving both temporal and physiological variations within the PPG. By leveraging the self-attention mechanism in ViT, our approach effectively captures both inter-beat and intra-beat dependencies, leading to more robust and accurate ECG reconstruction. Experimental results demonstrate that our method consistently outperforms existing 1D convolution-based approaches, achieving up to 29% reduction in PRD and 15% reduction in RMSE. The proposed approach also produces improvements in other evaluation metrics, highlighting its robustness and effectiveness in reconstructing ECG signals. Furthermore, to ensure a clinically relevant evaluation, we introduce new performance metrics, including QRS area error, PR interval error, RT interval error, and RT amplitude difference error. Our findings suggest that integrating a four-channel signal image representation with the self-attention mechanism of ViT enables more effective extraction of informative PPG features and improved modeling of beat-to-beat variations for PPG-to-ECG mapping. Beyond demonstrating the potential of PPG as a viable alternative for heart activity monitoring, our approach opens new avenues for cyclic signal analysis and prediction. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.20961 [pdf, ps, other]

Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization

Authors: Yiyuan Yang, Shitong Xu, Niki Trigoni, Andrew Markham

Abstract: Sound source localization (SSL) is a critical technology for determining the position of sound sources in complex environments. However, existing methods face challenges such as high computational costs and precise calibration requirements, limiting their deployment in dynamic or resource-constrained environments. This paper introduces a novel 3D SSL framework, which uses sparse cross-attention, p… ▽ More Sound source localization (SSL) is a critical technology for determining the position of sound sources in complex environments. However, existing methods face challenges such as high computational costs and precise calibration requirements, limiting their deployment in dynamic or resource-constrained environments. This paper introduces a novel 3D SSL framework, which uses sparse cross-attention, pretraining, and adaptive signal coherence metrics, to achieve accurate and computationally efficient localization with fewer input microphones. The framework is also fault-tolerant to unreliable or even unknown microphone position inputs, ensuring its applicability in real-world scenarios. Preliminary experiments demonstrate its scalability for multi-source localization without requiring additional hardware. This work advances SSL by balancing the model's performance and efficiency and improving its robustness for real-world scenarios. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Accepted by Interspeech 2025 Conference

arXiv:2505.19048 [pdf, other]

Movable-Element STARS-Assisted Near-Field Wideband Communications

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

Abstract: A novel movable-element simultaneously transmitting and reflecting surface (ME-STARS)-assisted near-field wideband communication framework is proposed. In particular, the position of each STARS element can be adjusted to combat the significant wideband beam squint issue in the near field instead of using costly true-time delay components. Four practical ME-STARS element movement modes are proposed… ▽ More A novel movable-element simultaneously transmitting and reflecting surface (ME-STARS)-assisted near-field wideband communication framework is proposed. In particular, the position of each STARS element can be adjusted to combat the significant wideband beam squint issue in the near field instead of using costly true-time delay components. Four practical ME-STARS element movement modes are proposed, namely region-based (RB), horizontal-based (HB), vertical-based (VB), and diagonal-based (DB) modes. Based on this, a near-field wideband multi-user downlink communication scenario is considered, where a sum rate maximization problem is formulated by jointly optimizing the base station (BS) precoding, ME-STARS beamforming, and element positions. To solve this intractable problem, a two-layer algorithm is developed. For the inner layer, the block coordinate descent optimization framework is utilized to solve the BS precoding and ME-STARS beamforming in an iterative manner. For the outer layer, the particle swarm optimization-based heuristic search method is employed to determine the desired element positions. Numerical results show that:1) the ME-STARSs can effectively address the beam squint for near-field wideband communications compared to conventional STARSs with fixed element positions; 2) the RB mode achieves the most efficient beam squint effect mitigation, while the DB mode achieves the best trade-off between performance gain and hardware overhead; and 3) an increase in the number of ME-STARS elements or BS subcarriers substantially improves the system performance. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.17568 [pdf, ps, other]

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

Authors: Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Zeren Luo, Jingyi Zheng, Wenhan Dong, Xinlei He, Xuechao Wang, Yingjie Xue, Shengmin Xu, Xinyi Huang

Abstract: Audio Language Models (ALMs) have made significant progress recently. These models integrate the audio modality directly into the model, rather than converting speech into text and inputting text to Large Language Models (LLMs). While jailbreak attacks on LLMs have been extensively studied, the security of ALMs with audio modalities remains largely unexplored. Currently, there is a lack of an adve… ▽ More Audio Language Models (ALMs) have made significant progress recently. These models integrate the audio modality directly into the model, rather than converting speech into text and inputting text to Large Language Models (LLMs). While jailbreak attacks on LLMs have been extensively studied, the security of ALMs with audio modalities remains largely unexplored. Currently, there is a lack of an adversarial audio dataset and a unified framework specifically designed to evaluate and compare attacks and ALMs. In this paper, we present JALMBench, the \textit{first} comprehensive benchmark to assess the safety of ALMs against jailbreak attacks. JALMBench includes a dataset containing 2,200 text samples and 51,381 audio samples with over 268 hours. It supports 12 mainstream ALMs, 4 text-transferred and 4 audio-originated attack methods, and 5 defense methods. Using JALMBench, we provide an in-depth analysis of attack efficiency, topic sensitivity, voice diversity, and attack representations. Additionally, we explore mitigation strategies for the attacks at both the prompt level and the response level. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.12089 [pdf, ps, other]

NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results

Authors: Sangmin Lee, Eunpil Park, Angel Canelo, Hyunhee Park, Youngjo Kim, Hyung-Ju Chun, Xin Jin, Chongyi Li, Chun-Le Guo, Radu Timofte, Qi Wu, Tianheng Qiu, Yuchun Dong, Shenglin Ding, Guanghua Pan, Weiyu Zhou, Tao Hu, Yixu Feng, Duwei Dai, Yu Cao, Peng Wu, Wei Dong, Yanning Zhang, Qingsen Yan, Simon J. Larsen , et al. (11 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effect… ▽ More This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effectively fusing these frames while adhering to strict efficiency constraints: fewer than 30 million model parameters and a computational budget under 4.0 trillion FLOPs. A total of 217 participants registered, with six teams finally submitting valid solutions. The top-performing approach achieved a PSNR of 43.22 dB, showcasing the potential of novel methods in this domain. This paper provides a comprehensive overview of the challenge, compares the proposed solutions, and serves as a valuable reference for researchers and practitioners in efficient burst HDR and restoration. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2504.13670 [pdf, other]

Pinching-Antenna Systems (PASS)-enabled Secure Wireless Communications

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Shibiao Xu, Yuanwei Liu, Naofal Al-Dhahir

Abstract: A novel pinching-antenna systems (PASS)-enabled secure wireless communication framework is proposed. By dynamically adjusting the positions of dielectric particles, namely pinching antennas (PAs), along the waveguides, PASS introduces a novel concept of pinching beamforming to enhance the performance of physical layer security. A fundamental PASS-enabled secure communication system is considered w… ▽ More A novel pinching-antenna systems (PASS)-enabled secure wireless communication framework is proposed. By dynamically adjusting the positions of dielectric particles, namely pinching antennas (PAs), along the waveguides, PASS introduces a novel concept of pinching beamforming to enhance the performance of physical layer security. A fundamental PASS-enabled secure communication system is considered with one legitimate user and one eavesdropper. Both single-waveguide and multiple-waveguide scenarios are studied. 1) For the single-waveguide scenario, the secrecy rate (SR) maximization is formulated to optimize the pinching beamforming. A PA-wise successive tuning (PAST) algorithm is proposed, which ensures constructive signal superposition at the legitimate user while inducing a destructive legitimate signal at the eavesdropper. 2) For the multiple-waveguide scenario, artificial noise (AN) is employed to further improve secrecy performance. A pair of practical transmission architectures are developed: waveguide division (WD) and waveguide multiplexing (WM). The key difference lies in whether each waveguide carries a single type of signal or a mixture of signals with baseband beamforming. For the SR maximization problem under the WD case, a two-stage algorithm is developed, where the pinching beamforming is designed with the PAST algorithm and the baseband power allocation among AN and legitimate signals is solved using successive convex approximation (SCA). For the WM case, an alternating optimization algorithm is developed, where the baseband beamforming is optimized with SCA and the pinching beamforming is designed employing particle swarm optimization. △ Less

Submitted 14 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.09248 [pdf, ps, other]

Asymptotic stabilization under homomorphic encryption: A re-encryption free method

Authors: Shuai Feng, Qian Ma, Junsoo Kim, Shengyuan Xu

Abstract: In this paper, we propose methods to encrypted a pre-given dynamic controller with homomorphic encryption, without re-encrypting the control inputs. We first present a preliminary result showing that the coefficients in a pre-given dynamic controller can be scaled up into integers by the zooming-in factor in dynamic quantization, without utilizing re-encryption. However, a sufficiently small zoomi… ▽ More In this paper, we propose methods to encrypted a pre-given dynamic controller with homomorphic encryption, without re-encrypting the control inputs. We first present a preliminary result showing that the coefficients in a pre-given dynamic controller can be scaled up into integers by the zooming-in factor in dynamic quantization, without utilizing re-encryption. However, a sufficiently small zooming-in factor may not always exist because it requires that the convergence speed of the pre-given closed-loop system should be sufficiently fast. Then, as the main result, we design a new controller approximating the pre-given dynamic controller, in which the zooming-in factor is decoupled from the convergence rate of the pre-given closed-loop system. Therefore, there always exist a (sufficiently small) zooming-in factor of dynamic quantization scaling up all the controller's coefficients to integers, and a finite modulus preventing overflow in cryptosystems. The process is asymptotically stable and the quantizer is not saturated. △ Less

Submitted 12 April, 2025; originally announced April 2025.

arXiv:2504.03769 [pdf, ps, other]

doi 10.1109/RadarConf2458775.2024.10548509

Optimal Sensor Placement Using Combinations of Hybrid Measurements for Source Localization

Authors: Kang Tang, Sheng Xu, Yuqi Yang, He Kong, Yongsheng Ma

Abstract: This paper focuses on static source localization employing different combinations of measurements, including time-difference-of-arrival (TDOA), received-signal-strength (RSS), angle-of-arrival (AOA), and time-of-arrival (TOA) measurements. Since sensor-source geometry significantly impacts localization accuracy, the strategies of optimal sensor placement are proposed systematically using combinati… ▽ More This paper focuses on static source localization employing different combinations of measurements, including time-difference-of-arrival (TDOA), received-signal-strength (RSS), angle-of-arrival (AOA), and time-of-arrival (TOA) measurements. Since sensor-source geometry significantly impacts localization accuracy, the strategies of optimal sensor placement are proposed systematically using combinations of hybrid measurements. Firstly, the relationship between sensor placement and source estimation accuracy is formulated by a derived Cramér-Rao bound (CRB). Secondly, the A-optimality criterion, i.e., minimizing the trace of the CRB, is selected to calculate the smallest reachable estimation mean-squared-error (MSE) in a unified manner. Thirdly, the optimal sensor placement strategies are developed to achieve the optimal estimation bound. Specifically, the specific constraints of the optimal geometries deduced by specific measurement, i.e., TDOA, AOA, RSS, and TOA, are found and discussed theoretically. Finally, the new findings are verified by simulation studies. △ Less

Submitted 9 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

Journal ref: IEEE Radar Conference 2024, Denver, CO, USA, pp. 1-6, 2024

arXiv:2503.13252 [pdf, other]

Digital Beamforming Enhanced Radar Odometry

Authors: Jingqi Jiang, Shida Xu, Kaicheng Zhang, Jiyuan Wei, Jingyang Wang, Sen Wang

Abstract: Radar has become an essential sensor for autonomous navigation, especially in challenging environments where camera and LiDAR sensors fail. 4D single-chip millimeter-wave radar systems, in particular, have drawn increasing attention thanks to their ability to provide spatial and Doppler information with low hardware cost and power consumption. However, most single-chip radar systems using traditio… ▽ More Radar has become an essential sensor for autonomous navigation, especially in challenging environments where camera and LiDAR sensors fail. 4D single-chip millimeter-wave radar systems, in particular, have drawn increasing attention thanks to their ability to provide spatial and Doppler information with low hardware cost and power consumption. However, most single-chip radar systems using traditional signal processing, such as Fast Fourier Transform, suffer from limited spatial resolution in radar detection, significantly limiting the performance of radar-based odometry and Simultaneous Localization and Mapping (SLAM) systems. In this paper, we develop a novel radar signal processing pipeline that integrates spatial domain beamforming techniques, and extend it to 3D Direction of Arrival estimation. Experiments using public datasets are conducted to evaluate and compare the performance of our proposed signal processing pipeline against traditional methodologies. These tests specifically focus on assessing structural precision across diverse scenes and measuring odometry accuracy in different radar odometry systems. This research demonstrates the feasibility of achieving more accurate radar odometry by simply replacing the standard FFT-based processing with the proposed pipeline. The codes are available at GitHub*. △ Less

Submitted 17 March, 2025; originally announced March 2025.

arXiv:2503.00941 [pdf, other]

C2S-AE: CSI to Sensing enabled by an Auto-Encoder-based Framework

Authors: Jun Jiang, Shugong Xu, Wenjun Yu, Yuan Gao

Abstract: Next-generation mobile networks are set to utilize integrated sensing and communication (ISAC) as a critical technology, providing significant support for sectors like the industrial Internet of Things (IIoT), extended reality (XR), and smart home applications. A key challenge in ISAC implementation is the extraction of sensing parameters from radio signals, a task that conventional methods strugg… ▽ More Next-generation mobile networks are set to utilize integrated sensing and communication (ISAC) as a critical technology, providing significant support for sectors like the industrial Internet of Things (IIoT), extended reality (XR), and smart home applications. A key challenge in ISAC implementation is the extraction of sensing parameters from radio signals, a task that conventional methods struggle to achieve due to the complexity of acquiring sensing channel data. In this paper, we introduce a novel auto-encoder (AE)-based framework to acquire sensing information using channel state information (CSI). Specifically, our framework, termed C2S (CSI to sensing)-AE, learns the relationship between CSI and the delay power spectrum (DPS), from which the range information can be readily accessed. To validate our framework's performance, we conducted measurements of DPS and CSI in real-world scenarios and introduced the dataset 'SHU7'. Our extensive experiments demonstrate that the framework excels in C2S extrapolation, surpassing existing methods in terms of accuracy for both delay and signal strength of individual paths. This innovative approach holds the potential to greatly enhance sensing capabilities in future mobile networks, paving the way for more robust and versatile ISAC applications. △ Less

Submitted 2 March, 2025; originally announced March 2025.

arXiv:2502.18766 [pdf, other]

MTCA: Multi-Task Channel Analysis for Wireless Communication

Authors: Jun Jiang, Wenjun Yu, Yuan Gao, Shugong Xu

Abstract: In modern wireless communication systems, the effective processing of Channel State Information (CSI) is crucial for enhancing communication quality and reliability. However, current methods often handle different tasks in isolation, thereby neglecting the synergies among various tasks and leading to extract CSI features inadequately for subsequent analysis. To address these limitations, this pape… ▽ More In modern wireless communication systems, the effective processing of Channel State Information (CSI) is crucial for enhancing communication quality and reliability. However, current methods often handle different tasks in isolation, thereby neglecting the synergies among various tasks and leading to extract CSI features inadequately for subsequent analysis. To address these limitations, this paper introduces a novel Multi-Task Channel Analysis framework named MTCA, aimed at improving the performance of wireless communication even sensing. MTCA is designed to handle four critical tasks, including channel prediction, antenna-domain channel extrapolation, channel identification, and scenario classification. Experiments conducted on a multi-scenario, multi-antenna dataset tailored for UAV-based communications demonstrate that the proposed MTCA exhibits superior comprehension of CSI, achieving enhanced performance across all evaluated tasks. Notably, MTCA reached 100% prediction accuracy in channel identification and scenario classification. Compared to the previous state-of-the-art methods, MTCA improved channel prediction performance by 20.1% and antenna-domain extrapolation performance by 54.5%. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.17536 [pdf, other]

CLEP-GAN: An Innovative Approach to Subject-Independent ECG Reconstruction from PPG Signals

Authors: Xiaoyan Li, Shixin Xu, Faisal Habib, Neda Aminnejad, Arvind Gupta, Huaxiong Huang

Abstract: This study addresses the challenge of reconstructing unseen ECG signals from PPG signals, a critical task for non-invasive cardiac monitoring. While numerous public ECG-PPG datasets are available, they lack the diversity seen in image datasets, and data collection processes often introduce noise, complicating ECG reconstruction from PPG even with advanced machine learning models. To tackle these c… ▽ More This study addresses the challenge of reconstructing unseen ECG signals from PPG signals, a critical task for non-invasive cardiac monitoring. While numerous public ECG-PPG datasets are available, they lack the diversity seen in image datasets, and data collection processes often introduce noise, complicating ECG reconstruction from PPG even with advanced machine learning models. To tackle these challenges, we first introduce a novel synthetic ECG-PPG data generation technique using an ODE model to enhance training diversity. Next, we develop a novel subject-independent PPG-to-ECG reconstruction model that integrates contrastive learning, adversarial learning, and attention gating, achieving results comparable to or even surpassing existing approaches for unseen ECG reconstruction. Finally, we examine factors such as sex and age that impact reconstruction accuracy, emphasizing the importance of considering demographic diversity during model training and dataset augmentation. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.16611 [pdf, ps, other]

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments

Authors: Shitong Xu, Yiyuan Yang, Niki Trigoni, Andrew Markham

Abstract: Target speaker extraction focuses on isolating a specific speaker's voice from an audio mixture containing multiple speakers. To provide information about the target speaker's identity, prior works have utilized clean audio samples as conditioning inputs. However, such clean audio examples are not always readily available. For instance, obtaining a clean recording of a stranger's voice at a cockta… ▽ More Target speaker extraction focuses on isolating a specific speaker's voice from an audio mixture containing multiple speakers. To provide information about the target speaker's identity, prior works have utilized clean audio samples as conditioning inputs. However, such clean audio examples are not always readily available. For instance, obtaining a clean recording of a stranger's voice at a cocktail party without leaving the noisy environment is generally infeasible. Limited prior research has explored extracting the target speaker's characteristics from noisy enrollments, which may contain overlapping speech from interfering speakers. In this work, we explore a novel enrollment strategy that encodes target speaker information from the noisy enrollment by comparing segments where the target speaker is talking (Positive Enrollments) with segments where the target speaker is silent (Negative Enrollments). Experiments show the effectiveness of our model architecture, which achieves over 2.1 dB higher SI-SNRi compared to prior works in extracting the monaural speech from the mixture of two speakers. Additionally, the proposed two-stage training strategy accelerates convergence, reducing the number of optimization steps required to reach 3 dB SNR by 60\%. Overall, our method achieves state-of-the-art performance in the monaural target speaker extraction conditioned on noisy enrollments. △ Less

Submitted 17 June, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

Comments: 11 pages, 6 figures

arXiv:2502.11965 [pdf, other]

A MIMO Wireless Channel Foundation Model via CIR-CSI Consistency

Authors: Jun Jiang, Wenjun Yu, Yunfan Li, Yuan Gao, Shugong Xu

Abstract: In the field of artificial intelligence, self-supervised learning has demonstrated superior generalization capabilities by leveraging large-scale unlabeled datasets for pretraining, which is especially critical for wireless communication models to adapt to a variety of scenarios. This paper innovatively treats Channel State Information (CSI) and Channel Impulse Response (CIR) as naturally aligned… ▽ More In the field of artificial intelligence, self-supervised learning has demonstrated superior generalization capabilities by leveraging large-scale unlabeled datasets for pretraining, which is especially critical for wireless communication models to adapt to a variety of scenarios. This paper innovatively treats Channel State Information (CSI) and Channel Impulse Response (CIR) as naturally aligned multi-modal data and proposes the first MIMO wireless channel foundation model, named CSI-CLIP. By effectively capturing the joint representations of both CIR and CSI, CSI-CLIP exhibits remarkable adaptability across scenarios and robust feature extraction capabilities. Experimental results show that in positioning task, CSI-CLIP reduces the mean error distance by 22%; in beam management task, it increases accuracy by 1% compared to traditional supervised methods, as well as in the channel identification task. These improvements not only highlight the potential and value of CSI-CLIP in integrating sensing and communication but also demonstrate its significant advantages over existing techniques. Moreover, viewing CSI and CIR as multi-modal pairs and contrastive learning for wireless channel foundation model open up new research directions in the domain of MIMO wireless communications. △ Less

Submitted 1 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

Comments: 6 pages, 2025 ICMLCN accepted

arXiv:2502.07230 [pdf, ps, other]

Physics-Informed Recurrent Network for State-Space Modeling of Gas Pipeline Networks

Authors: Siyuan Wang, Wenchuan Wu, Chenhui Lin, Qi Wang, Shuwei Xu, Binbin Chen

Abstract: As a part of the integrated energy system (IES), gas pipeline networks can provide additional flexibility to power systems through coordinated optimal dispatch. An accurate pipeline network model is critical for the optimal operation and control of IESs. However, inaccuracies or unavailability of accurate pipeline parameters often introduce errors in the state-space models of such networks. This p… ▽ More As a part of the integrated energy system (IES), gas pipeline networks can provide additional flexibility to power systems through coordinated optimal dispatch. An accurate pipeline network model is critical for the optimal operation and control of IESs. However, inaccuracies or unavailability of accurate pipeline parameters often introduce errors in the state-space models of such networks. This paper proposes a physics-informed recurrent network (PIRN) to identify the state-space model of gas pipelines. It fuses sparse measurement data with fluid-dynamic behavior expressed by partial differential equations. By embedding the physical state-space model within the recurrent network, parameter identification becomes an end-to-end PIRN training task. The model can be realized in PyTorch through modifications to a standard RNN backbone. Case studies demonstrate that our proposed PIRN can accurately estimate gas pipeline models from sparse terminal node measurements, providing robust performance and significantly higher parameter efficiency. Furthermore, the identified state-space model of the pipeline network can be seamlessly integrated into optimization frameworks. △ Less

Submitted 19 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: 9 Pages

arXiv:2502.01003 [pdf, other]

Near-Field Integrated Sensing and Communications for Secure UAV Networks

Authors: Jingjing Zhao, Songtao Xue, Kaiquan Cai, Xidong Mu, Yuanwei Liu, Yanbo Zhu

Abstract: A novel near-field integrated sensing and communications framework for secure unmanned aerial vehicle (UAV) networks with high time efficiency is proposed. A ground base station (GBS) with large aperture size communicates with one communication UAV (C-UAV) under the existence of one eavesdropping UAV (E-UAV), where the artificial noise (AN) is employed for both jamming and sensing purpose. Given t… ▽ More A novel near-field integrated sensing and communications framework for secure unmanned aerial vehicle (UAV) networks with high time efficiency is proposed. A ground base station (GBS) with large aperture size communicates with one communication UAV (C-UAV) under the existence of one eavesdropping UAV (E-UAV), where the artificial noise (AN) is employed for both jamming and sensing purpose. Given that the E-UAV's motion model is unknown at the GBS, we first propose a near-field localization and trajectory tracking scheme. Specifically, exploiting the variant Doppler shift observations over the spatial domain in the near field, the E-UAV's three-dimensional (3D) velocities are estimated from echo signals. To provide the timely correction of location prediction errors, the extended Kalman filter (EKF) is adopted to fuse the predicted states and the measured ones. Subsequently, based on the real-time predicated location of the E-UAV, we further propose a joint GBS beamforming and C-UAV trajectory design scheme for maximizing the instantaneous secrecy rate, while guaranteeing the sensing accuracy constraint. To solve the resultant non-convex problem, an alternating optimization approach is developed, where the near-field GBS beamforming and the C-UAV trajectory design subproblems are iteratively solved by exploiting the successive convex approximation method. Finally, our numerical results unveil that: 1) the E-UAV's 3D velocities and location can be accurately estimated in real time with our proposed framework by exploiting the near-field spherical wave propagation; and 2) the proposed framework achieves superior secrecy rate compared to benchmark schemes and closely approaches the performance when the E-UAV trajectory is perfectly known. △ Less

Submitted 2 February, 2025; originally announced February 2025.

arXiv:2501.16023 [pdf, other]

TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Authors: Xin Li, Ran Liu, Saihua Xu, Sirajudeen Gulam Razul, Chau Yuen

Abstract: Accurate indoor pathloss prediction is crucial for optimizing wireless communication in indoor settings, where diverse materials and complex electromagnetic interactions pose significant modeling challenges. This paper introduces TransPathNet, a novel two-stage deep learning framework that leverages transformer-based feature extraction and multiscale convolutional attention decoding to generate hi… ▽ More Accurate indoor pathloss prediction is crucial for optimizing wireless communication in indoor settings, where diverse materials and complex electromagnetic interactions pose significant modeling challenges. This paper introduces TransPathNet, a novel two-stage deep learning framework that leverages transformer-based feature extraction and multiscale convolutional attention decoding to generate high-precision indoor radio pathloss maps. TransPathNet demonstrates state-of-the-art performance in the ICASSP 2025 Indoor Pathloss Radio Map Prediction Challenge, achieving an overall Root Mean Squared Error (RMSE) of 10.397 dB on the challenge full test set and 9.73 dB on the challenge Kaggle test set, showing excellent generalization capabilities across different indoor geometries, frequencies, and antenna patterns. Our project page, including the associated code, is available at https://lixin.ai/TransPathNet/. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: Accepted to ICASSP 2025

arXiv:2501.14970 [pdf, other]

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

Authors: Guangjin Pan, Yuan Gao, Yilin Gao, Zhiyong Zhong, Xiaoyu Yang, Xinyu Guo, Shugong Xu

Abstract: Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Par… ▽ More Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Partnership Project (3GPP) standards, AI/machine learning (ML)-based positioning is becoming a key technology to overcome the limitations of traditional methods. This paper begins with an introduction to the fundamentals of AI and wireless positioning, covering AI models, algorithms, positioning applications, emerging wireless technologies, and the basics of positioning techniques. Subsequently, focusing on standardization progress, we provide a comprehensive review of the evolution of 3GPP positioning standards, with an emphasis on the integration of AI/ML technologies in recent and upcoming releases. Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research. we focus on state-of-the-art (SOTA) research in AI-based line-of-sight (LOS)/non-line-of-sight (NLOS) detection, time of arrival (TOA)/time difference of arrival (TDOA) estimation, and angle estimation techniques. For Direct AI/ML Positioning, we explore SOTA advancements in fingerprint-based positioning, knowledge-assisted AI positioning, and channel charting-based positioning. Furthermore, we introduce publicly available datasets for wireless positioning and conclude by summarizing the challenges and opportunities of AI-driven wireless positioning. △ Less

Submitted 24 January, 2025; originally announced January 2025.

Comments: 32 pages. This work has been submitted to the IEEE for possible publication

arXiv:2501.02572 [pdf, other]

Energy Optimization of Multi-task DNN Inference in MEC-assisted XR Devices: A Lyapunov-Guided Reinforcement Learning Approach

Authors: Yanzan Sun, Jiacheng Qiu, Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaoyun Wang, Shuangfeng Han

Abstract: Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to th… ▽ More Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to the challenges posed by the high energy consumption and limited resources of XR devices, we designed a dual time-scale joint optimization strategy for model partitioning and resource allocation, formulated as a bi-level optimization problem. This strategy aims to minimize the total energy consumption of XR devices while ensuring queue stability and adhering to computational and communication resource constraints. To tackle this problem, we devised a Lyapunov-guided Proximal Policy Optimization algorithm, named LyaPPO. Numerical results demonstrate that the LyaPPO algorithm outperforms the baselines, achieving energy conservation of 24.79% to 46.14% under varying resource capacities. Specifically, the proposed algorithm reduces the energy consumption of XR devices by 24.29% to 56.62% compared to baseline algorithms. △ Less

Submitted 5 January, 2025; originally announced January 2025.

Comments: 13 pages, 7 figures. This work has been submitted to the IEEE for possible publication

arXiv:2412.17268 [pdf, ps, other]

STAR-RIS Assisted SWIPT Systems: Active or Passive?

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is investigated. Both active and passive STAR-RISs are considered. Passive STAR-RISs can be cost-efficiently fabricated to large aperture sizes with significant near-field regions, but the design flexibility is limited by the couple… ▽ More A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is investigated. Both active and passive STAR-RISs are considered. Passive STAR-RISs can be cost-efficiently fabricated to large aperture sizes with significant near-field regions, but the design flexibility is limited by the coupled phase-shifts. Active STAR-RISs can further amplify signals and have independent phase-shifts, but their aperture sizes are relatively small due to the high cost. To characterize and compare their performance, a power consumption minimization problem is formulated by jointly designing the beamforming at the access point (AP) and the STAR-RIS, subject to both the power and information quality-of-service requirements. To solve the resulting highly-coupled non-convex problem, the original problem is first decomposed into simpler subproblems and then an alternating optimization framework is proposed. For the passive STAR-RIS, the coupled phase-shift constraint is tackled by employing a vector-driven weight penalty method. While for the active STAR-RIS, the independent phase-shift is optimized with AP beamforming via matrix-driven semidefinite programming, and the amplitude matrix is updated using convex optimization techniques in each iteration. Numerical results show that: 1) given the same aperture sizes, the active STAR-RIS exhibits superior performance over the passive one when the aperture size is small, but the performance gap decreases with the increase in aperture size; and 2) given identical power budgets, the passive STAR-RIS is generally preferred, whereas the active STAR-RIS typically suffers performance loss for balancing between the hardware power and the amplification power. △ Less

Submitted 22 December, 2024; originally announced December 2024.

arXiv:2412.12742 [pdf, other]

Subspace Implicit Neural Representations for Real-Time Cardiac Cine MR Imaging

Authors: Wenqi Huang, Veronika Spieker, Siying Xu, Gastao Cruz, Claudia Prieto, Julia Schnabel, Kerstin Hammernik, Thomas Kuestner, Daniel Rueckert

Abstract: Conventional cardiac cine MRI methods rely on retrospective gating, which limits temporal resolution and the ability to capture continuous cardiac dynamics, particularly in patients with arrhythmias and beat-to-beat variations. To address these challenges, we propose a reconstruction framework based on subspace implicit neural representations for real-time cardiac cine MRI of continuously sampled… ▽ More Conventional cardiac cine MRI methods rely on retrospective gating, which limits temporal resolution and the ability to capture continuous cardiac dynamics, particularly in patients with arrhythmias and beat-to-beat variations. To address these challenges, we propose a reconstruction framework based on subspace implicit neural representations for real-time cardiac cine MRI of continuously sampled radial data. This approach employs two multilayer perceptrons to learn spatial and temporal subspace bases, leveraging the low-rank properties of cardiac cine MRI. Initialized with low-resolution reconstructions, the networks are fine-tuned using spoke-specific loss functions to recover spatial details and temporal fidelity. Our method directly utilizes the continuously sampled radial k-space spokes during training, thereby eliminating the need for binning and non-uniform FFT. This approach achieves superior spatial and temporal image quality compared to conventional binned methods at the acceleration rate of 10 and 20, demonstrating potential for high-resolution imaging of dynamic cardiac events and enhancing diagnostic capability. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.07555 [pdf, other]

GSM: A GNN-based Space-MIMO Framework for Direct-to-Cell Communications

Authors: Sai Xu, Yanan Du, Gaojie Chen, Rahim Tafazolli

Abstract: This paper proposes a graph neural network (GNN)-based space multiple-input multiple-output (MIMO) framework, named GSM, for direct-to-cell communications, aiming to achieve distributed coordinated beamforming for low Earth orbit (LEO) satellites. Firstly, a system model for LEO multi-satellite communications is established, where multiple LEO satellites collaborate to perform distributed beamform… ▽ More This paper proposes a graph neural network (GNN)-based space multiple-input multiple-output (MIMO) framework, named GSM, for direct-to-cell communications, aiming to achieve distributed coordinated beamforming for low Earth orbit (LEO) satellites. Firstly, a system model for LEO multi-satellite communications is established, where multiple LEO satellites collaborate to perform distributed beamforming and communicate with terrestrial user terminals coherently. Based on the system model, a weighted sum rate maximization problem is formulated. Secondly, a GNN-based method is developed to address the optimization problem. Particularly, the adopted neural network is composed of multiple identical GNNs, which are trained together and then deployed individually on each LEO satellite. Finally, the trained GNN is quantized and deployed on a field-programmable gate array (FPGA) to accelerate the inference by customizing the microarchitecture. Simulation results demonstrate that the proposed GNN scheme outperforms the benchmark ones including maximum ratio transmission, zero forcing and minimum mean square error. Furthermore, experimental results show that the FPGA-based accelerator achieves remarkably low inference latency, ranging from 3.863 to 5.883 ms under a 10-ns target clock period with 8-bit fixed-point data representation. △ Less

Submitted 10 December, 2024; originally announced December 2024.

arXiv:2412.04201 [pdf, other]

Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image

Authors: Shuang Xu, Zixiang Zhao, Haowen Bai, Chang Yu, Jiangjun Peng, Xiangyong Cao, Deyu Meng

Abstract: Hyperspectral images (HSIs) are frequently noisy and of low resolution due to the constraints of imaging devices. Recently launched satellites can concurrently acquire HSIs and panchromatic (PAN) images, enabling the restoration of HSIs to generate clean and high-resolution imagery through fusing PAN images for denoising and super-resolution. However, previous studies treated these two tasks as in… ▽ More Hyperspectral images (HSIs) are frequently noisy and of low resolution due to the constraints of imaging devices. Recently launched satellites can concurrently acquire HSIs and panchromatic (PAN) images, enabling the restoration of HSIs to generate clean and high-resolution imagery through fusing PAN images for denoising and super-resolution. However, previous studies treated these two tasks as independent processes, resulting in accumulated errors. This paper introduces \textbf{H}yperspectral \textbf{I}mage Joint \textbf{Pand}enoising \textbf{a}nd Pan\textbf{s}harpening (Hipandas), a novel learning paradigm that reconstructs HRHS images from noisy low-resolution HSIs (LRHS) and high-resolution PAN images. The proposed zero-shot Hipandas framework consists of a guided denoising network, a guided super-resolution network, and a PAN reconstruction network, utilizing an HSI low-rank prior and a newly introduced detail-oriented low-rank prior. The interconnection of these networks complicates the training process, necessitating a two-stage training strategy to ensure effective training. Experimental results on both simulated and real-world datasets indicate that the proposed method surpasses state-of-the-art algorithms, yielding more accurate and visually pleasing HRHS images. △ Less

Submitted 5 December, 2024; originally announced December 2024.

arXiv:2412.01035 [pdf, other]

Adaptive Traffic Element-Based Streetlight Control Using Neighbor Discovery Algorithm Based on IoT Events

Authors: Yupeng Tan, Sheng Xu, Chengyue Su

Abstract: Intelligent streetlight systems divide the streetlight network into multiple sectors, activating only the streetlights in the corresponding sectors when traffic elements pass by, rather than all streetlights, effectively reducing energy waste. This strategy requires streetlights to understand their neighbor relationships to illuminate only the streetlights in their respective sectors. However, man… ▽ More Intelligent streetlight systems divide the streetlight network into multiple sectors, activating only the streetlights in the corresponding sectors when traffic elements pass by, rather than all streetlights, effectively reducing energy waste. This strategy requires streetlights to understand their neighbor relationships to illuminate only the streetlights in their respective sectors. However, manually configuring the neighbor relationships for a large number of streetlights in complex large-scale road streetlight networks is cumbersome and prone to errors. Due to the crisscrossing nature of roads, it is also difficult to determine the neighbor relationships using GPS or communication positioning. In response to these issues, this article proposes a systematic approach to model the streetlight network as a social network and construct a neighbor relationship probabilistic graph using IoT event records of streetlights detecting traffic elements. Based on this, a multi-objective genetic algorithm based probabilistic graph clustering method is designed to discover the neighbor relationships of streetlights. Considering the characteristic that pedestrians and vehicles usually move at a constant speed on a section of a road, speed consistency is introduced as an optimization objective, which, together with traditional similarity measures, forms a multi-objective function, enhancing the accuracy of neighbor relationship discovery. Extensive experiments on simulation datasets were conducted, comparing the proposed algorithm with other probabilistic graph clustering algorithms. The results demonstrate that the proposed algorithm can more accurately identify the neighbor relationships of streetlights compared to other algorithms, effectively achieving adaptive streetlight control for traffic elements. △ Less

Submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.13305 [pdf, ps, other]

Mutual Information-oriented ISAC Beamforming Design for Large Dimensional Antenna Array

Authors: Shanfeng Xu, Yanshuo Cheng, Siqiang Wang, Xinyi Wang, Zhong Zheng, Zesong Fei

Abstract: Existing integrated sensing and communication (ISAC) beamforming design were mostly designed under perfect instantaneous channel state information (CSI), limiting their use in practical dynamic environments. In this paper, we study the beamforming design for multiple-input multiple-output (MIMO) ISAC systems based on statistical CSI, with the weighted mutual information (MI) comprising sensing and… ▽ More Existing integrated sensing and communication (ISAC) beamforming design were mostly designed under perfect instantaneous channel state information (CSI), limiting their use in practical dynamic environments. In this paper, we study the beamforming design for multiple-input multiple-output (MIMO) ISAC systems based on statistical CSI, with the weighted mutual information (MI) comprising sensing and communication perspectives adopted as the performance metric. In particular, the operator-valued free probability theory is utilized to derive the closed-form expression for the weighted MI under statistical CSI. Subsequently, an efficient projected gradient ascent (PGA) algorithm is proposed to optimize the transmit beamforming matrix with the aim of maximizing the weighted MI.Numerical results validate that the derived closed-form expression matches well with the Monte Carlo simulation results and the proposed optimization algorithm is able to improve the weighted MI significantly. We also illustrate the trade-off between sensing and communication MI. △ Less

Submitted 17 June, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: 17 pages, 5 figures

arXiv:2411.07603 [pdf, other]

$\mathscr{H}_2$ Model Reduction for Linear Quantum Systems

Authors: G. P. Wu, S. Xue, G. F. Zhang, I. R. Petersen

Abstract: In this paper, an $\mathscr{H}_2$ norm-based model reduction method for linear quantum systems is presented, which can obtain a physically realizable model with a reduced order for closely approximating the original system. The model reduction problem is described as an optimization problem, whose objective is taken as an $\mathscr{H}_2$ norm of the difference between the transfer function of the… ▽ More In this paper, an $\mathscr{H}_2$ norm-based model reduction method for linear quantum systems is presented, which can obtain a physically realizable model with a reduced order for closely approximating the original system. The model reduction problem is described as an optimization problem, whose objective is taken as an $\mathscr{H}_2$ norm of the difference between the transfer function of the original system and that of the reduced one. Different from classical model reduction problems, physical realizability conditions for guaranteeing that the reduced-order system is also a quantum system should be taken as nonlinear constraints in the optimization. To solve the optimization problem with such nonlinear constraints, we employ a matrix inequality approach to transform nonlinear inequality constraints into readily solvable linear matrix inequalities (LMIs) and nonlinear equality constraints, so that the optimization problem can be solved by a lifting variables approach. We emphasize that different from existing work, which only introduces a criterion to evaluate the performance after model reduction, we guide our method to obtain an optimal reduced model with respect to the $\mathscr{H}_2$ norm. In addition, the above approach for model reduction is extended to passive linear quantum systems. Finally, examples of active and passive linear quantum systems validate the efficacy of the proposed method. △ Less

Submitted 19 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

Comments: 13 pages,3 figures

arXiv:2410.23154 [pdf, other]

Nested ResNet: A Vision-Based Method for Detecting the Sensing Area of a Drop-in Gamma Probe

Authors: Songyu Xu, Yicheng Hu, Jionglong Su, Daniel Elson, Baoru Huang

Abstract: Purpose: Drop-in gamma probes are widely used in robotic-assisted minimally invasive surgery (RAMIS) for lymph node detection. However, these devices only provide audio feedback on signal intensity, lacking the visual feedback necessary for precise localisation. Previous work attempted to predict the sensing area location using laparoscopic images, but the prediction accuracy was unsatisfactory. I… ▽ More Purpose: Drop-in gamma probes are widely used in robotic-assisted minimally invasive surgery (RAMIS) for lymph node detection. However, these devices only provide audio feedback on signal intensity, lacking the visual feedback necessary for precise localisation. Previous work attempted to predict the sensing area location using laparoscopic images, but the prediction accuracy was unsatisfactory. Improvements are needed in the deep learning-based regression approach. Methods: We introduce a three-branch deep learning framework to predict the sensing area of the probe. Specifically, we utilise the stereo laparoscopic images as input for the main branch and develop a Nested ResNet architecture. The framework also incorporates depth estimation via transfer learning and orientation guidance through probe axis sampling. The combined features from each branch enhanced the accuracy of the prediction. Results: Our approach has been evaluated on a publicly available dataset, demonstrating superior performance over previous methods. In particular, our method resulted in a 22.10\% decrease in 2D mean error and a 41.67\% reduction in 3D mean error. Additionally, qualitative comparisons further demonstrated the improved precision of our approach. Conclusion: With extensive evaluation, our solution significantly enhances the accuracy and reliability of sensing area predictions. This advancement enables visual feedback during the use of the drop-in gamma probe in surgery, providing surgeons with more accurate and reliable localisation.} △ Less

Submitted 30 October, 2024; originally announced October 2024.

arXiv:2410.21351 [pdf, other]

LinFormer: A Linear-based Lightweight Transformer Architecture For Time-Aware MIMO Channel Prediction

Authors: Yanliang Jin, Yifan Wu, Yuan Gao, Shunqing Zhang, Shugong Xu, Cheng-Xiang Wang

Abstract: The emergence of 6th generation (6G) mobile networks brings new challenges in supporting high-mobility communications, particularly in addressing the issue of channel aging. While existing channel prediction methods offer improved accuracy at the expense of increased computational complexity, limiting their practical application in mobile networks. To address these challenges, we present LinFormer… ▽ More The emergence of 6th generation (6G) mobile networks brings new challenges in supporting high-mobility communications, particularly in addressing the issue of channel aging. While existing channel prediction methods offer improved accuracy at the expense of increased computational complexity, limiting their practical application in mobile networks. To address these challenges, we present LinFormer, an innovative channel prediction framework based on a scalable, all-linear, encoder-only Transformer model. Our approach, inspired by natural language processing (NLP) models such as BERT, adapts an encoder-only architecture specifically for channel prediction tasks. We propose replacing the computationally intensive attention mechanism commonly used in Transformers with a time-aware multi-layer perceptron (TMLP), significantly reducing computational demands. The inherent time awareness of TMLP module makes it particularly suitable for channel prediction tasks. We enhance LinFormer's training process by employing a weighted mean squared error loss (WMSELoss) function and data augmentation techniques, leveraging larger, readily available communication datasets. Our approach achieves a substantial reduction in computational complexity while maintaining high prediction accuracy, making it more suitable for deployment in cost-effective base stations (BS). Comprehensive experiments using both simulated and measured data demonstrate that LinFormer outperforms existing methods across various mobility scenarios, offering a promising solution for future wireless communication systems. △ Less

Submitted 28 October, 2024; originally announced October 2024.

arXiv:2410.19779 [pdf, other]

EEGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training

Authors: Tongtian Yue, Shuning Xue, Xuange Gao, Yepeng Tang, Longteng Guo, Jie Jiang, Jing Liu

Abstract: Electroencephalogram (EEG) signals are pivotal in providing insights into spontaneous brain activity, highlighting their significant importance in neuroscience research. However, the exploration of versatile EEG models is constrained by diverse data formats, outdated pre-training paradigms, and limited transfer learning methods, only leading to specialist models on single dataset. In this paper, w… ▽ More Electroencephalogram (EEG) signals are pivotal in providing insights into spontaneous brain activity, highlighting their significant importance in neuroscience research. However, the exploration of versatile EEG models is constrained by diverse data formats, outdated pre-training paradigms, and limited transfer learning methods, only leading to specialist models on single dataset. In this paper, we introduce EEGPT, the first generalist EEG foundation model designed to address these challenges. First, we propose an electrode-wise modeling strategy that treats each electrode as a fundamental unit, enabling the integration of diverse EEG datasets collected from up to 138 electrodes, amassing 37.5M pre-training samples. Second, we develop the first autoregressive EEG pre-trained model, moving away from traditional masked autoencoder approaches to a next signal prediction task that better captures the sequential and temporal dependencies of EEG data. We also explore scaling laws with model up to 1.1B parameters: the largest in EEG research to date. Third, we introduce a multi-task transfer learning paradigm using a learnable electrode graph network shared across tasks, which for the first time confirms multi-task compatibility and synergy. As the first generalist EEG foundation model, EEGPT shows broad compatibility with various signal acquisition devices, subjects, and tasks. It supports up to 138 electrodes and any combination thereof as input. Furthermore, we simultaneously evaluate it on 5 distinct tasks across 12 benchmarks. EEGPT consistently outperforms existing specialist models across all downstream tasks, with its effectiveness further validated through extensive ablation studies. This work sets a new direction for generalist EEG modeling, offering improved scalability, transferability, and adaptability for a wide range of EEG applications. The code and models will be released. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.12218 [pdf, other]

Exploring Dual-Sniffer Passive Localization: Algorithm Design and Experimental Results

Authors: Tuo Wu, Lingyu Hou, Hong Niu, Saihua Xu, Sirajudeen Gulam Razul, Chau Yuen

Abstract: In this paper, we explore a dual-sniffer passive localization system that detects the timing difference of signals from both commercial base station (eNb) and user equipment (UE) to the sniffers. We design two localization schemes for UE localization: a time of arrival (ToA) based scheme and a time difference of arrival (TDoA) based scheme. In the ToA-based scheme, we derive two ellipse equations… ▽ More In this paper, we explore a dual-sniffer passive localization system that detects the timing difference of signals from both commercial base station (eNb) and user equipment (UE) to the sniffers. We design two localization schemes for UE localization: a time of arrival (ToA) based scheme and a time difference of arrival (TDoA) based scheme. In the ToA-based scheme, we derive two ellipse equations from measured arrival times at two sniffers, enabling direct numerical computation of the estimated position. For the TDoA-based scheme, we relocate one sniffer to a different position to obtain two sets of TDoA measurements, resulting in hyperbola equations. We then apply a least squares (LS) algorithm to analytically estimate the UE's position. Simulation results validate the effectiveness of the proposed TDoA-based scheme, demonstrating improved accuracy in UE positioning.We build a platform based on the considered localization system and conduct real-world experiments. The experimental results confirm the accuracy and practicality of the TDoA-based dual-sniffer localization scheme, demonstrating improved precision in passive localization. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.01330 [pdf, ps, other]

Enhancing User Fairness in Wireless Powered Communication Networks with STAR-RIS

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted wireless powered communication network (WPCN) is proposed, where two energy-limited devices first harvest energy from a hybrid access point (HAP) and then use that energy to transmit information back. To fully eliminate the doubly-near-far effect in WPCNs, two STAR-RIS operating protocol-driven tran… ▽ More A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted wireless powered communication network (WPCN) is proposed, where two energy-limited devices first harvest energy from a hybrid access point (HAP) and then use that energy to transmit information back. To fully eliminate the doubly-near-far effect in WPCNs, two STAR-RIS operating protocol-driven transmission strategies, namely energy splitting non-orthogonal multiple access (ES- NOMA) and time switching time division multiple access (TS- TDMA) are proposed. For each strategy, the corresponding optimization problem is formulated to maximize the minimum throughput by jointly optimizing time allocation, user transmit power, active HAP beamforming, and passive STAR-RIS beamforming. For ES-NOMA, the resulting intractable problem is solved via a two-layer algorithm, which exploits the one-dimensional search and block coordinate descent methods in an iterative manner. For TS-TDMA, the optimal active beamforming and passive beamforming are first determined according to the maximum-ratio transmission beamformer. Then, the optimal solution of the time allocation variables is obtained by solving a standard convex problem. Numerical results show that: 1) the STAR-RIS can achieve considerable performance improvements for both strategies compared to the conventional RIS; 2) TS- TDMA is preferred for single-antenna scenarios, whereas ES- NOMA is better suited for multi-antenna scenarios; and 3) the superiority of ES-NOMA over TS-TDMA is enhanced as the number of STAR-RIS elements increases. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.19370 [pdf, ps, other]

MambaEviScrib: Mamba and Evidence-Guided Consistency Enhance CNN Robustness for Scribble-Based Weakly Supervised Ultrasound Image Segmentation

Authors: Xiaoxiang Han, Xinyu Li, Jiang Shang, Yiman Liu, Keyan Chen, Shugong Xu, Qiaohong Liu, Qi Zhang

Abstract: Segmenting anatomical structures and lesions from ultrasound images contributes to disease assessment. Weakly supervised learning (WSL) based on sparse annotation has achieved encouraging performance and demonstrated the potential to reduce annotation costs. This study attempts to introduce scribble-based WSL into ultrasound image segmentation tasks. However, ultrasound images often suffer from po… ▽ More Segmenting anatomical structures and lesions from ultrasound images contributes to disease assessment. Weakly supervised learning (WSL) based on sparse annotation has achieved encouraging performance and demonstrated the potential to reduce annotation costs. This study attempts to introduce scribble-based WSL into ultrasound image segmentation tasks. However, ultrasound images often suffer from poor contrast and unclear edges, coupled with insufficient supervison signals for edges, posing challenges to edge prediction. Uncertainty modeling has been proven to facilitate models in dealing with these issues. Nevertheless, existing uncertainty estimation paradigms are not robust enough and often filter out predictions near decision boundaries, resulting in unstable edge predictions. Therefore, we propose leveraging predictions near decision boundaries effectively. Specifically, we introduce Dempster-Shafer Theory (DST) of evidence to design an Evidence-Guided Consistency strategy. This strategy utilizes high-evidence predictions, which are more likely to occur near high-density regions, to guide the optimization of low-evidence predictions that may appear near decision boundaries. Furthermore, the diverse sizes and locations of lesions in ultrasound images pose a challenge for CNNs with local receptive fields, as they struggle to model global information. Therefore, we introduce Visual Mamba based on structured state space sequence models, which achieves long-range dependency with linear computational complexity, and we construct a novel hybrid CNN-Mamba framework. During training, the collaboration between the CNN branch and the Mamba branch in the proposed framework draws inspiration from each other based on the EGC strategy. Experiments demonstrate the competitiveness of the proposed method. Dataset and code will be available on https://github.com/GtLinyer/MambaEviScrib. △ Less

Submitted 31 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.15742 [pdf, other]

Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample

Authors: Zhiyong Chen, Zhiqi Ai, Xinnuo Li, Shugong Xu

Abstract: This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task… ▽ More This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language text-dependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27\% over the directly use of efficient WavLM base+ model. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: IEEE Spoken Language Technology Workshop 2024

Journal ref: IEEE Spoken Language Technology Workshop 2024

arXiv:2409.15741 [pdf, other]

StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis

Authors: Zhiyong Chen, Xinnuo Li, Zhiqi Ai, Shugong Xu

Abstract: We introduce StyleFusion-TTS, a prompt and/or audio referenced, style and speaker-controllable, zero-shot text-to-speech (TTS) synthesis system designed to enhance the editability and naturalness of current research literature. We propose a general front-end encoder as a compact and effective module to utilize multimodal inputs including text prompts, audio references, and speaker timbre reference… ▽ More We introduce StyleFusion-TTS, a prompt and/or audio referenced, style and speaker-controllable, zero-shot text-to-speech (TTS) synthesis system designed to enhance the editability and naturalness of current research literature. We propose a general front-end encoder as a compact and effective module to utilize multimodal inputs including text prompts, audio references, and speaker timbre references in a fully zero-shot manner and produce disentangled style and speaker control embeddings. Our novel approach also leverages a hierarchical conformer structure for the fusion of style and speaker control embeddings, aiming to achieve optimal feature fusion within the current advanced TTS architecture. StyleFusion-TTS is evaluated through multiple metrics, both subjectively and objectively. The system shows promising performance across our evaluations, suggesting its potential to contribute to the advancement of the field of zero-shot text-to-speech synthesis. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: The 7th Chinese Conference on Pattern Recognition and Computer Vision PRCV 2024

Journal ref: The 7th Chinese Conference on Pattern Recognition and Computer Vision PRCV 2024

arXiv:2409.14688 [pdf, other]

A Generalized Control Revision Method for Autonomous Driving Safety

Authors: Zehang Zhu, Yuning Wang, Tianqi Ke, Zeyu Han, Shaobing Xu, Qing Xu, John M. Dolan, Jianqiang Wang

Abstract: Safety is one of the most crucial challenges of autonomous driving vehicles, and one solution to guarantee safety is to employ an additional control revision module after the planning backbone. Control Barrier Function (CBF) has been widely used because of its strong mathematical foundation on safety. However, the incompatibility with heterogeneous perception data and incomplete consideration of t… ▽ More Safety is one of the most crucial challenges of autonomous driving vehicles, and one solution to guarantee safety is to employ an additional control revision module after the planning backbone. Control Barrier Function (CBF) has been widely used because of its strong mathematical foundation on safety. However, the incompatibility with heterogeneous perception data and incomplete consideration of traffic scene elements make existing systems hard to be applied in dynamic and complex real-world scenarios. In this study, we introduce a generalized control revision method for autonomous driving safety, which adopts both vectorized perception and occupancy grid map as inputs and comprehensively models multiple types of traffic scene constraints based on a new proposed barrier function. Traffic elements are integrated into one unified framework, decoupled from specific scenario settings or rules. Experiments on CARLA, SUMO, and OnSite simulator prove that the proposed algorithm could realize safe control revision under complicated scenes, adapting to various planning backbones, road topologies, and risk types. Physical platform validation also verifies the real-world application feasibility. △ Less

Submitted 17 March, 2025; v1 submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.12470 [pdf, other]

HSIGene: A Foundation Model For Hyperspectral Image Generation

Authors: Li Pang, Xiangyong Cao, Datao Tang, Shuang Xu, Xueru Bai, Feng Zhou, Deyu Meng

Abstract: Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affe… ▽ More Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene. △ Less

Submitted 1 November, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.08652 [pdf, other]

SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation

Authors: Rongtao Xu, Changwei Wang, Jiguang Zhang, Shibiao Xu, Weiliang Meng, Xiaopeng Zhang

Abstract: Accurate skin lesion segmentation from dermoscopic images is of great importance for skin cancer diagnosis. However, automatic segmentation of melanoma remains a challenging task because it is difficult to incorporate useful texture representations into the learning process. Texture representations are not only related to the local structural information learned by CNN, but also include the global… ▽ More Accurate skin lesion segmentation from dermoscopic images is of great importance for skin cancer diagnosis. However, automatic segmentation of melanoma remains a challenging task because it is difficult to incorporate useful texture representations into the learning process. Texture representations are not only related to the local structural information learned by CNN, but also include the global statistical texture information of the input image. In this paper, we propose a trans\textbf{Former} network (\textbf{SkinFormer}) that efficiently extracts and fuses statistical texture representation for \textbf{Skin} lesion segmentation. Specifically, to quantify the statistical texture of input features, a Kurtosis-guided Statistical Counting Operator is designed. We propose Statistical Texture Fusion Transformer and Statistical Texture Enhance Transformer with the help of Kurtosis-guided Statistical Counting Operator by utilizing the transformer's global attention mechanism. The former fuses structural texture information and statistical texture information, and the latter enhances the statistical texture of multi-scale features. {Extensive experiments on three publicly available skin lesion datasets validate that our SkinFormer outperforms other SOAT methods, and our method achieves 93.2\% Dice score on ISIC 2018. It can be easy to extend SkinFormer to segment 3D images in the future.} Our code is available at https://github.com/Rongtao-Xu/SkinFormer. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 12 pages, 8 figures, published to JBHI

arXiv:2409.02497 [pdf, other]

A Learnable Color Correction Matrix for RAW Reconstruction

Authors: Anqi Liu, Shiyi Mu, Shugong Xu

Abstract: Autonomous driving algorithms usually employ sRGB images as model input due to their compatibility with the human visual system. However, visually pleasing sRGB images are possibly sub-optimal for downstream tasks when compared to RAW images. The availability of RAW images is constrained by the difficulties in collecting real-world driving data and the associated challenges of annotation. To addre… ▽ More Autonomous driving algorithms usually employ sRGB images as model input due to their compatibility with the human visual system. However, visually pleasing sRGB images are possibly sub-optimal for downstream tasks when compared to RAW images. The availability of RAW images is constrained by the difficulties in collecting real-world driving data and the associated challenges of annotation. To address this limitation and support research in RAW-domain driving perception, we design a novel and ultra-lightweight RAW reconstruction method. The proposed model introduces a learnable color correction matrix (CCM), which uses only a single convolutional layer to approximate the complex inverse image signal processor (ISP). Experimental results demonstrate that simulated RAW (simRAW) images generated by our method provide performance improvements equivalent to those produced by more complex inverse ISP methods when pretraining RAW-domain object detectors, which highlights the effectiveness and practicality of our approach. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: Accepted by BMVC2024

arXiv:2408.06645 [pdf]

Dynamic Pricing of Electric Vehicle Charging Station Alliances Under Information Asymmetry

Authors: Zeyu Liu, Yun Zhou, Donghan Feng, Shaolun Xu, Yin Yi, Hengjie Li, Haojing Wang

Abstract: Due to the centralization of charging stations (CSs), CSs are organized as charging station alliances (CSAs) in the commercial competition. Under this situation, this paper studies the profit-oriented dynamic pricing strategy of CSAs. As the practicability basis, a privacy-protected bidirectional real-time information interaction framework is designed, under which the status of EVs is utilized as… ▽ More Due to the centralization of charging stations (CSs), CSs are organized as charging station alliances (CSAs) in the commercial competition. Under this situation, this paper studies the profit-oriented dynamic pricing strategy of CSAs. As the practicability basis, a privacy-protected bidirectional real-time information interaction framework is designed, under which the status of EVs is utilized as the reference for pricing, and the prices of CSs are the reference for charging decisions. Based on this framework, the decision-making models of EVs and CSs are established, in which the uncertainty caused by the information asymmetry between EVs and CSs and the bounded rationality of EV users are integrated. To solve the pricing decision model, the evolutionary game theory is adopted to describe the dynamic pricing game among CSAs, the equilibrium of which gives the optimal pricing strategy. Finally, the case study results in a real urban area in Shanghai, China verifies the practicability of the framework and the effectiveness of the dynamic pricing strategy. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.20852 [pdf, other]

Optimizing 5G-Advanced Networks for Time-critical Applications: The Role of L4S

Authors: Guangjin Pan, Shugong Xu, Pin Jiang

Abstract: As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the la… ▽ More As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the latest 3GPP 5G-Advanced Release 18 (R18) standard. Our case study, using a prototype system for a real-time communication (RTC) application, demonstrates the superiority of L4S technology. The experimental results show that, compared with the GCC algorithm, the proposed L4S-GCC algorithm can reduce the stalling rate by 1.51%-2.80% and increase the bandwidth utilization by 11.4%-31.4%. The results emphasize the immense potential of L4S technology in enhancing transmission performance in time-critical applications. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 7 pages, 3 figures. This work has been submitted to the IEEE for possible publication

arXiv:2407.20518 [pdf, other]

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.08509 [pdf, other]

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao, Deyu Meng

Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived… ▽ More Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived from the 2-D frontal slice-wise Haar discrete wavelet transform, effectively modeling the low-rank prior for separated coarse-grained structure and fine-grained textures in the image. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. △ Less

Submitted 16 December, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 9 September, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 62, art. no. 5528714, 2024

arXiv:2407.03308 [pdf, other]

Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method

Authors: Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang

Abstract: Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruc… ▽ More Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03034 [pdf, ps, other]

Attention Incorporated Network for Sharing Low-rank, Image and K-space Information during MR Image Reconstruction to Achieve Single Breath-hold Cardiac Cine Imaging

Authors: Siying Xu, Kerstin Hammernik, Andreas Lingg, Jens Kuebler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Kuestner

Abstract: Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities,… ▽ More Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities, including single-domain learning, reliance on a single regularization term, and equal feature contribution. To address these limitations, we propose to embed information from multiple domains, including low-rank, image, and k-space, in a novel deep learning network for MRI reconstruction, which we denote as A-LIKNet. A-LIKNet adopts a parallel-branch structure, enabling independent learning in the k-space and image domain. Coupled information sharing layers realize the information exchange between domains. Furthermore, we introduce attention mechanisms into the network to assign greater weights to more critical coils or important temporal frames. Training and testing were conducted on an in-house dataset, including 91 cardiovascular patients and 38 healthy subjects scanned with 2D cardiac Cine using retrospective undersampling. Additionally, we evaluated A-LIKNet on the real-time 8x prospectively undersampled data from the OCMR dataset. The results demonstrate that our proposed A-LIKNet outperforms existing methods and provides high-quality reconstructions. The network can effectively reconstruct highly retrospectively undersampled dynamic MR images up to 24x accelerations, indicating its potential for single breath-hold imaging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Showing 1–50 of 223 results for author: Xu, S