Search | arXiv e-print repository

Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement

Authors: Siyuan Chai, Xiaodong Guo, Tong Liu

Abstract: Infrared image helps improve the perception capabilities of autonomous driving in complex weather conditions such as fog, rain, and low light. However, infrared image often suffers from low contrast, especially in non-heat-emitting targets like bicycles, which significantly affects the performance of downstream high-level vision tasks. Furthermore, achieving contrast enhancement without amplifying… ▽ More Infrared image helps improve the perception capabilities of autonomous driving in complex weather conditions such as fog, rain, and low light. However, infrared image often suffers from low contrast, especially in non-heat-emitting targets like bicycles, which significantly affects the performance of downstream high-level vision tasks. Furthermore, achieving contrast enhancement without amplifying noise and losing important information remains a challenge. To address these challenges, we propose a task-oriented infrared image enhancement method. Our approach consists of two key components: layer decomposition and saliency information extraction. First, we design an layer decomposition method for infrared images, which enhances scene details while preserving dark region features, providing more features for subsequent saliency information extraction. Then, we propose a morphological reconstruction-based saliency extraction method that effectively extracts and enhances target information without amplifying noise. Our method improves the image quality for object detection and semantic segmentation tasks. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods. △ Less

Submitted 29 June, 2025; originally announced June 2025.

arXiv:2506.16304 [pdf, ps, other]

A Tractable Approach to Massive Communication and Ubiquitous Connectivity in 6G Standardization

Authors: Junyi Jiang, Wei Chen, Xin Guo, Shenghui Song, Ying Jun, Zhang, Zhu Han, Merouane Debbah, Khaled B. Letaief

Abstract: The full-scale 6G standardization has attracted considerable recent attention, especially since the first 3GPP-wide 6G workshop held in March 2025. To understand the practical and fundamental values of 6G and facilitate its standardization, it is crucial to explore the theoretical limits of spectrum, energy, and coverage efficiency considering practical hardware and signaling constraints. In this… ▽ More The full-scale 6G standardization has attracted considerable recent attention, especially since the first 3GPP-wide 6G workshop held in March 2025. To understand the practical and fundamental values of 6G and facilitate its standardization, it is crucial to explore the theoretical limits of spectrum, energy, and coverage efficiency considering practical hardware and signaling constraints. In this paper, we present a mean-field-approximation-based investigation on two out of six use case scenarios defined by IMT-2030, namely, massive communication and ubiquitous connectivity. Being aware of the limitation in interference cancellation owing to constrained cost and hardware complexity, we investigate the spectrum reuse architecture in both usage scenarios. We propose a tractable spectrum reuse with low signaling overhead consumed for channel estimation and channel state information (CSI) feedback. Our analysis indicates that the massive communication over cellular and device-to-device (D2D) networks can benefit from channel orthogonalization, while it is unnecessary to share the CSI of interfering links. Moreover, deploying relays or movable base stations, e.g. unmanned aerial vehicle, yields substantial energy and spectrum gain for ubiquitous connectivity, despite introducing interference. As such, the mean-field-optimization-based evaluation is expected to positively impact 6G and NextG standardization in 3GPP and other standardization bodies. △ Less

Submitted 19 June, 2025; originally announced June 2025.

arXiv:2506.13317 [pdf, ps, other]

A Contemporary Survey on Fluid Antenna Systems: Fundamentals and Networking Perspectives

Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Xinghao Guo, Farshad Rostami Ghadi, Yu Chen, Yin Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong, Yangyang Zhang

Abstract: The explosive growth of teletraffic, fueled by the convergence of cyber-physical systems and data-intensive applications, such as the Internet of Things (IoT), autonomous systems, and immersive communications, demands a multidisciplinary suite of innovative solutions across the physical and network layers. Fluid antenna systems (FAS) represent a transformative advancement in antenna design, offeri… ▽ More The explosive growth of teletraffic, fueled by the convergence of cyber-physical systems and data-intensive applications, such as the Internet of Things (IoT), autonomous systems, and immersive communications, demands a multidisciplinary suite of innovative solutions across the physical and network layers. Fluid antenna systems (FAS) represent a transformative advancement in antenna design, offering enhanced spatial degrees of freedom through dynamic reconfigurability. By exploiting spatial flexibility, FAS can adapt to varying channel conditions and optimize wireless performance, making it a highly promising candidate for next-generation communication networks. This paper provides a comprehensive survey of the state of the art in FAS research. We begin by examining key application scenarios in which FAS offers significant advantages. We then present the fundamental principles of FAS, covering channel measurement and modeling, single-user configurations, and the multi-user fluid antenna multiple access (FAMA) framework. Following this, we delve into key network-layer techniques such as quality-of-service (QoS) provisioning, power allocation, and content placement strategies. We conclude by identifying prevailing challenges and outlining future research directions to support the continued development of FAS in next-generation wireless networks. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.07362 [pdf, ps, other]

Fluid Antenna-Empowered Receive Spatial Modulation

Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Hanjiang Hong, Kai-Kit Wong, Chan-Byoung Chae, Wenjun Zhang, Yiyan Wu

Abstract: Fluid antenna (FA), as an emerging antenna technology, fully exploits spatial diversity. This paper integrates FA with the receive spatial modulation (RSM) scheme and proposes a novel FA-empowered RSM (FA-RSM) system. In this system, the transmitter is equipped with an FA that simultaneously activates multiple ports to transmit precoded signals. We address three key challenges in the FA-RSM system… ▽ More Fluid antenna (FA), as an emerging antenna technology, fully exploits spatial diversity. This paper integrates FA with the receive spatial modulation (RSM) scheme and proposes a novel FA-empowered RSM (FA-RSM) system. In this system, the transmitter is equipped with an FA that simultaneously activates multiple ports to transmit precoded signals. We address three key challenges in the FA-RSM system: port selection, theoretical analysis, and detection. First, for port selection, an optimal algorithm from a capacity maximization perspective are proposed, followed by two low-complexity alternatives. Second, for theoretical analysis, performance evaluation metrics are provided for port selection, which demonstrate that increasing the number of activated ports enhances system performance. Third, regarding detection, two low-complexity detectors are proposed. Simulation results confirm that the FA-RSM system significantly outperforms the conventional RSM system. The proposed low-complexity port selection algorithms facilitate minimal performance degradation. Moreover, while activating additional ports improves performance, the gain gradually saturates due to inherent spatial correlation, highlighting the importance of effective port selection in reducing system complexity and cost. Finally, both proposed detectors achieve near-optimal detection performance with low computational complexity, emphasizing the receiver-friendly nature of the FA-RSM system. △ Less

Submitted 8 June, 2025; originally announced June 2025.

Comments: 12 pages, submitted to IEEE Journal

arXiv:2505.14717 [pdf, ps, other]

Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks

Authors: Xigui Li, Yuanye Zhou, Feiyang Xiao, Xin Guo, Chen Jiang, Tan Pan, Xingmeng Zhang, Cenyu Liu, Zeyun Miao, Jianchao Ge, Xiansheng Wang, Qimeng Wang, Yichi Zhang, Wenbo Zhang, Fengping Zhu, Limei Han, Yuan Qi, Chensen Lin, Yuan Cheng

Abstract: Intracranial aneurysms (IAs) are serious cerebrovascular lesions found in approximately 5\% of the general population. Their rupture may lead to high mortality. Current methods for assessing IA risk focus on morphological and patient-specific factors, but the hemodynamic influences on IA development and rupture remain unclear. While accurate for hemodynamic studies, conventional computational flui… ▽ More Intracranial aneurysms (IAs) are serious cerebrovascular lesions found in approximately 5\% of the general population. Their rupture may lead to high mortality. Current methods for assessing IA risk focus on morphological and patient-specific factors, but the hemodynamic influences on IA development and rupture remain unclear. While accurate for hemodynamic studies, conventional computational fluid dynamics (CFD) methods are computationally intensive, hindering their deployment in large-scale or real-time clinical applications. To address this challenge, we curated a large-scale, high-fidelity aneurysm CFD dataset to facilitate the development of efficient machine learning algorithms for such applications. Based on 427 real aneurysm geometries, we synthesized 10,660 3D shapes via controlled deformation to simulate aneurysm evolution. The authenticity of these synthetic shapes was confirmed by neurosurgeons. CFD computations were performed on each shape under eight steady-state mass flow conditions, generating a total of 85,280 blood flow dynamics data covering key parameters. Furthermore, the dataset includes segmentation masks, which can support tasks that use images, point clouds or other multimodal data as input. Additionally, we introduced a benchmark for estimating flow parameters to assess current modeling methods. This dataset aims to advance aneurysm research and promote data-driven approaches in biofluids, biomedical engineering, and clinical risk assessment. The code and dataset are available at: https://github.com/Xigui-Li/Aneumo. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.13818 [pdf, ps, other]

RainfalLTE: A Zero-effect Rainfall Sensing System Utilizing Existing LTE Infrastructure

Authors: Xianbin Jiang, Fei Shang, Haohua Du, Panlong Yang, Xing Guo, Lihong Liang, Yuanting Zhang, Xiang-Yang Li

Abstract: Environmental sensing is an important research topic in the integrated sensing and communication (ISAC) system. Current works often focus on static environments, such as buildings and terrains. However, dynamic factors like rainfall can cause serious interference to wireless signals. In this paper, we propose a system called RainfalLTE that utilizes the downlink signal of LTE base stations for dev… ▽ More Environmental sensing is an important research topic in the integrated sensing and communication (ISAC) system. Current works often focus on static environments, such as buildings and terrains. However, dynamic factors like rainfall can cause serious interference to wireless signals. In this paper, we propose a system called RainfalLTE that utilizes the downlink signal of LTE base stations for device-independent rain sensing. In articular, it is fully compatible with current communication modes and does not require any additional hardware. We evaluate it with LTE data and rainfall information provided by a weather radar in Badaling Town, Beijing The results show that for 10 classes of rainfall, RainfalLTE achieves over 97% identification accuracy. Our case study shows that the assistance of rainfall information can bring more than 40% energy saving, which provides new opportunities for the design and optimization of ISAC systems. △ Less

Submitted 25 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.07687 [pdf, ps, other]

ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for pr… ▽ More Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba's selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2's image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: MICCAI 2025(under view)

arXiv:2505.07449 [pdf, ps, other]

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Authors: Wei Li, Ming Hu, Guoan Wang, Lihao Liu, Kaijin Zhou, Junzhi Ning, Xin Guo, Zongyuan Ge, Lixu Gu, Junjun He

Abstract: In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgica… ▽ More In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. To construct Ophora, we first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K. Then, we propose a Progressive Video-Instruction Tuning scheme to transfer rich spatial-temporal knowledge from a T2V model pre-trained on natural video-text datasets for privacy-preserved ophthalmic surgical video generation based on Ophora-160K. Experiments on video quality evaluation via quantitative analysis and ophthalmologist feedback demonstrate that Ophora can generate realistic and reliable ophthalmic surgical videos based on surgeon instructions. We also validate the capability of Ophora for empowering downstream tasks of ophthalmic surgical workflow understanding. Code is available at https://github.com/mar-cry/Ophora. △ Less

Submitted 26 June, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

Comments: Early accepted in MICCAI25

arXiv:2504.02628 [pdf, ps, other]

Towards Computation- and Communication-efficient Computational Pathology

Authors: Chu Han, Bingchao Zhao, Jiatai Lin, Shanshan Lyu, Longfei Wang, Tianpeng Deng, Cheng Lu, Changhong Liang, Hannah Y. Wen, Xiaojing Guo, Zhenwei Shi, Zaiyi Liu

Abstract: Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis. This limitation severely compromises their clinical utility, especially in time-sensitive diagnostic scenarios and situations requiring efficient data transfer. To add… ▽ More Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis. This limitation severely compromises their clinical utility, especially in time-sensitive diagnostic scenarios and situations requiring efficient data transfer. To address these issues, we present a novel computation- and communication-efficient framework called Magnification-Aligned Global-Local Transformer (MAG-GLTrans). Our approach significantly reduces computational time, file transfer requirements, and storage overhead by enabling effective analysis using low-magnification inputs rather than high-magnification ones. The key innovation lies in our proposed magnification alignment (MAG) mechanism, which employs self-supervised learning to bridge the information gap between low and high magnification levels by effectively aligning their feature representations. Through extensive evaluation across various fundamental CPath tasks, MAG-GLTrans demonstrates state-of-the-art classification performance while achieving remarkable efficiency gains: up to 10.7 times reduction in computational time and over 20 times reduction in file transfer and storage requirements. Furthermore, we highlight the versatility of our MAG framework through two significant extensions: (1) its applicability as a feature extractor to enhance the efficiency of any CPath architecture, and (2) its compatibility with existing foundation models and histopathology-specific encoders, enabling them to process low-magnification inputs with minimal information loss. These advancements position MAG-GLTrans as a particularly promising solution for time-sensitive applications, especially in the context of intraoperative frozen section diagnosis where both accuracy and efficiency are paramount. △ Less

Submitted 3 June, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

arXiv:2503.22202 [pdf, other]

mmHRR: Monitoring Heart Rate Recovery with Millimeter Wave Radar

Authors: Ziheng Mao, Yuan He, Jia Zhang, Yimiao Sun, Yadong Xie, Xiuzhen Guo

Abstract: Heart rate recovery (HRR) within the initial minute following exercise is a widely utilized metric for assessing cardiac autonomic function in individuals and predicting mortality risk in patients with cardiovascular disease. However, prevailing solutions for HRR monitoring typically involve the use of specialized medical equipment or contact wearable sensors, resulting in high costs and poor user… ▽ More Heart rate recovery (HRR) within the initial minute following exercise is a widely utilized metric for assessing cardiac autonomic function in individuals and predicting mortality risk in patients with cardiovascular disease. However, prevailing solutions for HRR monitoring typically involve the use of specialized medical equipment or contact wearable sensors, resulting in high costs and poor user experience. In this paper, we propose a contactless HRR monitoring technique, mmHRR, which achieves accurate heart rate (HR) estimation with a commercial mmWave radar. Unlike HR estimation at rest, the HR varies quickly after exercise and the heartbeat signal entangles with the respiration harmonics. To overcome these hurdles and effectively estimate the HR from the weak and non-stationary heartbeat signal, we propose a novel signal processing pipeline, including dynamic target tracking, adaptive heartbeat signal extraction, and accurate HR estimation with composite sliding windows. Real-world experiments demonstrate that mmHRR exhibits exceptional robustness across diverse environmental conditions, and achieves an average HR estimation error of 3.31 bpm (beats per minute), 71% lower than that of the state-of-the-art method. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.21165 [pdf, other]

Extending Silicon Lifetime: A Review of Design Techniques for Reliable Integrated Circuits

Authors: Shaik Jani Babu, Fan Hu, Linyu Zhu, Sonal Singhal, Xinfei Guo

Abstract: Reliability has become an increasing concern in modern computing. Integrated circuits (ICs) are the backbone of modern computing devices across industries, including artificial intelligence (AI), consumer electronics, healthcare, automotive, industrial, and aerospace. Moore Law has driven the semiconductor IC industry toward smaller dimensions, improved performance, and greater energy efficiency.… ▽ More Reliability has become an increasing concern in modern computing. Integrated circuits (ICs) are the backbone of modern computing devices across industries, including artificial intelligence (AI), consumer electronics, healthcare, automotive, industrial, and aerospace. Moore Law has driven the semiconductor IC industry toward smaller dimensions, improved performance, and greater energy efficiency. However, as transistors shrink to atomic scales, aging-related degradation mechanisms such as Bias Temperature Instability (BTI), Hot Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), Electromigration (EM), and stochastic aging-induced variations have become major reliability threats. From an application perspective, applications like AI training and autonomous driving require continuous and sustainable operation to minimize recovery costs and enhance safety. Additionally, the high cost of chip replacement and reproduction underscores the need for extended lifespans. These factors highlight the urgency of designing more reliable ICs. This survey addresses the critical aging issues in ICs, focusing on fundamental degradation mechanisms and mitigation strategies. It provides a comprehensive overview of aging impact and the methods to counter it, starting with the root causes of aging and summarizing key monitoring techniques at both circuit and system levels. A detailed analysis of circuit-level mitigation strategies highlights the distinct aging characteristics of digital, analog, and SRAM circuits, emphasizing the need for tailored solutions. The survey also explores emerging software approaches in design automation, aging characterization, and mitigation, which are transforming traditional reliability optimization. Finally, it outlines the challenges and future directions for improving aging management and ensuring the long-term reliability of ICs across diverse applications. △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: This work is under review by ACM

arXiv:2503.10287 [pdf, other]

MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment

Authors: Hao Zhou, Xiaobao Guo, Yuzhe Zhu, Adams Wai-Kin Kong

Abstract: Propelled by the breakthrough in deep generative models, audio-to-image generation has emerged as a pivotal cross-model task that converts complex auditory signals into rich visual representations. However, previous works only focus on single-source audio inputs for image generation, ignoring the multi-source characteristic in natural auditory scenes, thus limiting the performance in generating co… ▽ More Propelled by the breakthrough in deep generative models, audio-to-image generation has emerged as a pivotal cross-model task that converts complex auditory signals into rich visual representations. However, previous works only focus on single-source audio inputs for image generation, ignoring the multi-source characteristic in natural auditory scenes, thus limiting the performance in generating comprehensive visual content. To bridge this gap, a method called MACS is proposed to conduct multi-source audio-to-image generation. This is the first work that explicitly separates multi-source audio to capture the rich audio components before image generation. MACS is a two-stage method. In the first stage, multi-source audio inputs are separated by a weakly supervised method, where the audio and text labels are semantically aligned by casting into a common space using the large pre-trained CLAP model. We introduce a ranking loss to consider the contextual significance of the separated audio signals. In the second stage, efficient image generation is achieved by mapping the separated audio signals to the generation condition using only a trainable adapter and a MLP layer. We preprocess the LLP dataset as the first full multi-source audio-to-image generation benchmark. The experiments are conducted on multi-source, mixed-source, and single-source audio-to-image generation tasks. The proposed MACS outperforms the current state-of-the-art methods in 17 of the 21 evaluation indexes on all tasks and delivers superior visual quality. The code will be publicly available. △ Less

Submitted 13 March, 2025; originally announced March 2025.

arXiv:2503.03971 [pdf, other]

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

Authors: Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang, Michael Tänzer, Qirong Li, Xinran Chen, Jiahao Huang, Yinzhe Wu, Kian Anvari Hamedani, Yuntong Lyu, Longyu Sun, Qing Li, Ziqiang Xu, Bingyu Xin, Dimitris N. Metaxas, Narges Razizadeh, Shahabedin Nabavi, George Yiasemis , et al. (34 additional authors not shown)

Abstract: Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconst… ▽ More Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconstruction methods have been proven to perform well in image reconstruction tasks, but most of them are designed for specific acquisition modality or dedicated imaging parameter, which limits their ability to generalize across a variety of scan scenarios. To address this issue, the CMRxRecon2024 challenge consists of two specific tasks: Task 1 focuses on a modality-universal setting, evaluating the out-of-distribution generalization of existing learning-based models, while Task 2 follows a k-space sampling-universal setting, assessing the all-in-one adaptability of universal models. Main contributions of this challenge include providing the largest publicly available multi-modality, multi-view cardiac k-space dataset; and developing an open benchmarking platform for algorithm evaluation and shared code library for data processing. In addition, through a detailed analysis of the results submitted to the challenge, we have also made several findings, including: 1) adaptive prompt-learning embedding is an effective means for achieving strong generalization in reconstruction models; 2) enhanced data consistency based on physics-informed networks is also an effective pathway toward a universal model; 3) traditional evaluation metrics have limitations when assessing ground-truth references with moderate or lower image quality, highlighting the need for subjective evaluation methods. This challenge attracted 200 participants from 18 countries, aimed at promoting their translation into clinical practice. △ Less

Submitted 13 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

arXiv:2503.02410 [pdf, ps, other]

Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D

Authors: Jiesi Hu, Chenfei Ye, Yanwu Yang, Xutao Guo, Yang Shang, Pengcheng Shi, Hanyang Peng, Ting Ma

Abstract: In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due t… ▽ More In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due to the high memory demands of ICL. In this regard, we introduce Neuroverse3D, an ICL model capable of performing multiple neuroimaging tasks in 3D (e.g., segmentation, denoising, inpainting). Neuroverse3D overcomes the large memory consumption associated with 3D inputs through adaptive parallel-sequential context processing and a U-shaped fusion strategy, allowing it to handle an unlimited number of context images. Additionally, we propose an optimized loss function to balance multi-task training and enhance focus on anatomical boundaries. Our study incorporates 43,674 3D multi-modal scans from 19 neuroimaging datasets and evaluates Neuroverse3D on 14 diverse tasks using held-out test sets. The results demonstrate that Neuroverse3D significantly outperforms existing ICL models and closely matches task-specific models, enabling flexible adaptation to medical center variations without retraining. The code and model weights are publicly available at https://github.com/jiesihu/Neuroverse3D. △ Less

Submitted 4 July, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

arXiv:2501.17884 [pdf]

Ranging Performance Analysis in Automotive DToF Lidars

Authors: Xiao Guo

Abstract: In recent years, achieving full autonomy in driving has emerged as a paramount objective for both the industry and academia. Among various perception technologies, Lidar (Light detection and ranging) stands out for its high-precision and high-resolution capabilities based on the principle of light propagation and coupling ranging module and imaging module. Lidar is a sophisticated system that inte… ▽ More In recent years, achieving full autonomy in driving has emerged as a paramount objective for both the industry and academia. Among various perception technologies, Lidar (Light detection and ranging) stands out for its high-precision and high-resolution capabilities based on the principle of light propagation and coupling ranging module and imaging module. Lidar is a sophisticated system that integrates multiple technologies such as optics, mechanics, circuits, and algorithms. Therefore, there are various feasible Lidar schemes to meet the needs of autonomous driving in different scenarios. The ranging performance of Lidar is a key factor that determines the overall performance of autonomous driving systems. As such, it is necessary to conduct a systematic analysis of the ranging performance of different Lidar schemes. In this paper, we present the ranging performance analysis methods corresponding to different optical designs, device selec-tions and measurement mechanisms. By using these methods, we compare the ranging perfor-mance of several typical commercial Lidars. Our findings provide a reference framework for de-signing Lidars with various trade-offs between cost and performance, and offer insights into the advancement towards improving Lidar schemes. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.14970 [pdf, other]

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

Authors: Guangjin Pan, Yuan Gao, Yilin Gao, Zhiyong Zhong, Xiaoyu Yang, Xinyu Guo, Shugong Xu

Abstract: Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Par… ▽ More Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Partnership Project (3GPP) standards, AI/machine learning (ML)-based positioning is becoming a key technology to overcome the limitations of traditional methods. This paper begins with an introduction to the fundamentals of AI and wireless positioning, covering AI models, algorithms, positioning applications, emerging wireless technologies, and the basics of positioning techniques. Subsequently, focusing on standardization progress, we provide a comprehensive review of the evolution of 3GPP positioning standards, with an emphasis on the integration of AI/ML technologies in recent and upcoming releases. Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research. we focus on state-of-the-art (SOTA) research in AI-based line-of-sight (LOS)/non-line-of-sight (NLOS) detection, time of arrival (TOA)/time difference of arrival (TDOA) estimation, and angle estimation techniques. For Direct AI/ML Positioning, we explore SOTA advancements in fingerprint-based positioning, knowledge-assisted AI positioning, and channel charting-based positioning. Furthermore, we introduce publicly available datasets for wireless positioning and conclude by summarizing the challenges and opportunities of AI-driven wireless positioning. △ Less

Submitted 24 January, 2025; originally announced January 2025.

Comments: 32 pages. This work has been submitted to the IEEE for possible publication

arXiv:2501.11842 [pdf, other]

Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications

Authors: Yuanbin Chen, Xufeng Guo, Chau Yuen, Yufei Zhao, Yong Liang Guan, Chong Meng Samson See, Merouane Débbah, Lajos Hanzo

Abstract: The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver.… ▽ More The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver. The appropriate wireless model is developed for each configuration, elaborating on the receiver's responses to the radio frequency (RF) signal, on the potential noise sources, and on the system performance. Next, we investigate the association distortion effects that might occur, specifically demonstrating the boundaries of linear dynamic regions, which provides critical insights into its practical implementations in wireless systems. Extensive simulation results are provided for characterizing the performance of wireless systems, harnessing this pair of Rydberg atomic receivers. Our results demonstrate that they deliver complementary benefits: LO-free systems excel in proximity operations, while LO-dressed systems are eminently suitable for long-distance sensing at extremely low power levels. More specifically, LO-dressed systems achieve a significant signal-to-noise ratio (SNR) gain of approximately 44 dB over conventional RF receivers, exhibiting an effective coverage range extension over conventional RF receivers by a factor of 150. Furthermore, LO-dressed systems support higher-order quadrature amplitude modulation (QAM) at reduced symbol error rates (SER) compared to conventional RF receivers, hence significantly enhancing wireless communication performance. △ Less

Submitted 20 January, 2025; originally announced January 2025.

Comments: This manuscript has been submitted to IEEE journal, with 13 pages of body and 2 pages of supplementary material

arXiv:2412.18459 [pdf, other]

Underwater Image Restoration via Polymorphic Large Kernel CNNs

Authors: Xiaojiao Guo, Yihang Dong, Xuhang Chen, Weiwen Chen, Zimeng Li, FuChen Zheng, Chi-Man Pun

Abstract: Underwater Image Restoration (UIR) remains a challenging task in computer vision due to the complex degradation of images in underwater environments. While recent approaches have leveraged various deep learning techniques, including Transformers and complex, parameter-heavy models to achieve significant improvements in restoration effects, we demonstrate that pure CNN architectures with lightweigh… ▽ More Underwater Image Restoration (UIR) remains a challenging task in computer vision due to the complex degradation of images in underwater environments. While recent approaches have leveraged various deep learning techniques, including Transformers and complex, parameter-heavy models to achieve significant improvements in restoration effects, we demonstrate that pure CNN architectures with lightweight parameters can achieve comparable results. In this paper, we introduce UIR-PolyKernel, a novel method for underwater image restoration that leverages Polymorphic Large Kernel CNNs. Our approach uniquely combines large kernel convolutions of diverse sizes and shapes to effectively capture long-range dependencies within underwater imagery. Additionally, we introduce a Hybrid Domain Attention module that integrates frequency and spatial domain attention mechanisms to enhance feature importance. By leveraging the frequency domain, we can capture hidden features that may not be perceptible to humans but are crucial for identifying patterns in both underwater and on-air images. This approach enhances the generalization and robustness of our UIR model. Extensive experiments on benchmark datasets demonstrate that UIR-PolyKernel achieves state-of-the-art performance in underwater image restoration tasks, both quantitatively and qualitatively. Our results show that well-designed pure CNN architectures can effectively compete with more complex models, offering a balance between performance and computational efficiency. This work provides new insights into the potential of CNN-based approaches for challenging image restoration tasks in underwater environments. The code is available at \href{https://github.com/CXH-Research/UIR-PolyKernel}{https://github.com/CXH-Research/UIR-PolyKernel}. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Comments: Accepted by ICASSP2025

arXiv:2412.16573 [pdf, other]

A Generalizable 3D Diffusion Framework for Low-Dose and Few-View Cardiac SPECT

Authors: Huidong Xie, Weijie Gan, Wei Ji, Xiongchao Chen, Alaa Alashi, Stephanie L. Thorn, Bo Zhou, Qiong Liu, Menghua Xia, Xueqi Guo, Yi-Hwa Liu, Hongyu An, Ulugbek S. Kamilov, Ge Wang, Albert J. Sinusas, Chi Liu

Abstract: Myocardial perfusion imaging using SPECT is widely utilized to diagnose coronary artery diseases, but image quality can be negatively affected in low-dose and few-view acquisition settings. Although various deep learning methods have been introduced to improve image quality from low-dose or few-view SPECT data, previous approaches often fail to generalize across different acquisition settings, lim… ▽ More Myocardial perfusion imaging using SPECT is widely utilized to diagnose coronary artery diseases, but image quality can be negatively affected in low-dose and few-view acquisition settings. Although various deep learning methods have been introduced to improve image quality from low-dose or few-view SPECT data, previous approaches often fail to generalize across different acquisition settings, limiting their applicability in reality. This work introduced DiffSPECT-3D, a diffusion framework for 3D cardiac SPECT imaging that effectively adapts to different acquisition settings without requiring further network re-training or fine-tuning. Using both image and projection data, a consistency strategy is proposed to ensure that diffusion sampling at each step aligns with the low-dose/few-view projection measurements, the image data, and the scanner geometry, thus enabling generalization to different low-dose/few-view settings. Incorporating anatomical spatial information from CT and total variation constraint, we proposed a 2.5D conditional strategy to allow the DiffSPECT-3D to observe 3D contextual information from the entire image volume, addressing the 3D memory issues in diffusion model. We extensively evaluated the proposed method on 1,325 clinical 99mTc tetrofosmin stress/rest studies from 795 patients. Each study was reconstructed into 5 different low-count and 5 different few-view levels for model evaluations, ranging from 1% to 50% and from 1 view to 9 view, respectively. Validated against cardiac catheterization results and diagnostic comments from nuclear cardiologists, the presented results show the potential to achieve low-dose and few-view SPECT imaging without compromising clinical performance. Additionally, DiffSPECT-3D could be directly applied to full-dose SPECT images to further improve image quality, especially in a low-dose stress-first cardiac SPECT imaging protocol. △ Less

Submitted 21 December, 2024; originally announced December 2024.

Comments: 13 pages, 6 figures, 2 tables. Paper under review. Oral presentation at IEEE MIC 2024

arXiv:2412.08671 [pdf, other]

A Deep Semantic Segmentation Network with Semantic and Contextual Refinements

Authors: Zhiyan Wang, Deyin Liu, Lin Yuanbo Wu, Song Wang, Xin Guo, Lin Qi

Abstract: Semantic segmentation is a fundamental task in multimedia processing, which can be used for analyzing, understanding, editing contents of images and videos, among others. To accelerate the analysis of multimedia data, existing segmentation researches tend to extract semantic information by progressively reducing the spatial resolutions of feature maps. However, this approach introduces a misalignm… ▽ More Semantic segmentation is a fundamental task in multimedia processing, which can be used for analyzing, understanding, editing contents of images and videos, among others. To accelerate the analysis of multimedia data, existing segmentation researches tend to extract semantic information by progressively reducing the spatial resolutions of feature maps. However, this approach introduces a misalignment problem when restoring the resolution of high-level feature maps. In this paper, we design a Semantic Refinement Module (SRM) to address this issue within the segmentation network. Specifically, SRM is designed to learn a transformation offset for each pixel in the upsampled feature maps, guided by high-resolution feature maps and neighboring offsets. By applying these offsets to the upsampled feature maps, SRM enhances the semantic representation of the segmentation network, particularly for pixels around object boundaries. Furthermore, a Contextual Refinement Module (CRM) is presented to capture global context information across both spatial and channel dimensions. To balance dimensions between channel and space, we aggregate the semantic maps from all four stages of the backbone to enrich channel context information. The efficacy of these proposed modules is validated on three widely used datasets-Cityscapes, Bdd100K, and ADE20K-demonstrating superior performance compared to state-of-the-art methods. Additionally, this paper extends these modules to a lightweight segmentation network, achieving an mIoU of 82.5% on the Cityscapes validation set with only 137.9 GFLOPs. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: Accept by tmm

arXiv:2412.08670 [pdf, other]

A feature refinement module for light-weight semantic segmentation network

Authors: Zhiyan Wang, Xin Guo, Song Wang, Peixiao Zheng, Lin Qi

Abstract: Low computational complexity and high segmentation accuracy are both essential to the real-world semantic segmentation tasks. However, to speed up the model inference, most existing approaches tend to design light-weight networks with a very limited number of parameters, leading to a considerable degradation in accuracy due to the decrease of the representation ability of the networks. To solve th… ▽ More Low computational complexity and high segmentation accuracy are both essential to the real-world semantic segmentation tasks. However, to speed up the model inference, most existing approaches tend to design light-weight networks with a very limited number of parameters, leading to a considerable degradation in accuracy due to the decrease of the representation ability of the networks. To solve the problem, this paper proposes a novel semantic segmentation method to improve the capacity of obtaining semantic information for the light-weight network. Specifically, a feature refinement module (FRM) is proposed to extract semantics from multi-stage feature maps generated by the backbone and capture non-local contextual information by utilizing a transformer block. On Cityscapes and Bdd100K datasets, the experimental results demonstrate that the proposed method achieves a promising trade-off between accuracy and computational cost, especially for Cityscapes test set where 80.4% mIoU is achieved and only 214.82 GFLOPs are required. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: Accept by icip 2023

arXiv:2412.04877 [pdf, other]

Fluid Antenna Index Modulation for MIMO Systems: Robust Transmission and Low-Complexity Detection

Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Hanjiang Hong, Kai-Kit Wong, Wenjun Zhang, Yiyan Wu

Abstract: The fluid antenna (FA) index modulation (IM)-enabled multiple-input multiple-output (MIMO) system, referred to as FA-IM, significantly enhances spectral efficiency (SE) compared to the conventional FA-assisted MIMO system. To improve robustness against the high spatial correlation among multiple activated ports of the fluid antenna, this paper proposes an innovative FA grouping-based IM (FAG-IM) s… ▽ More The fluid antenna (FA) index modulation (IM)-enabled multiple-input multiple-output (MIMO) system, referred to as FA-IM, significantly enhances spectral efficiency (SE) compared to the conventional FA-assisted MIMO system. To improve robustness against the high spatial correlation among multiple activated ports of the fluid antenna, this paper proposes an innovative FA grouping-based IM (FAG-IM) system. A block grouping scheme is employed based on the spatial correlation model and the distribution structure of the ports. Then, a closed-form expression for the average bit error probability (ABEP) upper bound of the FAG-IM system is derived. To reduce the complexity of the receiver, the message passing architecture is incorporated into the FAG-IM system. Building on this, an efficient approximate message passing (AMP) detector, named structured AMP (S-AMP) detector, is proposed by exploiting the structural characteristics of the transmitted signals. Simulation results confirm that the proposed FAG-IM system significantly outperforms the existing FA-IM system in the presence of spatial correlation, achieving more robust transmission. Furthermore, it is demonstrated that the proposed low-complexity S-AMP detector not only reduces time complexity to a linear scale but also substantially improves bit error rate (BER) performance compared to the minimum mean square error (MMSE) detector, thereby enhancing the practical feasibility of the FAG-IM system. △ Less

Submitted 30 December, 2024; v1 submitted 6 December, 2024; originally announced December 2024.

Comments: Submitted to an IEEE journal

arXiv:2411.08509 [pdf, other]

Sum Rate Maximization for Movable Antenna-Aided Downlink RSMA Systems

Authors: Cixiao Zhang, Size Peng, Yin Xu, Qingqing Wu, Xiaowu Ou, Xinghao Guo, Dazhi He, Wenjun Zhang

Abstract: Rate splitting multiple access (RSMA) is regarded as a crucial and powerful physical layer (PHY) paradigm for next-generation communication systems. Particularly, users employ successive interference cancellation (SIC) to decode part of the interference while treating the remainder as noise. However, conventional RSMA systems rely on fixed-position antenna arrays, limiting their ability to fully e… ▽ More Rate splitting multiple access (RSMA) is regarded as a crucial and powerful physical layer (PHY) paradigm for next-generation communication systems. Particularly, users employ successive interference cancellation (SIC) to decode part of the interference while treating the remainder as noise. However, conventional RSMA systems rely on fixed-position antenna arrays, limiting their ability to fully exploit spatial diversity. This constraint reduces beamforming gain and significantly impairs RSMA performance. To address this problem, we propose a movable antenna (MA)-aided RSMA scheme that allows the antennas at the base station (BS) to dynamically adjust their positions. Our objective is to maximize the system sum rate of common and private messages by jointly optimizing the MA positions, beamforming matrix, and common rate allocation. To tackle the formulated non-convex problem, we apply fractional programming (FP) and develop an efficient two-stage, coarse-to-fine-grained searching (CFGS) algorithm to obtain high-quality solutions. Numerical results demonstrate that, with optimized antenna adjustments, the MA-enabled system achieves substantial performance and reliability improvements in RSMA over fixed-position antenna setups. △ Less

Submitted 14 November, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

arXiv:2410.19811 [pdf, other]

ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise

Authors: Xingang Guo, Darioush Keivan, Usman Syed, Lianhui Qin, Huan Zhang, Geir Dullerud, Peter Seiler, Bin Hu

Abstract: Control system design is a crucial aspect of modern engineering with far-reaching applications across diverse sectors including aerospace, automotive systems, power grids, and robotics. Despite advances made by Large Language Models (LLMs) in various domains, their application in control system design remains limited due to the complexity and specificity of control theory. To bridge this gap, we i… ▽ More Control system design is a crucial aspect of modern engineering with far-reaching applications across diverse sectors including aerospace, automotive systems, power grids, and robotics. Despite advances made by Large Language Models (LLMs) in various domains, their application in control system design remains limited due to the complexity and specificity of control theory. To bridge this gap, we introduce ControlAgent, a new paradigm that automates control system design via novel integration of LLM agents and control-oriented domain expertise. ControlAgent encodes expert control knowledge and emulates human iterative design processes by gradually tuning controller parameters to meet user-specified requirements for stability, performance, and robustness. ControlAgent integrates multiple collaborative LLM agents, including a central agent responsible for task distribution and task-specific agents dedicated to detailed controller design for various types of systems and requirements. ControlAgent also employs a Python computation agent that performs complex calculations and controller evaluations based on standard design information provided by task-specified LLM agents. Combined with a history and feedback module, the task-specific LLM agents iteratively refine controller parameters based on real-time feedback from prior designs. Overall, ControlAgent mimics the design processes used by (human) practicing engineers, but removes all the human efforts and can be run in a fully automated way to give end-to-end solutions for control system design with user-specified requirements. To validate ControlAgent's effectiveness, we develop ControlEval, an evaluation dataset that comprises 500 control tasks with various specific design goals. The effectiveness of ControlAgent is demonstrated via extensive comparative evaluations between LLM-based and traditional human-involved toolbox-based baselines. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2409.19217 [pdf]

Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter

Authors: Wei Wang, Chenyang Li, Zhaoxi Chen, Wenyu Zhang, Zetao Wang, Xi Guo, Jian Guan, Gang Li

Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a sleep-related breathing disorder associated with significant morbidity and mortality worldwide. The gold standard for OSAHS diagnosis, polysomnography (PSG), faces challenges in popularization due to its high cost and complexity. Recently, radar has shown potential in detecting sleep apnea-hypopnea events (SAE) with the advantages of low cost… ▽ More Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a sleep-related breathing disorder associated with significant morbidity and mortality worldwide. The gold standard for OSAHS diagnosis, polysomnography (PSG), faces challenges in popularization due to its high cost and complexity. Recently, radar has shown potential in detecting sleep apnea-hypopnea events (SAE) with the advantages of low cost and non-contact monitoring. However, existing studies, especially those using deep learning, employ segment-based classification approach for SAE detection, making the task of event quantity estimation difficult. Additionally, radar-based SAE detection is susceptible to interference from body movements and the environment. Oxygen saturation (SpO2) can offer valuable information about OSAHS, but it also has certain limitations and cannot be used alone for diagnosis. In this study, we propose a method using millimeterwave radar and pulse oximeter to detect SAE, called ROSA. It fuses information from both sensors, and directly predicts the temporal localization of SAE. Experimental results demonstrate a high degree of consistency (ICC=0.9864) between AHI from ROSA and PSG. This study presents an effective method with lowload device for the diagnosis of OSAHS. △ Less

Submitted 25 April, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.15816 [pdf, other]

Diffusion Models for Intelligent Transportation Systems: A Survey

Authors: Mingxing Peng, Kehua Chen, Xusen Guo, Qiming Zhang, Hui Zhong, Meixin Zhu, Hai Yang

Abstract: Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we… ▽ More Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we introduce the theoretical foundations of diffusion models and their key variants, including conditional diffusion models and latent diffusion models, highlighting their suitability for modeling complex, multi-modal traffic data and enabling controllable generation. Second, we outline the primary challenges in ITS and the corresponding advantages of diffusion models, providing readers with a deeper understanding of the intersection between ITS and diffusion models. Third, we offer a multi-perspective investigation of current applications of diffusion models in ITS domains, including autonomous driving, traffic simulation, trajectory prediction, and traffic safety. Finally, we discuss state-of-the-art diffusion model techniques and highlight key ITS research directions that warrant further investigation. Through this structured overview, we aim to provide researchers with a comprehensive understanding of diffusion models for ITS, thereby advancing their future applications in the transportation domain. △ Less

Submitted 8 May, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

Comments: 7 figures

arXiv:2409.13440 [pdf, other]

Differentially Private Multimodal Laplacian Dropout (DP-MLD) for EEG Representative Learning

Authors: Xiaowen Fu, Bingxin Wang, Xinzhou Guo, Guoqing Liu, Yang Xiang

Abstract: Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have be… ▽ More Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have been proposed under DP, it has not been extensively studied for multimodal EEG data due to the complexities of models and signal data considered there. In this paper, we propose a novel Differentially Private Multimodal Laplacian Dropout (DP-MLD) scheme for multimodal EEG learning. Our approach proposes a novel multimodal representative learning model that processes EEG data by language models as text and other modal data by vision transformers as images, incorporating well-designed cross-attention mechanisms to effectively extract and integrate cross-modal features. To achieve DP, we design a novel adaptive feature-level Laplacian dropout scheme, where randomness allocation and performance are dynamically optimized within given privacy budgets. In the experiment on an open-source multimodal dataset of Freezing of Gait (FoG) in Parkinson's Disease (PD), our proposed method demonstrates an approximate 4\% improvement in classification accuracy, and achieves state-of-the-art performance in multimodal EEG learning under DP. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.13283 [pdf, other]

MIMO Precoding Exploiting Extra Degrees of Freedom (DoF) in the Wavenumber Domain

Authors: Yuanbin Chen, Xufeng Guo, Tianqi Mao, Qingqing Wu, Zhaocheng Wang, Chau Yuen

Abstract: In this paper, we propose an emerging wavenumber-domain precoding scheme to break the limitations of rank-1 channels that merely supports single-stream transmission, enabling simultaneous transmission of multiple data streams. The proposed wavenumber-domain precoding scheme also breaks the Rayleigh distance demarcation, regardless of the far-field and near-field contexts. Specifically, by characte… ▽ More In this paper, we propose an emerging wavenumber-domain precoding scheme to break the limitations of rank-1 channels that merely supports single-stream transmission, enabling simultaneous transmission of multiple data streams. The proposed wavenumber-domain precoding scheme also breaks the Rayleigh distance demarcation, regardless of the far-field and near-field contexts. Specifically, by characterizing the channel response as the superposition of a series of Fourier harmonics specified by different wavenumbers, the degree of freedom (DoF) is dependent on the cardinality of the wavenumber support, based on which the extra DoF is presented. This representation is applicable for both far-field and near-field. Different wavenumber atoms, determined within this support, constitute the codebook for MIMO precoding, in which each atom allows for the transmission of a data stream. Then, to maximize the capacity, it is required to select the wavenumbers associated with the optimal transmission direction, and optimize its power allocation. Finally, our simulation results demonstrate the significant superiority in comparison to the conventional spatial division schemes, with the potential of approaching the theoretical performance upper bound achieved by singular value decomposition (SVD). △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: This paper has been accepted in 2024 IEEE Globecom Workshop

arXiv:2409.11543 [pdf, other]

Noise-aware Dynamic Image Denoising and Positron Range Correction for Rubidium-82 Cardiac PET Imaging via Self-supervision

Authors: Huidong Xie, Liang Guo, Alexandre Velo, Zhao Liu, Qiong Liu, Xueqi Guo, Bo Zhou, Xiongchao Chen, Yu-Jung Tsai, Tianshun Miao, Menghua Xia, Yi-Hwa Liu, Ian S. Armstrong, Ge Wang, Richard E. Carson, Albert J. Sinusas, Chi Liu

Abstract: Rb-82 is a radioactive isotope widely used for cardiac PET imaging. Despite numerous benefits of 82-Rb, there are several factors that limits its image quality and quantitative accuracy. First, the short half-life of 82-Rb results in noisy dynamic frames. Low signal-to-noise ratio would result in inaccurate and biased image quantification. Noisy dynamic frames also lead to highly noisy parametric… ▽ More Rb-82 is a radioactive isotope widely used for cardiac PET imaging. Despite numerous benefits of 82-Rb, there are several factors that limits its image quality and quantitative accuracy. First, the short half-life of 82-Rb results in noisy dynamic frames. Low signal-to-noise ratio would result in inaccurate and biased image quantification. Noisy dynamic frames also lead to highly noisy parametric images. The noise levels also vary substantially in different dynamic frames due to radiotracer decay and short half-life. Existing denoising methods are not applicable for this task due to the lack of paired training inputs/labels and inability to generalize across varying noise levels. Second, 82-Rb emits high-energy positrons. Compared with other tracers such as 18-F, 82-Rb travels a longer distance before annihilation, which negatively affect image spatial resolution. Here, the goal of this study is to propose a self-supervised method for simultaneous (1) noise-aware dynamic image denoising and (2) positron range correction for 82-Rb cardiac PET imaging. Tested on a series of PET scans from a cohort of normal volunteers, the proposed method produced images with superior visual quality. To demonstrate the improvement in image quantification, we compared image-derived input functions (IDIFs) with arterial input functions (AIFs) from continuous arterial blood samples. The IDIF derived from the proposed method led to lower AUC differences, decreasing from 11.09% to 7.58% on average, compared to the original dynamic frames. The proposed method also improved the quantification of myocardium blood flow (MBF), as validated against 15-O-water scans, with mean MBF differences decreased from 0.43 to 0.09, compared to the original dynamic frames. We also conducted a generalizability experiment on 37 patient scans obtained from a different country using a different scanner. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: 15 Pages, 10 Figures, 5 tables. Paper Under review. Oral Presentation at IEEE MIC 2023

arXiv:2409.10958 [pdf, other]

Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending

Authors: Yongyang Pan, Xiaohong Liu, Siqi Luo, Yi Xin, Xiao Guo, Xiaoming Liu, Xiongkuo Min, Guangtao Zhai

Abstract: Rapid advancements in multimodal large language models have enabled the creation of hyper-realistic images from textual descriptions. However, these advancements also raise significant concerns about unauthorized use, which hinders their broader distribution. Traditional watermarking methods often require complex integration or degrade image quality. To address these challenges, we introduce a nov… ▽ More Rapid advancements in multimodal large language models have enabled the creation of hyper-realistic images from textual descriptions. However, these advancements also raise significant concerns about unauthorized use, which hinders their broader distribution. Traditional watermarking methods often require complex integration or degrade image quality. To address these challenges, we introduce a novel framework Towards Effective user Attribution for latent diffusion models via Watermark-Informed Blending (TEAWIB). TEAWIB incorporates a unique ready-to-use configuration approach that allows seamless integration of user-specific watermarks into generative models. This approach ensures that each user can directly apply a pre-configured set of parameters to the model without altering the original model parameters or compromising image quality. Additionally, noise and augmentation operations are embedded at the pixel level to further secure and stabilize watermarked images. Extensive experiments validate the effectiveness of TEAWIB, showcasing the state-of-the-art performance in perceptual quality and attribution accuracy. △ Less

Submitted 15 December, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

Comments: 9 pages, 7 figures

arXiv:2409.10123 [pdf, other]

Wavenumber-Domain Near-Field Channel Estimation: Beyond the Fresnel Bound

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Zhaocheng Wang, Chau Yuen

Abstract: In the near-field context, the Fresnel approximation is typically employed to mathematically represent solvable functions of spherical waves. However, these efforts may fail to take into account the significant increase in the lower limit of the Fresnel approximation, known as the Fresnel distance. The lower bound of the Fresnel approximation imposes a constraint that becomes more pronounced as th… ▽ More In the near-field context, the Fresnel approximation is typically employed to mathematically represent solvable functions of spherical waves. However, these efforts may fail to take into account the significant increase in the lower limit of the Fresnel approximation, known as the Fresnel distance. The lower bound of the Fresnel approximation imposes a constraint that becomes more pronounced as the array size grows. Beyond this constraint, the validity of the Fresnel approximation is broken. As a potential solution, the wavenumber-domain paradigm characterizes the spherical wave using a spectrum composed of a series of linear orthogonal bases. However, this approach falls short of covering the effects of the array geometry, especially when using Gaussian-mixed-model (GMM)-based von Mises-Fisher distributions to approximate all spectra. To fill this gap, this paper introduces a novel wavenumber-domain ellipse fitting (WDEF) method to tackle these challenges. Particularly, the channel is accurately estimated in the near-field region, by maximizing the closed-form likelihood function of the wavenumber-domain spectrum conditioned on the scatterers' geometric parameters. Simulation results are provided to demonstrate the robustness of the proposed scheme against both the distance and angles of arrival. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by IEEE Globecom 2024

arXiv:2408.09931 [pdf, other]

Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation

Authors: Qianhui Men, Xiaoqing Guo, Aris T. Papageorghiou, J. Alison Noble

Abstract: 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the pr… ▽ More 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the proposed Pose-GuideNet, a novel 2D/3D registration approach to align freehand 2D ultrasound to a 3D anatomical atlas without the acquisition of 3D ultrasound. To facilitate the 2D to 3D cross-dimensional projection, we exploit the prior knowledge in the atlas to align the standard plane frame in a freehand scan. A semantic-aware contrastive-based approach is further proposed to align the frames that are off standard planes based on their anatomical similarity. In the experiment, we enhance the existing assessment of freehand image localization by comparing the transformation of its estimated pose towards standard plane with the corresponding probe motion, which reflects the actual view change in 3D anatomy. Extensive results on two clinical head biometry tasks show that Pose-GuideNet not only accurately predicts pose but also successfully predicts the direction of the fetal head. Evaluations with probe motions further demonstrate the feasibility of adopting Pose-GuideNet for freehand ultrasound-assisted navigation in a sensor-free environment. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: Accepted by MICCAI2024

arXiv:2408.05358 [pdf, other]

GesturePrint: Enabling User Identification for mmWave-based Gesture Recognition Systems

Authors: Lilin Xu, Keyi Wang, Chaojie Gu, Xiuzhen Guo, Shibo He, Jiming Chen

Abstract: The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave… ▽ More The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave radar sensor. GesturePrint features an effective pipeline that enables the gesture recognition system to identify users at a minor additional cost. By introducing an efficient signal preprocessing stage and a network architecture GesIDNet, which employs an attention-based multilevel feature fusion mechanism, GesturePrint effectively extracts unique gesture features for gesture recognition and personalized motion pattern features for user identification. We implement GesturePrint and collect data from 17 participants performing 15 gestures in a meeting room and an office, respectively. GesturePrint achieves a gesture recognition accuracy (GRA) of 98.87% with a user identification accuracy (UIA) of 99.78% in the meeting room, and 98.22% GRA with 99.26% UIA in the office. Extensive experiments on three public datasets and a new gesture dataset show GesturePrint's superior performance in enabling effective user identification for gesture recognition systems. △ Less

Submitted 25 July, 2024; originally announced August 2024.

Comments: Accepted to the 44th IEEE International Conference on Distributed Computing Systems (ICDCS 2024)

arXiv:2408.05117 [pdf, other]

doi 10.1016/j.media.2025.103513

Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

Authors: Shouyue Liu, Ziyi Zhang, Yuanyuan Gu, Jinkui Hao, Yonghuai Liu, Huazhu Fu, Xinyu Guo, Hong Song, Shuting Zhang, Yitian Zhao

Abstract: Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic… ▽ More Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryological origins and physiological characteristics of the retina and brain, retinal imaging is emerging as a potentially rapid and cost-effective alternative for the identification of individuals with or at high risk of AD. In this paper, we present a novel PolarNet+ that uses retinal optical coherence tomography angiography (OCTA) to discriminate early-onset AD (EOAD) and MCI subjects from controls. Our method first maps OCTA images from Cartesian coordinates to polar coordinates, allowing approximate sub-region calculation to implement the clinician-friendly early treatment of diabetic retinopathy study (ETDRS) grid analysis. We then introduce a multi-view module to serialize and analyze the images along three dimensions for comprehensive, clinically useful information extraction. Finally, we abstract the sequence embedding into a graph, transforming the detection task into a general graph classification problem. A regional relationship module is applied after the multi-view module to excavate the relationship between the sub-regions. Such regional relationship analyses validate known eye-brain links and reveal new discriminative patterns. △ Less

Submitted 12 March, 2025; v1 submitted 9 August, 2024; originally announced August 2024.

arXiv:2407.14815 [pdf, ps, other]

Unified Far-Field and Near-Field in Holographic MIMO: A Wavenumber-Domain Perspective

Authors: Yuanbin Chen, Xufeng Guo, Gui Zhou, Shi Jin, Derrick Wing Kwan Ng, Zhaocheng Wang

Abstract: This article conceives a unified representation for near-field and far-field holographic multiple-input multiple-output (HMIMO) channels, addressing a practical design dilemma: "Why does the angular-domain representation no longer function effectively?" To answer this question, we pivot from the angular domain to the wavenumber domain and present a succinct overview of its underlying philosophy. I… ▽ More This article conceives a unified representation for near-field and far-field holographic multiple-input multiple-output (HMIMO) channels, addressing a practical design dilemma: "Why does the angular-domain representation no longer function effectively?" To answer this question, we pivot from the angular domain to the wavenumber domain and present a succinct overview of its underlying philosophy. In re-examining the Fourier plane-wave series expansion that recasts spherical propagation waves into a series of plane waves represented by Fourier harmonics, we characterize the HMIMO channel employing these Fourier harmonics having different wavenumbers. This approach, referred to as the wavenumebr-domain representation, facilitates a unified view across the far-field and the near-field. Furthermore, the limitations of the DFT basis are demonstrated when identifying the sparsity inherent to the HMIMO channel, motivating the development of a wavenumber-domain basis as an alternative. We then present some preliminary applications of the proposed wavenumber-domain basis in signal processing across both the far-field and near-field, along with several prospects for future HMIMO system designs based on the wavenumber domain. △ Less

Submitted 20 July, 2024; originally announced July 2024.

Comments: This article has been accepted for publication in IEEE Commag (7 pages, 5 figures)

arXiv:2407.11651 [pdf, other]

Fluid Antenna Grouping Index Modulation Design for MIMO Systems

Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Wenjun Zhang, Yi-yan Wu

Abstract: Index modulation (IM) significantly enhances the spectral efficiency of fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) systems, which is named FA-IM. However, due to the dense distribution of ports on the FA, the wireless channel exhibits a high spatial correlation, leading to severe performance degradation in the existing FA-IM-assisted MIMO systems. To tackle this issue, this… ▽ More Index modulation (IM) significantly enhances the spectral efficiency of fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) systems, which is named FA-IM. However, due to the dense distribution of ports on the FA, the wireless channel exhibits a high spatial correlation, leading to severe performance degradation in the existing FA-IM-assisted MIMO systems. To tackle this issue, this paper proposes a novel fluid antenna grouping index modulation (FA-GIM) scheme to mitigate the high correlation between the activated ports. Specifically, considering the characteristics of the FA two-dimensional (2D) surface structure and the spatially correlated channel model in FA-assisted MIMO systems, a block grouping method is adopted, where adjacent ports are assigned to the same group. Consequently, different groups independently perform port index selection and constellation symbol mapping, with only one port being activated within each group during each transmission interval. Then, a closed-form average bit error probability (ABEP) upper bound for the proposed scheme is derived. Numerical results show that, compared to state-of-the-art schemes, the proposed FA-GIM scheme consistently achieves significant bit error rate (BER) performance gains under various conditions. The proposed scheme is both efficient and robust, enhancing the performance of FA-assisted MIMO systems. △ Less

Submitted 16 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: A longer version with more details will be submitted to an IEEE journal

arXiv:2406.18985 [pdf, other]

Exploiting Structured Sparsity in Near Field: From the Perspective of Decomposition

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Chau Yuen

Abstract: The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regim… ▽ More The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regime remains an open issue. To answer this question, this article delves into structured sparsity in the near-field realm, examining its peculiarities and challenges. In particular, we present the key features of near-field structured sparsity in contrast to the far-field counterpart, drawing from both physical and mathematical perspectives. Upon unmasking the theoretical bottlenecks, we resort to bypassing them by decoupling the geometric parameters of the scatterers, termed the triple parametric decomposition (TPD) framework. It is demonstrated that our novel TPD framework can achieve robust recovery of near-field sparse channels by applying the potential structured sparsity and avoiding the curse of complexity and overhead. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: This aricle has been accepted for publication in IEEE Commag

arXiv:2406.14064 [pdf, ps, other]

PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiplexing

Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Yao Ge, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique for high-mobility communications based on discrete affine Fourier transform (DAFT). By properly tuning two parameters in the DAFT module, the effective channel in the DAFT domain can completely circumvent path overlap, thereby constituting a full representation of delay-Doppler profile. However, AFDM has a cruc… ▽ More Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique for high-mobility communications based on discrete affine Fourier transform (DAFT). By properly tuning two parameters in the DAFT module, the effective channel in the DAFT domain can completely circumvent path overlap, thereby constituting a full representation of delay-Doppler profile. However, AFDM has a crucial problem of high peak-to-average power ratio (PAPR), stemming from randomness of modulated symbols. To reduce the PAPR of AFDM, a novel algorithm named grouped pre-chirp selection (GPS) is proposed in this letter. The GPS varying the pre-chirp parameter across subcarrier groups in a non-enumerated manner and then selects the signal with the smallest PAPR among all candidate signals. We detail the operational procedures of the GPS algorithm, analyzing GPS from four aspects: PAPR performance, computational complexity, spectral efficiency, and bit error rate (BER) performance. Simulations indicate the effectiveness of the proposed GPS in reducing PAPR. At the cost of slight communication performance, AFDM with GPS can achieve better PAPR performance than orthogonal time frequency space (OTFS) while maintaining a lower modulation complexity. △ Less

Submitted 20 December, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.08374 [pdf, other]

2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures

arXiv:2406.02422 [pdf, other]

IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

Authors: Ziyun Liang, Xiaoqing Guo, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They inten… ▽ More Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2 △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02164 [pdf, other]

Sparse Recovery for Holographic MIMO Channels: Leveraging the Clustered Sparsity

Authors: Yuqing Guo, Xufeng Guo, Yuanbin Chen, Ying Wang

Abstract: Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension cur… ▽ More Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension curse can be relieved to avoid the proportional increase with the antenna density using the state-of-the-art wavenumber-domain sparse representation, the sparse recovery complexity remains tied to the order of non-zero elements in the sparse channel, which still considerably exceeds the number of scatterers. By modeling the inherent clustered sparsity using a Gaussian mixed model (GMM)-based von Mises-Fisher (vMF) distribution, the to-be-estimated channel characteristics can be compressed to the scatterer level. Upon the sparsity extraction, a novel wavenumber-domain expectation-maximization (WD-EM) algorithm is proposed to implement the cluster-by-cluster variational inference, thus significantly reducing the computational complexity. Simulation results verify the robustness of the proposed scheme across overheads and signal-to-noise ratio (SNR). △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: This manuscript has been submitted to IEEE journal, 5 pages, 3 figures

arXiv:2405.19659 [pdf, other]

CSANet: Channel Spatial Attention Network for Robust 3D Face Alignment and Reconstruction

Authors: Yilin Liu, Xuezhou Guo, Xinqi Wang, Fangzhou Du

Abstract: Our project proposes an end-to-end 3D face alignment and reconstruction network. The backbone of our model is built by Bottle-Neck structure via Depth-wise Separable Convolution. We integrate Coordinate Attention mechanism and Spatial Group-wise Enhancement to extract more representative features. For more stable training process and better convergence, we jointly use Wing loss and the Weighted Pa… ▽ More Our project proposes an end-to-end 3D face alignment and reconstruction network. The backbone of our model is built by Bottle-Neck structure via Depth-wise Separable Convolution. We integrate Coordinate Attention mechanism and Spatial Group-wise Enhancement to extract more representative features. For more stable training process and better convergence, we jointly use Wing loss and the Weighted Parameter Distance Cost to learn parameters for 3D Morphable model and 3D vertices. Our proposed model outperforms all baseline models both quantitatively and qualitatively. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2405.12996 [pdf, ps, other]

Dose-aware Diffusion Model for 3D PET Image Denoising: Multi-institutional Validation with Reader Study and Real Low-dose Data

Authors: Huidong Xie, Weijie Gan, Reimund Bayerlein, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Kuan-Yin Ko, Der-Shiun Wang, Benjamin A. Spencer, Wei Ji, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang , et al. (2 additional authors not shown)

Abstract: Reducing scan times, radiation dose, and enhancing image quality for lower-performance scanners, are critical in low-dose PET imaging. Deep learning techniques have been investigated for PET image denoising. However, existing models have often resulted in compromised image quality when achieving low-count/low-dose PET and have limited generalizability to different image noise-levels, acquisition p… ▽ More Reducing scan times, radiation dose, and enhancing image quality for lower-performance scanners, are critical in low-dose PET imaging. Deep learning techniques have been investigated for PET image denoising. However, existing models have often resulted in compromised image quality when achieving low-count/low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, and patient populations. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we evaluated the proposed model using a total of 9,783 18F-FDG studies with low-dose levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by board-certified nuclear medicine physicians, experienced readers judged the images to be similar or superior to the full-dose images and previous DL baselines based on qualitative visual impression. Lesion-level quantitative accuracy was evaluated using a Monte Carlo simulation study and a lesion segmentation network. The presented results show the potential to achieve low-dose PET while maintaining image quality. Real low-dose scans was also included for evaluation. △ Less

Submitted 16 June, 2025; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 18 Pages, 16 Figures, 5 Tables. Paper under review. First-place Freek J. Beekman Young Investigator Award at SNMMI 2024. Code available after paper publication. arXiv admin note: substantial text overlap with arXiv:2311.04248

arXiv:2405.06995 [pdf, other]

Benchmarking Cross-Domain Audio-Visual Deception Detection

Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot

Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features d… ▽ More Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. △ Less

Submitted 5 October, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2404.09131 [pdf, other]

Design of Artificial Interference Signals for Covert Communication Aided by Multiple Friendly Nodes

Authors: Xuyang Zhao. Wei Guo, Yongchao Wang

Abstract: In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of ch… ▽ More In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of channel coefficients to optimize the basis matrix of the artificial noise signals space in the absence of accurate channel fading information between the friendly interference nodes and the legitimate receiver. The optimization problem aims to design artificial noise signals within the space to facilitate covert communication while minimizing the impact on the performance of legitimate communication. Second, a customized Rimannian Stochastic Variance Reduced Gradient (R-SVRG) algorithm is proposed to solve the non-convex problem. In the algorithm, we employ the Riemannian optimization framework to analyze the geometric structure of the basis matrix constraints and transform the original non-convex optimization problem into an unconstrained problem on the complex Stiefel manifold for solution. Third, we theoretically prove the convergence of the proposed algorithm to a stationary point. In the end, we evaluate the performance of the proposed strategy for generating artificial noise signals through numerical simulations. The results demonstrate that our approach significantly outperforms the Gaussian artificial noise strategy without optimization. △ Less

Submitted 9 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.03869 [pdf, other]

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

Abstract: The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be update… ▽ More The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables' impact on team performance by visualization. △ Less

Submitted 2 October, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.19002 [pdf, other]

Robust Active Speaker Detection in Noisy Environments

Authors: Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

Abstract: This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation a… ▽ More This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation as guidance to learn noise-free audio features. These features are then utilized in an ASD model, and both tasks are jointly optimized in an end-to-end framework. Our proposed framework mitigates residual noise and audio quality reduction issues that can occur in a naive cascaded two-stage framework that directly uses separated speech for ASD, and enables the two tasks to be optimized simultaneously. To further enhance the robustness of the audio features and handle inherent speech noises, we propose a dynamic weighted loss approach to train the speech separator. We also collected a real-world noise audio dataset to facilitate investigations. Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments. The framework is general and can be applied to different ASD approaches to improve their robustness. Our code, models, and data will be released. △ Less

Submitted 30 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 15 pages, 5 figures

arXiv:2403.11071 [pdf, other]

Wavenumber Domain Sparse Channel Estimation in Holographic MIMO

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Zhaocheng Wang, Zhu Han

Abstract: In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the p… ▽ More In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the power leakage. To tackle this challenge, the HMIMO channel is represented in the wavenumber domain for exploring its cluster-dominated sparsity. Specifically, a finite set of Fourier harmonics acts as a series of sampling probes to encapsulate the integral of the power spectrum over specific angular regions. This technique effectively eliminates power leakage resulting from power mismatches induced by the use of discrete angular-domain probes. Next, the channel estimation problem is recast as a sparse recovery of the significant angular power spectrum over the continuous integration region. We then propose an accompanying graph-cut-based swap expansion (GCSE) algorithm to extract beneficial sparsity inherent in HMIMO channels. Numerical results demonstrate that this wavenumber-domainbased GCSE approach achieves robust performance with rapid convergence. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: This paper has been accepted in 2024 ICC

arXiv:2402.09567 [pdf, other]

TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction

Authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Qiong Liu, Bo Zhou, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Lawrence H. Staib, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek

Abstract: Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for ea… ▽ More Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for early frames where intensity-based image registration techniques often fail. To address this issue, we propose a novel method called Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) that utilizes an all-to-one mapping to convert early frames into those with tracer distribution similar to the last reference frame. The TAI-GAN consists of a feature-wise linear modulation layer that encodes channel-wise parameters generated from temporal information and rough cardiac segmentation masks with local shifts that serve as anatomical information. Our proposed method was evaluated on a clinical 82-Rb PET dataset, and the results show that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, the motion estimation accuracy and subsequent myocardial blood flow (MBF) quantification with both conventional and deep learning-based motion correction methods were improved compared to using the original frames. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Under revision at Medical Image Analysis

arXiv:2401.14285 [pdf, other]

POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation

Authors: Bo Zhou, Jun Hou, Tianqi Chen, Yinchi Zhou, Xiongchao Chen, Huidong Xie, Qiong Liu, Xueqi Guo, Yu-Jung Tsai, Vladimir Y. Panin, Takuya Toyonaga, James S. Duncan, Chi Liu

Abstract: Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prio… ▽ More Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $μ$-map generation, resulting in the production of high-quality $μ$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures

Showing 1–50 of 131 results for author: Guo, X