Search | arXiv e-print repository

arXiv:2507.07512 [pdf]

Demonstration of TFTs 3D Monolithically Integrated on GaN HEMTs using Cascode Configuration with High Breakdown Voltage (>1900V)

Authors: Tian-Li Wu, Hsin-Jou Ho, Chia-Wei Liu, Yi-Chen Chen

Abstract: This study demonstrates 3D monolithic integration of amorphous indium-gallium-zinc oxide (a-IGZO) thin-film transistors (TFTs) on Gallium Nitride (GaN) high electron mobility transistors (HEMTs) in a cascode configuration, achieving high breakdown voltage capabilities exceeding 1900 V. Two device configurations, differing in a-IGZO channel thickness (30 nm / 10 nm), are fabricated and evaluated. S… ▽ More This study demonstrates 3D monolithic integration of amorphous indium-gallium-zinc oxide (a-IGZO) thin-film transistors (TFTs) on Gallium Nitride (GaN) high electron mobility transistors (HEMTs) in a cascode configuration, achieving high breakdown voltage capabilities exceeding 1900 V. Two device configurations, differing in a-IGZO channel thickness (30 nm / 10 nm), are fabricated and evaluated. Sample B, with a 10 nm a-IGZO channel, demonstrates superior electrical performance, including a high ON/OFF current ratio (~10^7), low subthreshold swing (SS), and a high breakdown voltage exceeding 1900 V comparable to standalone GaN power HEMTs. The results highlight the feasibility and potential of 3D integrated TFT on GaN power HEMTs, paving the way for new opportunities for the TFTs for high voltage applications. △ Less

Submitted 10 July, 2025; originally announced July 2025.

Comments: 3 pages, 5 figures

arXiv:2507.00209 [pdf, ps, other]

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

Authors: Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri

Abstract: High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging a… ▽ More High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution, representing realistic conditions of robotic-assisted procedures. SurgiSR4K comprises diverse visual scenarios including specular reflections, tool occlusions, bleeding, and soft tissue deformations, meticulously designed to reflect common challenges faced during laparoscopic and robotic surgeries. This dataset opens up possibilities for a broad range of computer vision tasks that might benefit from high resolution data, such as super resolution (SR), smoke removal, surgical instrument detection, 3D tissue reconstruction, monocular depth estimation, instance segmentation, novel view synthesis, and vision-language model (VLM) development. SurgiSR4K provides a robust foundation for advancing research in high-resolution surgical imaging and fosters the development of intelligent imaging technologies aimed at enhancing performance, safety, and usability in image-guided robotic surgeries. △ Less

Submitted 7 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

arXiv:2506.06679 [pdf, ps, other]

Controlled Reach-avoid Set Computation for Discrete-time Polynomial Systems via Convex Optimization

Authors: Taoran Wu, Yiling Xue, Dejin Ren, Arvind Easwaran, Martin Fränzle, Bai Xue

Abstract: This paper addresses the computation of controlled reach-avoid sets (CRASs) for discrete-time polynomial systems subject to control inputs. A CRAS is a set encompassing initial states from which there exist control inputs driving the system into a target set while avoiding unsafe sets. However, efficiently computing CRASs remains an open problem, especially for discrete-time systems. In this paper… ▽ More This paper addresses the computation of controlled reach-avoid sets (CRASs) for discrete-time polynomial systems subject to control inputs. A CRAS is a set encompassing initial states from which there exist control inputs driving the system into a target set while avoiding unsafe sets. However, efficiently computing CRASs remains an open problem, especially for discrete-time systems. In this paper, we propose a novel framework for computing CRASs which takes advantage of a probabilistic perspective. This framework transforms the fundamentally nonlinear problem of computing CRASs into a computationally tractable convex optimization problem. By regarding control inputs as disturbances obeying certain probability distributions, a CRAS can be equivalently treated as a 0-reach-avoid set in the probabilistic sense, which consists of initial states from which the probability of eventually entering the target set while remaining within the safe set is greater than zero. Thus, we can employ the convex optimization method of computing 0-reach-avoid sets to estimate CRASs. Furthermore, inspired by the $ε$-greedy strategy widely used in reinforcement learning, we propose an approach that iteratively updates the aforementioned probability distributions imposed on control inputs to compute larger CRASs. We demonstrate the effectiveness of the proposed method on extensive examples. △ Less

Submitted 7 June, 2025; originally announced June 2025.

arXiv:2506.01496 [pdf, ps, other]

Continual Speech Learning with Fused Speech Features

Authors: Guitao Wang, Jinming Zhao, Hao Yang, Guilin Qi, Tongtong Wu, Gholamreza Haffari

Abstract: Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap in current speech models. We use the encoder-decoder Whisper model to standardize speech tasks into a generative format. We integrate a learnable gated-fusion… ▽ More Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap in current speech models. We use the encoder-decoder Whisper model to standardize speech tasks into a generative format. We integrate a learnable gated-fusion layer on the top of the encoder to dynamically select task-specific features for downstream tasks. Our approach improves accuracy significantly over traditional methods in six speech processing tasks, demonstrating gains in adapting to new speech tasks without full retraining. △ Less

Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

Comments: Accepted to Interspeech 2025

arXiv:2505.20638 [pdf, ps, other]

Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs

Authors: Wenhao You, Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Zhongyu Ouyang, Chiyu Ma, Tingxuan Wu, Noah Wei, Zong Ke, Ming Cheng, Soroush Vosoughi, Jiang Gui

Abstract: While recent Multimodal Large Language Models exhibit impressive capabilities for general multimodal tasks, specialized domains like music necessitate tailored approaches. Music Audio-Visual Question Answering (Music AVQA) particularly underscores this, presenting unique challenges with its continuous, densely layered audio-visual content, intricate temporal dynamics, and the critical need for dom… ▽ More While recent Multimodal Large Language Models exhibit impressive capabilities for general multimodal tasks, specialized domains like music necessitate tailored approaches. Music Audio-Visual Question Answering (Music AVQA) particularly underscores this, presenting unique challenges with its continuous, densely layered audio-visual content, intricate temporal dynamics, and the critical need for domain-specific knowledge. Through a systematic analysis of Music AVQA datasets and methods, this position paper identifies that specialized input processing, architectures incorporating dedicated spatial-temporal designs, and music-specific modeling strategies are critical for success in this domain. Our study provides valuable insights for researchers by highlighting effective design patterns empirically linked to strong performance, proposing concrete future directions for incorporating musical priors, and aiming to establish a robust foundation for advancing multimodal musical understanding. This work is intended to inspire broader attention and further research, supported by a continuously updated anonymous GitHub repository of relevant papers: https://github.com/xid32/Survey4MusicAVQA. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.09511 [pdf, ps, other]

Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps

Authors: Tianfu Wu, Jiaqi Fu, Wugang Meng, Sungjin Cho, Huanzhe Zhan, Fumin Zhang

Abstract: Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm a… ▽ More Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.04453 [pdf, ps, other]

Meta-Learning Driven Lightweight Phase Shift Compression for IRS-Assisted Wireless Systems

Authors: Xianhua Yu, Dong Li, Bowen Gu, Xiaoye Jing, Wen Wu, Tuo Wu, Kan Yu

Abstract: The phase shift information (PSI) overhead poses a critical challenge to enabling real-time intelligent reflecting surface (IRS)-assisted wireless systems, particularly under dynamic and resource-constrained conditions. In this paper, we propose a lightweight PSI compression framework, termed meta-learning-driven compression and reconstruction network (MCRNet). By leveraging a few-shot adaptation… ▽ More The phase shift information (PSI) overhead poses a critical challenge to enabling real-time intelligent reflecting surface (IRS)-assisted wireless systems, particularly under dynamic and resource-constrained conditions. In this paper, we propose a lightweight PSI compression framework, termed meta-learning-driven compression and reconstruction network (MCRNet). By leveraging a few-shot adaptation strategy via model-agnostic meta-learning (MAML), MCRNet enables rapid generalization across diverse IRS configurations with minimal retraining overhead. Furthermore, a novel depthwise convolutional gating (DWCG) module is incorporated into the decoder to achieve adaptive local feature modulation with low computational cost, significantly improving decoding efficiency. Extensive simulations demonstrate that MCRNet achieves competitive normalized mean square error performance compared to state-of-the-art baselines across various compression ratios, while substantially reducing model size and inference latency. These results validate the effectiveness of the proposed asymmetric architecture and highlight the practical scalability and real-time applicability of MCRNet for dynamic IRS-assisted wireless deployments. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2504.18271 [pdf, other]

LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method

Authors: Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu

Abstract: Antenna modeling is a time-consuming and complex process, decreasing the speed of antenna analysis and design. In this paper, a large language model (LLM)- enabled antenna modeling method, called LEAM, is presented to address this challenge. LEAM enables automatic antenna model generation based on language descriptions via prompt input, images, descriptions from academic papers, patents, and techn… ▽ More Antenna modeling is a time-consuming and complex process, decreasing the speed of antenna analysis and design. In this paper, a large language model (LLM)- enabled antenna modeling method, called LEAM, is presented to address this challenge. LEAM enables automatic antenna model generation based on language descriptions via prompt input, images, descriptions from academic papers, patents, and technical reports (either one or multiple). The effectiveness of LEAM is demonstrated by three examples: a Vivaldi antenna generated from a complete user description, a slotted patch antenna generated from an incomplete user description and the operating frequency, and a monopole slotted antenna generated from images and descriptions scanned from the literature. For all the examples, correct antenna models are generated in a few minutes. The code can be accessed via https://github.com/TaoWu974/LEAM. △ Less

Submitted 25 April, 2025; originally announced April 2025.

Comments: Code are available: https://github.com/TaoWu974/LEAM

arXiv:2504.13455 [pdf, other]

Modular XL-Array-Enabled 3-D Localization based on Hybrid Spherical-Planar Wave Model in Terahertz Systems

Authors: Yang Zhang, Ruidong Li, Cunhua Pan, Hong Ren, Tuo Wu, Changhong Wang

Abstract: This work considers the three-dimensional (3-D) positioning problem in a Terahertz (THz) system enabled by a modular extra-large (XL) array with sub-connected architecture. Our purpose is to estimate the Cartesian Coordinates of multiple user equipments (UEs) with the received signal of the RF chains while considering the spatial non-stationarity (SNS). We apply the hybrid spherical-planar wave mo… ▽ More This work considers the three-dimensional (3-D) positioning problem in a Terahertz (THz) system enabled by a modular extra-large (XL) array with sub-connected architecture. Our purpose is to estimate the Cartesian Coordinates of multiple user equipments (UEs) with the received signal of the RF chains while considering the spatial non-stationarity (SNS). We apply the hybrid spherical-planar wave model (HSPWM) as the channel model owing to the structual feature of the modular array, and propose a 3-D localization algorithm with relatively high accuracy and low complexity. Specifically, we first distinguish the visible sub-arrays (SAs) located in the VR and estimate the angles-of-arrival (AoAs) from each UE to typical visible SAs with the largest receive power via compressed sensing (CS) method. In addition, we apply the weighted least square (WLS) method to obtain a coarse 3-D position estimation of each UE according to the AoA estimations. Then, we estimate the AoAs of the other SAs with a reduced dictionary (RD)-CS-based method for lower computational complexity, and utilize all the efficient AoA estimations to derive a fine position estimation. Simulation results indicate that the proposed positioning framework based on modular XL-array can achieve satisfactory accuracy with evident reduction in complexity. Furthermore, the deployment of SAs and the allocation of antenna elements need to be specially designed for better positioning performance. △ Less

Submitted 18 April, 2025; originally announced April 2025.

Comments: 13 pages, 11 figures

arXiv:2504.12711 [pdf, other]

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/. △ Less

Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

arXiv:2504.01638 [pdf, other]

Convex Computations for Controlled Safety Invariant Sets of Black-box Discrete-time Dynamical Systems

Authors: Taoran Wu, Yiling Xue, Jingduo Pan, Dejin Ren, Arvind Easwaran, Bai Xue

Abstract: Identifying controlled safety invariant sets (CSISs) is essential in safety-critical applications. This paper tackles the problem of identifying CSISs for black-box discrete-time systems, where the model is unknown and only limited simulation data is accessible. Traditionally, a CSIS is defined as a subset of a safe set, encompassing initial states for which a control input exists that keeps the s… ▽ More Identifying controlled safety invariant sets (CSISs) is essential in safety-critical applications. This paper tackles the problem of identifying CSISs for black-box discrete-time systems, where the model is unknown and only limited simulation data is accessible. Traditionally, a CSIS is defined as a subset of a safe set, encompassing initial states for which a control input exists that keeps the system within the set at the next time step-this is referred to as the one-step invariance property. However, the requirement for one-step invariance can be equivalently translated into a stricter condition of ``always-invariance'', meaning that there exist control inputs capable of keeping the system within this set indefinitely. Such a condition may prove overly stringent or impractical for black-box systems, where predictions can become unreliable beyond a single time step or a limited number of finite time steps. To overcome the challenges posed by black-box systems, we reformulate the one-step invariance property in a ``Probably Approximately Correct'' (PAC) sense. This approach allows us to assess the probability that a control input exists to keep the system within the CSIS at the next time step, with a predefined level of confidence. If the system successfully remains within the set at the next time step, we can then reapply the invariance evaluation to the new state, thereby facilitating a recursive assurance of invariance. Our method employs barrier functions and scenario optimization, resulting in a linear programming method to estimate PAC CSISs. Finally, the effectiveness of our approach is demonstrated on several examples. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 15 pages

arXiv:2503.12698 [pdf, other]

A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

Authors: Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong , et al. (9 additional authors not shown)

Abstract: Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized… ▽ More Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2502.16669 [pdf, other]

Holographic MIMO Multi-Cell Communications

Authors: Kangda Zhi, Tianyu Yang, Shuangyang Li, Yi Song, Tuo Wu, Giuseppe Caire

Abstract: Metamaterial antennas are appealing for next-generation wireless networks due to their simplified hardware and much-reduced size, power, and cost. This paper investigates the holographic multiple-input multiple-output (HMIMO)-aided multi-cell systems with practical per-radio frequency (RF) chain power constraints. With multiple antennas at both base stations (BSs) and users, we design the baseband… ▽ More Metamaterial antennas are appealing for next-generation wireless networks due to their simplified hardware and much-reduced size, power, and cost. This paper investigates the holographic multiple-input multiple-output (HMIMO)-aided multi-cell systems with practical per-radio frequency (RF) chain power constraints. With multiple antennas at both base stations (BSs) and users, we design the baseband digital precoder and the tuning response of HMIMO metamaterial elements to maximize the weighted sum user rate. Specifically, under the framework of block coordinate descent (BCD) and weighted minimum mean square error (WMMSE) techniques, we derive the low-complexity closed-form solution for baseband precoder without requiring bisection search and matrix inversion. Then, for the design of HMIMO metamaterial elements under binary tuning constraints, we first propose a low-complexity suboptimal algorithm with closed-form solutions by exploiting the hidden convexity (HC) in the quadratic problem and then further propose an accelerated sphere decoding (SD)-based algorithm which yields global optimal solution in the iteration. For HMIMO metamaterial element design under the Lorentzian-constrained phase model, we propose a maximization-minorization (MM) algorithm with closed-form solutions at each iteration step. Furthermore, in a simplified multiple-input single-output (MISO) scenario, we derive the scaling law of downlink single-to-noise (SNR) for HMIMO with binary and Lorentzian tuning constraints and theoretically compare it with conventional fully digital/hybrid arrays. Simulation results demonstrate the effectiveness of our algorithms compared to benchmarks and the benefits of HMIMO compared to conventional arrays. △ Less

Submitted 23 February, 2025; originally announced February 2025.

Comments: 13 pages

arXiv:2502.06710 [pdf, other]

Learning Musical Representations for Music Performance Question Answering

Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui

Abstract: Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances:… ▽ More Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances: they underexplore the interaction between the multimodal signals in performance and fail to consider the distinctive characteristics of instruments and music. Therefore, existing methods tend to answer questions regarding musical performances inaccurately. To bridge the above research gaps, (i) given the intricate multimodal interconnectivity inherent to music data, our primary backbone is designed to incorporate multimodal interactions within the context of music; (ii) to enable the model to learn music characteristics, we annotate and release rhythmic and music sources in the current music datasets; (iii) for time-aware audio-visual modeling, we align the model's music predictions with the temporal dimension. Our experiments show state-of-the-art effects on the Music AVQA datasets. Our code is available at https://github.com/xid32/Amuse. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: Accepted at EMNLP 2024

arXiv:2502.04307 [pdf, other]

DexterityGen: Foundation Controller for Unprecedented Dexterity

Authors: Zhao-Heng Yin, Changhao Wang, Luis Pineda, Francois Hogan, Krishna Bodduluri, Akash Sharma, Patrick Lancaster, Ishita Prasad, Mrinal Kalakrishnan, Jitendra Malik, Mike Lambeta, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam

Abstract: Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The… ▽ More Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The second RL-based approach struggles with the domain gap and involves highly task-specific reward engineering on complex tasks. Our key insight is that RL is effective at learning low-level motion primitives, while humans excel at providing coarse motion commands for complex, long-horizon tasks. Therefore, the optimal solution might be a combination of both approaches. In this paper, we introduce DexterityGen (DexGen), which uses RL to pretrain large-scale dexterous motion primitives, such as in-hand rotation or translation. We then leverage this learned dataset to train a dexterous foundational controller. In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior. We evaluate the effectiveness of DexGen in both simulation and real world, demonstrating that it is a general-purpose controller that can realize input dexterous manipulation commands and significantly improves stability by 10-100x measured as duration of holding objects across diverse tasks. Notably, with DexGen we demonstrate unprecedented dexterous skills including diverse object reorientation and dexterous tool use such as pen, syringe, and screwdriver for the first time. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: Project: https://zhaohengyin.github.io/dexteritygen

arXiv:2501.18378 [pdf, other]

A Hybrid Dynamic Subarray Architecture for Efficient DOA Estimation in THz Ultra-Massive Hybrid MIMO Systems

Authors: Ye Tian, Jiaji Ren, Tuo Wu, Wei Liu, Chau Yuen, Merouane Debbah, Naofal Al-Dhahir, Matthew C. Valenti, Hing Cheung So, Yonina C. Eldar

Abstract: Terahertz (THz) communication combined with ultra-massive multiple-input multiple-output (UM-MIMO) technology is promising for 6G wireless systems, where fast and precise direction-of-arrival (DOA) estimation is crucial for effective beamforming. However, finding DOAs in THz UM-MIMO systems faces significant challenges: while reducing hardware complexity, the hybrid analog-digital (HAD) architectu… ▽ More Terahertz (THz) communication combined with ultra-massive multiple-input multiple-output (UM-MIMO) technology is promising for 6G wireless systems, where fast and precise direction-of-arrival (DOA) estimation is crucial for effective beamforming. However, finding DOAs in THz UM-MIMO systems faces significant challenges: while reducing hardware complexity, the hybrid analog-digital (HAD) architecture introduces inherent difficulties in spatial information acquisition the large-scale antenna array causes significant deviations in eigenvalue decomposition results; and conventional two-dimensional DOA estimation methods incur prohibitively high computational overhead, hindering fast and accurate realization. To address these challenges, we propose a hybrid dynamic subarray (HDS) architecture that strategically divides antenna elements into subarrays, ensuring phase differences between subarrays correlate exclusively with single-dimensional DOAs. Leveraging this architectural innovation, we develop two efficient algorithms for DOA estimation: a reduced-dimension MUSIC (RD-MUSIC) algorithm that enables fast processing by correcting large-scale array estimation bias, and an improved version that further accelerates estimation by exploiting THz channel sparsity to obtain initial closed-form solutions through specialized two-RF-chain configuration. Furthermore, we develop a theoretical framework through Cramér-Rao lower bound analysis, providing fundamental insights for different HDS configurations. Extensive simulations demonstrate that our solution achieves both superior estimation accuracy and computational efficiency, making it particularly suitable for practical THz UM-MIMO systems. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.16854 [pdf, ps, other]

From Partial Calibration to Full Potential: A Two-Stage Sparse DOA Estimation for Incoherently-Distributed Sources with Gain-Phase Uncertainty

Authors: He Xu, Tuo Wu, Wei Liu, Maged Elkashlan, Naofal Al-Dhahir, Merouane Debbah, Chau Yuen, Hing Cheung So

Abstract: Direction-of-arrival (DOA) estimation for incoherently distributed (ID) sources is essential in multipath wireless communication scenarios, yet it remains challenging due to the combined effects of angular spread and gain-phase uncertainties in antenna arrays. This paper presents a two-stage sparse DOA estimation framework, transitioning from partial calibration to full potential, under the genera… ▽ More Direction-of-arrival (DOA) estimation for incoherently distributed (ID) sources is essential in multipath wireless communication scenarios, yet it remains challenging due to the combined effects of angular spread and gain-phase uncertainties in antenna arrays. This paper presents a two-stage sparse DOA estimation framework, transitioning from partial calibration to full potential, under the generalized array manifold (GAM) framework. In the first stage, coarse DOA estimates are obtained by exploiting the output from a subset of partly-calibrated arrays (PCAs). In the second stage, these estimates are utilized to determine and compensate for gain-phase uncertainties across all array elements. Then a sparse total least-squares optimization problem is formulated and solved via alternating descent to refine the DOA estimates. Simulation results demonstrate that the proposed method attained improved estimation accuracy compared to existing approaches, while maintaining robustness against both noise and angular spread effects in practical multipath environments. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.12473 [pdf, other]

RIS-Aided Monitoring With Cooperative Jamming: Design and Performance Analysis

Authors: Shuying Lin, Yulong Zou, Zhiyang Li, Tong Wu, Eduard E. Bahingayi, Le-Nam Tran

Abstract: We investigate a reconfigurable intelligent surface (RIS) aided wireless surveillance system. In this system, a monitor not only receives signal from suspicious transmitter via a RIS-enhanced legitimate surveillance (LS) link but also simultaneously takes control of multiple jammers to degrade the quality of received suspicious signal. Under this setup, to enhance monitoring performance requires i… ▽ More We investigate a reconfigurable intelligent surface (RIS) aided wireless surveillance system. In this system, a monitor not only receives signal from suspicious transmitter via a RIS-enhanced legitimate surveillance (LS) link but also simultaneously takes control of multiple jammers to degrade the quality of received suspicious signal. Under this setup, to enhance monitoring performance requires improvements of both the received signal quality at the monitor and the cooperative jamming (CJ). Considering that the surveillance system is aided by one RIS, whose phase shift optimization involves both channel state information (CSI) of the LS and CJ links, we utilize partial CSI to alleviate the CSI acquisition burden in our design. We propose two RIS-aided monitoring schemes with optimal jammer selection (OJS), and derive their closed-form expressions of surveillance success probability (SSP), respectively. Furthermore, we consider RIS-aided monitoring schemes with random jammer selection as corresponding benchmarks. Thereafter, we analyze special cases where the jammers are using power control to avoid being found, making it appears like passive monitoring. Also, the effect of RIS is highlighted by considering asymptotically large number of RIS elements. Numerical results verify that the proposed OJS strategy further enhances the RIS-aided monitoring performance compared with non-jammer-selection RISLR and RISCR schemes, where the superiority comes at the cost of CSI knowledge and becomes marginal in the region of high jamming power. In addition, the RISLO shows surveillance performance advantage overRISCOwhen the suspicious power is low or when the number of RIS elements is large. △ Less

Submitted 25 February, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: submitted to IEEE Transactions on Communications

arXiv:2501.08680 [pdf, other]

Digital Twin Online Channel Modeling: Challenges,Principles, and Applications

Authors: Junling Li, Cheng-Xiang Wang, Chen Huang, Tianrun Qi, Tong Wu

Abstract: Different from traditional offline channel modeling, digital twin online channel modeling can sense and accurately characterize dynamic wireless channels in real time, and can therefore greatly assist 6G network optimization. This article proposes a novel promising framework and a step-by-step design procedure of digital twin online channel models (DTOCM). By enabling continuous visualization and… ▽ More Different from traditional offline channel modeling, digital twin online channel modeling can sense and accurately characterize dynamic wireless channels in real time, and can therefore greatly assist 6G network optimization. This article proposes a novel promising framework and a step-by-step design procedure of digital twin online channel models (DTOCM). By enabling continuous visualization and accurate prediction of dynamic channel variations, DTOCM can synchronize the performance between simulated and real networks. We first explore the evolution and conceptual advancements of DTOCM, highlighting its visions and associated challenges. Then, we explain its operational principles, construction mechanisms, and applications to typical 6G scenarios. Subsequently, the real-time channel information provisioning and visualization capabilities of DTOCM are illustrated through our DTOCM platform based on practical scenarios. Finally, future research directions and open issues are discussed. △ Less

Submitted 15 January, 2025; originally announced January 2025.

arXiv:2501.01281 [pdf, other]

Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

Authors: Shunxing Yang, Junteng Yao, Jie Tang, Tuo Wu, Maged Elkashlan, Chau Yuen, Merouane Debbah, Hyundong Shin, Matthew Valenti

Abstract: Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-conve… ▽ More Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-convex problem, with traditional methods becoming impractical as the number of fluid antennas increases. To address these challenges, this letter proposes a block coordinate descent (BCD) framework integrated with a deep reinforcement learning (DRL)-based approach for intelligent antenna positioning. By leveraging the deep deterministic policy gradient (DDPG) algorithm, the proposed framework efficiently balances sensing and communication performance. Simulation results demonstrate the scalability and effectiveness of the proposed approach. △ Less

Submitted 2 January, 2025; originally announced January 2025.

arXiv:2412.15843 [pdf, other]

Rethinking Hardware Impairments in Multi-User Systems: Can FAS Make a Difference?

Authors: Junteng Yao, Tuo Wu, Liaoshi Zhou, Ming Jin, Cunhua Pan, Maged Elkashlan, Fumiyuki Adachi, George K. Karagiannidis, Naofal Al-Dhahir, Chau Yuen

Abstract: In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optim… ▽ More In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optimizing the BS's transmit beamforming, the positions of its transmit fluid antennas, and the positions of the CUs' receive fluid antennas. To address this non-convex problem, we propose a block coordinate descent (BCD) algorithm integrating semidefinite relaxation (SDR), rank-one constraint relaxation (SRCR), successive convex approximation (SCA), and majorization-minimization (MM). Simulation results demonstrate that FAS significantly enhances system performance and robustness, with notable gains when both the BS and CUs are equipped with fluid antennas. Even under low transmit power conditions, deploying FAS at the BS alone yields substantial performance gains. However, the effectiveness of FAS depends on the availability of sufficient movement space, as space constraints may limit its benefits compared to fixed antenna strategies. Our findings highlight the potential of FAS to mitigate HIs and enhance multi-user system performance, while emphasizing the need for practical deployment considerations. △ Less

Submitted 20 December, 2024; originally announced December 2024.

arXiv:2412.03839 [pdf, other]

Fluid Antenna Systems Enabling 6G:Principles, Applications, and Research Directions

Authors: Tuo Wu, Kangda Zhi, Junteng Yao, Xiazhi Lai, Jianchao Zheng, Hong Niu, Maged Elkashlan, Kai-Kit Wong, Chan-Byoung Chae, Zhiguo Ding, George K. Karagiannidis, Merouane Debbah, Chau Yuen

Abstract: Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation… ▽ More Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation pattern, the operating frequency, and other critical radiation characteristics. With its capability, it is highly anticipated that FAS can contribute greatly to the upcoming sixth generation (6G) wireless networks. This article substantiates this thought by addressing four major questions: 1) Is FAS crucial to 6G? 2) How to characterize FAS? 3) What are the applications of FAS? 4) What are the relevant challenges and future research directions? In particular, five promising research directions that underscore the potential of FAS are discussed. We conclude this article by showcasing the impressive performance of FAS. △ Less

Submitted 4 December, 2024; originally announced December 2024.

arXiv:2412.02282 [pdf, other]

Exploring Evolutionary Spectral Clustering for Temporal-Smoothed Clustered Cell-Free Networking

Authors: Junyuan Wang, Tianyao Wu, Ouyang Zhou, Yaping Zhu

Abstract: Clustered cell-free networking, which dynamically partitions the whole network into nonoverlapping subnetworks, has been recently proposed to mitigate the cell-edge problem in cellular networks. However, prior works only focused on optimizing clustered cell-free networking in static scenarios with fixed users. This could lead to a large number of handovers in the practical dynamic environment with… ▽ More Clustered cell-free networking, which dynamically partitions the whole network into nonoverlapping subnetworks, has been recently proposed to mitigate the cell-edge problem in cellular networks. However, prior works only focused on optimizing clustered cell-free networking in static scenarios with fixed users. This could lead to a large number of handovers in the practical dynamic environment with moving users, seriously hindering the implementation of clustered cell-free networking in practice. This paper considers user mobility and aims to simultaneously maximize the sum rate and minimize the number of handovers. By transforming the multi-objective optimization problem into a time-varying graph partitioning problem and exploring evolutionary spectral clustering, a temporal-smoothed clustered cell-free networking algorithm is proposed, which is shown to be effective in smoothing network partitions over time and reducing handovers while maintaining similar sum rate. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 5 pages, 3 figures

arXiv:2411.16997 [pdf, other]

doi 10.1109/GLOBECOM52923.2024.10901379

Channel Modeling for Ultraviolet Non-Line-of-Sight Communications Incorporating an Obstacle

Authors: Tianfeng Wu, Fang Yang, Tian Cao, Ling Cheng, Yupeng Chen, Jian Song, Julian Cheng, Zhu Han

Abstract: Existing studies on ultraviolet (UV) non-line-of-sight (NLoS) channel modeling primarily focus on scenarios without any obstacle, which makes them unsuitable for small transceiver elevation angles in most cases. To address this issue, a UV NLoS channel model incorporating an obstacle was investigated in this paper, where the impacts of atmospheric scattering and obstacle reflection on UV signals w… ▽ More Existing studies on ultraviolet (UV) non-line-of-sight (NLoS) channel modeling primarily focus on scenarios without any obstacle, which makes them unsuitable for small transceiver elevation angles in most cases. To address this issue, a UV NLoS channel model incorporating an obstacle was investigated in this paper, where the impacts of atmospheric scattering and obstacle reflection on UV signals were both taken into account. To validate the proposed model, we compared it to the related Monte-Carlo photon-tracing (MCPT) model that had been verified by outdoor experiments. Numerical results manifest that the path loss curves obtained by the proposed model agree well with those determined by the MCPT model, while its computation complexity is lower than that of the MCPT model. This work discloses that obstacle reflection can effectively reduce the channel path loss of UV NLoS communication systems. △ Less

Submitted 8 November, 2024; originally announced November 2024.

Comments: Accepted by IEEE Global Communications Conference (GLOBECOM) 2024. arXiv admin note: substantial text overlap with arXiv:2411.15154

arXiv:2411.15154 [pdf, other]

Modeling of UV NLoS Communication Channels: From Atmospheric Scattering and Obstacle Reflection Perspectives

Authors: Tianfeng Wu, Fang Yang, Tian Cao, Ling Cheng, Yupeng Chen, Jian Song, Julian Cheng, Zhu Han

Abstract: As transceiver elevation angles increase from small to large, existing ultraviolet (UV) non-line-of-sight (NLoS) models encounter two challenges: i) cannot estimate the channel characteristics of UV NLoS communication scenarios when there exists an obstacle in the overlap volume between the transmitter beam and the receiver field-of-view (FoV), and ii) cannot evaluate the channel path loss for the… ▽ More As transceiver elevation angles increase from small to large, existing ultraviolet (UV) non-line-of-sight (NLoS) models encounter two challenges: i) cannot estimate the channel characteristics of UV NLoS communication scenarios when there exists an obstacle in the overlap volume between the transmitter beam and the receiver field-of-view (FoV), and ii) cannot evaluate the channel path loss for the wide beam and wide FoV scenarios with existing simplified single-scattering path loss models. To address these challenges, a UV NLoS scattering model incorporating an obstacle was investigated, where the obstacle's orientation angle, coordinates, and geometric dimensions were taken into account to approach actual application environments. Then, a UV NLoS reflection model was developed combined with specific geometric diagrams. Further, a simplified single-scattering path loss model was proposed with a closed-form expression. Finally, the proposed models were validated by comparing them with the Monte-Carlo photon-tracing model, the exact single-scattering model, and the latest simplified single-scattering model. Numerical results show that the path loss curves obtained by the proposed models agree well with those attained by related NLoS models under identical parameter settings, and avoiding obstacles is not always a good option for UV NLoS communications. Moreover, the accuracy of the proposed simplified model is superior to that of the existing simplified model for all kinds of transceiver FoV angles. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: Accepted by IEEE Journal on Selected Areas in Communications

arXiv:2411.11110 [pdf, other]

Retinal Vessel Segmentation via Neuron Programming

Authors: Tingting Wu, Ruyi Min, Peixuan Song, Hengtao Guo, Tieyong Zeng, Feng-Lei Fan

Abstract: The accurate segmentation of retinal blood vessels plays a crucial role in the early diagnosis and treatment of various ophthalmic diseases. Designing a network model for this task requires meticulous tuning and extensive experimentation to handle the tiny and intertwined morphology of retinal blood vessels. To tackle this challenge, Neural Architecture Search (NAS) methods are developed to fully… ▽ More The accurate segmentation of retinal blood vessels plays a crucial role in the early diagnosis and treatment of various ophthalmic diseases. Designing a network model for this task requires meticulous tuning and extensive experimentation to handle the tiny and intertwined morphology of retinal blood vessels. To tackle this challenge, Neural Architecture Search (NAS) methods are developed to fully explore the space of potential network architectures and go after the most powerful one. Inspired by neuronal diversity which is the biological foundation of all kinds of intelligent behaviors in our brain, this paper introduces a novel and foundational approach to neural network design, termed ``neuron programming'', to automatically search neuronal types into a network to enhance a network's representation ability at the neuronal level, which is complementary to architecture-level enhancement done by NAS. Additionally, to mitigate the time and computational intensity of neuron programming, we develop a hypernetwork that leverages the search-derived architectural information to predict optimal neuronal configurations. Comprehensive experiments validate that neuron programming can achieve competitive performance in retinal blood segmentation, demonstrating the strong potential of neuronal diversity in medical image analysis. △ Less

Submitted 17 November, 2024; originally announced November 2024.

arXiv:2411.09235 [pdf, ps, other]

FAS for Secure and Covert Communications

Authors: Junteng Yao, Liangxiao Xin, Tuo Wu, Ming Jin, Kai-Kit Wong, Chau Yuen, Hyundong Shin

Abstract: This letter considers a fluid antenna system (FAS)-aided secure and covert communication system, where the transmitter adjusts multiple fluid antennas' positions to achieve secure and covert transmission under the threat of an eavesdropper and the detection of a warden. This letter aims to maximize the secrecy rate while satisfying the covertness constraint. Unfortunately, the optimization problem… ▽ More This letter considers a fluid antenna system (FAS)-aided secure and covert communication system, where the transmitter adjusts multiple fluid antennas' positions to achieve secure and covert transmission under the threat of an eavesdropper and the detection of a warden. This letter aims to maximize the secrecy rate while satisfying the covertness constraint. Unfortunately, the optimization problem is non-convex due to the coupled variables. To tackle this, we propose an alternating optimization (AO) algorithm to alternatively optimize the optimization variables in an iterative manner. In particular, we use a penalty-based method and the majorization-minimization (MM) algorithm to optimize the transmit beamforming and fluid antennas' positions, respectively. Simulation results show that FAS can significantly improve the performance of secrecy and covertness compared to the fixed-position antenna (FPA)-based schemes. △ Less

Submitted 14 November, 2024; originally announced November 2024.

arXiv:2411.08618 [pdf, other]

Robust Optimal Power Flow Against Adversarial Attacks: A Tri-Level Optimization Approach

Authors: Saman Mazaheri Khamaneh, Tong Wu

Abstract: In power systems, unpredictable events like extreme weather, equipment failures, and cyberattacks present significant challenges to ensuring safety and reliability. Ensuring resilience in the face of these uncertainties is crucial for reliable and efficient operations. This paper presents a tri-level optimization approach for robust power system operations that effectively address worst-case attac… ▽ More In power systems, unpredictable events like extreme weather, equipment failures, and cyberattacks present significant challenges to ensuring safety and reliability. Ensuring resilience in the face of these uncertainties is crucial for reliable and efficient operations. This paper presents a tri-level optimization approach for robust power system operations that effectively address worst-case attacks. The first stage focuses on optimizing economic dispatch under normal operating conditions, aiming to minimize generation costs while maintaining the supply-demand balance. The second stage introduces an adversarial attack model, identifying worst-case scenarios that maximize the system's vulnerability by targeting distributed generation (DG). In the third stage, mitigation strategies are developed using fast-response energy storage systems (ESS) to minimize disruptions caused by these attacks. By integrating economic dispatch, vulnerability assessment, and mitigation into a unified framework, this approach provides a robust solution for enhancing power system resilience and safety against evolving adversarial threats. The approach is validated using the IEEE-33 node distribution system to demonstrate its effectiveness in achieving both cost efficiency and system resilience. △ Less

Submitted 13 November, 2024; originally announced November 2024.

Comments: This work has been submitted for possible publication

arXiv:2411.08386 [pdf, ps, other]

A Secure Beamforming Design: When Fluid Antenna Meets NOMA

Authors: Lifeng Mai, Junteng Yao, Jie Tang, Tuo Wu, Kai-Kit Wong, Hyundong Shin, Fumiyuki Adachi

Abstract: This letter proposes a secure beamforming design for downlink non-orthogonal multiple access (NOMA) systems utilizing fluid antenna systems (FAS). We consider a setup where a base station (BS) with $M$ fluid antennas (FAs) communicates to a cell-center user (CU) and a cell-edge user (CEU), each with a FA. The CU is the intended recipient while the CEU is regarded as a potential eavesdropper. Our a… ▽ More This letter proposes a secure beamforming design for downlink non-orthogonal multiple access (NOMA) systems utilizing fluid antenna systems (FAS). We consider a setup where a base station (BS) with $M$ fluid antennas (FAs) communicates to a cell-center user (CU) and a cell-edge user (CEU), each with a FA. The CU is the intended recipient while the CEU is regarded as a potential eavesdropper. Our aim is to maximize the achievable secrecy rate by jointly optimizing the secure beamforming vectors and the positions of FAs. To tackle this, we adopt an alternating optimization (AO) algorithm that optimizes secure beamforming and the positions of the FAs iteratively while keeping the other variables fixed. Numerical results illustrate that when FAs meet NOMA, the proposed scheme greatly enhances the secrecy rate compared to conventional multiple-input single-output (MISO) fixed antenna NOMA systems and other benchmark schemes. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2411.08383 [pdf, other]

FAS-Driven Spectrum Sensing for Cognitive Radio Networks

Authors: Junteng Yao, Ming Jin, Tuo Wu, Maged Elkashlan, Chau Yuen, Kai-Kit Wong, George K. Karagiannidis, Hyundong Shin

Abstract: Cognitive radio (CR) networks face significant challenges in spectrum sensing, especially under spectrum scarcity. Fluid antenna systems (FAS) can offer an unorthodox solution due to their ability to dynamically adjust antenna positions for improved channel gain. In this letter, we study a FAS-driven CR setup where a secondary user (SU) adjusts the positions of fluid antennas to detect signals fro… ▽ More Cognitive radio (CR) networks face significant challenges in spectrum sensing, especially under spectrum scarcity. Fluid antenna systems (FAS) can offer an unorthodox solution due to their ability to dynamically adjust antenna positions for improved channel gain. In this letter, we study a FAS-driven CR setup where a secondary user (SU) adjusts the positions of fluid antennas to detect signals from the primary user (PU). We aim to maximize the detection probability under the constraints of the false alarm probability and the received beamforming of the SU. To address this problem, we first derive a closed-form expression for the optimal detection threshold and reformulate the problem to find its solution. Then an alternating optimization (AO) scheme is proposed to decompose the problem into several sub-problems, addressing both the received beamforming and the antenna positions at the SU. The beamforming subproblem is addressed using a closed-form solution, while the fluid antenna positions are solved by successive convex approximation (SCA). Simulation results reveal that the proposed algorithm provides significant improvements over traditional fixed-position antenna (FPA) schemes in terms of spectrum sensing performance. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2411.05363 [pdf, other]

Path Loss Modeling for NLoS Ultraviolet Channels Incorporating Scattering and Reflection Effects

Authors: Tianfeng Wu, Fang Yang, Fei Li, Renzhi Yuan, Tian Cao, Ling Cheng, Jian Song, Julian Cheng, Zhu Han

Abstract: This paper tackles limitations in existing non-line-of-sight (NLoS) ultraviolet (UV) channel models, where conventional approaches assume obstacle-free propagation or uniform radiation intensity. In this paper, we develop a path loss model incorporating scattering and reflection, and then propose an obstacle-boundary approximation method to achieve computational tractability. Our framework systema… ▽ More This paper tackles limitations in existing non-line-of-sight (NLoS) ultraviolet (UV) channel models, where conventional approaches assume obstacle-free propagation or uniform radiation intensity. In this paper, we develop a path loss model incorporating scattering and reflection, and then propose an obstacle-boundary approximation method to achieve computational tractability. Our framework systematically incorporates spatial obstacle properties, including dimensions, coordinates, contours, and orientation angles, while employing the Lambertian radiation pattern for source modeling. Additionally, the proposed path loss model is validated by comparing it with the Monte-Carlo photon-tracing model and analytical integral model via numerical results, which indicate that when obstacle reflection is prominent, an approximation treatment of obstacle boundaries has a negligible influence on the path loss estimation of NLoS UV communication channels. △ Less

Submitted 18 March, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

Comments: Submitted to IEEE Global Communications Conference (GLOBECOM) 2025

arXiv:2411.01400 [pdf, ps, other]

Unlocking FAS-RIS Security Analysis with Block-Correlation Model

Authors: Jianchao Zheng, Xiazhi Lai, Tuo Wu, Maged Elkashlan, Daniel Benevides da Costa, Chau Yuen, Fumiyuki Adachi

Abstract: In this letter, we investigate the security of fluid antenna system (FAS)-reconfigurable intelligent surfaces (RIS) communication systems. The base station (BS) employs a single fixed-position antenna, while both the legitimate receiver and the eavesdropper are equipped with fluid antennas. By utilizing the block-correlation model and the central limit theorem (CLT), we derive approximate expressi… ▽ More In this letter, we investigate the security of fluid antenna system (FAS)-reconfigurable intelligent surfaces (RIS) communication systems. The base station (BS) employs a single fixed-position antenna, while both the legitimate receiver and the eavesdropper are equipped with fluid antennas. By utilizing the block-correlation model and the central limit theorem (CLT), we derive approximate expressions for the average secrecy capacity and secrecy outage probability (SOP). Our analysis, validated by simulation results, demonstrates the effectiveness of the block-correlation model in accurately assessing the security performance. Moreover, simulation results reveal that FAS-RIS system significantly outperforms other systems in terms of security, further underscoring its potential in secure communication applications. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2411.01398 [pdf, ps, other]

Paving the Way to 6G: Outage Probability Analysis for FAS-ARIS Systems

Authors: Jianchao Zheng, Xiazhi Lai, Junteng Yao, Jie Tang, Yijin Pan, Tuo Wu, Chau Yuen

Abstract: In this paper, we pave the way to six-generation (6G) by investigating the outage probability (OP) of fluid antenna system (FAS)-active reconfigurable intelligent surface (ARIS) communication systems. We consider a FAS-ARIS setup consisting of a base station (BS) with a single fixed-position antenna and a receiver equipped with a fluid antenna (FA). Utilizing the block-correlation model, we derive… ▽ More In this paper, we pave the way to six-generation (6G) by investigating the outage probability (OP) of fluid antenna system (FAS)-active reconfigurable intelligent surface (ARIS) communication systems. We consider a FAS-ARIS setup consisting of a base station (BS) with a single fixed-position antenna and a receiver equipped with a fluid antenna (FA). Utilizing the block-correlation model, we derive a closed-form expression for the OP. Our analysis, supported by numerical results, confirms the accuracy and effectiveness of the derivation. Furthermore, the results demonstrate that the FAS-ARIS system significantly outperforms other configurations in terms of OP, highlighting its potential to enhance communication performance and reliability in future 6G networks. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2410.17609 [pdf, other]

Exploring the Impact of RIS on Cooperative NOMA URLLC Systems: A Theoretical Perspective

Authors: Jianchao Zheng, Tuo Wu, Junteng Yao, Chau Yuen, Zhiguo Ding, Fumiyuki Adachi

Abstract: In this paper, we conduct a theoretical analysis of how to integrate reconfigurable intelligent surfaces (RIS) with cooperative non-orthogonal multiple access (NOMA), considering URLLC. We consider a downlink two-user cooperative NOMA system employing short-packet communications, where the two users are denoted by the central user (CU) and the cell-edge user (CEU), respectively, and an RIS is depl… ▽ More In this paper, we conduct a theoretical analysis of how to integrate reconfigurable intelligent surfaces (RIS) with cooperative non-orthogonal multiple access (NOMA), considering URLLC. We consider a downlink two-user cooperative NOMA system employing short-packet communications, where the two users are denoted by the central user (CU) and the cell-edge user (CEU), respectively, and an RIS is deployed to enhance signal quality. Specifically, compared to CEU, CU lies nearer from BS and enjoys the higher channel gains. Closed-form expressions for the CU's average block error rate (BLER) are derived. Furthermore, we evaluate the CEU's BLER performance utilizing selective combining (SC) and derive a tight lower bound under maximum ratio combining (MRC). Simulation results are provided to our analyses and demonstrate that the RIS-assisted system significantly outperforms its counterpart without RIS in terms of BLER. Notably, MRC achieves a squared multiple of the diversity gain of the SC, leading to more reliable performance, especially for the CEU. Furthermore, by dividing the RIS into two zones, each dedicated to a specific user, the average BLER can be further reduced, particularly for the CEU. △ Less

Submitted 23 October, 2024; originally announced October 2024.

arXiv:2410.12218 [pdf, other]

Exploring Dual-Sniffer Passive Localization: Algorithm Design and Experimental Results

Authors: Tuo Wu, Lingyu Hou, Hong Niu, Saihua Xu, Sirajudeen Gulam Razul, Chau Yuen

Abstract: In this paper, we explore a dual-sniffer passive localization system that detects the timing difference of signals from both commercial base station (eNb) and user equipment (UE) to the sniffers. We design two localization schemes for UE localization: a time of arrival (ToA) based scheme and a time difference of arrival (TDoA) based scheme. In the ToA-based scheme, we derive two ellipse equations… ▽ More In this paper, we explore a dual-sniffer passive localization system that detects the timing difference of signals from both commercial base station (eNb) and user equipment (UE) to the sniffers. We design two localization schemes for UE localization: a time of arrival (ToA) based scheme and a time difference of arrival (TDoA) based scheme. In the ToA-based scheme, we derive two ellipse equations from measured arrival times at two sniffers, enabling direct numerical computation of the estimated position. For the TDoA-based scheme, we relocate one sniffer to a different position to obtain two sets of TDoA measurements, resulting in hyperbola equations. We then apply a least squares (LS) algorithm to analytically estimate the UE's position. Simulation results validate the effectiveness of the proposed TDoA-based scheme, demonstrating improved accuracy in UE positioning.We build a platform based on the considered localization system and conduct real-world experiments. The experimental results confirm the accuracy and practicality of the TDoA-based dual-sniffer localization scheme, demonstrating improved precision in passive localization. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.06115 [pdf, other]

A physics-based perspective for understanding and utilizing spatial resources of wireless channels

Authors: Hui Xu, Jun Wei Wu, Zhen Jie Qi, Hao Tian Wu, Rui Wen Shao, Qiang Cheng, Jieao Zhu, Linglong Dai, Tie Jun Cui

Abstract: To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the pre… ▽ More To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the previous studies were primarily focused on frame analysis, with limited exploration of practical applications and a comprehensive understanding of its essential physical characteristics. In this paper, we present a three-dimensional (3-D) line-of-sight channel capacity formula that captures the vector EM physics and accommodates both near- and far-field scenes. Based on the rigorous mathematical equation and the physical mechanism of fast multipole expansion, a channel model is established, and the finite angular spectral bandwidth feature of scattered waves is revealed. To adapt to the feature of the channel, an optimization problem is formulated for determining the mode currents on the transmitter, aiming to obtain the optimal design of the precoder and combiner. We make comprehensive analyses to investigate the relationship among the spatial degree of freedom, noise, and transmitted power, thereby establishing a rigorous upper bound of channel capacity. A series of simulations are conducted to validate the theoretical model and numerical method. This work offers a novel perspective and methodology for understanding and leveraging EIT, and provides a theoretical foundation for the design and optimization of future wireless communications. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 31pages, 8 figures

arXiv:2409.12962 [pdf, other]

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

Authors: Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

Abstract: The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input. Evaluating these machine-generated audio captions is a complex task that requires considering diverse factors, among them, auditory scene understanding, sound-object inference, temporal coherence, and the environmental context of the scene. While current methods focus on specific aspe… ▽ More The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input. Evaluating these machine-generated audio captions is a complex task that requires considering diverse factors, among them, auditory scene understanding, sound-object inference, temporal coherence, and the environmental context of the scene. While current methods focus on specific aspects, they often fail to provide an overall score that aligns well with human judgment. In this work, we propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models (LLMs) to evaluate candidate audio captions by directly asking LLMs for a semantic distance score. In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics, with a 5.8% relative accuracy improvement compared to the domain-specific FENSE metric and up to 11% over the best general-purpose measure on the Clotho-Eval dataset. Moreover, CLAIR-A offers more transparency by allowing the language model to explain the reasoning behind its scores, with these explanations rated up to 30% better by human evaluators than those provided by baseline methods. CLAIR-A is made publicly available at https://github.com/DavidMChan/clair-a. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: Code is publicly available at https://github.com/DavidMChan/clair-a

arXiv:2408.13447 [pdf, ps, other]

FAS-RIS Communication: Model, Analysis, and Optimization

Authors: Junteng Yao, Jianchao Zheng, Tuo Wu, Ming Jin, Chau Yuen, Kai-Kit Wong, Fumiyuki Adachi

Abstract: This correspondence investigates the novel fluid antenna system (FAS) technology, combining with reconfigurable intelligent surface (RIS) for wireless communications, where a base station (BS) communicates with a FAS-enabled user with the assistance of a RIS. To analyze this technology, we derive the outage probability based on the block-diagonal matrix approximation (BDMA) model. With this, we ob… ▽ More This correspondence investigates the novel fluid antenna system (FAS) technology, combining with reconfigurable intelligent surface (RIS) for wireless communications, where a base station (BS) communicates with a FAS-enabled user with the assistance of a RIS. To analyze this technology, we derive the outage probability based on the block-diagonal matrix approximation (BDMA) model. With this, we obtain the upper bound, lower bound, and asymptotic approximation of the outage probability to gain more insights. Moreover, we design the phase shift matrix of the RIS in order to minimize the system outage probability. Simulation results confirm the accuracy of our approximations and that the proposed schemes outperform benchmarks significantly. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.13444 [pdf, ps, other]

FAS-RIS: A Block-Correlation Model Analysis

Authors: Xiazhi Lai, Junteng Yao, Kangda Zhi, Tuo Wu, David Morales-Jimenez, Kai-Kit Wong

Abstract: In this correspondence, we analyze the performance of a reconfigurable intelligent surface (RIS)-aided communication system that involves a fluid antenna system (FAS)-enabled receiver. By applying the central limit theorem (CLT), we derive approximate expressions for the system outage probability when the RIS has a large number of elements. Also, we adopt the block-correlation channel model to sim… ▽ More In this correspondence, we analyze the performance of a reconfigurable intelligent surface (RIS)-aided communication system that involves a fluid antenna system (FAS)-enabled receiver. By applying the central limit theorem (CLT), we derive approximate expressions for the system outage probability when the RIS has a large number of elements. Also, we adopt the block-correlation channel model to simplify the outage probability expressions, reducing the computational complexity and shedding light on the impact of the number of ports. Numerical results validate the effectiveness of our analysis, especially in scenarios with a large number of RIS elements. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.09067 [pdf, ps, other]

FAS vs. ARIS: Which Is More Important for FAS-ARIS Communication Systems?

Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Chongwen Huang, Chau Yuen

Abstract: In this paper, we investigate the question of which technology, fluid antenna systems (FAS) or active reconfigurable intelligent surfaces (ARIS), plays a more crucial role in FAS-ARIS wireless communication systems. To address this, we develop a comprehensive system model and explore the problem from an optimization perspective. We introduce an alternating optimization (AO) algorithm incorporating… ▽ More In this paper, we investigate the question of which technology, fluid antenna systems (FAS) or active reconfigurable intelligent surfaces (ARIS), plays a more crucial role in FAS-ARIS wireless communication systems. To address this, we develop a comprehensive system model and explore the problem from an optimization perspective. We introduce an alternating optimization (AO) algorithm incorporating majorization-minimization (MM), successive convex approximation (SCA), and sequential rank-one constraint relaxation (SRCR) to tackle the non-convex challenges inherent in these systems. Specifically, for the transmit beamforming of the BS optimization, we propose a closed-form rank-one solution with low-complexity. For the optimization the positions of fluid antennas (FAs) of the BS, the Taylor expansions and MM algorithm are utilized to construct the effective lower bounds and upper bounds of the objective function and constraints, transforming the non-convex optimization problem into a convex one. Furthermore, we use the SCA and SRCR to optimize the reflection coefficient matrix of the ARIS and effectively solve the rank-one constraint. Simulation results reveal that the relative importance of FAS and ARIS varies depending on the scenario: FAS proves more critical in simpler models with fewer reflecting elements or limited transmission paths, while ARIS becomes more significant in complex scenarios with a higher number of reflecting elements or transmission paths. Ultimately, the integration of both FAS and ARIS creates a win-win scenario, resulting in a more robust and efficient communication system. This study underscores the importance of combining FAS with ARIS, as their complementary use provides the most substantial benefits across different communication environments. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.03124 [pdf, other]

CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems

Authors: Long Wei, Haodong Feng, Yuchen Yang, Ruiqi Feng, Peiyan Hu, Xiang Zheng, Tao Zhang, Dixia Fan, Tailin Wu

Abstract: The control problems of complex physical systems have broad applications in science and engineering. Previous studies have shown that generative control methods based on diffusion models offer significant advantages for solving these problems. However, existing generative control approaches face challenges in both performance and efficiency when extended to the closed-loop setting, which is essent… ▽ More The control problems of complex physical systems have broad applications in science and engineering. Previous studies have shown that generative control methods based on diffusion models offer significant advantages for solving these problems. However, existing generative control approaches face challenges in both performance and efficiency when extended to the closed-loop setting, which is essential for effective control. In this paper, we propose an efficient Closed-Loop Diffusion method for Physical systems Control (CL-DiffPhyCon). By employing an asynchronous denoising framework for different physical time steps, CL-DiffPhyCon generates control signals conditioned on real-time feedback from the system with significantly reduced computational cost during sampling. Additionally, the control process could be further accelerated by incorporating fast sampling techniques, such as DDIM. We evaluate CL-DiffPhyCon on two tasks: 1D Burgers' equation control and 2D incompressible fluid control. The results demonstrate that CL-DiffPhyCon achieves superior control performance with significant improvements in sampling efficiency. The code can be found at https://github.com/AI4Science-WestlakeU/CL_DiffPhyCon. △ Less

Submitted 22 February, 2025; v1 submitted 31 July, 2024; originally announced August 2024.

Comments: Published as a conference paper at ICLR 2025

arXiv:2407.19663 [pdf, other]

Short-Term Photovoltaic Forecasting Model for Qualifying Uncertainty during Hazy Weather

Authors: Xuan Yang, Yunxuan Dong, Lina Yang, Thomas Wu

Abstract: Solar energy is one of the most promising renewable energy resources. Forecasting photovoltaic power generation is an important way to increase photovoltaic penetration. However, the difficulty in qualifying the uncertainty of PV power generation, especially during hazy weather, makes forecasting challenging. This paper proposes a novel model to address the issue. We introduce a modified entropy t… ▽ More Solar energy is one of the most promising renewable energy resources. Forecasting photovoltaic power generation is an important way to increase photovoltaic penetration. However, the difficulty in qualifying the uncertainty of PV power generation, especially during hazy weather, makes forecasting challenging. This paper proposes a novel model to address the issue. We introduce a modified entropy to qualify uncertainty during hazy weather while clustering and attention mechanisms are employed to reduce computational costs and enhance forecasting accuracy, respectively. Hyperparameters were adjusted using an optimization algorithm. Experiments on two datasets related to hazy weather demonstrate that our model significantly improves forecasting accuracy compared to existing models. △ Less

Submitted 7 October, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

Comments: The manuscript was submitted to Applied Energy on August 29, 2024

arXiv:2407.11307 [pdf, ps, other]

Fluid Antenna-Assisted Simultaneous Wireless Information and Power Transfer Systems

Authors: Liaoshi Zhou, Junteng Yao, Tuo Wu, Ming Jin, Chau Yuen, Fumiyuki Adachi

Abstract: This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The s… ▽ More This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The system comprises a base station (BS) equipped with multiple FAs that transmit signals to an energy receiver (ER) and an information receiver (IR), both equipped with a single FA. Our objective is to maximize the communication rate between the BS and the IR while satisfying the harvested power requirement of the ER. This involves jointly optimizing the BS's transmit beamforming and the positions of all FAs. To address this complex convex optimization problem, we employ an alternating optimization (AO) approach, decomposing it into three sub-problems and solving them iteratively using first and second-order Taylor expansions. Simulation results validate the effectiveness of our proposed FA-assisted SWIPT system, demonstrating significant performance improvements over traditional FPA-based systems. △ Less

Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08141 [pdf, ps, other]

A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that provides transmission design for both static scenarios with the knowledge of channel state information (CSI) and harsh environments where CSI is hard to acquire. It leads to two approaches: a CSI-based scheme where CSI is available, and a CSI-free scheme when CSI is inaccessible. Given the complex spatial correlations in FAS, we employ block-diagonal matrix approximation and independent antenna equivalent models to simplify the derivation of outage probabilities in both cases. Based on the derived outage probabilities, we then optimize the throughput of the FAS-RIS system. For the CSI-based scheme, we first propose a gradient ascent-based algorithm to obtain a near-optimal solution. Then, to address the possible high computational complexity in the gradient algorithm, we approximate the objective function and confirm a unique optimal solution accessible through a bisection search method. For the CSI-free scheme, we apply the partial gradient ascent algorithm, reducing complexity further than full gradient algorithms. We also approximate the objective function and derive a locally optimal closed-form solution to maximize throughput. Simulation results validate the effectiveness of the proposed framework for the transmission design in FAS-RIS systems. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: submitted to IEEE journal for possible publication

arXiv:2407.07720 [pdf, other]

Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

Authors: Wei Dai, Rui Liu, Zixuan Wu, Tianyi Wu, Min Wang, Junxian Zhou, Yixuan Yuan, Jun Liu

Abstract: Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise… ▽ More Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively. △ Less

Submitted 5 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures, under review

arXiv:2407.05643 [pdf, ps, other]

doi 10.1109/TSP.2024.3512575

Revisiting XL-MIMO Channel Estimation: When Dual-Wideband Effects Meet Near Field

Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Yijian Chen, Hongkang Yu, Maged Elkashlan

Abstract: The deployment of extremely large antenna arrays (ELAAs) and operation at higher frequency bands in wideband extremely large-scale multiple-input-multiple-output (XL-MIMO) systems introduce significant near-field effects, such as spherical wavefront propagation and spatially non-stationary (SnS) properties. Combined with dual-wideband impacts, these effects fundamentally reshape the sparsity patte… ▽ More The deployment of extremely large antenna arrays (ELAAs) and operation at higher frequency bands in wideband extremely large-scale multiple-input-multiple-output (XL-MIMO) systems introduce significant near-field effects, such as spherical wavefront propagation and spatially non-stationary (SnS) properties. Combined with dual-wideband impacts, these effects fundamentally reshape the sparsity patterns of wideband XL-MIMO channels in the angular-delay domain, making existing sparsity-based channel estimation methods inadequate. To address these challenges, this paper revisits the channel estimation problem for wideband XL-MIMO systems, considering dual-wideband effects, spherical wavefront, and SnS properties. By leveraging the spatial-chirp property of near-field array responses, we quantitatively characterize the sparsity patterns of wideband XL-MIMO channels in the angular-delay domain, revealing global block sparsity and local common-delay sparsity. Building on this structured sparsity, we formulate the wideband XL-MIMO channel estimation problem as a multiple measurement vector (MMV)-based Bayesian inference task and propose a novel column-wise hierarchical prior model to effectively capture the sparsity characteristics. To enable efficient channel reconstruction, we develop an MMV-based variational message passing (MMV-VMP) algorithm, tailored to the complex factor graph induced by the hierarchical prior. Simulation results validate the proposed algorithm, demonstrating its convergence and superior performance compared to existing methods, thus establishing its effectiveness in addressing the challenges of wideband XL-MIMO channel estimation under complex near-field conditions. △ Less

Submitted 16 June, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: A major revision version has been submitted to IEEE journal for possible publication

arXiv:2407.05289 [pdf, other]

DM-MIMO: Diffusion Models for Robust Semantic Communications over MIMO Channels

Authors: Yiheng Duan, Tong Wu, Zhiyong Chen, Meixia Tao

Abstract: This paper investigates robust semantic communications over multiple-input multiple-output (MIMO) fading channels. Current semantic communications over MIMO channels mainly focus on channel adaptive encoding and decoding, which lacks exploration of signal distribution. To leverage the potential of signal distribution in signal space denoising, we develop a diffusion model over MIMO channels (DM-MI… ▽ More This paper investigates robust semantic communications over multiple-input multiple-output (MIMO) fading channels. Current semantic communications over MIMO channels mainly focus on channel adaptive encoding and decoding, which lacks exploration of signal distribution. To leverage the potential of signal distribution in signal space denoising, we develop a diffusion model over MIMO channels (DM-MIMO), a plugin module at the receiver side in conjunction with singular value decomposition (SVD) based precoding and equalization. Specifically, due to the significant variations in effective noise power over distinct sub-channels, we determine the effective sampling steps accordingly and devise a joint sampling algorithm. Utilizing a three-stage training algorithm, DM-MIMO learns the distribution of the encoded signal, which enables noise elimination over all sub-channels. Experimental results demonstrate that the DM-MIMO effectively reduces the mean square errors (MSE) of the equalized signal and the DM-MIMO semantic communication system (DM-MIMO-JSCC) outperforms the JSCC-based semantic communication system in image reconstruction. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.00304 [pdf, ps, other]

doi 10.1109/JPROC.2025.3584656

A Review of Safe Reinforcement Learning Methods for Modern Power Systems

Authors: Tong Su, Tong Wu, Junbo Zhao, Anna Scaglione, Le Xie

Abstract: Given the availability of more comprehensive measurement data in modern power systems, reinforcement learning (RL) has gained significant interest in operation and control. Conventional RL relies on trial-and-error interactions with the environment and reward feedback, which often leads to exploring unsafe operating regions and executing unsafe actions, especially when deployed in real-world power… ▽ More Given the availability of more comprehensive measurement data in modern power systems, reinforcement learning (RL) has gained significant interest in operation and control. Conventional RL relies on trial-and-error interactions with the environment and reward feedback, which often leads to exploring unsafe operating regions and executing unsafe actions, especially when deployed in real-world power systems. To address these challenges, safe RL has been proposed to optimize operational objectives while ensuring safety constraints are met, keeping actions and states within safe regions throughout both training and deployment. Rather than relying solely on manually designed penalty terms for unsafe actions, as is common in conventional RL, safe RL methods reviewed here primarily leverage advanced and proactive mechanisms. These include techniques such as Lagrangian relaxation, safety layers, and theoretical guarantees like Lyapunov functions to rigorously enforce safety boundaries. This paper provides a comprehensive review of safe RL methods and their applications across various power system operations and control domains, including security control, real-time operation, operational planning, and emerging areas. It summarizes existing safe RL techniques, evaluates their performance, analyzes suitable deployment scenarios, and examines algorithm benchmarks and application environments. The paper also highlights real-world implementation cases and identifies critical challenges such as scalability in large-scale systems and robustness under uncertainty, providing potential solutions and outlining future directions to advance the reliable integration and deployment of safe RL in modern power systems. △ Less

Submitted 25 June, 2025; v1 submitted 28 June, 2024; originally announced July 2024.

Journal ref: Proceedings of the IEEE, 2025

arXiv:2406.16990 [pdf, other]

AND: Audio Network Dissection for Interpreting Deep Acoustic Models

Authors: Tung-Yu Wu, Yu-Xiang Lin, Tsui-Wei Weng

Abstract: Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework th… ▽ More Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework that automatically establishes natural language explanations of acoustic neurons based on highly-responsive audio. $\textit{AND}$ features the use of LLMs to summarize mutual acoustic features and identities among audio. Extensive experiments are conducted to verify $\textit{AND}$'s precise and informative descriptions. In addition, we demonstrate a potential use of $\textit{AND}$ for audio machine unlearning by conducting concept-specific pruning based on the generated descriptions. Finally, we highlight two acoustic model behaviors with analysis by $\textit{AND}$: (i) models discriminate audio with a combination of basic acoustic features rather than high-level abstract concepts; (ii) training strategies affect model behaviors and neuron interpretability -- supervised training guides neurons to gradually narrow their attention, while self-supervised learning encourages neurons to be polysemantic for exploring high-level features. △ Less

Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by ICML'24

Journal ref: Forty-first International Conference on Machine Learning (2024)

arXiv:2406.16876 [pdf, other]

Near-Field Mobile Tracking: A Framework of Using XL-RIS Information

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Junteng Yao, Hong Ren, Maged Elkashlan, Chau Yuen

Abstract: This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more… ▽ More This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more accurate tracking of mobile users (MUs). As the first step, we present an XL-RIS information reconstruction (XL-RIS-IR) algorithm to reconstruct the high-dimensional XL-RIS information from the low-dimensional BS information. Building on this, this paper proposes a comprehensive framework for mobile tracking, consisting of a Feature Extraction Module and a Mobile Tracking Module. The Feature Extraction Module incorporates a convolutional neural network (CNN) extractor for spatial features, a time and frequency (T$\&$F) extractor for domain features, and a near-field angles of arrival (AoAs) extractor for capturing AoA features within the XL-RIS. These features are combined into a comprehensive feature vector, forming a time-varying sequence fed into the Mobile Tracking Module, which employs an Auto-encoder (AE) with a stacked bidirectional long short-term memory (Bi-LSTM) encoder and a standard LSTM decoder to predict MUs' positions in the upcoming time slot. Simulation results confirm that the tracking accuracy of our proposed framework is significantly enhanced by using reconstructed XL-RIS information and exhibits substantial robustness to signal-to-noise ratio (SNR) variations. △ Less

Submitted 5 August, 2024; v1 submitted 3 April, 2024; originally announced June 2024.

Showing 1–50 of 136 results for author: Wu, T