Search | arXiv e-print repository

Enhancing Environment Generalizability for Deep Learning-Based CSI Feedback

Authors: Haoyu Wang, Shuangfeng Han, Xiaoyun Wang, Zhi Sun

Abstract: Accurate and low-overhead channel state information (CSI) feedback is essential to boost the capacity of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems. Deep learning-based CSI feedback significantly outperforms conventional approaches. Nevertheless, current deep learning-based CSI feedback algorithms exhibit limited generalizability to unseen environments, w… ▽ More Accurate and low-overhead channel state information (CSI) feedback is essential to boost the capacity of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems. Deep learning-based CSI feedback significantly outperforms conventional approaches. Nevertheless, current deep learning-based CSI feedback algorithms exhibit limited generalizability to unseen environments, which obviously increases the deployment cost. In this paper, we first model the distribution shift of CSI across different environments, which is composed of the distribution shift of multipath structure and a single-path. Then, EG-CsiNet is proposed as a novel CSI feedback learning framework to enhance environment-generalizability. Explicitly, EG-CsiNet comprises the modules of multipath decoupling and fine-grained alignment, which can address the distribution shift of multipath structure and a single path. Based on extensive simulations, the proposed EG-CsiNet can robustly enhance the generalizability in unseen environments compared to the state-of-the-art, especially in challenging conditions with a single source environment. △ Less

Submitted 9 July, 2025; originally announced July 2025.

arXiv:2507.03609 [pdf, ps, other]

Implicit Neural Representation of Beamforming for Continuous Aperture Array (CAPA) System

Authors: Shiyong Chen, Jia Guo, Shengqian Han

Abstract: In this paper, a learning-based approach for optimizing downlink beamforming in continuous aperture array (CAPA) systems is proposed, where a MIMO scenario that both the base station (BS) and the user are equipped with CAPA is considered. As the beamforming in the CAPA system is a function that maps a coordinate on the aperture to the beamforming weight at the coordinate, a DNN called BeaINR is pr… ▽ More In this paper, a learning-based approach for optimizing downlink beamforming in continuous aperture array (CAPA) systems is proposed, where a MIMO scenario that both the base station (BS) and the user are equipped with CAPA is considered. As the beamforming in the CAPA system is a function that maps a coordinate on the aperture to the beamforming weight at the coordinate, a DNN called BeaINR is proposed to parameterize this function, which is called implicit neural representation (INR). We further find that the optimal beamforming function lies in the subspace of channel function, i.e., it can be expressed as a weighted integral of channel function. Based on this finding, we propose another DNN called CoefINR to learn the weighting coefficient with INR, which has lower complexity than learning the beamforming function with BeaINR. Simulation results show that the proposed INR-based methods outperform numerical baselines in both spectral efficiency (SE) and inference time, with CoefINR offering additional training efficiency. △ Less

Submitted 4 July, 2025; originally announced July 2025.

Comments: 5 pages, 3 figures

arXiv:2506.16741 [pdf, ps, other]

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

Authors: Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song

Abstract: We introduce RapFlow-TTS, a rapid and high-fidelity TTS acoustic model that leverages velocity consistency constraints in flow matching (FM) training. Although ordinary differential equation (ODE)-based TTS generation achieves natural-quality speech, it typically requires a large number of generation steps, resulting in a trade-off between quality and inference speed. To address this challenge, Ra… ▽ More We introduce RapFlow-TTS, a rapid and high-fidelity TTS acoustic model that leverages velocity consistency constraints in flow matching (FM) training. Although ordinary differential equation (ODE)-based TTS generation achieves natural-quality speech, it typically requires a large number of generation steps, resulting in a trade-off between quality and inference speed. To address this challenge, RapFlow-TTS enforces consistency in the velocity field along the FM-straightened ODE trajectory, enabling consistent synthetic quality with fewer generation steps. Additionally, we introduce techniques such as time interval scheduling and adversarial learning to further enhance the quality of the few-step synthesis. Experimental results show that RapFlow-TTS achieves high-fidelity speech synthesis with a 5- and 10-fold reduction in synthesis steps than the conventional FM- and score-based approaches, respectively. △ Less

Submitted 20 June, 2025; originally announced June 2025.

Comments: Accepted on Interspeech 2025

arXiv:2506.09375 [pdf, ps, other]

CoLMbo: Speaker Language Model for Descriptive Profiling

Authors: Massa Baali, Shuo Han, Syed Abdul Hannan, Purusottam Samal, Karanveer Singh, Soham Deshmukh, Rita Singh, Bhiksha Raj

Abstract: Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that… ▽ More Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that addresses these limitations by integrating a speaker encoder with prompt-based conditioning. This allows for the creation of detailed captions based on speaker embeddings. CoLMbo utilizes user-defined prompts to adapt dynamically to new speaker characteristics and provides customized descriptions, including regional dialect variations and age-related traits. This innovative approach not only enhances traditional speaker profiling but also excels in zero-shot scenarios across diverse datasets, marking a significant advancement in the field of speaker recognition. △ Less

Submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.06400 [pdf, ps, other]

ResPF: Residual Poisson Flow for Efficient and Physically Consistent Sparse-View CT Reconstruction

Authors: Changsheng Fang, Yongtong Liu, Bahareh Morovati, Shuo Han, Yu Shi, Li Zhou, Shuyi Fan, Hengyong Yu

Abstract: Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting ill-posed inverse problem poses significant challenges for accurate image reconstruction. Although deep learning and diffusion-based methods have shown promising results, they often lack physical interpretability or suffer from high computational costs due to iterative sampling starting from ra… ▽ More Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting ill-posed inverse problem poses significant challenges for accurate image reconstruction. Although deep learning and diffusion-based methods have shown promising results, they often lack physical interpretability or suffer from high computational costs due to iterative sampling starting from random noise. Recent advances in generative modeling, particularly Poisson Flow Generative Models (PFGM), enable high-fidelity image synthesis by modeling the full data distribution. In this work, we propose Residual Poisson Flow (ResPF) Generative Models for efficient and accurate sparse-view CT reconstruction. Based on PFGM++, ResPF integrates conditional guidance from sparse measurements and employs a hijacking strategy to significantly reduce sampling cost by skipping redundant initial steps. However, skipping early stages can degrade reconstruction quality and introduce unrealistic structures. To address this, we embed a data-consistency into each iteration, ensuring fidelity to sparse-view measurements. Yet, PFGM sampling relies on a fixed ordinary differential equation (ODE) trajectory induced by electrostatic fields, which can be disrupted by step-wise data consistency, resulting in unstable or degraded reconstructions. Inspired by ResNet, we introduce a residual fusion module to linearly combine generative outputs with data-consistent reconstructions, effectively preserving trajectory continuity. To the best of our knowledge, this is the first application of Poisson flow models to sparse-view CT. Extensive experiments on synthetic and clinical datasets demonstrate that ResPF achieves superior reconstruction quality, faster inference, and stronger robustness compared to state-of-the-art iterative, learning-based, and diffusion models. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2506.02197 [pdf, ps, other]

NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution

Authors: Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Zhang, Xin Luo, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang , et al. (14 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and… ▽ More This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and noise degradations, (ii) upscale RAW Bayer images by 2x, considering unknown noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. This report presents the current state-of-the-art in RAW Restoration. △ Less

Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

arXiv:2506.01460 [pdf, ps, other]

Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement

Authors: Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee

Abstract: Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schrödinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and requir… ▽ More Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schrödinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and require a large number of sampling steps -- more than 50. Our investigation reveals that the performance of baseline models significantly degrades when the number of sampling steps is reduced, particularly under low-SNR conditions. We propose integrating Schrödinger Bridge with GANs to effectively mitigate this issue, achieving high-quality outputs on full-band datasets while substantially reducing the required sampling steps. Experimental results demonstrate that our proposed model outperforms existing baselines, even with a single inference step, in both denoising and dereverberation tasks. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: Accepted to Interspeech 2025

arXiv:2505.13867 [pdf, other]

Generalizable Learning for Frequency-Domain Channel Extrapolation under Distribution Shift

Authors: Haoyu Wang, Zhi Sun, Shuangfeng Han, Xiaoyun Wang, Zhaocheng Wang

Abstract: Frequency-domain channel extrapolation is effective in reducing pilot overhead for massive multiple-input multiple-output (MIMO) systems. Recently, Deep learning (DL) based channel extrapolator has become a promising candidate for modeling complex frequency-domain dependency. Nevertheless, current DL extrapolators fail to operate in unseen environments under distribution shift, which poses challen… ▽ More Frequency-domain channel extrapolation is effective in reducing pilot overhead for massive multiple-input multiple-output (MIMO) systems. Recently, Deep learning (DL) based channel extrapolator has become a promising candidate for modeling complex frequency-domain dependency. Nevertheless, current DL extrapolators fail to operate in unseen environments under distribution shift, which poses challenges for large-scale deployment. In this paper, environment generalizable learning for channel extrapolation is achieved by realizing distribution alignment from a physics perspective. Firstly, the distribution shift of wireless channels is rigorously analyzed, which comprises the distribution shift of multipath structure and single-path response. Secondly, a physics-based progressive distribution alignment strategy is proposed to address the distribution shift, which includes successive path-oriented design and path alignment. Path-oriented DL extrapolator decomposes multipath channel extrapolation into parallel extrapolations of the extracted path, which can mitigate the distribution shift of multipath structure. Path alignment is proposed to address the distribution shift of single-path response in path-oriented DL extrapolators, which eventually enables generalizable learning for channel extrapolation. In the simulation, distinct wireless environments are generated using the precise ray-tracing tool. Based on extensive evaluations, the proposed path-oriented DL extrapolator with path alignment can reduce extrapolation error by more than 6 dB in unseen environments compared to the state-of-the-arts. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2504.19522 [pdf, other]

A Model-based DNN for Learning HMIMO Beamforming

Authors: Shiyong Chen, Shengqian Han

Abstract: Holographic MIMO (HMIMO) is a promising technique for large-scale MIMO systems to enhance spectral efficiency while maintaining low hardware cost and power consumption. Existing alternating optimization algorithms can effectively optimize the hybrid beamforming of HMIMO to improve the system performance, while their high computational complexity hinders real-time application. In this paper, we pro… ▽ More Holographic MIMO (HMIMO) is a promising technique for large-scale MIMO systems to enhance spectral efficiency while maintaining low hardware cost and power consumption. Existing alternating optimization algorithms can effectively optimize the hybrid beamforming of HMIMO to improve the system performance, while their high computational complexity hinders real-time application. In this paper, we propose a model-based deep neural network (MB-DNN), which leverages permutation equivalent properties and the optimal beamforming structure to jointly optimize the holographic and digital beamforming. Simulation results demonstrate that the proposed MB-DNN outperforms benchmark schemes and requires much less inference time than existing alternating optimization algorithms. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 5 pages,4figures

MSC Class: 68T07; 90B18; 94A05

arXiv:2504.13276 [pdf, other]

Strategic Planning of Stealthy Backdoor Attacks in Markov Decision Processes

Authors: Xinyi Wei, Shuo Han, Ahmed H. Hemida, Charles A. Kamhoua, Jie Fu

Abstract: This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective i… ▽ More This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.04097 [pdf, other]

Risk-Aware Robot Control in Dynamic Environments Using Belief Control Barrier Functions

Authors: Shaohang Han, Matti Vahs, Jana Tumova

Abstract: Ensuring safety for autonomous robots operating in dynamic environments can be challenging due to factors such as unmodeled dynamics, noisy sensor measurements, and partial observability. To account for these limitations, it is common to maintain a belief distribution over the true state. This belief could be a non-parametric, sample-based representation to capture uncertainty more flexibly. In th… ▽ More Ensuring safety for autonomous robots operating in dynamic environments can be challenging due to factors such as unmodeled dynamics, noisy sensor measurements, and partial observability. To account for these limitations, it is common to maintain a belief distribution over the true state. This belief could be a non-parametric, sample-based representation to capture uncertainty more flexibly. In this paper, we propose a novel form of Belief Control Barrier Functions (BCBFs) specifically designed to ensure safety in dynamic environments under stochastic dynamics and a sample-based belief about the environment state. Our approach incorporates provable concentration bounds on tail risk measures into BCBFs, effectively addressing possible multimodal and skewed belief distributions represented by samples. Moreover, the proposed method demonstrates robustness against distributional shifts up to a predefined bound. We validate the effectiveness and real-time performance (approximately 1kHz) of the proposed method through two simulated underwater robotic applications: object tracking and dynamic collision avoidance. △ Less

Submitted 5 April, 2025; originally announced April 2025.

arXiv:2504.00361 [pdf, ps, other]

Adaptive Radar Detection in joint Range and Azimuth based on the Hierarchical Latent Variable Model

Authors: Linjie Yan, Chengpeng Hao, Sudan Han, Giuseppe Ricci, Zhanhao Hu, Danilo Orlando

Abstract: This paper focuses on the design of a robust decision scheme capable of operating in target-rich scenarios with unknown signal signatures (including their range positions, angles of arrival, and number) in a background of Gaussian disturbance. To solve the problem at hand, a novel estimation procedure is conceived resorting to the expectation-maximization algorithm in conjunction with the hierarch… ▽ More This paper focuses on the design of a robust decision scheme capable of operating in target-rich scenarios with unknown signal signatures (including their range positions, angles of arrival, and number) in a background of Gaussian disturbance. To solve the problem at hand, a novel estimation procedure is conceived resorting to the expectation-maximization algorithm in conjunction with the hierarchical latent variable model that are exploited to come up with a maximum \textit{a posteriori} rule for reliable signal classification and angle of arrival estimation. The estimates returned by the procedure are then used to build up an adaptive detection architecture in range and azimuth based on the likelihood ratio test with enhanced detection performance. Remarkably, it is shown that the new decision scheme can maintain constant the false alarm rate when the interference parameters vary in the considered range of values. The performance assessment, conducted by means of Monte Carlo simulation, highlights that the proposed detector exhibits superior detection performance in comparison with the existing GLRT-based competitors. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.20490 [pdf, other]

Model Predictive Control for Tracking Bounded References With Arbitrary Dynamics

Authors: Shibo Han, Bonan Hou, Yuhao Zhang, Xiaotong Shi, Xingwei Zhao

Abstract: In this article, a model predictive control (MPC) method is proposed for constrained linear systems to track bounded references with arbitrary dynamics. Besides control inputs to be determined, artificial reference is introduced as additional decision variable, which serves as an intermediate target to cope with sudden changes of reference and enlarges domain of attraction. Cost function penalizes… ▽ More In this article, a model predictive control (MPC) method is proposed for constrained linear systems to track bounded references with arbitrary dynamics. Besides control inputs to be determined, artificial reference is introduced as additional decision variable, which serves as an intermediate target to cope with sudden changes of reference and enlarges domain of attraction. Cost function penalizes both artificial state error and reference error, while terminal constraint is imposed on artificial state error and artificial reference. We specify the requirements for terminal constraint and cost function to guarantee recursive feasibility of the proposed method and asymptotic stability of tracking error. Then, periodic and non-periodic references are considered and the method to determine required cost function and terminal constraint is proposed. Finally, the efficiency of the proposed MPC controller is demonstrated with simulation examples. △ Less

Submitted 26 March, 2025; originally announced March 2025.

arXiv:2503.12308 [pdf]

AI-driven 6G Air Interface: Technical Usage Scenarios and Balanced Design Methodology

Authors: Xiaoyun Wang, Shuangfeng Han, Zhiming Liu, Qixing Wang, Jiangzhou Wang, Chih-Lin I

Abstract: This paper systematically analyzes the typical application scenarios and key technical challenges of AI in 6G air interface transmission, covering important areas such as performance enhancement of single functional modules, joint optimization of multiple functional modules, and low-complexity solutions to complex mathematical problems. Innovatively, a three-dimensional joint optimization design c… ▽ More This paper systematically analyzes the typical application scenarios and key technical challenges of AI in 6G air interface transmission, covering important areas such as performance enhancement of single functional modules, joint optimization of multiple functional modules, and low-complexity solutions to complex mathematical problems. Innovatively, a three-dimensional joint optimization design criterion is proposed, which comprehensively considers AI capability, quality, and cost. By maximizing the ratio of multi-scenario communication capability to comprehensive cost, a triangular equilibrium is achieved, effectively addressing the lack of consideration for quality and cost dimensions in existing design criteria. The effectiveness of the proposed method is validated through multiple design examples, and the technical pathways and challenges for air interface AI standardization are thoroughly discussed. This provides significant references for the theoretical research and engineering practice of 6G air interface AI technology. △ Less

Submitted 15 March, 2025; originally announced March 2025.

Comments: 19 pages, in Chinese language, 1 figure, 20 references

arXiv:2503.11787 [pdf, ps, other]

ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement

Authors: Samuel W. Remedios, Shuwen Wei, Shuo Han, Jinwei Zhang, Aaron Carass, Kurt G. Schilling, Dzung L. Pham, Jerry L. Prince, Blake E. Dewey

Abstract: In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps be… ▽ More In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps between slices. Super-resolution (SR) methods aim to address this problem, but previous methods do not address all of the following: slice profile shape estimation, slice gap, domain shift, and non-integer or arbitrary upsampling factors. In this paper, we propose ECLARE (Efficient Cross-planar Learning for Anisotropic Resolution Enhancement), a self-SR method that addresses each of these factors. ECLARE uses a slice profile estimated from the multi-slice 2D MR volume, trains a network to learn the mapping from low-resolution to high-resolution in-plane patches from the same volume, and performs SR with anti-aliasing. We compared ECLARE to cubic B-spline interpolation, SMORE, and other contemporary SR methods. We used realistic and representative simulations so that quantitative performance against ground truth can be computed, and ECLARE outperformed all other methods in both signal recovery and downstream tasks. Importantly, as ECLARE does not use external training data it cannot suffer from domain shift between training and testing. Our code is open-source and available at https://www.github.com/sremedios/eclare. △ Less

Submitted 21 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

arXiv:2503.09398 [pdf, other]

Precoder Learning by Leveraging Unitary Equivariance Property

Authors: Yilun Ge, Shuyao Liao, Shengqian Han, Chenyang Yang

Abstract: Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of… ▽ More Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of the weight matrix of DNNs. In this paper, we study a stronger property than permutation equivariance, namely unitary equivariance, for precoder learning. We first show that a DNN with unitary equivariance designed by further introducing parameter sharing into a permutation equivariant DNN is unable to learn the optimal precoder. We proceed to develop a novel non-linear weighting process satisfying unitary equivariance and then construct a joint unitary and permutation equivariant DNN. Simulation results demonstrate that the proposed DNN not only outperforms existing learning methods in learning performance and generalizability but also reduces training complexity. △ Less

Submitted 12 March, 2025; originally announced March 2025.

arXiv:2503.08125 [pdf, other]

Quantization Design for Deep Learning-Based CSI Feedback

Authors: Manru Yin, Shengqian Han, Chenyang Yang

Abstract: Deep learning-based autoencoders have been employed to compress and reconstruct channel state information (CSI) in frequency-division duplex systems. Practical implementations require judicious quantization of encoder outputs for digital transmission. In this paper, we propose a novel quantization module with bit allocation among encoder outputs and develop a method for joint training the module a… ▽ More Deep learning-based autoencoders have been employed to compress and reconstruct channel state information (CSI) in frequency-division duplex systems. Practical implementations require judicious quantization of encoder outputs for digital transmission. In this paper, we propose a novel quantization module with bit allocation among encoder outputs and develop a method for joint training the module and the autoencoder. To enhance learning performance, we design a loss function that adaptively weights the quantization loss and the logarithm of reconstruction loss. Simulation results show the performance gain of the proposed method over existing baselines. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.06875 [pdf, other]

Distributed Resource Block Allocation for Wideband Cell-free System

Authors: Yang Ma, Shengqian Han, Chenyang Yang

Abstract: This paper studies distributed resource block (RB) allocation in wideband orthogonal frequency-division multiplexing (OFDM) cell-free systems. We propose a novel distributed sequential algorithm and its two variants, which optimize RB allocation based on the information obtained through over-the-air (OTA) transmissions between access points (APs) and user equipments, enabling local decision update… ▽ More This paper studies distributed resource block (RB) allocation in wideband orthogonal frequency-division multiplexing (OFDM) cell-free systems. We propose a novel distributed sequential algorithm and its two variants, which optimize RB allocation based on the information obtained through over-the-air (OTA) transmissions between access points (APs) and user equipments, enabling local decision updates at each AP. To reduce the overhead of OTA transmission, we further develop a distributed deep learning (DL)-based method to learn the RB allocation policy. Simulation results demonstrate that the proposed distributed algorithms perform close to the centralized algorithm, while the DL-based method outperforms existing baseline methods. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2503.06638 [pdf, other]

Learning of Uplink Resource Allocation with Multiuser QoS Constraints

Authors: Manru Yin, Shengqian Han, Chenyang Yang

Abstract: In the paper the joint optimization of uplink multiuser power and resource block (RB) allocation are studied, where each user has quality of service (QoS) constraints on both long- and short-blocklength transmissions. The objective is to minimize the consumption of RBs for meeting the QoS requirements, leading to a mixed-integer nonlinear programming (MINLP) problem. We resort to deep learning to… ▽ More In the paper the joint optimization of uplink multiuser power and resource block (RB) allocation are studied, where each user has quality of service (QoS) constraints on both long- and short-blocklength transmissions. The objective is to minimize the consumption of RBs for meeting the QoS requirements, leading to a mixed-integer nonlinear programming (MINLP) problem. We resort to deep learning to solve the problem with low inference complexity. To provide a performance benchmark for learning based methods, we propose a hierarchical algorithm to find the global optimal solution in the single-user scenario, which is then extended to the multiuser scenario. The design of the learning method, however, is challenging due to the discrete policy to be learned, which results in either vanishing or exploding gradient during neural network training. We introduce two types of smoothing functions to approximate the involved discretizing processes and propose a smoothing parameter adaption method. Another critical challenge lies in guaranteeing the QoS constraints. To address it, we design a nonlinear function to intensify the penalties for minor constraint violations. Simulation results demonstrate the advantages of the proposed method in reducing the number of occupied RBs and satisfying QoS constraints reliably. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2503.06077 [pdf, other]

Gradient-Driven Graph Neural Networks for Learning Digital and Hybrid Precoder

Authors: Lin Zhang, Shengqian Han, Chenyang Yang

Abstract: The optimization of multi-user multi-input multi-output (MU-MIMO) precoders is a widely recognized challenging problem. Existing work has demonstrated the potential of graph neural networks (GNNs) in learning precoding policies. However, existing GNNs often exhibit poor generalizability for the numbers of users or antennas. In this paper, we develop a gradient-driven GNN design method for the lear… ▽ More The optimization of multi-user multi-input multi-output (MU-MIMO) precoders is a widely recognized challenging problem. Existing work has demonstrated the potential of graph neural networks (GNNs) in learning precoding policies. However, existing GNNs often exhibit poor generalizability for the numbers of users or antennas. In this paper, we develop a gradient-driven GNN design method for the learning of fully digital and hybrid precoding policies. The proposed GNNs leverage two kinds of knowledge, namely the gradient of signal-to-interference-plus-noise ratio (SINR) to the precoders and the permutation equivariant property of the precoding policy. To demonstrate the flexibility of the proposed method for accommodating different optimization objectives and different precoding policies, we first apply the proposed method to learn the fully digital precoding policies. We study two precoder optimization problems for spectral efficiency (SE) maximization and log-SE maximization to achieve proportional fairness. We then apply the proposed method to learn the hybrid precoding policy, where the gradients to analog and digital precoders are exploited for the design of the GNN. Simulation results show the effectiveness of the proposed methods for learning different precoding policies and better generalization performance to the numbers of both users and antennas compared to baseline GNNs. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.04497 [pdf, other]

Precoder Learning for Weighted Sum Rate Maximization

Authors: Mingyu Deng, Shengqian Han

Abstract: Weighted sum rate maximization (WSRM) for precoder optimization effectively balances performance and fairness among users. Recent studies have demonstrated the potential of deep learning in precoder optimization for sum rate maximization. However, the WSRM problem necessitates a redesign of neural network architectures to incorporate user weights into the input. In this paper, we propose a novel d… ▽ More Weighted sum rate maximization (WSRM) for precoder optimization effectively balances performance and fairness among users. Recent studies have demonstrated the potential of deep learning in precoder optimization for sum rate maximization. However, the WSRM problem necessitates a redesign of neural network architectures to incorporate user weights into the input. In this paper, we propose a novel deep neural network (DNN) to learn the precoder for WSRM. Compared to existing DNNs, the proposed DNN leverage the joint unitary and permutation equivariant property inherent in the optimal precoding policy, effectively enhancing learning performance while reducing training complexity. Simulation results demonstrate that the proposed method significantly outperforms baseline learning methods in terms of both learning and generalization performance while maintaining low training and inference complexity. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.04233 [pdf, other]

Learning Wideband User Scheduling and Hybrid Precoding with Graph Neural Networks

Authors: Shengjie Liu, Chenyang Yang, Shengqian Han

Abstract: Spatial-frequency scheduling and hybrid precoding in wideband multi-user multi-antenna systems have never been learned jointly due to the challenges arising from the massive user combinations on resource blocks (RBs) and the shared analog precoder among RBs. In this paper, we strive to jointly learn the scheduling and precoding policies with graph neural networks (GNNs), which have emerged as a po… ▽ More Spatial-frequency scheduling and hybrid precoding in wideband multi-user multi-antenna systems have never been learned jointly due to the challenges arising from the massive user combinations on resource blocks (RBs) and the shared analog precoder among RBs. In this paper, we strive to jointly learn the scheduling and precoding policies with graph neural networks (GNNs), which have emerged as a powerful tool for optimizing resource allocation thanks to their potential in generalizing across problem scales. By reformulating the joint optimization problem into an equivalent functional optimization problem for the scheduling and precoding policies, we propose a GNN-based architecture consisting of two cascaded modules to learn the two policies. We discover a same-parameter same-decision (SPSD) property for wireless policies defined on sets, revealing that a GNN cannot well learn the optimal scheduling policy when users have similar channels. This motivates us to develop a sequence of GNNs to enhance the scheduling module. Furthermore, by analyzing the SPSD property, we find when linear aggregators in GNNs impede size generalization. Based on the observation, we devise a novel attention mechanism for information aggregation in the precoder module. Simulation results demonstrate that the proposed architecture achieves satisfactory spectral efficiency with short inference time and low training complexity, and is generalizable to the numbers of users, RBs, and antennas at the base station and users. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2502.20311 [pdf, other]

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

Authors: Marcus Yu Zhe Wee, Justin Juin Hng Wong, Lynus Lim, Joe Yu Wei Tan, Prannaya Gupta, Dillion Lim, En Hao Tew, Aloysius Keng Siew Han, Yong Zhi Lim

Abstract: Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of… ▽ More Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.18777 [pdf, other]

Hyperspectral image reconstruction by deep learning with super-Rayleigh speckles

Authors: Ziyan Chen, Zhentao Liu, Jianrong Wu, Shensheng Han

Abstract: Ghost imaging via sparsity constraints (GISC) spectral camera modulates the three-dimensional (3D) hyperspectral image into a two-dimensional (2D) compressive image with speckles in a single shot. It obtains a 3D hyperspectral image (HSI) by reconstruction algorithms. The rapid development of deep learning has provided a new method for 3D HSI reconstruction. Moreover, the imaging performance of th… ▽ More Ghost imaging via sparsity constraints (GISC) spectral camera modulates the three-dimensional (3D) hyperspectral image into a two-dimensional (2D) compressive image with speckles in a single shot. It obtains a 3D hyperspectral image (HSI) by reconstruction algorithms. The rapid development of deep learning has provided a new method for 3D HSI reconstruction. Moreover, the imaging performance of the GISC spectral camera can be improved by optimizing the speckle modulation. In this paper, we propose an end-to-end GISCnet with super-Rayleigh speckle modulation to improve the imaging quality of the GISC spectral camera. The structure of GISCnet is very simple but effective, and we can easily adjust the network structure parameters to improve the image reconstruction quality. Relative to Rayleigh speckles, our super-Rayleigh speckles modulation exhibits a wealth of detail in reconstructing 3D HSIs. After evaluating 648 3D HSIs, it was found that the average peak signal-to-noise ratio increased from 27 dB to 31 dB. Overall, the proposed GISCnet with super-Rayleigh speckle modulation can effectively improve the imaging quality of the GISC spectral camera by taking advantage of both optimized super-Rayleigh modulation and deep-learning image reconstruction, inspiring joint optimization of light-field modulation and image reconstruction to improve ghost imaging performance. △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.17502 [pdf, ps, other]

Complex Electromagnetic Space Combat System-of-systems Modeling and Key Node Identification Method

Authors: Xiao Liu, Sudan Han, Jinlin Peng

Abstract: With the application of advanced science and technology in the military field, modern warfare has developed into a confrontation between systems. The combat system-of-systems (CSoS) has numerous nodes, multiple attributes and complex interactions, and its research and analysis are facing great difficulties. Electromagnetic space is an important dimension of modern warfare. Modeling and analyzing t… ▽ More With the application of advanced science and technology in the military field, modern warfare has developed into a confrontation between systems. The combat system-of-systems (CSoS) has numerous nodes, multiple attributes and complex interactions, and its research and analysis are facing great difficulties. Electromagnetic space is an important dimension of modern warfare. Modeling and analyzing the CSoS from this perspective is of great significance to studying modern warfare and can provide a reference for the research of electromagnetic warfare. In this study, the types of nodes and relationships in the complex electromagnetic space of CSoS are first divided, the important attributes of the combat nodes are extracted, and the relationship weights are normalized to establish a networked model. On this basis, the calculation method of CSoS combat effectiveness based on the combat cycle is proposed, and then the identification and sorting of key nodes can be realized by the node deletion method. Finally, by constructing an instance of aircraft carrier fleet confrontation, the feasibility of this method has been verified, and the experimental results have been compared with classical algorithms to demonstrate the advanced nature of this method. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: conference paper,already accepted but not published

arXiv:2502.07065 [pdf, other]

Active Inference through Incentive Design in Markov Decision Processes

Authors: Xinyi Wei, Chongyang Shi, Shuo Han, Ahmed H. Hemida, Charles A. Kamhoua, Jie Fu

Abstract: We present a method for active inference with partial observations in stochastic systems through incentive design, also known as the leader-follower game. Consider a leader agent who aims to infer a follower agent's type given a finite set of possible types. Different types of followers differ in either the dynamical model, the reward function, or both. We assume the leader can partially observe a… ▽ More We present a method for active inference with partial observations in stochastic systems through incentive design, also known as the leader-follower game. Consider a leader agent who aims to infer a follower agent's type given a finite set of possible types. Different types of followers differ in either the dynamical model, the reward function, or both. We assume the leader can partially observe a follower's behavior in the stochastic system modeled as a Markov decision process, in which the follower takes an optimal policy to maximize a total reward. To improve inference accuracy and efficiency, the leader can offer side payments (incentives) to the followers such that different types of them, under the incentive design, can exhibit diverging behaviors that facilitate the leader's inference task. We show the problem of active inference through incentive design can be formulated as a special class of leader-follower games, where the leader's objective is to balance the information gain and cost of incentive design. The information gain is measured by the entropy of the estimated follower's type given partial observations. Furthermore, we demonstrate that this problem can be solved by reducing a single-level optimization through softmax temporal consistency between followers' policies and value functions. This reduction allows us to develop an efficient gradient-based algorithm. We utilize observable operators in the hidden Markov model (HMM) to compute the necessary gradients and demonstrate the effectiveness of our approach through experiments in stochastic grid world environments. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: 8 pages

arXiv:2502.04476 [pdf, other]

ADIFF: Explaining audio difference using natural language

Authors: Soham Deshmukh, Shuo Han, Rita Singh, Bhiksha Raj

Abstract: Understanding and explaining differences between audio recordings is crucial for fields like audio forensics, quality assessment, and audio generation. This involves identifying and describing audio events, acoustic scenes, signal characteristics, and their emotional impact on listeners. This paper stands out as the first work to comprehensively study the task of explaining audio differences and t… ▽ More Understanding and explaining differences between audio recordings is crucial for fields like audio forensics, quality assessment, and audio generation. This involves identifying and describing audio events, acoustic scenes, signal characteristics, and their emotional impact on listeners. This paper stands out as the first work to comprehensively study the task of explaining audio differences and then propose benchmark, baselines for the task. First, we present two new datasets for audio difference explanation derived from the AudioCaps and Clotho audio captioning datasets. Using Large Language Models (LLMs), we generate three levels of difference explanations: (1) concise descriptions of audio events and objects, (2) brief sentences about audio events, acoustic scenes, and signal properties, and (3) comprehensive explanations that include semantics and listener emotions. For the baseline, we use prefix tuning where audio embeddings from two audio files are used to prompt a frozen language model. Our empirical analysis and ablation studies reveal that the naive baseline struggles to distinguish perceptually similar sounds and generate detailed tier 3 explanations. To address these limitations, we propose ADIFF, which introduces a cross-projection module, position captioning, and a three-step training process to enhance the model's ability to produce detailed explanations. We evaluate our model using objective metrics and human evaluation and show our model enhancements lead to significant improvements in performance over naive baseline and SoTA Audio-Language Model (ALM) Qwen Audio. Lastly, we conduct multiple ablation studies to study the effects of cross-projection, language model parameters, position captioning, third stage fine-tuning, and present our findings. Our benchmarks, findings, and strong baseline pave the way for nuanced and human-like explanations of audio differences. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: Accepted at ICLR 2025. Dataset and checkpoints are available at: https://github.com/soham97/ADIFF

arXiv:2501.15116 [pdf, other]

Path Evolution Model for Endogenous Channel Digital Twin towards 6G Wireless Networks

Authors: Haoyu Wang, Zhi Sun, Shuangfeng Han, Xiaoyun Wang, Shidong Zhou, Zhaocheng Wang

Abstract: Massive Multiple Input Multiple Output (MIMO) is critical for boosting 6G wireless network capacity. Nevertheless, high dimensional Channel State Information (CSI) acquisition becomes the bottleneck of 6G massive MIMO system. Recently, Channel Digital Twin (CDT), which replicates physical entities in wireless channels, has been proposed, providing site-specific prior knowledge for CSI acquisition.… ▽ More Massive Multiple Input Multiple Output (MIMO) is critical for boosting 6G wireless network capacity. Nevertheless, high dimensional Channel State Information (CSI) acquisition becomes the bottleneck of 6G massive MIMO system. Recently, Channel Digital Twin (CDT), which replicates physical entities in wireless channels, has been proposed, providing site-specific prior knowledge for CSI acquisition. However, external devices (e.g., cameras and GPS devices) cannot always be integrated into existing communication systems, nor are they universally available across all scenarios. Moreover, the trained CDT model cannot be directly applied in new environments, which lacks environmental generalizability. To this end, Path Evolution Model (PEM) is proposed as an alternative CDT to reflect physical path evolutions from consecutive channel measurements. Compared to existing CDTs, PEM demonstrates virtues of full endogeneity, self-sustainability and environmental generalizability. Firstly, PEM only requires existing channel measurements, which is free of other hardware devices and can be readily deployed. Secondly, self-sustaining maintenance of PEM can be achieved in dynamic channel by progressive updates. Thirdly, environmental generalizability can greatly reduce deployment costs in dynamic environments. To facilitate the implementation of PEM, an intelligent and light-weighted operation framework is firstly designed. Then, the environmental generalizability of PEM is rigorously analyzed. Next, efficient learning approaches are proposed to reduce the amount of training data practically. Extensive simulation results reveal that PEM can simultaneously achieve high-precision and low-overhead CSI acquisition, which can serve as a fundamental CDT for 6G wireless networks. △ Less

Submitted 25 January, 2025; originally announced January 2025.

arXiv:2501.09877 [pdf, other]

CLAP-S: Support Set Based Adaptation for Downstream Fiber-optic Acoustic Recognition

Authors: Jingchen Sun, Shaobo Han, Wataru Kohno, Changyou Chen

Abstract: Contrastive Language-Audio Pretraining (CLAP) models have demonstrated unprecedented performance in various acoustic signal recognition tasks. Fiber-optic-based acoustic recognition is one of the most important downstream tasks and plays a significant role in environmental sensing. Adapting CLAP for fiber-optic acoustic recognition has become an active research area. As a non-conventional acoustic… ▽ More Contrastive Language-Audio Pretraining (CLAP) models have demonstrated unprecedented performance in various acoustic signal recognition tasks. Fiber-optic-based acoustic recognition is one of the most important downstream tasks and plays a significant role in environmental sensing. Adapting CLAP for fiber-optic acoustic recognition has become an active research area. As a non-conventional acoustic sensor, fiber-optic acoustic recognition presents a challenging, domain-specific, low-shot deployment environment with significant domain shifts due to unique frequency response and noise characteristics. To address these challenges, we propose a support-based adaptation method, CLAP-S, which linearly interpolates a CLAP Adapter with the Support Set, leveraging both implicit knowledge through fine-tuning and explicit knowledge retrieved from memory for cross-domain generalization. Experimental results show that our method delivers competitive performance on both laboratory-recorded fiber-optic ESC-50 datasets and a real-world fiber-optic gunshot-firework dataset. Our research also provides valuable insights for other downstream acoustic recognition tasks. The code and gunshot-firework dataset are available at https://github.com/Jingchensun/clap-s. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: Accepted to ICASSP 2025

arXiv:2501.06176 [pdf, other]

GR-WiFi: A GNU Radio based WiFi Platform with Single-User and Multi-User MIMO Capability

Authors: Natong Lin, Zelin Yun, Shengli Zhou, Song Han

Abstract: Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of increasing numbers of mobile devices and the growth of Internet of Things (IoT) applications. Unfortunately, the lack of open-source IEEE 802.11 testbeds in the community limits the development and perf… ▽ More Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of increasing numbers of mobile devices and the growth of Internet of Things (IoT) applications. Unfortunately, the lack of open-source IEEE 802.11 testbeds in the community limits the development and performance evaluation of those new features. Motivated by an existing popular open-source software-defined radio (SDR) package for single-user single-stream transmission based on the IEEE 802.11/a/g/p standard, in this paper we present GR-WiFi, an open-source package for single-user and multi-user multi-input multi-output (MIMO) transmissions based on 802.11n and 802.11ac standards. The distinct features of GR-WiFi include the support of parallel data streams to single or multiple users, and the compatible preamble processing to allow the co-existence of conventional, high-throughput (HT) and very-high-throughput (VHT) traffics. The performance of GR-WiFi is evaluated through both extensive simulation and real-world experiments. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: 11 pages, 18 figures

arXiv:2501.02572 [pdf, other]

Energy Optimization of Multi-task DNN Inference in MEC-assisted XR Devices: A Lyapunov-Guided Reinforcement Learning Approach

Authors: Yanzan Sun, Jiacheng Qiu, Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaoyun Wang, Shuangfeng Han

Abstract: Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to th… ▽ More Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to the challenges posed by the high energy consumption and limited resources of XR devices, we designed a dual time-scale joint optimization strategy for model partitioning and resource allocation, formulated as a bi-level optimization problem. This strategy aims to minimize the total energy consumption of XR devices while ensuring queue stability and adhering to computational and communication resource constraints. To tackle this problem, we devised a Lyapunov-guided Proximal Policy Optimization algorithm, named LyaPPO. Numerical results demonstrate that the LyaPPO algorithm outperforms the baselines, achieving energy conservation of 24.79% to 46.14% under varying resource capacities. Specifically, the proposed algorithm reduces the energy consumption of XR devices by 24.29% to 56.62% compared to baseline algorithms. △ Less

Submitted 5 January, 2025; originally announced January 2025.

Comments: 13 pages, 7 figures. This work has been submitted to the IEEE for possible publication

arXiv:2501.00842 [pdf, other]

A Survey of Secure Semantic Communications

Authors: Rui Meng, Song Gao, Dayu Fan, Haixiao Gao, Yining Wang, Xiaodong Xu, Bizhu Wang, Suyu Lv, Zhidi Zhang, Mengying Sun, Shujun Han, Chen Dong, Xiaofeng Tao, Ping Zhang

Abstract: Semantic communication (SemCom) is regarded as a promising and revolutionary technology in 6G, aiming to transcend the constraints of ``Shannon's trap" by filtering out redundant information and extracting the core of effective data. Compared to traditional communication paradigms, SemCom offers several notable advantages, such as reducing the burden on data transmission, enhancing network managem… ▽ More Semantic communication (SemCom) is regarded as a promising and revolutionary technology in 6G, aiming to transcend the constraints of ``Shannon's trap" by filtering out redundant information and extracting the core of effective data. Compared to traditional communication paradigms, SemCom offers several notable advantages, such as reducing the burden on data transmission, enhancing network management efficiency, and optimizing resource allocation. Numerous researchers have extensively explored SemCom from various perspectives, including network architecture, theoretical analysis, potential technologies, and future applications. However, as SemCom continues to evolve, a multitude of security and privacy concerns have arisen, posing threats to the confidentiality, integrity, and availability of SemCom systems. This paper presents a comprehensive survey of the technologies that can be utilized to secure SemCom. Firstly, we elaborate on the entire life cycle of SemCom, which includes the model training, model transfer, and semantic information transmission phases. Then, we identify the security and privacy issues that emerge during these three stages. Furthermore, we summarize the techniques available to mitigate these security and privacy threats, including data cleaning, robust learning, defensive strategies against backdoor attacks, adversarial training, differential privacy, cryptography, blockchain technology, model compression, and physical-layer security. Lastly, this paper outlines future research directions to guide researchers in related fields. △ Less

Submitted 26 March, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

Comments: 160 pages, 27 figures

arXiv:2412.05322 [pdf, other]

$ρ$-NeRF: Leveraging Attenuation Priors in Neural Radiance Field for 3D Computed Tomography Reconstruction

Authors: Li Zhou, Changsheng Fang, Bahareh Morovati, Yongtong Liu, Shuo Han, Yongshun Xu, Hengyong Yu

Abstract: This paper introduces $ρ$-NeRF, a self-supervised approach that sets a new standard in novel view synthesis (NVS) and computed tomography (CT) reconstruction by modeling a continuous volumetric radiance field enriched with physics-based attenuation priors. The $ρ$-NeRF represents a three-dimensional (3D) volume through a fully-connected neural network that takes a single continuous four-dimensiona… ▽ More This paper introduces $ρ$-NeRF, a self-supervised approach that sets a new standard in novel view synthesis (NVS) and computed tomography (CT) reconstruction by modeling a continuous volumetric radiance field enriched with physics-based attenuation priors. The $ρ$-NeRF represents a three-dimensional (3D) volume through a fully-connected neural network that takes a single continuous four-dimensional (4D) coordinate, spatial location $(x, y, z)$ and an initialized attenuation value ($ρ$), and outputs the attenuation coefficient at that position. By querying these 4D coordinates along X-ray paths, the classic forward projection technique is applied to integrate attenuation data across the 3D space. By matching and refining pre-initialized attenuation values derived from traditional reconstruction algorithms like Feldkamp-Davis-Kress algorithm (FDK) or conjugate gradient least squares (CGLS), the enriched schema delivers superior fidelity in both projection synthesis and image recognition. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: The paper was submitted to CVPR 2025

arXiv:2412.02985 [pdf, other]

Robust Model Predictive Control for Constrained Uncertain Systems Based on Concentric Container and Varying Tube

Authors: Shibo Han, Yuhao Zhang, Xiaotong Shi, Xingwei Zhao

Abstract: This paper proposes a novel robust model predictive control (RMPC) method for the stabilization of constrained systems subject to additive disturbance (AD) and multiplicative disturbance (MD). Concentric containers are introduced to facilitate the characterization of MD, and varying tubes are constructed to bound reachable states. By restricting states and the corresponding inputs in containers wi… ▽ More This paper proposes a novel robust model predictive control (RMPC) method for the stabilization of constrained systems subject to additive disturbance (AD) and multiplicative disturbance (MD). Concentric containers are introduced to facilitate the characterization of MD, and varying tubes are constructed to bound reachable states. By restricting states and the corresponding inputs in containers with free sizes and a fixed shape, feasible MDs, which are the products of model uncertainty with states and inputs, are restricted into polytopes with free sizes. Then, tubes with different centers and shapes are constructed based on the nominal dynamics and the knowledge of AD and MD. The free sizes of containers allow for a more accurate characterization of MD, while the fixed shape reduces online computational burden, making the proposed method less conservative and computationally efficient. Moreover, the shape of containers is optimized to further reduce conservativeness. Compared to the RMPC method using homothetic tubes, the proposed method has a larger region of attraction while involving fewer decision variables and constraints in the online optimization problem. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 13 pages, 6 figures

arXiv:2411.12776 [pdf, other]

Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission

Authors: Haixiao Gao, Mengying Sun, Xiaodong Xu, Bingxuan Xu, Shujun Han, Bizhu Wang, Sheng Jiang, Chen Dong, Ping Zhang

Abstract: In this paper, we propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission, incorporating feature extraction, encoding, encryption, cyclic redundancy check (CRC), and retransmission processes to achieve compatibility between semantic communication and traditional communication systems. Additionally, we propose an adaptive cross-layer transmission me… ▽ More In this paper, we propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission, incorporating feature extraction, encoding, encryption, cyclic redundancy check (CRC), and retransmission processes to achieve compatibility between semantic communication and traditional communication systems. Additionally, we propose an adaptive cross-layer transmission mechanism that dynamically adjusts CRC, channel coding, and retransmission schemes based on the importance of semantic information. This ensures that important information is prioritized under poor transmission conditions. To verify the aforementioned framework, we also design an end-to-end adaptive panoramic video semantic transmission (APVST) network that leverages a deep joint source-channel coding (Deep JSCC) structure and attention mechanism, integrated with a latitude adaptive module that facilitates adaptive semantic feature extraction and variable-length encoding of panoramic videos. The proposed CLESC is also applicable to the transmission of other modal data. Simulation results demonstrate that the proposed CLESC effectively achieves compatibility and adaptation between semantic communication and traditional communication systems, improving both transmission efficiency and channel adaptability. Compared to traditional cross-layer transmission schemes, the CLESC framework can reduce bandwidth consumption by 85% while showing significant advantages under low signal-to-noise ratio (SNR) conditions. △ Less

Submitted 19 November, 2024; originally announced November 2024.

arXiv:2411.09906 [pdf, other]

A Survey of Machine Learning-based Physical-Layer Authentication in Wireless Communications

Authors: Rui Meng, Bingxuan Xu, Xiaodong Xu, Mengying Sun, Bizhu Wang, Shujun Han, Suyu Lv, Ping Zhang

Abstract: To ensure secure and reliable communication in wireless systems, authenticating the identities of numerous nodes is imperative. Traditional cryptography-based authentication methods suffer from issues such as low compatibility, reliability, and high complexity. Physical-Layer Authentication (PLA) is emerging as a promising complement due to its exploitation of unique properties in wireless environ… ▽ More To ensure secure and reliable communication in wireless systems, authenticating the identities of numerous nodes is imperative. Traditional cryptography-based authentication methods suffer from issues such as low compatibility, reliability, and high complexity. Physical-Layer Authentication (PLA) is emerging as a promising complement due to its exploitation of unique properties in wireless environments. Recently, Machine Learning (ML)-based PLA has gained attention for its intelligence, adaptability, universality, and scalability compared to non-ML approaches. However, a comprehensive overview of state-of-the-art ML-based PLA and its foundational aspects is lacking. This paper presents a comprehensive survey of characteristics and technologies that can be used in the ML-based PLA. We categorize existing ML-based PLA schemes into two main types: multi-device identification and attack detection schemes. In deep learning-based multi-device identification schemes, Deep Neural Networks are employed to train models, avoiding complex processing and expert feature transformation. Deep learning-based multi-device identification schemes are further subdivided, with schemes based on Convolutional Neural Networks being extensively researched. In ML-based attack detection schemes, receivers utilize intelligent ML techniques to set detection thresholds automatically, eliminating the need for manual calculation or knowledge of channel models. ML-based attack detection schemes are categorized into three sub-types: Supervised Learning, Unsupervised Learning, and Reinforcement Learning. Additionally, we summarize open-source datasets used for PLA, encompassing Radio Frequency fingerprints and channel fingerprints. Finally, this paper outlines future research directions to guide researchers in related fields. △ Less

Submitted 3 December, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

Comments: 111 pages, 9 figures

arXiv:2411.04833 [pdf, other]

Finding Control Invariant Sets via Lipschitz Constants of Linear Programs

Authors: Matti Vahs, Shaohang Han, Jana Tumova

Abstract: Control invariant sets play an important role in safety-critical control and find broad application in numerous fields such as obstacle avoidance for mobile robots. However, finding valid control invariant sets of dynamical systems under input limitations is notoriously difficult. We present an approach to safely expand an initial set while always guaranteeing that the set is control invariant. Sp… ▽ More Control invariant sets play an important role in safety-critical control and find broad application in numerous fields such as obstacle avoidance for mobile robots. However, finding valid control invariant sets of dynamical systems under input limitations is notoriously difficult. We present an approach to safely expand an initial set while always guaranteeing that the set is control invariant. Specifically, we define an expansion law for the boundary of a set and check for control invariance using Linear Programs (LPs). To verify control invariance on a continuous domain, we leverage recently proposed Lipschitz constants of LPs to transform the problem of continuous verification into a finite number of LPs. Using concepts from differentiable optimization, we derive the safe expansion law of the control invariant set and show how it can be interpreted as a second invariance problem in the space of possible boundaries. Finally, we show how the obtained set can be used to obtain a minimally invasive safety filter in a Control Barrier Function (CBF) framework. Our work is supported by theoretical results as well as numerical examples. △ Less

Submitted 7 November, 2024; originally announced November 2024.

arXiv:2410.12160 [pdf, other]

When to Trust Your Data: Enhancing Dyna-Style Model-Based Reinforcement Learning With Data Filter

Authors: Yansong Li, Zeyu Dong, Ertai Luo, Yu Wu, Shuo Wu, Shuo Han

Abstract: Reinforcement learning (RL) algorithms can be divided into two classes: model-free algorithms, which are sample-inefficient, and model-based algorithms, which suffer from model bias. Dyna-style algorithms combine these two approaches by using simulated data from an estimated environmental model to accelerate model-free training. However, their efficiency is compromised when the estimated model is… ▽ More Reinforcement learning (RL) algorithms can be divided into two classes: model-free algorithms, which are sample-inefficient, and model-based algorithms, which suffer from model bias. Dyna-style algorithms combine these two approaches by using simulated data from an estimated environmental model to accelerate model-free training. However, their efficiency is compromised when the estimated model is inaccurate. Previous works address this issue by using model ensembles or pretraining the estimated model with data collected from the real environment, increasing computational and sample complexity. To tackle this issue, we introduce an out-of-distribution (OOD) data filter that removes simulated data from the estimated model that significantly diverges from data collected in the real environment. We show theoretically that this technique enhances the quality of simulated data. With the help of the OOD data filter, the data simulated from the estimated model better mimics the data collected by interacting with the real model. This improvement is evident in the critic updates compared to using the simulated data without the OOD data filter. Our experiment integrates the data filter into the model-based policy optimization (MBPO) algorithm. The results demonstrate that our method requires fewer interactions with the real environment to achieve a higher level of optimality than MBPO, even without a model ensemble. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.10758 [pdf]

Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix

Authors: Seungwoo Han

Abstract: With the advancements in graph neural network, there has been increasing interest in applying this network to ECG signal analysis. In this study, we generated an adjacency matrix using correlation matrix of extracted features and applied a graph neural network to classify arrhythmias. The proposed model was compared with existing approaches from the literature. The results demonstrated that precis… ▽ More With the advancements in graph neural network, there has been increasing interest in applying this network to ECG signal analysis. In this study, we generated an adjacency matrix using correlation matrix of extracted features and applied a graph neural network to classify arrhythmias. The proposed model was compared with existing approaches from the literature. The results demonstrated that precision and recall for all arrhythmia classes exceeded 50%, suggesting that this method can be considered an approach for arrhythmia classification. △ Less

Submitted 10 February, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: Corrected typos

arXiv:2410.06767 [pdf, ps, other]

On the Achievable Error Rate Performance of Pilot-Aided Simultaneous Communication and Localisation

Authors: Shuaishuai Han, Emad Alsusa, Mohammad Ahmad Al-Jarrah, Mahmoud AlaaEldin

Abstract: This paper investigates the symbol error rate (SER) performance of the pilot-aided simultaneous communication and localisation (PASCAL) system. A scenario where multiple drones transmit communication signals to a base station (BS), which needs to simultaneously decode the signals and continuously locate the drones' positions during the communication session, is considered. The BS operates in two s… ▽ More This paper investigates the symbol error rate (SER) performance of the pilot-aided simultaneous communication and localisation (PASCAL) system. A scenario where multiple drones transmit communication signals to a base station (BS), which needs to simultaneously decode the signals and continuously locate the drones' positions during the communication session, is considered. The BS operates in two stages: first, it estimates the drones' location parameters using pilot signals; second, it performs data detection by reconstructing the channel response based on the estimated location parameters. The theoretical analysis presented demonstrates that the estimated location parameters follow Gaussian distributions with means equal to the actual values and variances determined by the root mean square error (RMSE) of the estimator. Using these distributions, the average SER is derived to quantify the impact of localisation errors on decoding performance. This analysis highlights the synergy between communication and localisation, providing valuable insights into the influence of localisation inaccuracies on the performance of location-aware communication systems. Simulations are conducted to validate the theoretical derivations. △ Less

Submitted 26 November, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

Comments: 13 pages, 10 figures

arXiv:2409.16439 [pdf, other]

Active Perception with Initial-State Uncertainty: A Policy Gradient Method

Authors: Chongyang Shi, Shuo Han, Michael Dorothy, Jie Fu

Abstract: This paper studies the synthesis of an active perception policy that maximizes the information leakage of the initial state in a stochastic system modeled as a hidden Markov model (HMM). Specifically, the emission function of the HMM is controllable with a set of perception or sensor query actions. Given the goal is to infer the initial state from partial observations in the HMM, we use Shannon co… ▽ More This paper studies the synthesis of an active perception policy that maximizes the information leakage of the initial state in a stochastic system modeled as a hidden Markov model (HMM). Specifically, the emission function of the HMM is controllable with a set of perception or sensor query actions. Given the goal is to infer the initial state from partial observations in the HMM, we use Shannon conditional entropy as the planning objective and develop a novel policy gradient method with convergence guarantees. By leveraging a variant of observable operators in HMMs, we prove several important properties of the gradient of the conditional entropy with respect to the policy parameters, which allow efficient computation of the policy gradient and stable and fast convergence. We demonstrate the effectiveness of our solution by applying it to an inference problem in a stochastic grid world environment. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15596 [pdf, other]

Computational Ghost Imaging with Low-Density Parity-Check Code

Authors: Shuang Liu, Yunkai Hu, Jinquan Qi, Shensheng Han, Zihuai Lin

Abstract: Ghost imaging (GI) is a high-resolution imaging technology that has been a subject of interest to many fields in the past 20 years. Most GI researchers focus on the reconstruction of signal under-sampling, nevertheless, how to use information redundancy to improve the result's belief in a complex environment has hardly been studied. Motivated by this, we propose a computational GI system based on… ▽ More Ghost imaging (GI) is a high-resolution imaging technology that has been a subject of interest to many fields in the past 20 years. Most GI researchers focus on the reconstruction of signal under-sampling, nevertheless, how to use information redundancy to improve the result's belief in a complex environment has hardly been studied. Motivated by this, we propose a computational GI system based on the low-density parity-check (LDPC) coded radiation field by exploiting the signal redundancy. The non-ideal factors generated within the imaging process can be eliminated by setting up the matching fading channel model. We have derived the analytical lower bound on the bit error rate for the proposed LDPC-coded GI system. The effectiveness and performance of the LDPC-coded GI system are further validated through numerical and experiment results. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.15353 [pdf, other]

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Authors: Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang

Abstract: Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition tas… ▽ More Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM: we first let the LLM detect named entities in speech without any context, then use this named entity as a query to retrieve phonetically similar named entities from a personal database and feed them to the LLM, and finally run context-aware LLM decoding. In a voice assistant task, our solution achieved up to 30.2% relative word error rate reduction and 73.6% relative named entity error rate reduction compared to a baseline system without contextualization. Notably, our solution by design avoids prompting the LLM with the full named entity database, making it highly efficient and applicable to large named entity databases. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.07770 [pdf, other]

Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification

Authors: Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han

Abstract: Recent advancements in automatic speaker verification (ASV) studies have been achieved by leveraging large-scale pretrained networks. In this study, we analyze the approaches toward such a paradigm and underline the significance of interlayer information processing as a result. Accordingly, we present a novel approach for exploiting the multilayered nature of pretrained models for ASV, which compr… ▽ More Recent advancements in automatic speaker verification (ASV) studies have been achieved by leveraging large-scale pretrained networks. In this study, we analyze the approaches toward such a paradigm and underline the significance of interlayer information processing as a result. Accordingly, we present a novel approach for exploiting the multilayered nature of pretrained models for ASV, which comprises a layer/frame-level network and two steps of pooling architectures for each layer and frame axis. Specifically, we let convolutional architecture directly processes a stack of layer outputs.Then, we present a channel attention-based scheme of gauging layer significance and squeeze the layer level with the most representative value. Finally, attentive statistics over frame-level representations yield a single vector speaker embedding. Comparative experiments are designed using versatile data environments and diverse pretraining models to validate the proposed approach. The experimental results demonstrate the stability of the approach using multi-layer outputs in leveraging pretrained architectures. Then, we verify the superiority of the proposed ASV backend structure, which involves layer-wise operations, in terms of performance improvement along with cost efficiency compared to the conventional method. The ablation study shows how the proposed interlayer processing aids in maximizing the advantage of utilizing pretrained models. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: Preprint

arXiv:2409.07467 [pdf, other]

Flexible Control in Symbolic Music Generation via Musical Metadata

Authors: Sangjun Han, Jiwon Ham, Chaeeun Lee, Heejin Kim, Soojong Do, Sihyuk Yi, Jun Seo, Seoyoon Kim, Yountae Jung, Woohyung Lim

Abstract: In this work, we introduce the demonstration of symbolic music generation, focusing on providing short musical motifs that serve as the central theme of the narrative. For the generation, we adopt an autoregressive model which takes musical metadata as inputs and generates 4 bars of multitrack MIDI sequences. During training, we randomly drop tokens from the musical metadata to guarantee flexible… ▽ More In this work, we introduce the demonstration of symbolic music generation, focusing on providing short musical motifs that serve as the central theme of the narrative. For the generation, we adopt an autoregressive model which takes musical metadata as inputs and generates 4 bars of multitrack MIDI sequences. During training, we randomly drop tokens from the musical metadata to guarantee flexible control. It provides users with the freedom to select input types while maintaining generative performance, enabling greater flexibility in music composition. We validate the effectiveness of the strategy through experiments in terms of model capacity, musical fidelity, diversity, and controllability. Additionally, we scale up the model and compare it with other music generation model through a subjective test. Our results indicate its superiority in both control and music quality. We provide a URL link https://www.youtube.com/watch?v=-0drPrFJdMQ to our demonstration video. △ Less

Submitted 28 August, 2024; originally announced September 2024.

arXiv:2409.06137 [pdf, other]

doi 10.21437/Interspeech.2024-2180

DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing

Authors: Kuang Yuan, Shuo Han, Swarun Kumar, Bhiksha Raj

Abstract: The quality of audio recordings in outdoor environments is often degraded by the presence of wind. Mitigating the impact of wind noise on the perceptual quality of single-channel speech remains a significant challenge due to its non-stationary characteristics. Prior work in noise suppression treats wind noise as a general background noise without explicit modeling of its characteristics. In this p… ▽ More The quality of audio recordings in outdoor environments is often degraded by the presence of wind. Mitigating the impact of wind noise on the perceptual quality of single-channel speech remains a significant challenge due to its non-stationary characteristics. Prior work in noise suppression treats wind noise as a general background noise without explicit modeling of its characteristics. In this paper, we leverage ultrasound as an auxiliary modality to explicitly sense the airflow and characterize the wind noise. We propose a multi-modal deep-learning framework to fuse the ultrasonic Doppler features and speech signals for wind noise reduction. Our results show that DeWinder can significantly improve the noise reduction capabilities of state-of-the-art speech enhancement models. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.01465 [pdf, other]

doi 10.2514/1.G007903

Terminal Soft Landing Guidance Law Using Analytic Gravity Turn Trajectory

Authors: Seungyeop Han, Byeong-Un Jo, Koki Ho

Abstract: This paper presents an innovative terminal landing guidance law that utilizes an analytic solution derived from the gravity turn trajectory. The characteristics of the derived solution are thoroughly investigated, and the solution is employed to generate a reference velocity vector that satisfies terminal landing conditions. A nonlinear control law is applied to effectively track the reference vel… ▽ More This paper presents an innovative terminal landing guidance law that utilizes an analytic solution derived from the gravity turn trajectory. The characteristics of the derived solution are thoroughly investigated, and the solution is employed to generate a reference velocity vector that satisfies terminal landing conditions. A nonlinear control law is applied to effectively track the reference velocity vector within a finite time, and its robustness against disturbances is studied. Furthermore, the guidance law is expanded to incorporate ground collision avoidance by considering the shape of the gravity turn trajectory. The proposed method's fuel efficiency, robustness, and practicality are demonstrated through comprehensive numerical simulations, and its performance is compared with existing methods. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Journal ref: Journal of Guidance, Control, and Dynamics, 47(6), 2024, 1-14

arXiv:2409.00986 [pdf, other]

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Authors: Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro

Abstract: Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip rea… ▽ More Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality. However, the effectiveness of adapting language information, such as vocabulary choice, of the target speaker has not been explored in previous works. Additionally, existing datasets for speaker adaptation have limited vocabulary sizes and pose variations, which restrict the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a pre-trained model to target speakers at both vision and language levels. Specifically, we integrate prompt tuning and the LoRA approach, applying them to a pre-trained lip reading model to effectively adapt the model to target speakers. Furthermore, to validate its effectiveness in real-world scenarios, we introduce a new dataset, VoxLRS-SA, derived from VoxCeleb2 and LRS3. It contains a vocabulary of approximately 100K words, offers diverse pose variations, and enables the validation of adaptation methods in the wild, sentence-level lip reading for the first time in English. Through various experiments, we demonstrate that the existing speaker-adaptive method also improves performance in the wild at the sentence level. Moreover, we show that the proposed method achieves larger improvements compared to the previous works. △ Less

Submitted 1 January, 2025; v1 submitted 2 September, 2024; originally announced September 2024.

Comments: Code available: https://github.com/JeongHun0716/Personalized-Lip-Reading

arXiv:2408.15789 [pdf, other]

A Stochastic Robust Adaptive Systems Level Approach to Stabilizing Large-Scale Uncertain Markovian Jump Linear Systems

Authors: SooJean Han, Minwoo M. Kim, Ieun Choo

Abstract: We propose a unified framework for robustly and adaptively stabilizing large-scale networked uncertain Markovian jump linear systems (MJLS) under external disturbances and mode switches that can change the network's topology. Adaptation is achieved by using minimal information on the disturbance to identify modes that are consistent with observable data. Robust control is achieved by extending the… ▽ More We propose a unified framework for robustly and adaptively stabilizing large-scale networked uncertain Markovian jump linear systems (MJLS) under external disturbances and mode switches that can change the network's topology. Adaptation is achieved by using minimal information on the disturbance to identify modes that are consistent with observable data. Robust control is achieved by extending the system level synthesis (SLS) approach, which allows us to pose the problem of simultaneously stabilizing multiple plants as a two-step convex optimization procedure. Our control pipeline computes a likelihood distribution of the system's current mode, uses them as probabilistic weights during simultaneous stabilization, then updates the likelihood via Bayesian inference. Because of this "softer" probabilistic approach to robust stabilization, our control pipeline does not suffer from abrupt destabilization issues due to changes in the system's true mode, which were observed in a previous method. Separability of SLS also lets us compute localized robust controllers for each subsystem, allowing for network scalability; we use several information consensus methods so that mode estimation can also be done locally. We apply our algorithms to disturbance-rejection on two sample dynamic power grid networks, a small-scale system with 7 nodes and a large-scale grid of 25 nodes. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Full version of accepted paper to 63rd IEEE Conference on Decision and Control (CDC) 2024

arXiv:2408.10235 [pdf, other]

doi 10.1016/j.bspc.2024.107337

Multi-Source EEG Emotion Recognition via Dynamic Contrastive Domain Adaptation

Authors: Yun Xiao, Yimeng Zhang, Xiaopeng Peng, Shuzheng Han, Xia Zheng, Dingyi Fang, Xiaojiang Chen

Abstract: Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. We introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA) based on differential entropy (DE) features, in which coarse-grained inter-domain and… ▽ More Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. We introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA) based on differential entropy (DE) features, in which coarse-grained inter-domain and fine-grained intra-class adaptations are modeled through a multi-branch contrastive neural network and contrastive sub-domain discrepancy learning. Leveraging domain knowledge from each individual source and a complementary source ensemble, our model uses dynamically weighted learning to achieve an optimal tradeoff between domain transferability and discriminability. The proposed MS-DCDA model was evaluated using the SEED and SEED-IV datasets, achieving respectively the highest mean accuracies of $90.84\%$ and $78.49\%$ in cross-subject experiments as well as $95.82\%$ and $82.25\%$ in cross-session experiments. Our model outperforms several alternative domain adaptation methods in recognition accuracy, inter-class margin, and intra-class compactness. Our study also suggests greater emotional sensitivity in the frontal and parietal brain lobes, providing insights for mental health interventions, personalized medicine, and preventive strategies. △ Less

Submitted 23 December, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Journal ref: Biomedical Signal Processing and Control, vol. 102, p. 107337, Apr. 2025

Showing 1–50 of 200 results for author: Han, S