Search | arXiv e-print repository

Channel Knowledge Map-assisted Dual-domain Tracking and Predictive Beamforming for High-Mobility Wireless Networks

Authors: Ruolin Du, Zhiqiang Wei, Zai Yang, Lei Yang, Yong Zeng, Derrick Wing Kwan Ng, Jinhong Yuan

Abstract: This paper introduces a novel channel knowledge map (CKM)-assisted dual-domain tracking and predictive beamforming scheme for high-mobility wireless networks. The central premise is that the CKM integrates both the coordinate and beam domains, thereby enabling tracking in one domain via treating the other domain's input as priors or measurements. In the coordinate domain (C-Domain), an extended Ka… ▽ More This paper introduces a novel channel knowledge map (CKM)-assisted dual-domain tracking and predictive beamforming scheme for high-mobility wireless networks. The central premise is that the CKM integrates both the coordinate and beam domains, thereby enabling tracking in one domain via treating the other domain's input as priors or measurements. In the coordinate domain (C-Domain), an extended Kalman filter (EKF) is employed to predict and track the state (i.e., location and velocity) of a moving communication receiver across time slots under both line-of-sight (LoS)-present and LoS-absent conditions, where the CKM provides a prior mapping from multipath channel parameters to potential target locations. In the beam domain (B-Domain), the updated location of the receiver is fed back to CKM to offer a priori information of angle of arrival (AoA) variations, which are incorporated to establish beam transition models for effective beam tracking, depending on the angular variation situation of each path. Then, we analyze the Cramér-Rao Bound (CRB) for AoA estimation for each path in the considered system and propose a jointly predictive beamforming and power allocation design to minimize AoA estimation errors, directly enhancing multipath beam tracking accuracy and indirectly improving target tracking performance. Simulation results demonstrate that the proposed scheme achieves significant improvements in both target and beam tracking performance compared to the state-of-the-art approaches, particularly in AoA tracking of non-line-of-sight (NLoS) paths, highlighting the potential gain of CKM in facilitating both target and beam tracking in high-mobility communications. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.22710 [pdf, ps, other]

LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning

Authors: Jiang Yuan, JI Ma, Bo Wang, Guanzhou Ke, Weiming Hu

Abstract: Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and inst… ▽ More Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and instead over-complicate the adaptation process to improve effect, resulting in a significant increase in the model's parameters and computations. In this paper, we focus on the discriminability optimization of IDR and propose a new powerful and lightweight BSR model termed LightBSR. Specifically, we employ a knowledge distillation-based learning framework. We first introduce a well-designed degradation-prior-constrained contrastive learning technique during teacher stage to make the model more focused on distinguishing different degradation types. Then we utilize a feature alignment technique to transfer the degradation-related knowledge acquired by the teacher to the student for practical inferencing. Extensive experiments demonstrate the effectiveness of IDR discriminability-driven BSR model design. The proposed LightBSR can achieve outstanding performance with minimal complexity across a range of blind SR tasks. Our code is accessible at: https://github.com/MJ-NCEPU/LightBSR. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Journal ref: International Conference on Computer Vision (ICCV) 2025

arXiv:2505.19626 [pdf, ps, other]

Decoding Speaker-Normalized Pitch from EEG for Mandarin Perception

Authors: Jiaxin Chen, Yiming Wang, Ziyu Zhang, Jiayang Han, Yin-Long Liu, Rui Feng, Xiuyuan Liang, Zhen-Hua Ling, Jiahong Yuan

Abstract: The same speech content produced by different speakers exhibits significant differences in pitch contour, yet listeners' semantic perception remains unaffected. This phenomenon may stem from the brain's perception of pitch contours being independent of individual speakers' pitch ranges. In this work, we recorded electroencephalogram (EEG) while participants listened to Mandarin monosyllables with… ▽ More The same speech content produced by different speakers exhibits significant differences in pitch contour, yet listeners' semantic perception remains unaffected. This phenomenon may stem from the brain's perception of pitch contours being independent of individual speakers' pitch ranges. In this work, we recorded electroencephalogram (EEG) while participants listened to Mandarin monosyllables with varying tones, phonemes, and speakers. The CE-ViViT model is proposed to decode raw or speaker-normalized pitch contours directly from EEG. Experimental results demonstrate that the proposed model can decode pitch contours with modest errors, achieving performance comparable to state-of-the-art EEG regression methods. Moreover, speaker-normalized pitch contours were decoded more accurately, supporting the neural encoding of relative pitch. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.19448 [pdf, other]

Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer's Disease Detection

Authors: Yin-Long Liu, Rui Feng, Jia-Xin Chen, Yi-Ming Wang, Jia-Hong Yuan, Zhen-Hua Ling

Abstract: Recent breakthroughs in Automatic Speech Recognition (ASR) have enabled fully automated Alzheimer's Disease (AD) detection using ASR transcripts. Nonetheless, the impact of ASR errors on AD detection remains poorly understood. This paper fills the gap. We conduct a comprehensive study on AD detection using transcripts from various ASR models and their synthesized speech on the ADReSS dataset. Expe… ▽ More Recent breakthroughs in Automatic Speech Recognition (ASR) have enabled fully automated Alzheimer's Disease (AD) detection using ASR transcripts. Nonetheless, the impact of ASR errors on AD detection remains poorly understood. This paper fills the gap. We conduct a comprehensive study on AD detection using transcripts from various ASR models and their synthesized speech on the ADReSS dataset. Experimental results reveal that certain ASR transcripts (ASR-synthesized speech) outperform manual transcripts (manual-synthesized speech) in detection accuracy, suggesting that ASR errors may provide valuable cues for improving AD detection. Additionally, we propose a cross-attention-based interpretability model that not only identifies these cues but also achieves superior or comparable performance to the baseline. Furthermore, we utilize this model to unveil AD-related patterns within pre-trained embeddings. Our study offers novel insights into the potential of ASR models for AD detection. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: Accepted by Interspeech 2025

arXiv:2505.19446 [pdf, other]

Leveraging Cascaded Binary Classification and Multimodal Fusion for Dementia Detection through Spontaneous Speech

Authors: Yin-Long Liu, Yuanchao Li, Rui Feng, Liu He, Jia-Xin Chen, Yi-Ming Wang, Yu-Ang Chen, Yan-Han Peng, Jia-Hong Yuan, Zhen-Hua Ling

Abstract: This paper presents our submission to the PROCESS Challenge 2025, focusing on spontaneous speech analysis for early dementia detection. For the three-class classification task (Healthy Control, Mild Cognitive Impairment, and Dementia), we propose a cascaded binary classification framework that fine-tunes pre-trained language models and incorporates pause encoding to better capture disfluencies. Th… ▽ More This paper presents our submission to the PROCESS Challenge 2025, focusing on spontaneous speech analysis for early dementia detection. For the three-class classification task (Healthy Control, Mild Cognitive Impairment, and Dementia), we propose a cascaded binary classification framework that fine-tunes pre-trained language models and incorporates pause encoding to better capture disfluencies. This design streamlines multi-class classification and addresses class imbalance by restructuring the decision process. For the Mini-Mental State Examination score regression task, we develop an enhanced multimodal fusion system that combines diverse acoustic and linguistic features. Separate regression models are trained on individual feature sets, with ensemble learning applied through score averaging. Experimental results on the test set outperform the baselines provided by the organizers in both tasks, demonstrating the robustness and effectiveness of our approach. △ Less

Submitted 26 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

Comments: Accepted by Interspeech 2025

arXiv:2504.09655 [pdf]

OmniMamba4D: Spatio-temporal Mamba for longitudinal CT lesion segmentation

Authors: Justin Namuk Kim, Yiqiao Liu, Rajath Soans, Keith Persson, Sarah Halek, Michal Tomaszewski, Jianda Yuan, Gregory Goldmacher, Antong Chen

Abstract: Accurate segmentation of longitudinal CT scans is important for monitoring tumor progression and evaluating treatment responses. However, existing 3D segmentation models solely focus on spatial information. To address this gap, we propose OmniMamba4D, a novel segmentation model designed for 4D medical images (3D images over time). OmniMamba4D utilizes a spatio-temporal tetra-orientated Mamba block… ▽ More Accurate segmentation of longitudinal CT scans is important for monitoring tumor progression and evaluating treatment responses. However, existing 3D segmentation models solely focus on spatial information. To address this gap, we propose OmniMamba4D, a novel segmentation model designed for 4D medical images (3D images over time). OmniMamba4D utilizes a spatio-temporal tetra-orientated Mamba block to effectively capture both spatial and temporal features. Unlike traditional 3D models, which analyze single-time points, OmniMamba4D processes 4D CT data, providing comprehensive spatio-temporal information on lesion progression. Evaluated on an internal dataset comprising of 3,252 CT scans, OmniMamba4D achieves a competitive Dice score of 0.682, comparable to state-of-the-arts (SOTA) models, while maintaining computational efficiency and better detecting disappeared lesions. This work demonstrates a new framework to leverage spatio-temporal information for longitudinal CT lesion segmentation. △ Less

Submitted 24 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

Comments: Accepted at IEEE International Symposium on Biomedical Imaging (ISBI) 2025

arXiv:2503.19703 [pdf, other]

High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting

Authors: Qian Wang, Zhihao Zhan, Jialei He, Zhituo Tu, Xiang Zhu, Jie Yuan

Abstract: Highly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring.Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive a… ▽ More Highly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring.Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive and prone to errors.This work presents an alternative technique rooted in 2D Gaussian Splatting (2DGS), free of explicit DSM and occlusion detection. With depth map generation, spatial information for every pixel within the TDOM is retrieved and can reconstruct the scene with high precision. Divide-and-conquer strategy achieves excellent GS training and rendering with high-resolution TDOMs at a lower resource cost, which preserves higher quality of rendering on complex terrain and thin structure without a decrease in efficiency. Experimental results demonstrate the efficiency of large-scale scene reconstruction and high-precision terrain modeling. This approach provides accurate spatial data, which assists users in better planning and decision-making based on maps. △ Less

Submitted 13 May, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.12380 [pdf, other]

doi 10.24251/HICSS.2025.364

A Unified Approach to Enforce Non-Negativity Constraint in Neural Network Approximation for Optimal Voltage Regulation (preprint)

Authors: Jiaqi Wu, Jingyi Yuan, Yang Weng, Guangwen Wang

Abstract: Power system voltage regulation is crucial to maintain power quality while integrating intermittent renewable resources in distribution grids. However, the system model on the grid edge is often unknown, making it difficult to model physical equations for optimal control. Therefore, previous work proposes structured data-driven methods like input convex neural networks (ICNN) for "optimal" control… ▽ More Power system voltage regulation is crucial to maintain power quality while integrating intermittent renewable resources in distribution grids. However, the system model on the grid edge is often unknown, making it difficult to model physical equations for optimal control. Therefore, previous work proposes structured data-driven methods like input convex neural networks (ICNN) for "optimal" control without relying on a physical model. While ICNNs offer theoretical guarantees based on restrictive assumptions of non-negative neural network parameters, can one improve the approximation power with an extra step on negative duplication of inputs? We show that such added mirroring step fails to improve accuracy, as a linear combination of the original input and duplicated input is equivalent to a linear operation of ICNN's input without duplication. While this design can not improve performance, we propose a unified approach to embed the non-negativity constraint as a regularized optimization of the neural network, contrary to the existing methods, which added a loosely integrated second step for post-processing on parameter negation. Our integration directly ties back-propagation to simultaneously minimizing the approximation error while enforcing the convexity constraints. Numerical experiments validate the issues of the mirroring method and show that our integrated objective can avoid problems such as unstable training and non-convergence existing in other methods for optimal control. △ Less

Submitted 6 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

Comments: Submitted to the 58th Hawaii International Conference on System Sciences (HICSS-58)

Journal ref: HICSS'58 (2025) 3018-3027

arXiv:2503.10060 [pdf, other]

Sum-Rate Maximization for Pinching Antenna-assisted NOMA Systems with Multiple Dielectric Waveguides

Authors: Shaokang Hu, Ruotong Zhao, Yihuan Liao, Derrick Wing Kwan Ng, Jinhong Yuan

Abstract: This paper investigates the resource allocation design for a pinching antenna (PA)-assisted multiuser multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) system featuring multiple dielectric waveguides. To enhance model accuracy, we propose a novel frequency-dependent power attenuation model for the dielectric waveguides in PA-assisted systems. By jointly optimizing the preco… ▽ More This paper investigates the resource allocation design for a pinching antenna (PA)-assisted multiuser multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) system featuring multiple dielectric waveguides. To enhance model accuracy, we propose a novel frequency-dependent power attenuation model for the dielectric waveguides in PA-assisted systems. By jointly optimizing the precoder vector and the PA placement, we aim to maximize the system's sum-rate while accounting for the power attenuation across the dielectric waveguides. The design is formulated as a non-convex optimization problem. To effectively address the problem at hand, we introduce an alternating optimization-based algorithm to obtain a suboptimal solution in polynomial time. Our results demonstrate that the proposed PA-assisted system not only significantly outperforms the conventional system but also surpasses a naive PA-assisted system that disregards power attenuation. The performance gain compared to the naive PA-assisted system becomes more pronounced at high carrier frequencies, emphasizing the importance of considering power attenuation in system design. △ Less

Submitted 6 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

Comments: 7 pages, 3 figures, conference

arXiv:2503.01202 [pdf, other]

A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV Mapping

Authors: Jialei He, Zhihao Zhan, Zhituo Tu, Xiang Zhu, Jie Yuan

Abstract: Rapid generation of large-scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long-standing focus of research in the field of aerial mapping. A multi-sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter-wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi-sensor data to… ▽ More Rapid generation of large-scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long-standing focus of research in the field of aerial mapping. A multi-sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter-wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi-sensor data to overcome the limitations of conventional orthoimage generation methods in terms of temporal performance, system robustness, and geographic reference accuracy. A prior-pose-optimized feature matching method is introduced to enhance matching speed and accuracy, reducing the number of required features and providing precise references for the Structure from Motion (SfM) process. The proposed method exhibits robustness in low-texture scenes like farmlands, where feature matching is difficult. Experiments show that our approach achieves accurate feature matching orthoimage generation in a short time. The proposed drone system effectively aids in farmland detection and management. △ Less

Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

arXiv:2502.03497 [pdf]

SLCGC: A lightweight Self-supervised Low-pass Contrastive Graph Clustering Network for Hyperspectral Images

Authors: Yao Ding, Zhili Zhang, Aitao Yang, Yaoming Cai, Xiongwu Xiao, Danfeng Hong, Junsong Yuan

Abstract: Self-supervised hyperspectral image (HSI) clustering remains a fundamental yet challenging task due to the absence of labeled data and the inherent complexity of spatial-spectral interactions. While recent advancements have explored innovative approaches, existing methods face critical limitations in clustering accuracy, feature discriminability, computational efficiency, and robustness to noise,… ▽ More Self-supervised hyperspectral image (HSI) clustering remains a fundamental yet challenging task due to the absence of labeled data and the inherent complexity of spatial-spectral interactions. While recent advancements have explored innovative approaches, existing methods face critical limitations in clustering accuracy, feature discriminability, computational efficiency, and robustness to noise, hindering their practical deployment. In this paper, a self-supervised efficient low-pass contrastive graph clustering (SLCGC) is introduced for HSIs. Our approach begins with homogeneous region generation, which aggregates pixels into spectrally consistent regions to preserve local spatial-spectral coherence while drastically reducing graph complexity. We then construct a structural graph using an adjacency matrix A and introduce a low-pass graph denoising mechanism to suppress high-frequency noise in the graph topology, ensuring stable feature propagation. A dual-branch graph contrastive learning module is developed, where Gaussian noise perturbations generate augmented views through two multilayer perceptrons (MLPs), and a cross-view contrastive loss enforces structural consistency between views to learn noise-invariant representations. Finally, latent embeddings optimized by this process are clustered via K-means. Extensive experiments and repeated comparative analysis have verified that our SLCGC contains high clustering accuracy, low computational complexity, and strong robustness. The code source will be available at https://github.com/DY-HYX. △ Less

Submitted 6 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

Comments: 12 pages, 9 figures

arXiv:2502.01078 [pdf, ps, other]

Parallel Coding for Orthogonal Delay-Doppler Division Multiplexing

Authors: Qi Li, Jinhong Yuan, Min Qiu

Abstract: This paper proposes a novel parallel coding transmission strategy and an iterative detection and decoding receiver signal processing technique for orthogonal delay-Doppler division multiplexing (ODDM) modulation. Specifically, the proposed approach employs a parallel channel encoding (PCE) scheme that consists of multiple short-length codewords for each delay-Doppler multicarrier (DDMC) symbol. Bu… ▽ More This paper proposes a novel parallel coding transmission strategy and an iterative detection and decoding receiver signal processing technique for orthogonal delay-Doppler division multiplexing (ODDM) modulation. Specifically, the proposed approach employs a parallel channel encoding (PCE) scheme that consists of multiple short-length codewords for each delay-Doppler multicarrier (DDMC) symbol. Building upon such a PCE transmission framework, we then introduce an iterative detection and decoding algorithm incorporating a successive decoding feedback (SDF) technique, which enables instant information exchange between the detector and decoder for each DDMC symbol. To characterize the error performance of the proposed scheme, we perform density evolution analysis considering the finite blocklength effects. Our analysis results, coupled with extensive simulations, demonstrate that the proposed PCE scheme with the SDF algorithm not only showcases a better overall performance but also requires much less decoding complexity to implement, compared to the conventional benchmark scheme that relies on a single long channel code for coding the entire ODDM frame. △ Less

Submitted 3 February, 2025; originally announced February 2025.

Comments: 12 pages, 12 figures, accepted by IEEE Transactions on Communications

arXiv:2501.08026 [pdf, other]

Orthogonal Delay-Doppler Division Multiplexing Modulation with Hierarchical Mode-Based Index Modulation

Authors: Kehan Huang, Min Qiu, Jinhong Yuan

Abstract: The orthogonal time frequency space with index modulation (OTFS-IM) offers flexible tradeoffs between spectral efficiency (SE) and bit error rate (BER) in doubly selective fading channels. While OTFS-IM schemes demonstrated such potential, a persistent challenge lies in the detection complexity. To address this problem, we propose the hierarchical mode-based index modulation (HMIM). HMIM introduce… ▽ More The orthogonal time frequency space with index modulation (OTFS-IM) offers flexible tradeoffs between spectral efficiency (SE) and bit error rate (BER) in doubly selective fading channels. While OTFS-IM schemes demonstrated such potential, a persistent challenge lies in the detection complexity. To address this problem, we propose the hierarchical mode-based index modulation (HMIM). HMIM introduces a novel approach to modulate information bits by IM patterns, significantly simplifying the complexity of maximum a posteriori (MAP) estimation with Gaussian noise. Further, we incorporate HMIM with the recently proposed orthogonal delay-Doppler division multiplexing (ODDM) modulation, namely ODDM-HMIM, to exploit the full diversity of the delay-Doppler (DD) channel. The BER performance of ODDM-HMIM is analyzed considering a maximum likelihood (ML) detector. Our numerical results reveal that, with the same SE, HMIM can outperform conventional IM in terms of both BER and computational complexity. In addition, we propose a successive interference cancellation-based minimum mean square error (SIC-MMSE) detector for ODDM-HMIM, which enables low-complexity detection with large frame sizes. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.03689 [pdf, other]

MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Authors: Haojie Wei, Jun Yuan, Rui Zhang, Quanyu Dai, Yueguo Chen

Abstract: Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical ch… ▽ More Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures. △ Less

Submitted 7 January, 2025; originally announced January 2025.

arXiv:2412.13216 [pdf, other]

On the Time-Frequency Localization Characteristics of the Delay-Doppler Plane Orthogonal Pulse

Authors: Akram Shafie, Jinhong Yuan, Nan Yang, Hai Lin

Abstract: In this work, we study the time-frequency (TF) localization characteristics of the prototype pulse of orthogonal delay-Doppler (DD) division multiplexing modulation, namely, the DD plane orthogonal pulse (DDOP). The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain, the time domain (TD), and the frequency domain (FD). We first de… ▽ More In this work, we study the time-frequency (TF) localization characteristics of the prototype pulse of orthogonal delay-Doppler (DD) division multiplexing modulation, namely, the DD plane orthogonal pulse (DDOP). The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain, the time domain (TD), and the frequency domain (FD). We first derive the TF localization metrics of the DDOP, including its TF area, its time and frequency dispersions, and its direction parameter. Based on these results, we demonstrate that the DDOP exhibits a high energy spread in the TD, FD, and the joint TF domain, while adhering to the Heisenberg uncertainty principle. Thereafter, we discuss the potential advantages brought by the energy spread of the DDOP, especially with regard to harnessing both time and frequency diversities and enabling fine-resolution sensing. Subsequently, we examine the relationships between the time and frequency dispersions of the DDOP and those of the envelope functions of DDOP's TD and FD representations, paving the way for simplified determination of the TF localization metrics for more generalized variants of the DDOP and the pulses used in other DD domain modulation schemes. Finally, using numerical results, we validate our analysis and find further insights. △ Less

Submitted 14 December, 2024; originally announced December 2024.

Comments: This paper has been accepted for publication in an IEEE Journal

arXiv:2412.11325 [pdf, other]

Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals

Authors: Xiaoxuan Liang, Wuyang Zhang, Hong Zhou, Zhaolong Wei, Sicheng Zhu, Yansong Li, Rui Yin, Jiantao Yuan, Jeremy Gummeson

Abstract: 3D Human Mesh Reconstruction (HMR) from 2D RGB images faces challenges in environments with poor lighting, privacy concerns, or occlusions. These weaknesses of RGB imaging can be complemented by acoustic signals, which are widely available, easy to deploy, and capable of penetrating obstacles. However, no existing methods effectively combine acoustic signals with RGB data for robust 3D HMR. The pr… ▽ More 3D Human Mesh Reconstruction (HMR) from 2D RGB images faces challenges in environments with poor lighting, privacy concerns, or occlusions. These weaknesses of RGB imaging can be complemented by acoustic signals, which are widely available, easy to deploy, and capable of penetrating obstacles. However, no existing methods effectively combine acoustic signals with RGB data for robust 3D HMR. The primary challenges include the low-resolution images generated by acoustic signals and the lack of dedicated processing backbones. We introduce SonicMesh, a novel approach combining acoustic signals with RGB images to reconstruct 3D human mesh. To address the challenges of low resolution and the absence of dedicated processing backbones in images generated by acoustic signals, we modify an existing method, HRNet, for effective feature extraction. We also integrate a universal feature embedding technique to enhance the precision of cross-dimensional feature alignment, enabling SonicMesh to achieve high accuracy. Experimental results demonstrate that SonicMesh accurately reconstructs 3D human mesh in challenging environments such as occlusions, non-line-of-sight scenarios, and poor lighting. △ Less

Submitted 15 December, 2024; originally announced December 2024.

arXiv:2412.07074 [pdf, other]

Channel Spreading Function-Inspired Channel Transfer Function Estimation for OFDM Systems with High-Mobility

Authors: Yiyan Ma, Bo Ai, Guoyu Ma, Akram Shafie, Qingqing Cheng, Mi Yang, Jingli Li, Xuebo Pang, Jinhong Yuan, Zhangdui Zhong

Abstract: In this letter, we propose a novel channel transfer function (CTF) estimation approach for orthogonal frequency division multiplexing (OFDM) systems in high-mobility scenarios, that leverages the stationary properties of the delay-Doppler domain channel spreading function (CSF). First, we develop a CSF estimation model for OFDM systems that relies solely on discrete pilot symbols in the time-frequ… ▽ More In this letter, we propose a novel channel transfer function (CTF) estimation approach for orthogonal frequency division multiplexing (OFDM) systems in high-mobility scenarios, that leverages the stationary properties of the delay-Doppler domain channel spreading function (CSF). First, we develop a CSF estimation model for OFDM systems that relies solely on discrete pilot symbols in the time-frequency (TF) domain, positioned at predefined resource elements. We then present theorems to elucidate the relationship between CSF compactness and pilot spacing in the TF domain for accurate CSF acquisition. Based on the estimated CSF, we finally estimate the CTF for data symbols. Numerical results show that, in high-mobility scenarios, the proposed approach outperforms traditional interpolation-based methods and closely matches the optimal estimator in terms of estimation accuracy. This work may pave the way for CSF estimation in commercial OFDM systems, benefiting high-mobility communications, integrated sensing and communications, and related applications. △ Less

Submitted 9 December, 2024; originally announced December 2024.

arXiv:2412.06259 [pdf, other]

Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection

Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

Abstract: Compared to other clinical screening techniques, speech-and-language-based automated Alzheimer's disease (AD) detection methods are characterized by their non-invasiveness, cost-effectiveness, and convenience. Previous studies have demonstrated the efficacy of fine-tuning pre-trained language models (PLMs) for AD detection. However, the objective of this traditional fine-tuning method, which invol… ▽ More Compared to other clinical screening techniques, speech-and-language-based automated Alzheimer's disease (AD) detection methods are characterized by their non-invasiveness, cost-effectiveness, and convenience. Previous studies have demonstrated the efficacy of fine-tuning pre-trained language models (PLMs) for AD detection. However, the objective of this traditional fine-tuning method, which involves inputting only transcripts, is inconsistent with the masked language modeling (MLM) task used during the pre-training phase of PLMs. In this paper, we investigate prompt-based fine-tuning of PLMs, converting the classification task into a MLM task by inserting prompt templates into the transcript inputs. We also explore the impact of incorporating pause information from forced alignment into manual transcripts. Additionally, we compare the performance of various automatic speech recognition (ASR) models and select the Whisper model to generate ASR-based transcripts for comparison with manual transcripts. Furthermore, majority voting and ensemble techniques are applied across different PLMs (BERT and RoBERTa) using different random seeds. Ultimately, we obtain maximum detection accuracy of 95.8% (with mean 87.9%, std 3.3%) using manual transcripts, achieving state-of-the-art performance for AD detection using only transcripts on the ADReSS test set. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: Accepted by ISCSLP 2024

arXiv:2412.00058 [pdf]

Real-time volumetric free-hand ultrasound imaging for large-sized organs: A study of imaging the whole spine

Authors: Caozhe Li, Enxiang Shen, Haoyang Wang, Yuxin Wang, Jie Yuan, Li Gong, Di Zhao, Weijing Zhang, Zhibin Jin

Abstract: Three-dimensional (3D) ultrasound imaging can overcome the limitations of conventional two dimensional (2D) ultrasound imaging in structural observation and measurement. However, conducting volumetric ultrasound imaging for large-sized organs still faces difficulties including long acquisition time, inevitable patient movement, and 3D feature recognition. In this study, we proposed a real-time vol… ▽ More Three-dimensional (3D) ultrasound imaging can overcome the limitations of conventional two dimensional (2D) ultrasound imaging in structural observation and measurement. However, conducting volumetric ultrasound imaging for large-sized organs still faces difficulties including long acquisition time, inevitable patient movement, and 3D feature recognition. In this study, we proposed a real-time volumetric free-hand ultrasound imaging system optimized for the above issues and applied it to the clinical diagnosis of scoliosis. This study employed an incremental imaging method coupled with algorithmic acceleration to enable real-time processing and visualization of the large amounts of data generated when scanning large-sized organs. Furthermore, to deal with the difficulty of image feature recognition, we proposed two tissue segmentation algorithms to reconstruct and visualize the spinal anatomy in 3D space by approximating the depth at which the bone structures are located and segmenting the ultrasound images at different depths. We validated the adaptability of our system by deploying it to multiple models of ultra-sound equipment and conducting experiments using different types of ultrasound probes. We also conducted experiments on 6 scoliosis patients and 10 normal volunteers to evaluate the performance of our proposed method. Ultrasound imaging of a volunteer spine from shoulder to crotch (more than 500 mm) was performed in 2 minutes, and the 3D imaging results displayed in real-time were compared with the corresponding X-ray images with a correlation coefficient of 0.96 in spinal curvature. Our proposed volumetric ultrasound imaging system might hold the potential to be clinically applied to other large-sized organs. △ Less

Submitted 25 November, 2024; originally announced December 2024.

arXiv:2411.15529 [pdf, other]

Uplink Multiple Access with Heterogeneous Blocklength and Reliability Constraints: Discrete Signaling with Treating Interference as Noise

Authors: Min Qiu, Yu-Chih Huang, Jinhong Yuan

Abstract: We consider the uplink multiple access of heterogeneous users, e.g., ultra-reliable low-latency communications (URLLC) and enhanced mobile broadband (eMBB) users. Each user has its own reliability requirement and blocklength constraint, and users transmitting longer blocks suffer from heterogeneous interference. On top of that, the decoding of URLLC messages cannot leverage successive interference… ▽ More We consider the uplink multiple access of heterogeneous users, e.g., ultra-reliable low-latency communications (URLLC) and enhanced mobile broadband (eMBB) users. Each user has its own reliability requirement and blocklength constraint, and users transmitting longer blocks suffer from heterogeneous interference. On top of that, the decoding of URLLC messages cannot leverage successive interference cancellation (SIC) owing to the stringent latency requirements. This can significantly degrade the spectral efficiency of all URLLC users when the interference is strong. To overcome this issue, we propose a new multiple access scheme employing discrete signaling and treating interference as noise (TIN) decoding, i.e., without SIC. Specifically, to handle heterogeneous interference while maintaining the single-user encoding and decoding complexities, each user uses a single channel code and maps its coded bits onto sub-blocks of symbols, where the underlying constellations can be different. We demonstrate theoretically and numerically that the proposed scheme employing quadrature amplitude modulations and TIN decoding can perform very close to the benchmark scheme based on Gaussian signaling with perfect SIC decoding. Interestingly, we show that the proposed scheme does not need to use all the transmit power budget, but also can sometimes even outperform the benchmark scheme. △ Less

Submitted 23 November, 2024; originally announced November 2024.

Comments: 14 pages, 7 figures, accepted by IEEE Transactions on Communications. arXiv admin note: text overlap with arXiv:2308.08883

arXiv:2411.12985 [pdf, other]

Disco Intelligent Omni-Surfaces: 360-degree Fully-Passive Jamming Attacks

Authors: Huan Huang, Hongliang Zhang, Jide Yuan, Luyao Sun, Yitian Wang, Weidong Mei, Boya Di, Yi Cai, Zhu Han

Abstract: Intelligent omni-surfaces (IOSs) with 360-degree electromagnetic radiation significantly improves the performance of wireless systems, while an adversarial IOS also poses a significant potential risk for physical layer security. In this paper, we propose a "DISCO" IOS (DIOS) based fully-passive jammer (FPJ) that can launch omnidirectional fully-passive jamming attacks. In the proposed DIOS-based F… ▽ More Intelligent omni-surfaces (IOSs) with 360-degree electromagnetic radiation significantly improves the performance of wireless systems, while an adversarial IOS also poses a significant potential risk for physical layer security. In this paper, we propose a "DISCO" IOS (DIOS) based fully-passive jammer (FPJ) that can launch omnidirectional fully-passive jamming attacks. In the proposed DIOS-based FPJ, the interrelated refractive and reflective (R&R) coefficients of the adversarial IOS are randomly generated, acting like a "DISCO" that distributes wireless energy radiated by the base station. By introducing active channel aging (ACA) during channel coherence time, the DIOS-based FPJ can perform omnidirectional fully-passive jamming without neither jamming power nor channel knowledge of legitimate users (LUs). To characterize the impact of the DIOS-based PFJ, we derive the statistical characteristics of DIOS-jammed channels based on two widely-used IOS models, i.e., the constant-amplitude model and the variable-amplitude model. Consequently, the asymptotic analysis of the ergodic achievable sum rates under the DIOS-based omnidirectional fully-passive jamming is given based on the derived stochastic characteristics for both the two IOS models. Based on the derived analysis, the omnidirectional jamming impact of the proposed DIOS-based FPJ implemented by a constant-amplitude IOS does not depend on either the quantization number or the stochastic distribution of the DIOS coefficients, while the conclusion does not hold on when a variable-amplitude IOS is used. Numerical results based on one-bit quantization of the IOS phase shifts are provided to verify the effectiveness of the derived theoretical analysis. The proposed DIOS-based FPJ can not only launch omnidirectional fully-passive jamming, but also improve the jamming impact by about 55% at 10 dBm transmit power per LU. △ Less

Submitted 19 November, 2024; originally announced November 2024.

Comments: This paper has been submitted to IEEE TWC for possible publication

arXiv:2411.08570 [pdf, other]

doi 10.1109/LAWP.2025.3549313

Electromagnetic Modeling and Capacity Analysis of Rydberg Atom-Based MIMO System

Authors: Shuai S. A. Yuan, Xinyi Y. I. Xu, Jinpeng Yuan, Guoda Xie, Chongwen Huang, Xiaoming Chen, Zhixiang Huang, Wei E. I. Sha

Abstract: Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception… ▽ More Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception without mutual coupling, remain unexplored in the analysis of Rydberg atom-based spatial multiplexing, i.e., multiple-input and multiple-output (MIMO), communications. Generally, the design considerations for any antenna, even for atomic ones, can be extracted to factors such as radiation patterns, efficiency, and polarization, allowing them to be seamlessly integrated into existing system models. In this letter, we extract the antenna properties from relevant quantum characteristics, enabling electromagnetic modeling and capacity analysis of Rydberg MIMO systems in both far-field and near-field scenarios. By employing ray-based method for far-field analysis and dyadic Green's function for near-field calculation, our results indicate that Rydberg atom-based antenna arrays offer specific advantages over classical dipole-type arrays in single-polarization MIMO communications. △ Less

Submitted 13 November, 2024; originally announced November 2024.

Comments: in IEEE Antennas and Wireless Propagation Letters, 2025

arXiv:2410.17556 [pdf, other]

Performance of orthogonal delay-doppler division multiplexing modulation with imperfect channel estimation

Authors: Kehan Huang, Min Qiu, Jun Tong, Jinhong Yuan, Hai Lin

Abstract: The orthogonal delay-Doppler division multiplexing (ODDM) modulation is a recently proposed multi-carrier modulation that features a realizable pulse orthogonal with respect to the delay-Doppler (DD) plane's fine resolutions. In this paper, we investigate the performance of ODDM systems with imperfect channel estimation considering three detectors, namely the message passing algorithm (MPA) detect… ▽ More The orthogonal delay-Doppler division multiplexing (ODDM) modulation is a recently proposed multi-carrier modulation that features a realizable pulse orthogonal with respect to the delay-Doppler (DD) plane's fine resolutions. In this paper, we investigate the performance of ODDM systems with imperfect channel estimation considering three detectors, namely the message passing algorithm (MPA) detector, iterative maximum-ratio combining (MRC) detector, and successive interference cancellation with minimum mean square error (SIC-MMSE) detector. We derive the post-equalization signal-to-interference-plus-noise ratio (SINR) for MRC and SIC-MMSE and analyze their bit error rate (BER) performance. Based on this analysis, we propose the MRC with subtractive dither (MRC-SD) and soft SIC-MMSE initialized MRC (SSMI-MRC) detector to improve the BER of iterative MRC. Our results demonstrate that soft SIC-MMSE consistently outperforms the other detectors in BER performance under perfect and imperfect CSI. While MRC exhibits a BER floor above $10^{-5}$, MRC-SD effectively lowers the BER with a negligible increase in detection complexity. SSMI-MRC achieves better BER than hard SIC-MMSE with the same detection complexity order. Additionally, we show that MPA has an error floor and is sensitive to imperfect CSI. △ Less

Submitted 23 October, 2024; originally announced October 2024.

arXiv:2410.15358 [pdf, ps, other]

A New Adaptive Balanced Augmented Lagrangian Method with Application to ISAC Beamforming Design

Authors: Jiageng Wu, Bo Jiang, Xinxin Li, Ya-Feng Liu, Jianhua Yuan

Abstract: In this paper, we consider a class of convex programming problems with linear equality constraints, which finds broad applications in machine learning and signal processing. We propose a new adaptive balanced augmented Lagrangian (ABAL) method for solving these problems. The proposed ABAL method adaptively selects the stepsize parameter and enjoys a low per-iteration complexity, involving only the… ▽ More In this paper, we consider a class of convex programming problems with linear equality constraints, which finds broad applications in machine learning and signal processing. We propose a new adaptive balanced augmented Lagrangian (ABAL) method for solving these problems. The proposed ABAL method adaptively selects the stepsize parameter and enjoys a low per-iteration complexity, involving only the computation of a proximal mapping of the objective function and the solution of a linear equation. These features make the proposed method well-suited to large-scale problems. We then custom-apply the ABAL method to solve the ISAC beamforming design problem, which is formulated as a nonlinear semidefinite program in a previous work. This customized application requires careful exploitation of the problem's special structure such as the property that all of its signal-to-interference-and-noise-ratio (SINR) constraints hold with equality at the solution and an efficient computation of the proximal mapping of the objective function. Simulation results demonstrate the efficiency of the proposed ABAL method. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: 7 pages, 1 table

arXiv:2410.03682 [pdf, other]

Delay Alignment Modulation with Hybrid Analog/Digital Beamforming for Millimeter Wave and Terahertz Communications

Authors: Jieni Zhang, Yong Zeng, Xiangbin Yu, Shi Jin, Jinhong Yuan, Ying-Chang Liang, Rui Zhang

Abstract: For millimeter wave (mmWave) or Terahertz (THz) communications, by leveraging the high spatial resolution offered by large antenna arrays and the multi-path sparsity of mmWave/THz channels, a novel inter-symbol interference (ISI) mitigation technique called delay alignment modulation (DAM) has been recently proposed. The key ideas of DAM are delay pre-compensation and path-based beamforming. Howev… ▽ More For millimeter wave (mmWave) or Terahertz (THz) communications, by leveraging the high spatial resolution offered by large antenna arrays and the multi-path sparsity of mmWave/THz channels, a novel inter-symbol interference (ISI) mitigation technique called delay alignment modulation (DAM) has been recently proposed. The key ideas of DAM are delay pre-compensation and path-based beamforming. However, existing research on DAM is mainly based on fully digital beamforming, which requires the number of radio frequency (RF) chains to be equal to the number of antennas. This paper proposes the hybrid analog/digital beamforming based DAM, including both fully and partially connected structures. The analog and digital beamforming matrices are designed to achieve performance close to DAM based on fully digital beamforming. While DAM was considered for the path-based channel model with integer delays in the previous work, this paper extends DAM to a more general tap-based model that accounts for fractional path delays. To further reduce the cost of channel estimation and improve the performance for wireless channels with fractional delays, DAM with codebook-based beam alignment and DAM-orthogonal frequency division multiplexing (DAM-OFDM) with hybrid beamforming are proposed. The effectiveness of the proposed techniques is verified by extensive simulation results. △ Less

Submitted 20 September, 2024; originally announced October 2024.

arXiv:2409.16920 [pdf, other]

Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models

Authors: Zhichen Han, Tianqi Geng, Hui Feng, Jiahong Yuan, Korin Richmond, Yuanchao Li

Abstract: Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios. This study presents a comparative analysis between human performance and SSL models, beginning with a layer-wise analysis and an exploration of parameter-efficient fine-tuning strategies in monolingual, cross-lingual, and transfer lea… ▽ More Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios. This study presents a comparative analysis between human performance and SSL models, beginning with a layer-wise analysis and an exploration of parameter-efficient fine-tuning strategies in monolingual, cross-lingual, and transfer learning contexts. We further compare the SER ability of models and humans at both utterance- and segment-levels. Additionally, we investigate the impact of dialect on cross-lingual SER through human evaluation. Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers. We also demonstrate the significant effect of dialect on SER for individuals without prior linguistic and paralinguistic background. Moreover, both humans and models exhibit distinct behaviors across different emotions. These results offer new insights into the cross-lingual SER capabilities of SSL models, underscoring both their similarities to and differences from human emotion perception. △ Less

Submitted 30 April, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

Comments: Accepted to ICASSP 2025

arXiv:2409.01694 [pdf, other]

A novel and efficient parameter estimation of the Lognormal-Rician turbulence model based on k-Nearest Neighbor and data generation method

Authors: Maoke Miao, Xinyu Zhang, Bo Liu, Rui Yin, Jiantao Yuan, Feng Gao, Xiao-Yu Chen

Abstract: In this paper, we propose a novel and efficient parameter estimator based on $k$-Nearest Neighbor ($k$NN) and data generation method for the Lognormal-Rician turbulence channel. The Kolmogorov-Smirnov (KS) goodness-of-fit statistical tools are employed to investigate the validity of $k$NN approximation under different channel conditions and it is shown that the choice of $k$ plays a significant ro… ▽ More In this paper, we propose a novel and efficient parameter estimator based on $k$-Nearest Neighbor ($k$NN) and data generation method for the Lognormal-Rician turbulence channel. The Kolmogorov-Smirnov (KS) goodness-of-fit statistical tools are employed to investigate the validity of $k$NN approximation under different channel conditions and it is shown that the choice of $k$ plays a significant role in the approximation accuracy. We present several numerical results to illustrate that solving the constructed objective function can provide a reasonable estimate for the actual values. The accuracy of the proposed estimator is investigated in terms of the mean square error. The simulation results show that increasing the number of generation samples by two orders of magnitude does not lead to a significant improvement in estimation performance when solving the optimization problem by the gradient descent algorithm. However, the estimation performance under the genetic algorithm (GA) approximates to that of the saddlepoint approximation and expectation-maximization estimators. Therefore, combined with the GA, we demonstrate that the proposed estimator achieves the best tradeoff between the computation complexity and the accuracy. △ Less

Submitted 13 February, 2025; v1 submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.05440 [pdf]

doi 10.1109/TIP.2025.3558442

Content-decoupled Contrastive Learning-based Implicit Degradation Modeling for Blind Image Super-Resolution

Authors: Jiang Yuan, Ji Ma, Bo Wang, Weiming Hu

Abstract: Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoup… ▽ More Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoupled Contrastive Learning-based blind image super-resolution (CdCL) framework following the typical blind SR pipeline. This framework introduces negative-free contrastive learning technique for the first time to model the implicit degradation representation, in which a new cyclic shift sampling strategy is designed to ensure decoupling between content features and degradation features from the data perspective, thereby improving the purity and discriminability of the learned implicit degradation space. In addition, we propose a detail-aware implicit degradation adapting module that can better adapt degradation representations to specific LR features by enhancing the basic adaptation unit's perception of image details, significantly reducing the overall SR model complexity. Extensive experiments on synthetic and real data show that our method achieves highly competitive quantitative and qualitative results in various degradation settings while obviously reducing parameters and computational costs, validating the feasibility of designing practical and lightweight blind SR tools. △ Less

Submitted 1 April, 2025; v1 submitted 10 August, 2024; originally announced August 2024.

Report number: TIP-33069-2024

Journal ref: IEEE Transactions on Image Processing (2025)

arXiv:2408.02074 [pdf]

Applying Conditional Generative Adversarial Networks for Imaging Diagnosis

Authors: Haowei Yang, Yuxiang Hu, Shuyao He, Ting Xu, Jiajie Yuan, Xingxin Gu

Abstract: This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation a… ▽ More This study introduces an innovative application of Conditional Generative Adversarial Networks (C-GAN) integrated with Stacked Hourglass Networks (SHGN) aimed at enhancing image segmentation, particularly in the challenging environment of medical imaging. We address the problem of overfitting, common in deep learning models applied to complex imaging datasets, by augmenting data through rotation and scaling. A hybrid loss function combining L1 and L2 reconstruction losses, enriched with adversarial training, is introduced to refine segmentation processes in intravascular ultrasound (IVUS) imaging. Our approach is unique in its capacity to accurately delineate distinct regions within medical images, such as tissue boundaries and vascular structures, without extensive reliance on domain-specific knowledge. The algorithm was evaluated using a standard medical image library, showing superior performance metrics compared to existing methods, thereby demonstrating its potential in enhancing automated medical diagnostics through deep learning △ Less

Submitted 17 July, 2024; originally announced August 2024.

arXiv:2407.21514 [pdf]

Wireless Communications in Doubly Selective Channels with Domain Adaptivity

Authors: J. Andrew Zhang, Hongyang Zhang, Kai Wu, Xiaojing Huang, Jinhong Yuan, Y. Jay Guo

Abstract: Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article expl… ▽ More Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article explores domain-adaptive system design, with an emphasis on adaptive equalization, while also discussing modulation and pilot placement strategies. It investigates the dynamic selection of best-fit domains based on channel conditions to enhance performance across diverse environments. We examine channel domain connections, signal designs, and equalization techniques with domain adaptivity, and highlight future research opportunities. △ Less

Submitted 30 October, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: Magazine article, 7 pages, 4 figures, 2 tables

arXiv:2407.06580 [pdf, other]

Off-grid Channel Estimation for Orthogonal Delay-Doppler Division Multiplexing Using Grid Refinement and Adjustment

Authors: Yaru Shan, Akram Shafie, Jinhong Yuan, Fanggang Wang

Abstract: Orthogonal delay-Doppler (DD) division multiplexing (ODDM) has been recently proposed as a promising multicarrier modulation scheme to tackle Doppler spread in high-mobility environments. Accurate channel estimation is of paramount importance to guarantee reliable communication for the ODDM, especially when the delays and Dopplers of the propagation paths are off-grid. In this paper, we propose a… ▽ More Orthogonal delay-Doppler (DD) division multiplexing (ODDM) has been recently proposed as a promising multicarrier modulation scheme to tackle Doppler spread in high-mobility environments. Accurate channel estimation is of paramount importance to guarantee reliable communication for the ODDM, especially when the delays and Dopplers of the propagation paths are off-grid. In this paper, we propose a novel grid refinement and adjustment-based sparse Bayesian inference (GRASBI) scheme for DD domain channel estimation. The GRASBI involves first formulating the channel estimation problem as a sparse signal recovery through the introduction of a virtual DD grid. Then, an iterative process is proposed that involves (i) sparse Bayesian learning to estimate the channel parameters and (ii) a novel grid refinement and adjustment process to adjust the virtual grid points. The grid adjustment in GRASBI relies on the maximum likelihood principle to attain the adjustment and utilizes refined grids that have much higher resolution than the virtual grid. Moreover, a low-complexity grid refinement and adjustment-based channel estimation scheme is proposed, that can provides a good tradeoff between the estimation accuracy and the complexity. Finally, numerical results are provided to demonstrate the accuracy and efficiency of the proposed channel estimation schemes. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05391 [pdf, other]

Interference Management in MIMO-ISAC Systems: A Transceiver Design Approach

Authors: Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan

Abstract: The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe in… ▽ More The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe interference in the ISAC systems. Facing this challenge, we propose a joint optimization framework for transmit beamforming and receive filter design for ISAC systems with MIMO architecture. We aim to maximize the signal-to-clutter-plus-noise ratio (SCNR) at the receiver while considering various constraints such as waveform similarity, power budget, and communication performance requirements to ensure the integration of the dual functionalities. In particular, the overall transmit beamforming is refined into sensing beamforming and communication beamforming, and a quadratic transformation (QT) is introduced to relax and convert the complex non-convex optimization objective. An efficient algorithm based on covariance matrix tapers (CMT) is proposed to restructure the clutter covariance matrix considering the mismatched steering vector, thereby improving the robustness of the ISAC transceiver design. Numerical simulations are provided to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.18592 [pdf, ps, other]

On the Coexistence of OTFS Modulation with OFDM-based Communication Systems

Authors: Akram Shafie, Jinhong Yuan, Paul Fitzpatrick, Taka Sakurai, Yuting Fang

Abstract: We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes… ▽ More We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes (CPs) with unequal lengths to the OTFS signal and (ii) edge carrier unloading (ECU), to account for the impacts of CP length, frame structure, and subcarrier arrangement described in 3GPP standards for 4G/5G systems. Our analysis reveals that the inclusion of multiple CPs to the OTFS signal and ECU lead to the channel response exhibiting spreading effects/leakage along the Doppler and delay dimensions, respectively. Consequently, the effective sampled delay-Doppler (DD) domain channel model for OTFS in coexisting systems may exhibit reduced sparsity. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of CPs. Subsequently, we propose an interference cancellation-based channel estimation (CE) technique for OTFS in coexisting systems. Through numerical results, we validate our analysis, highlight the importance of not ignoring the unequal lengths of CPs during signal detection, and show the significance of the proposed CE technique. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: This paper has been submitted for publication in an IEEE Journal. arXiv admin note: text overlap with arXiv:2311.06850

arXiv:2406.18548 [pdf]

Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is added to the network for processing. The brain glioma MRI image dataset provided by cancer imaging archives was experimentally verified. A multi-scale segmentation method based on a weighted least squares filter was used to complete the 3D reconstruction of brain tumors. Thus, the accuracy of three-dimensional reconstruction is further improved. Experiments show that the local texture features obtained by the proposed algorithm are similar to those obtained by laser scanning. The algorithm is improved by using the U-Net method and an accuracy of 0.9851 is obtained. This approach significantly enhances the precision of image segmentation and boosts the efficiency of image classification. △ Less

Submitted 23 May, 2024; originally announced June 2024.

arXiv:2406.07410 [pdf, other]

Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech

Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

Abstract: We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and prepro… ▽ More We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and preprocessed Pitt recordings results in typical levels (approximately 80%) of AD detection accuracy. These results demonstrate a Clever Hans effect in AD detection on the Pitt corpus. Our findings emphasize the crucial importance of maintaining vigilance regarding inherent biases in datasets utilized for training deep learning models, and highlight the necessity for a better understanding of the models' performance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.04776 [pdf, ps, other]

OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

Authors: Tongyang Xu, Shuangyang Li, Jinhong Yuan

Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc b… ▽ More Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc back to 1924, with the aim of enhancing performance in industrial Internet of things (IIoT). In time-critical IIoT applications, low-latency and time-jitter tolerance are two critical factors that significantly impact the performance and reliability. Recognizing the inevitability of latency and jitter in practice, this work aims to propose a waveform technique to mitigate these effects via reducing latency and enhancing the system robustness under time jitter effects. The utilization of irSinc yields a signal with increased spectral efficiency without sacrificing error performance. Integrating the irSinc in a two-stage framework, a single-carrier non-orthogonal frequency shaping (SC-NOFS) waveform is developed, showcasing perfect compatibility with 5G standards, enabling the direct integration of irSinc in existing industrial IoT setups. Through 5G standard signal configuration, our signal achieves faster data transmission within the same spectral bandwidth. Hardware experiments validate an 18% saving in timing resources, leading to either reduced latency or enhanced jitter tolerance. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02126 [pdf, other]

CityLight: A Universal Model for Coordinated Traffic Signal Control in City-scale Heterogeneous Intersections

Authors: Jinwei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Qianyue Hao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

Abstract: The increasingly severe congestion problem in modern cities strengthens the significance of developing city-scale traffic signal control (TSC) methods for traffic efficiency enhancement. While reinforcement learning has been widely explored in TSC, most of them still target small-scale optimization and cannot directly scale to the city level due to unbearable resource demand. Only a few of them ma… ▽ More The increasingly severe congestion problem in modern cities strengthens the significance of developing city-scale traffic signal control (TSC) methods for traffic efficiency enhancement. While reinforcement learning has been widely explored in TSC, most of them still target small-scale optimization and cannot directly scale to the city level due to unbearable resource demand. Only a few of them manage to tackle city-level optimization, namely a thousand-scale optimization, by incorporating parameter-sharing mechanisms, but hardly have they fully tackled the heterogeneity of intersections and intricate between-intersection interactions inherent in real-world city road networks. To fill in the gap, we target at the two important challenges in adopting parameter-sharing paradigms to solve TSC: inconsistency of inner state representations for intersections heterogeneous in configuration, scale, and orders of available traffic phases; intricacy of impacts from neighborhood intersections that have various relative traffic relationships due to inconsistent phase orders and diverse relative positioning. Our method, CityLight, features a universal representation module that not only aligns the state representations of intersections by reindexing their phases based on their semantics and designing heterogeneity-preserving observations, but also encodes the narrowed relative traffic relation types to project the neighborhood intersections onto a uniform relative traffic impact space. We further attentively fuse neighborhood representations based on their competing relations and incorporate neighborhood-integrated rewards to boost coordination. Extensive experiments with hundreds to tens of thousands of intersections validate the surprising effectiveness and generalizability of CityLight, with an overall performance gain of 11.68% and a 22.59% improvement in transfer scenarios in throughput. △ Less

Submitted 28 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.08295 [pdf, other]

SpeechVerse: A Large-scale Generalizable Audio Language Model

Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sravan Bodapati, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks. △ Less

Submitted 24 March, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Single Column, 13 page

arXiv:2405.08288 [pdf, other]

doi 10.1109/TCOMM.2024.3519545

Orthogonal Delay-Doppler Division Multiplexing Modulation with Tomlinson-Harashima Precoding

Authors: Yiyan Ma, Akram Shafie, Jinhong Yuan, Guoyu Ma, Zhangdui Zhong, Bo Ai

Abstract: The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima p… ▽ More The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima precoding (THP) for the ODDM transmitter, to make the DD domain single-tap equalizer feasible, thereby reducing the equalization complexity. In our design, we first pre-cancel the inter-symbolinterference (ISI) using the linear time-varying (LTV) channel information. Second, different from classical THP designs, we introduce a modified modulo operation with an adaptive modulus, by which the joint DD domain data multiplexing and timedomain ISI pre-cancellation can be realized without excessively increasing the bit errors. We then analytically study the losses encountered in this design, namely the power loss, the modulo noise loss, and the modulo signal loss. Based on this analysis, BER lower bounds of the ODDM system with time domain THP are derived when 4-QAM or 16-QAM modulations are adopted for symbol mapping in the DD domain. Finally, through numerical results, we validate our analysis and then demonstrate that the ODDM system with time domain THP is a promising solution to realize better BER performance over LTV channels compared to orthogonal frequency division multiplexing systems with single-tap equalizer and ODDM systems with maximum ratio combining. △ Less

Submitted 13 December, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07547 [pdf, other]

Channel Coding Toward 6G: Technical Overview and Outlook

Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, Jinhong Yuan

Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Things (IoT) communications. As the fifth generation (5G) of mobile communications is currently in operation and 5G-advanced is on the horizon, the objective of this paper is to assess prominent channel coding schemes in the context of recent advancements and the anticipated requirements for the sixth generation (6G). In this paper, after considering the potential impact of channel coding on key performance indicators (KPIs) of wireless networks, we review the evolution of mobile communication standards and the organizations involved in the standardization, from the first generation (1G) to the current 5G, highlighting the technologies integral to achieving targeted KPIs such as reliability, data rate, latency, energy efficiency, spectral efficiency, connection density, and traffic capacity. Following this, we delve into the anticipated requirements for potential use cases in 6G. The subsequent sections of the paper focus on a comprehensive review of three primary coding schemes utilized in past generations and their recent advancements: low-density parity-check (LDPC) codes, turbo codes (including convolutional codes), polar codes (alongside Reed-Muller codes). Additionally, we examine alternative coding schemes like Fountain codes and sparse regression codes. Our evaluation includes a comparative analysis of error correction performance and the performance of hardware implementation for these coding schemes, providing insights into their potential and suitability for the upcoming 6G era. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

arXiv:2404.16253 [pdf, other]

Mitigating Automotive Radar Interference using Onboard Intelligent Reflective Surface

Authors: Shree Prasad Maruthi, Karrthik G. K., Vijaya Krishna A., Mahbub Hassan, Jinhong Yuan

Abstract: The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), o… ▽ More The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), or its "electromagnetic visibility." Our proposed method utilizes the IRS's ability to form a coherent reflection of the incident radar waveform back towards the source radar, thereby improving radar performance under interference. We evaluated both passive and active IRS options. Passive IRS, which does not support reflection amplification, was found to be counter-productive and actually decreased the vehicle's effective RCS instead of enhancing it. In contrast, active IRS, which can amplify the reflection power of individual elements, effectively combats all types of automotive radar interference when the reflective elements are configured with a 15-35 dB reflection gain. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 7 pages, 9 Figures

arXiv:2403.14192 [pdf, ps, other]

Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, Jinhong Yuan, Baoming Bai, Giuseppe Caire

Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis functions aligning with the time-frequency (TF)-consistency condition, which are globally quasi-periodic and locally twisted-shifted. We unveil that these features are translated to unique signal structures in both time and frequency, which are beneficial for communication purposes. Then, we focus on the practical implementations of DD Nyquist communications, where we show that rectangular windows achieve perfect DD orthogonality, while truncated periodic signals can obtain sufficient DD orthogonality. Particularly, smoothed rectangular window with excess bandwidth can result in a slightly worse orthogonality but better pulse localization in the DD domain. Furthermore, we present a practical pulse shaping framework for general DD communications and derive the corresponding input-output relation under various shaping pulses. Our numerical results agree with our derivations and also demonstrate advantages of DD communications over conventional orthogonal frequency-division multiplexing (OFDM). △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.10323 [pdf, ps, other]

Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks

Authors: Junteng Yao, Tuo Wu, Ming Jin, Cunhua Pan, Quanzhong Li, Jinhong Yuan

Abstract: This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq… ▽ More This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-square-error (MSE) of the AP, while considering transmit power constraints at both the AP and the sensors, as well as ensuring the covert transmission to Willie with a low detection error probability (DEP). However, obtaining globally optimal solutions for the investigated non-convex problem is challenging due to the interdependence of optimization variables. To tackle this problem, we introduce an exact penalty algorithm and transform the optimization problem into a difference-of-convex (DC) form problem to find a locally optimal solution. Simulation results showcase the superior performance in terms of our proposed scheme in comparison to the benchmark schemes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.02012 [pdf, other]

OTFS vs OFDM: Which is Superior in Multiuser LEO Satellite Communications

Authors: Yu Liu, Ming Chen, Cunhua Pan, Tantao Gong, Jinhong Yuan, Jiangzhou Wang

Abstract: Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satell… ▽ More Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satellite communication system have rarely been investigated. In this paper, we conduct a performance comparison under various channel conditions between the OTFS and OFDM modulations, encompassing evaluations of sum-rate and bit error ratio (BER). Additionally, we investigate the joint optimal allocation of power and delay-Doppler resource blocks aiming at maximizing sum-rate for multiuser downlink OTFS-based LEO satellite communication systems. Unlike the conventional modulations relaying on complex input-output relations within the Time-Frequency (TF) domain, the OTFS modulation exploits both time and frequency diversities, i.e., delay and Doppler shifts remain constant during a OTFS frame, which facilitates a DD domain input-output simple relation for our investigation. We transform the resulting non-convex and combinatorial optimization problem into an equivalent difference of convex problem by decoupling the conditional constraints, and solve the transformed problem via penalty convex-concave procedure algorithm. Simulation results demonstrate that the OTFS modulation is robust to carrier frequency offsets (CFO) caused by high-mobility of LEO satellites, and has superior performance to the OFDM modulation. Moreover, numerical results indicate that our proposed resource allocation scheme has higher sum-rate than existed schemes for the OTFS modulation, such as delay divided multiple access and Doppler divided multiple access, especially in the high signal-to-noise ratio (SNR) regime. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures

arXiv:2402.12127 [pdf, other]

Rate-Splitting Multiple Access for Transmissive Reconfigurable Intelligent Surface Transceiver Empowered ISAC System

Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Jinhong Yuan, Shanshan Zhang, Zhendong Li, Jun Li

Abstract: In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of se… ▽ More In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of service (QoS) criteria based on this structure and construct an optimization problem with the sensing QoS criteria as the objective function to optimize the sensing stream precoding matrix and the communication stream precoding matrix. Due to the coupling of optimization variables, the formulated problem is a non-convex optimization problem that cannot be solved directly. To tackle the above-mentioned challenging problem, alternating optimization (AO) is utilized to decouple the optimization variables. Specifically, the problem is decoupled into three subproblems about the sensing stream precoding matrix, the communication stream precoding matrix, and the auxiliary variables, which is solved alternatively through AO until the convergence is reached. For solving the problem, successive convex approximation (SCA) is applied to deal with the sum-rate threshold constraints on communications, and difference-of-convex (DC) programming is utilized to solve rank-one non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in terms of improving the communication and sensing QoS. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.15164 [pdf, other]

AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

Authors: Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee, Junsong Yuan, Yu-Ping Chang

Abstract: Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make… ▽ More Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and Central networks. The proposed MAN injects cross-modal attention via its Peripheral key-value pairs within each layer of a mode-specific Central query network. The resulting cross-attended mode-specific descriptors are then combined using an Adaptive Fusion technique that enables the model to integrate the discriminative and complementary mode-specific data patterns within an instance-specific multimodal descriptor. Given a dialogue represented by a sequence of utterances, the proposed AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level. This helps not only in delivering better classification performance (3-5% improvement in Weighted-F1 and 5-7% improvement in Accuracy) in large-scale public datasets but also helps the users in understanding the reasoning behind each emotion prediction made by the model via its Multimodal Explainability Visualization module. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.11058 [pdf, ps, other]

Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Authors: Qi Li, Jinhong Yuan, Min Qiu, Shuangyang Li, Yixuan Xie

Abstract: Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are de… ▽ More Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are detected based on layers processed by multiple SIC-MMSE linear filters for each sub-channel, with interference on the targeted signal layer being successively canceled either by hard or soft information. To reduce the complexity of computing individual layer filter coefficients, we also propose a novel filter coefficients recycling approach in place of generating the exact form of MMSE filter weights. Moreover, we design a joint detection and decoding algorithm for ZP-OTFS to enhance error performance. Compared to the conventional SIC-MMSE detection, our proposed algorithms outperform other linear detectors, e.g., maximal ratio combining (MRC), for ZP-OTFS with up to 3 dB gain while maintaining comparable computation complexity. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Communications

arXiv:2401.01433 [pdf, other]

Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook

Authors: Bruno Clerckx, Yijie Mao, Zhaohui Yang, Mingzhe Chen, Ahmed Alkhateeb, Liang Liu, Min Qiu, Jinhong Yuan, Vincent W. S. Wong, Juan Montojo

Abstract: Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligenc… ▽ More Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligence (AI) in wireless networks, MA techniques are expected to experience a paradigm shift in 6G and beyond. In this paper, we provide a tutorial, survey and outlook of past, emerging and future MA techniques and pay a particular attention to how wireless network intelligence and multi-functionality will lead to a re-thinking of those techniques. The paper starts with an overview of orthogonal, physical layer multicasting, space domain, power domain, ratesplitting, code domain MAs, and other domains, and highlight the importance of researching universal multiple access to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions. It then jumps into rethinking MA schemes in the era of wireless network intelligence, covering AI for MA such as AI-empowered resource allocation, optimization, channel estimation, receiver designs, user behavior predictions, and MA for AI such as federated learning/edge intelligence and over the air computation. We then discuss MA for network multi-functionality and the interplay between MA and integrated sensing, localization, and communications. We finish with studying MA for emerging intelligent applications before presenting a roadmap toward 6G standardization. We also point out numerous directions that are promising for future research. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: submitted for publication in Proceedings of the IEEE

arXiv:2311.15556 [pdf, other]

PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Authors: Jiquan Yuan, Xinyan Cao, Changjin Li, Fanyi Yang, Jinlong Lin, Xixin Cao

Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natural images, and not all generated images meet the requirements of the real world. Therefore, it is of great significance to evaluate AIGIs more comprehensively. Although previous work has established several human perception-based AIGC image quality assessment (AIGCIQA) databases for text-generated images, the AI image generation technology includes scenarios like text-to-image and image-to-image, and assessing only the images generated by text-to-image models is insufficient. To address this issue, we establish a human perception-based image-to-image AIGCIQA database, named PKU-I2IQA. We conduct a well-organized subjective experiment to collect quality labels for AIGIs and then conduct a comprehensive analysis of the PKU-I2IQA database. Furthermore, we have proposed two benchmark models: NR-AIGCIQA based on the no-reference image quality assessment method and FR-AIGCIQA based on the full-reference image quality assessment method. Finally, leveraging this database, we conduct benchmark experiments and compare the performance of the proposed benchmark models. The PKU-I2IQA database and benchmarks will be released to facilitate future research on \url{https://github.com/jiquan123/I2IQA}. △ Less

Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 18 pages

arXiv:2311.13787 [pdf, other]

A Fast Power Spectrum Sensing Solution for Generalized Coprime Sampling

Authors: Kaili Jiang, Dechang Wang, Kailun Tian, Hancong Feng, Yuxin Zhao, Junyu Yuan, Bin Tang

Abstract: The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocor… ▽ More The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocorrelation sequence of inputs can be reconstructed from sub-Nyquist samples by only utilizing the parallel Fourier transform and simple multiplication operations. Thus, it takes less time than the state-of-the-art methods while maintaining the same performance, and it achieves higher performance than the existing methods within the same execution time, without the need for pre-estimating the number of inputs. Furthermore, the influence of the model mismatch has only a minor impact on the estimation performance, which allows for more efficient use of the spectrum resource in a distributed swarm scenario. Simulation results demonstrate the low complexity in sampling and computation, making it a more practical solution for real-time and distributed wideband spectrum sensing applications. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Showing 1–50 of 174 results for author: Yuan, J