-
A Scalable Factorization Approach for High-Order Structured Tensor Recovery
Authors:
Zhen Qin,
Michael B. Wakin,
Zhihui Zhu
Abstract:
Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal…
▽ More
Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal processing, machine learning, and quantum physics. A computationally and memory-efficient approach to these problems is to optimize directly over the factors using local search algorithms such as gradient descent, a strategy known as the factorization approach in matrix and tensor optimization. However, the resulting optimization problems are highly nonconvex due to the multiplicative interactions between factors, posing significant challenges for convergence analysis and recovery guarantees.
In this paper, we present a unified framework for the factorization approach to solving various tensor decomposition problems. Specifically, by leveraging the canonical form of tensor decompositions--where most factors are constrained to be orthonormal to mitigate scaling ambiguity--we apply Riemannian gradient descent (RGD) to optimize these orthonormal factors on the Stiefel manifold. Under a mild condition on the loss function, we establish a Riemannian regularity condition for the factorized objective and prove that RGD converges to the ground-truth tensor at a linear rate when properly initialized. Notably, both the initialization requirement and the convergence rate scale polynomially rather than exponentially with $N$, improving upon existing results for Tucker and tensor-train format tensors.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
Authors:
Helin Wang,
Jiarui Hai,
Dading Chong,
Karan Thakkar,
Tiantian Feng,
Dongchao Yang,
Junhyeok Lee,
Laureano Moro Velazquez,
Jesus Villalba,
Zengyi Qin,
Shrikanth Narayanan,
Mounya Elhiali,
Najim Dehak
Abstract:
Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchm…
▽ More
Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchmark designed for a series of CapTTS-related tasks, including style-captioned text-to-speech synthesis with sound events (CapTTS-SE), accent-captioned TTS (AccCapTTS), emotion-captioned TTS (EmoCapTTS), and text-to-speech synthesis for chat agent (AgentTTS). CapSpeech comprises over 10 million machine-annotated audio-caption pairs and nearly 0.36 million human-annotated audio-caption pairs. In addition, we introduce two new datasets collected and recorded by a professional voice actor and experienced audio engineers, specifically for the AgentTTS and CapTTS-SE tasks. Alongside the datasets, we conduct comprehensive experiments using both autoregressive and non-autoregressive models on CapSpeech. Our results demonstrate high-fidelity and highly intelligible speech synthesis across a diverse range of speaking styles. To the best of our knowledge, CapSpeech is the largest available dataset offering comprehensive annotations for CapTTS-related tasks. The experiments and findings further provide valuable insights into the challenges of developing CapTTS systems.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework
Authors:
Zhenkai Qin,
Feng Zhu,
Huan Zeng,
Xunyi Nong
Abstract:
The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with…
▽ More
The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with limited computational resources (e.g., edge devices or real-time systems). To address this, we propose the Multi-Agent Aggregation Module (MAAM), a lightweight attention architecture integrated with the MindSpore framework. MAAM employs three parallel agent branches with independently parameterized operations to extract heterogeneous features, adaptively fused via learnable scalar weights, and refined through a convolutional compression layer. Leveraging MindSpore's dynamic computational graph and operator fusion, MAAM achieves 87.0% accuracy on the CIFAR-10 dataset, significantly outperforming conventional CNN (58.3%) and MLP (49.6%) models, while improving training efficiency by 30%. Ablation studies confirm the critical role of agent attention (accuracy drops to 32.0% if removed) and compression modules (25.5% if omitted), validating their necessity for maintaining discriminative feature learning. The framework's hardware acceleration capabilities and minimal memory footprint further demonstrate its practicality, offering a deployable solution for image classification in resource-constrained scenarios without compromising accuracy.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
IRS Assisted Decentralized Learning for Wideband Spectrum Sensing
Authors:
Sicheng Liu,
Qun Wang,
Zhuwei Qin,
Weishan Zhang,
Jingyi Wang,
Xiang Ma
Abstract:
The increasing demand for reliable connectivity in industrial environments necessitates effective spectrum utilization strategies, especially in the context of shared spectrum bands.
However, the dynamic spectrum-sharing mechanisms often lead to significant interference and critical failures, creating a trade-off between spectrum scarcity and under-utilization.
This paper addresses these chall…
▽ More
The increasing demand for reliable connectivity in industrial environments necessitates effective spectrum utilization strategies, especially in the context of shared spectrum bands.
However, the dynamic spectrum-sharing mechanisms often lead to significant interference and critical failures, creating a trade-off between spectrum scarcity and under-utilization.
This paper addresses these challenges by proposing a novel Intelligent Reflecting Surface (IRS)-assisted spectrum sensing framework integrated with decentralized deep learning.
The proposed model overcomes partial observation constraints and minimizes communication overhead while leveraging IRS technology to enhance spectrum sensing accuracy.
Through comprehensive simulations, the framework demonstrates its ability to monitor wideband spectrum occupancy effectively, even under challenging signal-to-noise ratio (SNR) conditions.
This approach offers a scalable and robust solution for spectrum management in next-generation wireless networks.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Multi-Task Semantic Communications via Large Models
Authors:
Wanli Ni,
Zhijin Qin,
Haofeng Sun,
Xiaoming Tao,
Zhu Han
Abstract:
Artificial intelligence (AI) promises to revolutionize the design, optimization and management of next-generation communication systems. In this article, we explore the integration of large AI models (LAMs) into semantic communications (SemCom) by leveraging their multi-modal data processing and generation capabilities. Although LAMs bring unprecedented abilities to extract semantics from raw data…
▽ More
Artificial intelligence (AI) promises to revolutionize the design, optimization and management of next-generation communication systems. In this article, we explore the integration of large AI models (LAMs) into semantic communications (SemCom) by leveraging their multi-modal data processing and generation capabilities. Although LAMs bring unprecedented abilities to extract semantics from raw data, this integration entails multifaceted challenges including high resource demands, model complexity, and the need for adaptability across diverse modalities and tasks. To overcome these challenges, we propose a LAM-based multi-task SemCom (MTSC) architecture, which includes an adaptive model compression strategy and a federated split fine-tuning approach to facilitate the efficient deployment of LAM-based semantic models in resource-limited networks. Furthermore, a retrieval-augmented generation scheme is implemented to synthesize the most recent local and global knowledge bases to enhance the accuracy of semantic extraction and content generation, thereby improving the inference performance. Finally, simulation results demonstrate the efficacy of the proposed LAM-based MTSC architecture, highlighting the performance enhancements across various downstream tasks under varying channel conditions.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Estimating Control Barriers from Offline Data
Authors:
Hongzhan Yu,
Seth Farrell,
Ryo Yoshimitsu,
Zhizhen Qin,
Henrik I. Christensen,
Sicun Gao
Abstract:
Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to…
▽ More
Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. Our approach introduces new annotation techniques based on out-of-distribution analysis, enabling efficient knowledge propagation from the limited labeled data to the unlabeled data. We also eliminate the dependency on a high-performance expert controller, and allow multiple sub-optimal policies or even manual control during data collection. We evaluate the proposed method on real-world platforms. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance, demonstrating statistically safer and less conservative maneuvers compared to existing methods.
△ Less
Submitted 20 February, 2025;
originally announced March 2025.
-
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Authors:
Yiming Li,
Kaiying Yan,
Shuo Shao,
Tongqing Zhai,
Shu-Tao Xia,
Zhan Qin,
Dacheng Tao
Abstract:
With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)…
▽ More
With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW), enabling dataset owners to determine whether a suspicious third-party model has been trained on a protected dataset under a black-box setting. The CBW method consists of two key stages: dataset watermarking and ownership verification. During watermarking, we implant multiple trigger patterns in the dataset to make similar samples (measured by their feature similarities) close to the same trigger while dissimilar samples are near different triggers. This ensures that any model trained on the watermarked dataset exhibits specific misclassification behaviors when exposed to trigger-embedded inputs. To verify dataset ownership, we design a hypothesis-test-based framework that statistically evaluates whether a suspicious model exhibits the expected backdoor behavior. We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks. The code for reproducing main experiments is available at https://github.com/Radiant0726/CBW
△ Less
Submitted 5 April, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
SpikACom: A Neuromorphic Computing Framework for Green Communications
Authors:
Yanzhen Liu,
Zhijin Qin,
Yongxu Zhu,
Geoffrey Ye Li
Abstract:
The ever-growing power consumption of wireless communication systems necessitates more energy-efficient algorithms. This paper introduces SpikACom ({Spik}ing {A}daptive {Com}munication), a neuromorphic computing-based framework for power-intensive wireless communication tasks. SpikACom leverages brain-inspired spiking neural networks (SNNs) for efficient signal processing. It is designed for dynam…
▽ More
The ever-growing power consumption of wireless communication systems necessitates more energy-efficient algorithms. This paper introduces SpikACom ({Spik}ing {A}daptive {Com}munication), a neuromorphic computing-based framework for power-intensive wireless communication tasks. SpikACom leverages brain-inspired spiking neural networks (SNNs) for efficient signal processing. It is designed for dynamic wireless environments, helping to mitigate catastrophic forgetting and facilitate adaptation to new circumstances. Moreover, SpikACom is customizable, allowing flexibly integration of domain knowledge to enhance it interpretability and efficacy. We validate its performance on fundamental wireless communication tasks, including task-oriented semantic communication, multiple-input multiple-output (MIMO) beamforming, and orthogonal frequency-division multiplexing (OFDM) channel estimation. The simulation results show that SpikACom significantly reduces power consumption while matching or exceeding the performance of conventional algorithms. This study highlights the potential of SNNs for enabling greener and smarter wireless communication systems.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Adaptive Sampling and Joint Semantic-Channel Coding under Dynamic Channel Environment
Authors:
Zhiyuan Qi,
Yulong Feng,
Zhijin Qin
Abstract:
Deep learning enabled semantic communications are attracting extensive attention. However, most works normally ignore the data acquisition process and suffer from robustness issues under dynamic channel environment. In this paper, we propose an adaptive joint sampling-semantic-channel coding (Adaptive-JSSCC) framework. Specifically, we propose a semantic-aware sampling and reconstruction method to…
▽ More
Deep learning enabled semantic communications are attracting extensive attention. However, most works normally ignore the data acquisition process and suffer from robustness issues under dynamic channel environment. In this paper, we propose an adaptive joint sampling-semantic-channel coding (Adaptive-JSSCC) framework. Specifically, we propose a semantic-aware sampling and reconstruction method to optimize the number of samples dynamically for each region of the images. According to semantic significance, we optimize sampling matrices for each region of the most individually and obtain a semantic sampling ratio distribution map shared with the receiver. Through the guidance of the map, high-quality reconstruction is achieved. Meanwhile, attention-based channel adaptive module (ACAM) is designed to overcome the neural network model mismatch between the training and testing channel environment during sampling-reconstruction and encoding-decoding. To this end, signal-to-noise ratio (SNR) is employed as an extra parameter input to integrate and reorganize intermediate characteristics. Simulation results show that the proposed Adaptive-JSSCC effectively reduces the amount of data acquisition without degrading the reconstruction performance in comparison to the state-of-the-art, and it is highly adaptable and adjustable to dynamic channel environments.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Predictive Lagrangian Optimization for Constrained Reinforcement Learning
Authors:
Tianqi Zhang,
Puzhen Yuan,
Guojian Zhan,
Ziyu Lin,
Yao Lyu,
Zhenzhi Qin,
Jingliang Duan,
Liping Zhang,
Shengbo Eben Li
Abstract:
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral…
▽ More
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is described by policy gradient descent, and the objective is to minimize constraint violations. Then, we introduce a multiplier guided policy learning (MGPL) module to perform policy parameters updating. And we prove that the resulting optimal policy, achieved through alternating MFOCP and MGPL, aligns with the solution of the primal constrained RL problem, thereby establishing our equivalence framework. Furthermore, we point out that the existing PID Lagrangian is merely one special case within our framework that utilizes a PID controller. We also accommodate the integration of other various feedback controllers, thereby facilitating the development of new algorithms. As a representative, we employ model predictive control (MPC) as the feedback controller and consequently propose a new algorithm called predictive Lagrangian optimization (PLO). Numerical experiments demonstrate its superiority over the PID Lagrangian method, achieving a larger feasible region up to 7.2% and a comparable average reward.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Large Model Empowered Streaming Speech Semantic Communications
Authors:
Zhenzi Weng,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
In this paper, we introduce a large model-empowered streaming semantic communication system for speech transmission across various languages, named LSSC-ST. Specifically, we devise an edge-device collaborative semantic communication architecture by offloading the intricate semantic extraction and channel coding modules to edge servers, thereby reducing the computational burden on local devices. To…
▽ More
In this paper, we introduce a large model-empowered streaming semantic communication system for speech transmission across various languages, named LSSC-ST. Specifically, we devise an edge-device collaborative semantic communication architecture by offloading the intricate semantic extraction and channel coding modules to edge servers, thereby reducing the computational burden on local devices. To support multilingual speech transmission, pre-trained large speech models are utilized to learn unified semantic features from speech in different languages, breaking the constraint of a single input language and enhancing the practicality of the LSSC-ST. Moreover, the input speech is sequentially streamed into the developed system as short speech segments, which enables low transmission latency without degrading the quality of the produced speech. A novel dynamic speech segmentation algorithm is proposed to further reduce the transmission latency by adaptively adjusting the duration of speech segments. According to simulation results, the LSSC-ST provides more accurate speech transmission and achieves a streaming manner with lower latency compared to the existing non-streaming semantic communication systems.
△ Less
Submitted 21 February, 2025; v1 submitted 10 January, 2025;
originally announced January 2025.
-
Optimal Error Analysis of Channel Estimation for IRS-assisted MIMO Systems
Authors:
Zhen Qin,
Zhihui Zhu
Abstract:
As intelligent reflecting surface (IRS) has emerged as a new and promising technology capable of configuring the wireless environment favorably, channel estimation for IRS-assisted multiple-input multiple-output (MIMO) systems has garnered extensive attention in recent years. While various algorithms have been proposed to address this challenge, there is a lack of rigorous theoretical error analys…
▽ More
As intelligent reflecting surface (IRS) has emerged as a new and promising technology capable of configuring the wireless environment favorably, channel estimation for IRS-assisted multiple-input multiple-output (MIMO) systems has garnered extensive attention in recent years. While various algorithms have been proposed to address this challenge, there is a lack of rigorous theoretical error analysis. This paper aims to address this gap by providing theoretical guarantees in terms of stable recovery of channel matrices for noisy measurements. We begin by establishing the equivalence between IRS-assisted MIMO systems and a compact tensor train (TT)-based tensor-on-tensor (ToT) regression. Building on this equivalence, we then investigate the restricted isometry property (RIP) for complex-valued subgaussian measurements. Our analysis reveals that successful recovery hinges on the relationship between the number of user terminals (in the uplink scenario) or base stations (in the downlink scenario) and the number of time slots during which channel matrices remain invariant. Utilizing the RIP condition, we analyze the theoretical recovery error for the solution to a constrained least-squares optimization problem, including upper error bound and minimax lower bound, demonstrating that the error decreases inversely with the number of time slots and increases proportionally with the number of unknown elements in the channel matrices. In addition, we extend our error analysis to two more specialized IRS-assisted MIMO systems, incorporating low-rank channel matrices or an unknown IRS. Furthermore, we explore a multi-hop IRS scheme and analyze the corresponding recovery errors. Finally, we introduce and implement two nonconvex optimization algorithms--alternating least squares and alternating gradient descent--to validate our conclusions through simulations.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
On Privacy, Security, and Trustworthiness in Distributed Wireless Large AI Models (WLAM)
Authors:
Zhaohui Yang,
Wei Xu,
Le Liang,
Yuanhao Cui,
Zhijin Qin,
Merouane Debbah
Abstract:
Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosys…
▽ More
Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosystems. However, the security considerations and sustainable communication resources limit the deployment of large AI models over distributed wireless networks. This paper provides a comprehensive overview of privacy, security, and trustworthy for distributed wireless large AI model (WLAM). In particular, a detailed privacy and security are analysis for distributed WLAM is fist revealed. The classifications and theoretical findings about privacy and security in distributed WLAM are discussed. Then the trustworthy and ethics for implementing distributed WLAM are described. Finally, the comprehensive applications of distributed WLAM are presented in the context of electromagnetic signal processing.
△ Less
Submitted 4 December, 2024; v1 submitted 3 December, 2024;
originally announced December 2024.
-
Optimal Allocation of Pauli Measurements for Low-rank Quantum State Tomography
Authors:
Zhen Qin,
Casey Jameson,
Zhexuan Gong,
Michael B. Wakin,
Zhihui Zhu
Abstract:
The process of reconstructing quantum states from experimental measurements, accomplished through quantum state tomography (QST), plays a crucial role in verifying and benchmarking quantum devices. A key challenge of QST is to find out how the accuracy of the reconstruction depends on the number of state copies used in the measurements. When multiple measurement settings are used, the total number…
▽ More
The process of reconstructing quantum states from experimental measurements, accomplished through quantum state tomography (QST), plays a crucial role in verifying and benchmarking quantum devices. A key challenge of QST is to find out how the accuracy of the reconstruction depends on the number of state copies used in the measurements. When multiple measurement settings are used, the total number of state copies is determined by multiplying the number of measurement settings with the number of repeated measurements for each setting. Due to statistical noise intrinsic to quantum measurements, a large number of repeated measurements is often used in practice. However, recent studies have shown that even with single-sample measurements--where only one measurement sample is obtained for each measurement setting--high accuracy QST can still be achieved with a sufficiently large number of different measurement settings. In this paper, we establish a theoretical understanding of the trade-off between the number of measurement settings and the number of repeated measurements per setting in QST. Our focus is primarily on low-rank density matrix recovery using Pauli measurements. We delve into the global landscape underlying the low-rank QST problem and demonstrate that the joint consideration of measurement settings and repeated measurements ensures a bounded recovery error for all second-order critical points, to which optimization algorithms tend to converge. This finding suggests the advantage of minimizing the number of repeated measurements per setting when the total number of state copies is held fixed. Additionally, we prove that the Wirtinger gradient descent algorithm can converge to the region of second-order critical points with a linear convergence rate. We have also performed numerical experiments to support our theoretical findings.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Authors:
Zhilin Zhang,
Jie Wang,
Zhanghao Qin,
Ruiqi Zhu,
Xiaoliang Gong
Abstract:
Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. Wh…
▽ More
Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. While recent approaches rely heavily on unified large pre-trained Visual-Language Models, research on more efficient fusion mechanisms remains relatively limited in this domain. In this paper, we introduce a fusion model, OMniBAN, that integrates Orthogonality loss, Multi-head attention, and a Bilinear Attention Network to achieve high computational efficiency as well as solid performance. We conduct comprehensive experiments and demonstrate how bilinear attention fusion can approximate the performance of larger fusion models like cross-modal Transformer. Our results show that OMniBAN requires fewer parameters (approximately 2/3 of Transformer-based Co-Attention) and substantially lower FLOPs (approximately 1/4), while achieving comparable overall performance and even slight improvements on closed-ended questions on two key MedVQA benchmarks. This balance between efficiency and accuracy suggests that OMniBAN could be a viable option for real-world medical image question answering, where computational resources are often constrained.
△ Less
Submitted 11 May, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
SEEV: Synthesis with Efficient Exact Verification for ReLU Neural Barrier Functions
Authors:
Hongchao Zhang,
Zhizhen Qin,
Sicun Gao,
Andrew Clark
Abstract:
Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundar…
▽ More
Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundary, thus incurring high computation cost. In this paper, we propose a framework for Synthesis with Efficient Exact Verification (SEEV). Our framework consists of two components, namely (i) an NCBF synthesis algorithm that introduces a novel regularizer to reduce the number of activation regions at the safety boundary, and (ii) a verification algorithm that exploits tight over-approximations of the safety conditions to reduce the cost of verifying each piecewise-linear segment. Our simulations show that SEEV significantly improves verification efficiency while maintaining the CBF quality across various benchmark systems and neural network structures. Our code is available at https://github.com/HongchaoZhang-HZ/SEEV.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
EEG-DIF: Early Warning of Epileptic Seizures through Generative Diffusion Model-based Multi-channel EEG Signals Forecasting
Authors:
Zekun Jiang,
Wei Dai,
Qu Wei,
Ziyuan Qin,
Kang Li,
Le Zhang
Abstract:
Multi-channel EEG signals are commonly used for the diagnosis and assessment of diseases such as epilepsy. Currently, various EEG diagnostic algorithms based on deep learning have been developed. However, most research efforts focus solely on diagnosing and classifying current signal data but do not consider the prediction of future trends for early warning. Additionally, since multi-channel EEG c…
▽ More
Multi-channel EEG signals are commonly used for the diagnosis and assessment of diseases such as epilepsy. Currently, various EEG diagnostic algorithms based on deep learning have been developed. However, most research efforts focus solely on diagnosing and classifying current signal data but do not consider the prediction of future trends for early warning. Additionally, since multi-channel EEG can be essentially regarded as the spatio-temporal signal data received by detectors at different locations in the brain, how to construct spatio-temporal information representations of EEG signals to facilitate future trend prediction for multi-channel EEG becomes an important problem. This study proposes a multi-signal prediction algorithm based on generative diffusion models (EEG-DIF), which transforms the multi-signal forecasting task into an image completion task, allowing for comprehensive representation and learning of the spatio-temporal correlations and future developmental patterns of multi-channel EEG signals. Here, we employ a publicly available epilepsy EEG dataset to construct and validate the EEG-DIF. The results demonstrate that our method can accurately predict future trends for multi-channel EEG signals simultaneously. Furthermore, the early warning accuracy for epilepsy seizures based on the generated EEG data reaches 0.89. In general, EEG-DIF provides a novel approach for characterizing multi-channel EEG signals and an innovative early warning algorithm for epilepsy seizures, aiding in optimizing and enhancing the clinical diagnosis process. The code is available at https://github.com/JZK00/EEG-DIF.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Robust Low-rank Tensor Train Recovery
Authors:
Zhen Qin,
Zhihui Zhu
Abstract:
Tensor train (TT) decomposition represents an $N$-order tensor using $O(N)$ matrices (i.e., factors) of small dimensions, achieved through products among these factors. Due to its compact representation, TT decomposition has found wide applications, including various tensor recovery problems in signal processing and quantum information. In this paper, we study the problem of reconstructing a TT fo…
▽ More
Tensor train (TT) decomposition represents an $N$-order tensor using $O(N)$ matrices (i.e., factors) of small dimensions, achieved through products among these factors. Due to its compact representation, TT decomposition has found wide applications, including various tensor recovery problems in signal processing and quantum information. In this paper, we study the problem of reconstructing a TT format tensor from measurements that are contaminated by outliers with arbitrary values. Given the vulnerability of smooth formulations to corruptions, we use an $\ell_1$ loss function to enhance robustness against outliers. We first establish the $\ell_1/\ell_2$-restricted isometry property (RIP) for Gaussian measurement operators, demonstrating that the information in the TT format tensor can be preserved using a number of measurements that grows linearly with $N$. We also prove the sharpness property for the $\ell_1$ loss function optimized over TT format tensors. Building on the $\ell_1/\ell_2$-RIP and sharpness property, we then propose two complementary methods to recover the TT format tensor from the corrupted measurements: the projected subgradient method (PSubGM), which optimizes over the entire tensor, and the factorized Riemannian subgradient method (FRSubGM), which optimizes directly over the factors. Compared to PSubGM, the factorized approach FRSubGM significantly reduces the memory cost at the expense of a slightly slower convergence rate. Nevertheless, we show that both methods, with diminishing step sizes, converge linearly to the ground-truth tensor given an appropriate initialization, which can be obtained by a truncated spectral method.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Partial reciprocity-based precoding matrix prediction in FDD massive MIMO with mobility
Authors:
Ziao Qin,
Haifan Yin
Abstract:
The timely precoding of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems is a substantial challenge in practice, especially in mobile environments. In order to improve the precoding performance and reduce the precoding complexity, we propose a partial reciprocity-based precoding matrix prediction scheme and further reduce its complexity by exploiting the channe…
▽ More
The timely precoding of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems is a substantial challenge in practice, especially in mobile environments. In order to improve the precoding performance and reduce the precoding complexity, we propose a partial reciprocity-based precoding matrix prediction scheme and further reduce its complexity by exploiting the channel gram matrix. We prove that the precoders can be predicted through a closed-form eigenvector interpolation which was based on the periodic eigenvector samples. Numerical results validate the performance improvements of our schemes over the conventional schemes from 30 km/h to 500 km/h of moving speed.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Sample-Efficient Quantum State Tomography for Structured Quantum States in One Dimension
Authors:
Zhen Qin,
Casey Jameson,
Alireza Goldar,
Michael B. Wakin,
Zhexuan Gong,
Zhihui Zhu
Abstract:
While quantum state tomography (QST) remains the gold standard for benchmarking and verifying quantum devices, it requires an exponentially large number of measurements and classical computational resources for generic quantum many-body systems, making it impractical even for intermediate-size quantum devices. Fortunately, many physical quantum states often exhibit certain low-dimensional structur…
▽ More
While quantum state tomography (QST) remains the gold standard for benchmarking and verifying quantum devices, it requires an exponentially large number of measurements and classical computational resources for generic quantum many-body systems, making it impractical even for intermediate-size quantum devices. Fortunately, many physical quantum states often exhibit certain low-dimensional structures that enable the development of efficient QST. A notable example is the class of states represented by matrix product operators (MPOs) with a finite matrix/bond dimension, which include most physical states in one dimension and where the number of independent parameters describing the states only grows linearly with the number of qubits. Whether a sample efficient quantum state tomography protocol, where the number of required state copies scales only linearly as the number of parameters describing the state, exists for a generic MPO state still remains an important open question.
In this paper, we answer this fundamental question affirmatively by using a class of informationally complete positive operator-valued measures (IC-POVMs) -- including symmetric IC-POVMs (SIC-POVMs) and spherical $t$-designs -- focusing on sample complexity while not accounting for the implementation complexity of the measurement settings. For SIC-POVMs and (approximate) spherical 2-designs, we show that the number of state copies to guarantee bounded recovery error of an MPO state with a constrained least-squares estimator depends on the probability distribution of the MPO under the POVM but scales only linearly with $n$ when the distribution is approximately uniform. For spherical $t$-designs with $t\geq 3$, we prove that only a number of state copies proportional to the number of independent parameters in the MPO is sufficient for a guaranteed recovery of any state represented by an MPO.
△ Less
Submitted 1 May, 2025; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Meta Learning Based Adaptive Cooperative Perception in Nonstationary Vehicular Networks
Authors:
Kaige Qu,
Zixiong Qin,
Weihua Zhuang
Abstract:
To accommodate high network dynamics in real-time cooperative perception (CP), reinforcement learning (RL) based adaptive CP schemes have been proposed, to allow adaptive switchings between CP and stand-alone perception modes among connected and autonomous vehicles. The traditional offline-training online-execution RL framework suffers from performance degradation under nonstationary network condi…
▽ More
To accommodate high network dynamics in real-time cooperative perception (CP), reinforcement learning (RL) based adaptive CP schemes have been proposed, to allow adaptive switchings between CP and stand-alone perception modes among connected and autonomous vehicles. The traditional offline-training online-execution RL framework suffers from performance degradation under nonstationary network conditions. To achieve fast and efficient model adaptation, we formulate a set of Markov decision processes for adaptive CP decisions in each stationary local vehicular network (LVN). A meta RL solution is proposed, which trains a meta RL model that captures the general features among LVNs, thus facilitating fast model adaptation for each LVN with the meta RL model as an initial point. Simulation results show the superiority of meta RL in terms of the convergence speed without reward degradation. The impact of the customization level of meta models on the model adaptation performance has also been evaluated.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
VSLLaVA: a pipeline of large multimodal foundation model for industrial vibration signal analysis
Authors:
Qi Li,
Jinfeng Huang,
Hongliang He,
Xinran Zhang,
Feibin Zhang,
Zhaoye Qin,
Fulei Chu
Abstract:
Large multimodal foundation models have been extensively utilized for image recognition tasks guided by instructions, yet there remains a scarcity of domain expertise in industrial vibration signal analysis. This paper presents a pipeline named VSLLaVA that leverages a large language model to integrate expert knowledge for identification of signal parameters and diagnosis of faults. Within this pi…
▽ More
Large multimodal foundation models have been extensively utilized for image recognition tasks guided by instructions, yet there remains a scarcity of domain expertise in industrial vibration signal analysis. This paper presents a pipeline named VSLLaVA that leverages a large language model to integrate expert knowledge for identification of signal parameters and diagnosis of faults. Within this pipeline, we first introduce an expert rule-assisted signal generator. The generator merges signal provided by vibration analysis experts with domain-specific parameter identification and fault diagnosis question-answer pairs to build signal-question-answer triplets. Then we use these triplets to apply low-rank adaptation methods for fine-tuning the linear layers of the Contrastive Language-Image Pretraining (CLIP) and large language model, injecting multimodal signal processing knowledge. Finally, the fine-tuned model is assessed through the combined efforts of large language model and expert rules to evaluate answer accuracy and relevance, which showcases enhanced performance in identifying, analyzing various signal parameters, and diagnosing faults. These enhancements indicate the potential of this pipeline to build a foundational model for future industrial signal analysis and monitoring.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Authors:
Yun Tian,
Jingkai Ying,
Zhijin Qin,
Ye Jin,
Xiaoming Tao
Abstract:
Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the sem…
▽ More
Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the semantic and time domains is a challenging problem. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding. To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics, and we propose a semantic codec that achieves similar quality of reconstruction and synchronization with lower bandwidth, compared to traditional methods. To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates. Particularly, for text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which significantly improves the performance of traditional FEC methods. The simulation results show that our proposed SyncSC reduce transmission overhead and achieve high-quality synchronous transmission of video and speech over the packet loss network.
△ Less
Submitted 10 August, 2024; v1 submitted 8 August, 2024;
originally announced August 2024.
-
A Secure and Efficient Distributed Semantic Communication System for Heterogeneous Internet of Things
Authors:
Weihao Zeng,
Xinyu Xu,
Qianyun Zhang,
Jiting Shi,
Zhenyu Guan,
Shufeng Li,
Zhijin Qin
Abstract:
Semantic communications are expected to improve the transmission efficiency in Internet of Things (IoT) networks. However, the distributed nature of networks and heterogeneity of devices challenge the secure utilization of semantic communication systems. In this paper, we develop a distributed semantic communication system that achieves the security and efficiency during update and usage phases. A…
▽ More
Semantic communications are expected to improve the transmission efficiency in Internet of Things (IoT) networks. However, the distributed nature of networks and heterogeneity of devices challenge the secure utilization of semantic communication systems. In this paper, we develop a distributed semantic communication system that achieves the security and efficiency during update and usage phases. A blockchain-based trust scheme for update is designed to continuously train and synchronize the system in dynamic IoT environments. To improve the updating efficiency, we propose a flexible semantic coding method base on compressive semantic knowledge bases. It greatly reduces the amount of data shared among devices for system update, and realizes the flexible adjustment of the size of knowledge bases and the number of transmitted signal symbols in model training and inference stages. In the usage phase, a signature mechanism for lossy semantics is introduced to guarantee the integrity and authenticity of the transmitted semantics in lossy semantic communications. We further design a noise-aware differential privacy mechanism, which introduces optimized noise based on the different channel information available to heterogeneous devices. Experiments on text transmission tasks show that the proposed system achieves the protection of the integrity and privacy for exchanged semantics, and reduces the data to be transmitted in the update phase by about $35\%$ to $88\%$, and in the usage phase by $60\%$ compared with related works.
△ Less
Submitted 11 December, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
DreamVoice: Text-Guided Voice Conversion
Authors:
Jiarui Hai,
Karan Thakkar,
Helin Wang,
Zengyi Qin,
Mounya Elhilali
Abstract:
Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. O…
▽ More
Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. Our paper presents two major contributions to VC technology: (1) DreamVoiceDB, a robust dataset of voice timbre annotations for 900 speakers from VCTK and LibriTTS. (2) Two text-guided VC methods: DreamVC, an end-to-end diffusion-based text-guided VC model; and DreamVG, a versatile text-to-voice generation plugin that can be combined with any one-shot VC models. The experimental results demonstrate that our proposed methods trained on the DreamVoiceDB dataset generate voice timbres accurately aligned with the text prompt and achieve high-quality VC.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition
Authors:
Zhen Qin,
Zhihui Zhu
Abstract:
Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (T…
▽ More
Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD.
△ Less
Submitted 1 May, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Hybrid Digital-Analog Semantic Communications
Authors:
Huiqiang Xie,
Zhijin Qin,
Zhu Han,
Khaled B. Letaief
Abstract:
Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is…
▽ More
Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is achieved through the introduction of digital-analog allocation and fusion modules. To strike a balance between data rate and distortion, we design new loss functions that take into account long-distance dependencies in the semantic distortion constraint, essential information recovery in the channel distortion constraint, and optimal bit stream generation in the rate constraint. Additionally, we propose denoising diffusion-based signal detection techniques, which involve carefully designed variance schedules and sampling algorithms to refine transmitted signals. Through extensive numerical experiments, we will demonstrate that HDA-DeepSC exhibits robustness to channel variations and is capable of supporting various communication scenarios. Our proposed framework outperforms existing benchmarks in terms of peak signal-to-noise ratio and multi-scale structural similarity, showcasing its superiority in semantic communication quality.
△ Less
Submitted 27 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Semantic MIMO Systems for Speech-to-Text Transmission
Authors:
Zhenzi Weng,
Zhijin Qin,
Huiqiang Xie,
Xiaoming Tao,
Khaled B. Letaief
Abstract:
Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve…
▽ More
Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate transmission with high semantic fidelity by identifying the critical semantic information and guaranteeing its accurate recovery. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
△ Less
Submitted 5 October, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Hybrid Bit and Semantic Communications
Authors:
Kaiwen Yu,
Renhe Fan,
Gang Wu,
Zhijin Qin
Abstract:
Semantic communication technology is regarded as a method surpassing the Shannon limit of bit transmission, capable of effectively enhancing transmission efficiency. However, current approaches that directly map content to transmission symbols are challenging to deploy in practice, imposing significant limitations on the development of semantic communication. To address this challenge, we propose…
▽ More
Semantic communication technology is regarded as a method surpassing the Shannon limit of bit transmission, capable of effectively enhancing transmission efficiency. However, current approaches that directly map content to transmission symbols are challenging to deploy in practice, imposing significant limitations on the development of semantic communication. To address this challenge, we propose a hybrid bit and semantic communication system, named HybridBSC, in which encoded semantic information is inserted into bit information for transmission via conventional digital communication systems utilizing same spectrum resources. The system can be easily deployed using existing communication architecture to achieve bit and semantic information transmission. Particularly, we design a semantic insertion and extraction scheme to implement this strategy. Furthermore, we conduct experimental validation based on the pluto-based software defined radio (SDR) platform in a real wireless channel, demonstrating that the proposed strategy can simultaneously transmit semantic and bit information.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
Authors:
Xiaoshuai Wu,
Xin Liao,
Bo Ou,
Yuling Liu,
Zheng Qin
Abstract:
AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images…
▽ More
AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
A Robust Semantic Communication System for Image
Authors:
Xiang Peng,
Zhijin Qin,
Xiaoming Tao,
Jianhua Lu,
Khaled B. Letaief
Abstract:
Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric…
▽ More
Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric for quantifying the intensity of semantic impairment and develop a semantic impairment dataset. Furthermore, we introduce a deep learning enabled semantic communication system, termed as DeepSC-RI, to enhance the robustness of image transmission, which incorporates a multi-scale semantic extractor with a dual-branch architecture for extracting semantics with varying granularity, thereby improving the robustness of the system. The fine-grained branch incorporates a semantic importance evaluation module to identify and prioritize crucial semantics, while the coarse-grained branch adopts a hierarchical approach for capturing the robust semantics. These two streams of semantics are seamlessly integrated via an advanced cross-attention-based semantic fusion module. Experimental results demonstrate the superior performance of DeepSC-RI under various levels of semantic impairment intensity.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Robust Semantic Communications for Speech Transmission
Authors:
Zhenzi Weng,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
In this paper, we propose a robust semantic communication system for speech transmission, named Ross-S2T, by delivering the essential semantic information. Specifically, we consider the speech-to-text translation (S2TT) as the transmission goal. First, a new deep semantic encoder is developed to convert speech in the source language to textual features associated with the target language, facilita…
▽ More
In this paper, we propose a robust semantic communication system for speech transmission, named Ross-S2T, by delivering the essential semantic information. Specifically, we consider the speech-to-text translation (S2TT) as the transmission goal. First, a new deep semantic encoder is developed to convert speech in the source language to textual features associated with the target language, facilitating the end-to-end semantic exchange to perform the S2TT task and reducing the transmission data without performance degradation. To mitigate semantic impairments inherent in the corrupted speech, a novel generative adversarial network (GAN)-enabled deep semantic compensator is established to estimate the lost semantic information within the speech and extract deep semantic features simultaneously, which enables robust semantic transmission for corrupted speech. Furthermore, a semantic probe-aided compensator is devised to enhance the semantic fidelity of recovered semantic features and improve the understandability of the target text. According to simulation results, the proposed Ross-S2T exhibits superior S2TT performance compared to conventional approaches and high robustness against semantic impairments.
△ Less
Submitted 4 July, 2025; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Computational Offloading in Semantic-Aware Cloud-Edge-End Collaborative Networks
Authors:
Zelin Ji,
Zhijin Qin
Abstract:
The trend of massive connectivity pushes forward the explosive growth of end devices. The emergence of various applications has prompted a demand for pervasive connectivity and more efficient computing paradigms. On the other hand, the lack of computational capacity of the end devices restricts the implementation of the intelligent applications, and becomes a bottleneck of the multiple access for…
▽ More
The trend of massive connectivity pushes forward the explosive growth of end devices. The emergence of various applications has prompted a demand for pervasive connectivity and more efficient computing paradigms. On the other hand, the lack of computational capacity of the end devices restricts the implementation of the intelligent applications, and becomes a bottleneck of the multiple access for supporting massive connectivity. Mobile cloud computing (MCC) and mobile edge computing (MEC) techniques enable end devices to offload local computation-intensive tasks to servers by networks. In this paper, we consider the cloud-edge-end collaborative networks to utilize distributed computing resources. Furthermore, we apply task-oriented semantic communications to tackle the fast-varying channel between the end devices and MEC servers and reduce the communication cost. To minimize long-term energy consumption on constraints queue stability and computational delay, a Lyapunov-guided deep reinforcement learning hybrid (DRLH) framework is proposed to solve the mixed integer non-linear programming (MINLP) problem. The long-term energy consumption minimization problem is transformed into the deterministic problem in each time frame. The DRLH framework integrates a model-free deep reinforcement learning algorithm with a model-based mathematical optimization algorithm to mitigate computational complexity and leverage the scenario information, so that improving the convergence performance. Numerical results demonstrate that the proposed DRLH framework achieves near-optimal performance on energy consumption while stabilizing all queues.
△ Less
Submitted 19 May, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Towards Intelligent Communications: Large Model Empowered Semantic Communications
Authors:
Huiqiang Xie,
Zhijin Qin,
Xiaoming Tao,
Zhu Han
Abstract:
Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication…
▽ More
Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication by enhancing semantic understanding and contextual understanding. This article systematically investigates the large model-empowered semantic communication systems from potential applications to system design. First, we propose a new semantic communication architecture that seamlessly integrates large models into semantic communication through the introduction of a memory module. Then, the typical applications are illustrated to show the benefits of the new architecture. Besides, we discuss the key designs in implementing the new semantic communication systems from module design to system training. Finally, the potential research directions are identified to boost the large model-empowered semantic communications.
△ Less
Submitted 19 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery
Authors:
Zhen Qin,
Michael B. Wakin,
Zhihui Zhu
Abstract:
In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those fact…
▽ More
In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.
△ Less
Submitted 21 December, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Federated Multi-View Synthesizing for Metaverse
Authors:
Yiyu Guo,
Zhijin Qin,
Xiaoming Tao,
Geoffrey Ye Li
Abstract:
The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view…
▽ More
The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.
△ Less
Submitted 18 December, 2023;
originally announced January 2024.
-
OpenVoice: Versatile Instant Voice Cloning
Authors:
Zengyi Qin,
Wenliang Zhao,
Xumin Yu,
Xin Sun
Abstract:
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotio…
▽ More
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. OpenVoice has been used by more than 2M users worldwide as the voice engine of MyShell.ai
△ Less
Submitted 18 August, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
An End-Cloud Computing Enabled Surveillance Video Transmission System
Authors:
Dingxi Yang,
Zhijin Qin,
Liting Wang,
Xiaoming Tao,
Fang Cui,
Hengjiang Wang
Abstract:
The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundan…
▽ More
The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundant frame elimination module is employed to further reduce the data volume of surveillance videos. Then we develop a key-frame assisted video super-resolution model to reconstruct the high-quality video at the cloud side. Moreover, we propose a strategy of extracting key frames from source videos for better reconstruction performance by utilizing the peak signal-to-noise ratio (PSNR) of adjacent frames to measure the propagation distance of key frame information. Simulation results show that the developed system can effectively reduce the data volume by the end-cloud collaboration and outperforms existing video super-resolution models significantly in terms of PSNR and structural similarity index (SSIM).
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Compression Ratio Learning and Semantic Communications for Video Imaging
Authors:
Bowen Zhang,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
Camera sensors have been widely used in intelligent robotic systems. Developing camera sensors with high sensing efficiency has always been important to reduce the power, memory, and other related resources. Inspired by recent success on programmable sensors and deep optic methods, we design a novel video compressed sensing system with spatially-variant compression ratios, which achieves higher im…
▽ More
Camera sensors have been widely used in intelligent robotic systems. Developing camera sensors with high sensing efficiency has always been important to reduce the power, memory, and other related resources. Inspired by recent success on programmable sensors and deep optic methods, we design a novel video compressed sensing system with spatially-variant compression ratios, which achieves higher imaging quality than the existing snapshot compressed imaging methods with the same sensing costs. In this article, we also investigate the data transmission methods for programmable sensors, where the performance of communication systems is evaluated by the reconstructed images or videos rather than the transmission of sensor data itself. Usually, different reconstruction algorithms are designed for applications in high dynamic range imaging, video compressive sensing, or motion debluring. This task-aware property inspires a semantic communication framework for programmable sensors. In this work, a policy-gradient based reinforcement learning method is introduced to achieve the explicit trade-off between the compression (or transmission) rate and the image distortion. Numerical results show the superiority of the proposed methods over existing baselines.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Low-complexity eigenvector prediction-based precoding matrix prediction in massive MIMO with mobility
Authors:
Ziao Qin,
Haifan Yin,
Weidong Li
Abstract:
In practical massive multiple-input multiple-output (MIMO) systems, the precoding matrix is often obtained from the eigenvectors of channel matrices and is challenging to update in time due to finite computation resources at the base station, especially in mobile scenarios. In order to reduce the precoding complexity while enhancing the spectral efficiency (SE), a novel precoding matrix prediction…
▽ More
In practical massive multiple-input multiple-output (MIMO) systems, the precoding matrix is often obtained from the eigenvectors of channel matrices and is challenging to update in time due to finite computation resources at the base station, especially in mobile scenarios. In order to reduce the precoding complexity while enhancing the spectral efficiency (SE), a novel precoding matrix prediction method based on the eigenvector prediction (EGVP) is proposed. The basic idea is to decompose the periodic uplink channel eigenvector samples into a linear combination of the channel state information (CSI) and channel weights. We further prove that the channel weights can be interpolated by an exponential model corresponding to the Doppler characteristics of the CSI. A fast matrix pencil prediction (FMPP) method is also devised to predict the CSI. We also prove that our scheme achieves asymptotically error-free precoder prediction with a distinct complexity advantage. Simulation results show that under the perfect non-delayed CSI, the proposed EGVP method reduces floating point operations by 80\% without losing SE performance compared to the traditional full-time precoding scheme. In more realistic cases with CSI delays, the proposed EGVP-FMPP scheme has clear SE performance gains compared to the precoding scheme widely used in current communication systems.
△ Less
Submitted 30 June, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Semantic-Aware Image Compressed Sensing
Authors:
Bowen Zhang,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
Deep learning based image compressed sensing (CS) has achieved great success. However, existing CS systems mainly adopt a fixed measurement matrix to images, ignoring the fact the optimal measurement numbers and bases are different for different images. To further improve the sensing efficiency, we propose a novel semantic-aware image CS system. In our system, the encoder first uses a fixed number…
▽ More
Deep learning based image compressed sensing (CS) has achieved great success. However, existing CS systems mainly adopt a fixed measurement matrix to images, ignoring the fact the optimal measurement numbers and bases are different for different images. To further improve the sensing efficiency, we propose a novel semantic-aware image CS system. In our system, the encoder first uses a fixed number of base CS measurements to sense different images. According to the base CS results, the encoder then employs a policy network to analyze the semantic information in images and determines the measurement matrix for different image areas. At the decoder side, a semantic-aware initial reconstruction network is developed to deal with the changes of measurement matrices used at the encoder. A rate-distortion training loss is further introduced to dynamically adjust the average compression ratio for the semantic-aware CS system and the policy network is trained jointly with the encoder and the decoder in an en-to-end manner by using some proxy functions. Numerical results show that the proposed semantic-aware image CS system is superior to the traditional ones with fixed measurement matrices.
△ Less
Submitted 10 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Meta Federated Reinforcement Learning for Distributed Resource Allocation
Authors:
Zelin Ji,
Zhijin Qin,
Xiaoming Tao
Abstract:
In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel condi…
▽ More
In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel conditions, we propose a robust meta federated reinforcement learning (\textit{MFRL}) framework that allows local users to optimize transmit power and assign channels using locally trained neural network models, so as to offload computational burden from the cloud server to the local users, reducing transmission overhead associated with local channel state information. The BS performs the meta learning procedure to initialize a general global model, enabling rapid adaptation to different environments with improved EE performance. The federated learning technique, based on decentralized reinforcement learning, promotes collaboration and mutual benefits among users. Analysis and numerical results demonstrate that the proposed \textit{MFRL} framework accelerates the reinforcement learning process, decreases transmission overhead, and offloads computation, while outperforming the conventional decentralized reinforcement learning algorithm in terms of convergence speed and EE performance across various scenarios.
△ Less
Submitted 9 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Quantum State Tomography for Matrix Product Density Operators
Authors:
Zhen Qin,
Casey Jameson,
Zhexuan Gong,
Michael B. Wakin,
Zhihui Zhu
Abstract:
The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the mos…
▽ More
The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general.
In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state.
△ Less
Submitted 18 February, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
QoE-based Semantic-Aware Resource Allocation for Multi-Task Networks
Authors:
Lei Yan,
Zhijin Qin,
Chunfeng Li,
Rui Zhang,
Yongzhao Li,
Xiaoming Tao
Abstract:
By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope wi…
▽ More
By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope with this challenge, we propose a quality-of-experience (QoE) based semantic-aware resource allocation method for multi-task networks in this paper. First, semantic entropy is defined to quantify the semantic information for different tasks, and the relationship between semantic entropy and Shannon entropy is analyzed. Then, we develop a novel QoE model to formulate the semantic-aware resource allocation in terms of semantic compression, channel assignment, and transmit power. The compatibility of the formulated problem with conventional communications is further demonstrated. To solve this problem, we decouple it into two subproblems and solved them by a developed deep Q-network (DQN) based method and a proposed low-complexity matching algorithm, respectively. Finally, simulation results validate the effectiveness and superiority of the proposed method, as well as its compatibility with conventional communications.
△ Less
Submitted 8 April, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
A manifold learning-based CSI feedback framework for FDD massive MIMO
Authors:
Yandi Cao,
Haifan Yin,
Ziao Qin,
Weidong Li,
Weimin Wu,
Mérouane Debbah
Abstract:
Massive multi-input multi-output (MIMO) in Frequency Division Duplex (FDD) mode suffers from heavy feedback overhead for Channel State Information (CSI). In this paper, a novel manifold learning-based CSI feedback framework (MLCF) is proposed to reduce the feedback and improve the spectral efficiency for FDD massive MIMO. Manifold learning (ML) is an effective method for dimensionality reduction.…
▽ More
Massive multi-input multi-output (MIMO) in Frequency Division Duplex (FDD) mode suffers from heavy feedback overhead for Channel State Information (CSI). In this paper, a novel manifold learning-based CSI feedback framework (MLCF) is proposed to reduce the feedback and improve the spectral efficiency for FDD massive MIMO. Manifold learning (ML) is an effective method for dimensionality reduction. However, most ML algorithms focus only on data compression, and lack the corresponding recovery methods. Moreover, the computational complexity is high when dealing with incremental data. Considering to utilize the intrinsic manifold structure where the CSI samples reside, we propose a landmark selection algorithm to describe the topological skeleton of this manifold. Based on the learned skeleton, the local patch of the incremental CSI on the manifold can be easily determined by its nearest landmarks. This motivates us to propose an incremental CSI compression and reconstruction scheme by keeping the local geometric relationships with landmarks invariant. We theoretically prove the convergence of the proposed landmark selection algorithm. Meanwhile, the upper bound on the error of approximating CSI with landmarks is derived. Simulation results under an industrial channel model of 3GPP demonstrate that the proposed MLCF outperforms existing deep learning based algorithms.
△ Less
Submitted 23 August, 2024; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Dynamic Compressive Sensing based on RLS for Underwater Acoustic Communications
Authors:
Zhen Qin
Abstract:
Sparse structures are widely recognized and utilized in channel estimation. Two typical mechanisms, namely proportionate updating (PU) and zero-attracting (ZA) techniques, achieve better performance, but their computational complexity are higher than non-sparse counterparts. In this paper, we propose a DCS technique based on the recursive least squares (RLS) algorithm which can simultaneously achi…
▽ More
Sparse structures are widely recognized and utilized in channel estimation. Two typical mechanisms, namely proportionate updating (PU) and zero-attracting (ZA) techniques, achieve better performance, but their computational complexity are higher than non-sparse counterparts. In this paper, we propose a DCS technique based on the recursive least squares (RLS) algorithm which can simultaneously achieve improved performance and reduced computational complexity. Specifically, we develop the sparse adaptive subspace pursuit-improved RLS (SpAdSP-IRLS) algorithm by updating only the sparse structure in the IRLS to track significant coefficients. The complexity of the SpAdSP-IRLS algorithm is successfully reduced to $\mathcal{O}(L^2+2L(s+1)+10s)$, compared with the order of $\mathcal{O}(3L^2+4L)$ for the standard RLS. Here, $L$ represents the length of the channel, and $s$ represents the size of the support set. Our experiments on both synthetic and real data show the superiority of the proposed SpAdSP-IRLS, even though only $s$ elements are updated in the channel estimation.
△ Less
Submitted 4 May, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Patching Approximately Safe Value Functions Leveraging Local Hamilton-Jacobi Reachability Analysis
Authors:
Sander Tonkens,
Alex Toofanian,
Zhizhen Qin,
Sicun Gao,
Sylvia Herbert
Abstract:
Safe value functions, such as control barrier functions, characterize a safe set and synthesize a safety filter, overriding unsafe actions, for a dynamic system. While function approximators like neural networks can synthesize approximately safe value functions, they typically lack formal guarantees. In this paper, we propose a local dynamic programming-based approach to "patch" approximately safe…
▽ More
Safe value functions, such as control barrier functions, characterize a safe set and synthesize a safety filter, overriding unsafe actions, for a dynamic system. While function approximators like neural networks can synthesize approximately safe value functions, they typically lack formal guarantees. In this paper, we propose a local dynamic programming-based approach to "patch" approximately safe value functions to obtain a safe value function. This algorithm, HJ-Patch, produces a novel value function that provides formal safety guarantees, yet retains the global structure of the initial value function. HJ-Patch modifies an approximately safe value function at states that are both (i) near the safety boundary and (ii) may violate safety. We iteratively update both this set of "active" states and the value function until convergence. This approach bridges the gap between value function approximation methods and formal safety through Hamilton-Jacobi (HJ) reachability, offering a framework for integrating various safety methods. We provide simulation results on analytic and learned examples, demonstrating HJ-Patch reduces the computational complexity by 2 orders of magnitude with respect to standard HJ reachability. Additionally, we demonstrate the perils of using approximately safe value functions directly and showcase improved safety using HJ-Patch.
△ Less
Submitted 6 September, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Semantic Communication with Memory
Authors:
Huiqiang Xie,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
While semantic communication succeeds in efficiently transmitting due to the strong capability to extract the essential semantic information, it is still far from the intelligent or human-like communications. In this paper, we introduce an essential component, memory, into semantic communications to mimic human communications. Particularly, we investigate a deep learning (DL) based semantic commun…
▽ More
While semantic communication succeeds in efficiently transmitting due to the strong capability to extract the essential semantic information, it is still far from the intelligent or human-like communications. In this paper, we introduce an essential component, memory, into semantic communications to mimic human communications. Particularly, we investigate a deep learning (DL) based semantic communication system with memory, named Mem-DeepSC, by considering the scenario question answer task. We exploit the universal Transformer based transceiver to extract the semantic information and introduce the memory module to process the context information. Moreover, we derive the relationship between the length of semantic signal and the channel noise to validate the possibility of dynamic transmission. Specially, we propose two dynamic transmission methods to enhance the transmission reliability as well as to reduce the communication overhead by masking some unessential elements, which are recognized through training the model with mutual information. Numerical results show that the proposed Mem-DeepSC is superior to benchmarks in terms of answer accuracy and transmission efficiency, i.e., number of transmitted symbols.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
A review of codebooks for CSI feedback in 5G new radio and beyond
Authors:
Ziao Qin,
Haifan Yin
Abstract:
Codebooks have been indispensable for wireless communication standard since the first release of the Long-Term Evolution in 2009. They offer an efficient way to acquire the channel state information (CSI) for multiple antenna systems. Nowadays, a codebook is not limited to a set of pre-defined precoders, it refers to a CSI feedback framework, which is more and more sophisticated. In this paper, we…
▽ More
Codebooks have been indispensable for wireless communication standard since the first release of the Long-Term Evolution in 2009. They offer an efficient way to acquire the channel state information (CSI) for multiple antenna systems. Nowadays, a codebook is not limited to a set of pre-defined precoders, it refers to a CSI feedback framework, which is more and more sophisticated. In this paper, we review the codebooks in 5G New Radio (NR) standards. The codebook timeline and the evolution trend are shown. Each codebook is elaborated with its motivation, the corresponding feedback mechanism, and the format of the precoding matrix indicator. Some insights are given to help grasp the underlying reasons and intuitions of these codebooks. Finally, we point out some unresolved challenges of the codebooks for future evolution of the standards. In general, this paper provides a comprehensive review of the codebooks in 5G NR and aims to help researchers understand the CSI feedback schemes from a standard and industrial perspective.
△ Less
Submitted 13 June, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Semantic Communications with Variable-Length Coding for Extended Reality
Authors:
Bowen Zhang,
Zhijin Qin,
Geoffrey Ye Li
Abstract:
Wireless extended reality (XR) has attracted wide attentions as a promising technology to improve users' mobility and quality of experience. However, the ultra-high data rate requirement of wireless XR has hindered its development for many years. To overcome this challenge, we develop a semantic communication framework, where semantically-unimportant information is highly-compressed or discarded i…
▽ More
Wireless extended reality (XR) has attracted wide attentions as a promising technology to improve users' mobility and quality of experience. However, the ultra-high data rate requirement of wireless XR has hindered its development for many years. To overcome this challenge, we develop a semantic communication framework, where semantically-unimportant information is highly-compressed or discarded in semantic coders, significantly improving the transmission efficiency. Besides, considering the fact that some source content may have less amount of semantic information or have higher tolerance to channel noise, we propose a universal variable-length semantic-channel coding method. In particular, we first use a rate allocation network to estimate the best code length for semantic information and then adjust the coding process accordingly. By adopting some proxy functions, the whole framework is trained in an end-to-end manner. Numerical results show that our semantic system significantly outperforms traditional transmission methods and the proposed variable-length coding scheme is superior to the fixed-length coding methods.
△ Less
Submitted 11 March, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.