Search | arXiv e-print repository

Mitigation of Active Power Oscillation in Multi-VSG Grids: An Impedance-Based Perspective

Authors: Junjie Xiao, Lu Wang, Xiong Du, Pedro Rodriguez, Zian Qin

Abstract: Active power oscillations frequently arise in inverter-dominated power systems with multiple converters operating under Virtual Synchronous Generator control, posing risks to system stability and protection coordination. While various mitigation strategies have been proposed, many rely on prior knowledge of system parameters, offer limited damping performance, or involve complex models that lack p… ▽ More Active power oscillations frequently arise in inverter-dominated power systems with multiple converters operating under Virtual Synchronous Generator control, posing risks to system stability and protection coordination. While various mitigation strategies have been proposed, many rely on prior knowledge of system parameters, offer limited damping performance, or involve complex models that lack physical interpretability, making them difficult to apply in practice. To address these challenges, this paper first introduces a physically intuitive RLC equivalent circuit model to explain the root causes of APOs in both stand-alone and grid-connected modes. By mapping inertia, damping, and feeder impedance to capacitive, resistive, and inductive elements, respectively, the model reveals how mismatches among converters lead to inter-unit oscillations characterized by LC resonance. Building on this insight, we propose two mode-specific mitigation strategies: in SA mode, a graph theory based impedance control ensures proportional reactive power sharing and effectively suppresses APOs; and in GC mode, adaptive inertia and damping control with feedforward filtering is designed to reshape transient power dynamics while preserving frequency stability. The proposed methods are validated through extensive simulations and real-time hardware-in-the-loop experiments, demonstrating their effectiveness in suppressing oscillations and enhancing the robustness of multi-converter power systems. △ Less

Submitted 25 September, 2025; originally announced September 2025.

arXiv:2509.19754 [pdf, ps, other]

Timeliness-Aware Joint Source and Channel Coding for Adaptive Image Transmission

Authors: Xiaolei Yang, Zijing Wang, Zhijin Qin, Xiaoming Tao

Abstract: Accurate and timely image transmission is critical for emerging time-sensitive applications such as remote sensing in satellite-assisted Internet of Things. However, the bandwidth limitation poses a significant challenge in existing wireless systems, making it difficult to fulfill the requirements of both high-fidelity and low-latency image transmission. Semantic communication is expected to break… ▽ More Accurate and timely image transmission is critical for emerging time-sensitive applications such as remote sensing in satellite-assisted Internet of Things. However, the bandwidth limitation poses a significant challenge in existing wireless systems, making it difficult to fulfill the requirements of both high-fidelity and low-latency image transmission. Semantic communication is expected to break through the performance bottleneck by focusing on the transmission of goal-oriented semantic information rather than raw data. In this paper, we employ a new timeliness metric named the value of information (VoI) and propose an adaptive joint source and channel coding (JSCC) method for image transmission that simultaneously considers both reconstruction quality and timeliness. Specifically, we first design a JSCC framework for image transmission with adaptive code length. Next, we formulate a VoI maximization problem by optimizing the transmission code length of the adaptive JSCC under the reconstruction quality constraint. Then, a deep reinforcement learning-based algorithm is proposed to solve the optimization problem efficiently. Experimental results show that the proposed method significantly outperforms baseline schemes in terms of reconstruction quality and timeliness, particularly in low signal-to-noise ratio conditions, offering a promising solution for efficient and robust image transmission in time-sensitive wireless networks. △ Less

Submitted 24 September, 2025; originally announced September 2025.

Comments: 6 pages, 7 figures, accepted at IEEE GLOBECOM Workshops 2025

arXiv:2509.19331 [pdf, ps, other]

Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention

Authors: Enhao Huang, Zhiyu Zhang, Tianxiang Xu, Chunshu Xia, Kaichun Hu, Yuchen Yang, Tongtong Pan, Dong Dong, Zhan Qin

Abstract: Complex-valued signals encode both amplitude and phase, yet most deep models treat attention as real-valued correlation, overlooking interference effects. We introduce the Holographic Transformer, a physics-inspired architecture that incorporates wave interference principles into self-attention. Holographic attention modulates interactions by relative phase and coherently superimposes values, ensu… ▽ More Complex-valued signals encode both amplitude and phase, yet most deep models treat attention as real-valued correlation, overlooking interference effects. We introduce the Holographic Transformer, a physics-inspired architecture that incorporates wave interference principles into self-attention. Holographic attention modulates interactions by relative phase and coherently superimposes values, ensuring consistency between amplitude and phase. A dual-headed decoder simultaneously reconstructs the input and predicts task outputs, preventing phase collapse when losses prioritize magnitude over phase. We demonstrate that holographic attention implements a discrete interference operator and maintains phase consistency under linear mixing. Experiments on PolSAR image classification and wireless channel prediction show strong performance, achieving high classification accuracy and F1 scores, low regression error, and increased robustness to phase perturbations. These results highlight that enforcing physical consistency in attention leads to generalizable improvements in complex-valued learning and provides a unified, physics-based framework for coherent signal modeling. The code is available at https://github.com/EonHao/Holographic-Transformers. △ Less

Submitted 14 September, 2025; originally announced September 2025.

arXiv:2509.16852 [pdf, ps, other]

Quantum State Tomography for Tensor Networks in Two Dimensions

Authors: Zhen Qin, Zhihui Zhu

Abstract: Recent work has shown that for one-dimensional quantum states that can be effectively approximated by matrix product operators (MPOs), a polynomial number of copies of the state suffices for reconstruction. Compared to MPOs in one dimension, projected entangled-pair states (PEPSs) and projected entangled-pair operators (PEPOs), which represent typical low-dimensional structures in two dimensions,… ▽ More Recent work has shown that for one-dimensional quantum states that can be effectively approximated by matrix product operators (MPOs), a polynomial number of copies of the state suffices for reconstruction. Compared to MPOs in one dimension, projected entangled-pair states (PEPSs) and projected entangled-pair operators (PEPOs), which represent typical low-dimensional structures in two dimensions, are more prevalent as a looped tensor network. However, a formal analysis of the sample complexity required for estimating PEPS or PEPO has yet to be established. In this paper, we aim to address this gap by providing theoretical guarantees for the stable recovery of PEPS and PEPO. Our analysis primarily focuses on two quantum measurement schemes: $(i)$ informationally complete positive operator valued measures (IC-POVMs), specifically the spherical $t$-designs ($t \geq 3$), and $(ii)$ projective rank-one measurements, in particular Haar random projective measurements. We first establish stable embeddings for PEPSs (or PEPOs) to ensure that the information contained in the states can be preserved under these two measurement schemes. We then show that a constrained least-squares estimator achieves stable recovery for PEPSs (or PEPOs), with the recovery error bounded when the number of state copies scales linearly under spherical $t$-designs and polynomially under Haar-random projective measurements with respect to the number of qudits. These results provide theoretical support for the reliable use of PEPS and PEPO in practical quantum information processing. △ Less

Submitted 20 September, 2025; originally announced September 2025.

arXiv:2509.10834 [pdf, ps, other]

Landscape Analysis of Simultaneous Blind Deconvolution and Phase Retrieval via Structured Low-Rank Tensor Recovery

Authors: Xiao Liang, Zhen Qin, Zhihui Zhu, Shuang Li

Abstract: This paper presents a geometric analysis of the simultaneous blind deconvolution and phase retrieval (BDPR) problem via a structured low-rank tensor recovery framework. Due to the highly complicated structure of the associated sensing tensor, directly characterizing its optimization landscape is intractable. To address this, we introduce a tensor sensing problem as a tractable surrogate that prese… ▽ More This paper presents a geometric analysis of the simultaneous blind deconvolution and phase retrieval (BDPR) problem via a structured low-rank tensor recovery framework. Due to the highly complicated structure of the associated sensing tensor, directly characterizing its optimization landscape is intractable. To address this, we introduce a tensor sensing problem as a tractable surrogate that preserves the essential structural features of the target low-rank tensor while enabling rigorous theoretical analysis. As a first step toward understanding this surrogate model, we study the corresponding population risk, which captures key aspects of the underlying low-rank tensor structure. We characterize the global landscape of the population risk on the unit sphere and show that Riemannian gradient descent (RGD) converges linearly under mild conditions. We then extend the analysis to the tensor sensing problem, establishing local geometric properties, proving convergence guarantees for RGD, and quantifying robustness under measurement noise. Our theoretical results are further supported by extensive numerical experiments. These findings offer foundational insights into the optimization landscape of the structured low-rank tensor recovery problem, which equivalently characterizes the original BDPR problem, thereby providing principled guidance for solving the original BDPR problem. △ Less

Submitted 13 September, 2025; originally announced September 2025.

Comments: 17 pages, 18 figures

arXiv:2507.19742 [pdf, ps, other]

DOA: A Degeneracy Optimization Agent with Adaptive Pose Compensation Capability based on Deep Reinforcement Learning

Authors: Yanbin Li, Canran Xiao, Hongyang He, Shenghai Yuan, Zong Ke, Jiajie Yu, Zixiong Qin, Zhiguo Zhang, Wenzheng Chi, Wei Zhang

Abstract: Particle filter-based 2D-SLAM is widely used in indoor localization tasks due to its efficiency. However, indoor environments such as long straight corridors can cause severe degeneracy problems in SLAM. In this paper, we use Proximal Policy Optimization (PPO) to train an adaptive degeneracy optimization agent (DOA) to address degeneracy problem. We propose a systematic methodology to address thre… ▽ More Particle filter-based 2D-SLAM is widely used in indoor localization tasks due to its efficiency. However, indoor environments such as long straight corridors can cause severe degeneracy problems in SLAM. In this paper, we use Proximal Policy Optimization (PPO) to train an adaptive degeneracy optimization agent (DOA) to address degeneracy problem. We propose a systematic methodology to address three critical challenges in traditional supervised learning frameworks: (1) data acquisition bottlenecks in degenerate dataset, (2) inherent quality deterioration of training samples, and (3) ambiguity in annotation protocol design. We design a specialized reward function to guide the agent in developing perception capabilities for degenerate environments. Using the output degeneracy factor as a reference weight, the agent can dynamically adjust the contribution of different sensors to pose optimization. Specifically, the observation distribution is shifted towards the motion model distribution, with the step size determined by a linear interpolation formula related to the degeneracy factor. In addition, we employ a transfer learning module to endow the agent with generalization capabilities across different environments and address the inefficiency of training in degenerate environments. Finally, we conduct ablation studies to demonstrate the rationality of our model design and the role of transfer learning. We also compare the proposed DOA with SOTA methods to prove its superior degeneracy detection and optimization capabilities across various environments. △ Less

Submitted 25 July, 2025; originally announced July 2025.

Comments: 10 pages,9 figures

arXiv:2506.16032 [pdf, ps, other]

A Scalable Factorization Approach for High-Order Structured Tensor Recovery

Authors: Zhen Qin, Michael B. Wakin, Zhihui Zhu

Abstract: Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal… ▽ More Tensor decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters. This is particularly beneficial for high-order tensors, as the number of entries in a tensor grows exponentially with the order. Consequently, they are widely used in signal recovery and data analysis across domains such as signal processing, machine learning, and quantum physics. A computationally and memory-efficient approach to these problems is to optimize directly over the factors using local search algorithms such as gradient descent, a strategy known as the factorization approach in matrix and tensor optimization. However, the resulting optimization problems are highly nonconvex due to the multiplicative interactions between factors, posing significant challenges for convergence analysis and recovery guarantees. In this paper, we present a unified framework for the factorization approach to solving various tensor decomposition problems. Specifically, by leveraging the canonical form of tensor decompositions--where most factors are constrained to be orthonormal to mitigate scaling ambiguity--we apply Riemannian gradient descent (RGD) to optimize these orthonormal factors on the Stiefel manifold. Under a mild condition on the loss function, we establish a Riemannian regularity condition for the factorized objective and prove that RGD converges to the ground-truth tensor at a linear rate when properly initialized. Notably, both the initialization requirement and the convergence rate scale polynomially rather than exponentially with $N$, improving upon existing results for Tucker and tensor-train format tensors. △ Less

Submitted 19 June, 2025; originally announced June 2025.

arXiv:2506.02863 [pdf, ps, other]

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Authors: Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak

Abstract: Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchm… ▽ More Recent advancements in generative artificial intelligence have significantly transformed the field of style-captioned text-to-speech synthesis (CapTTS). However, adapting CapTTS to real-world applications remains challenging due to the lack of standardized, comprehensive datasets and limited research on downstream tasks built upon CapTTS. To address these gaps, we introduce CapSpeech, a new benchmark designed for a series of CapTTS-related tasks, including style-captioned text-to-speech synthesis with sound events (CapTTS-SE), accent-captioned TTS (AccCapTTS), emotion-captioned TTS (EmoCapTTS), and text-to-speech synthesis for chat agent (AgentTTS). CapSpeech comprises over 10 million machine-annotated audio-caption pairs and nearly 0.36 million human-annotated audio-caption pairs. In addition, we introduce two new datasets collected and recorded by a professional voice actor and experienced audio engineers, specifically for the AgentTTS and CapTTS-SE tasks. Alongside the datasets, we conduct comprehensive experiments using both autoregressive and non-autoregressive models on CapSpeech. Our results demonstrate high-fidelity and highly intelligible speech synthesis across a diverse range of speaking styles. To the best of our knowledge, CapSpeech is the largest available dataset offering comprehensive annotations for CapTTS-related tasks. The experiments and findings further provide valuable insights into the challenges of developing CapTTS systems. △ Less

Submitted 26 September, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

arXiv:2504.13574 [pdf, other]

MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework

Authors: Zhenkai Qin, Feng Zhu, Huan Zeng, Xunyi Nong

Abstract: The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with… ▽ More The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with limited computational resources (e.g., edge devices or real-time systems). To address this, we propose the Multi-Agent Aggregation Module (MAAM), a lightweight attention architecture integrated with the MindSpore framework. MAAM employs three parallel agent branches with independently parameterized operations to extract heterogeneous features, adaptively fused via learnable scalar weights, and refined through a convolutional compression layer. Leveraging MindSpore's dynamic computational graph and operator fusion, MAAM achieves 87.0% accuracy on the CIFAR-10 dataset, significantly outperforming conventional CNN (58.3%) and MLP (49.6%) models, while improving training efficiency by 30%. Ablation studies confirm the critical role of agent attention (accuracy drops to 32.0% if removed) and compression modules (25.5% if omitted), validating their necessity for maintaining discriminative feature learning. The framework's hardware acceleration capabilities and minimal memory footprint further demonstrate its practicality, offering a deployable solution for image classification in resource-constrained scenarios without compromising accuracy. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.01344 [pdf, other]

IRS Assisted Decentralized Learning for Wideband Spectrum Sensing

Authors: Sicheng Liu, Qun Wang, Zhuwei Qin, Weishan Zhang, Jingyi Wang, Xiang Ma

Abstract: The increasing demand for reliable connectivity in industrial environments necessitates effective spectrum utilization strategies, especially in the context of shared spectrum bands. However, the dynamic spectrum-sharing mechanisms often lead to significant interference and critical failures, creating a trade-off between spectrum scarcity and under-utilization. This paper addresses these chall… ▽ More The increasing demand for reliable connectivity in industrial environments necessitates effective spectrum utilization strategies, especially in the context of shared spectrum bands. However, the dynamic spectrum-sharing mechanisms often lead to significant interference and critical failures, creating a trade-off between spectrum scarcity and under-utilization. This paper addresses these challenges by proposing a novel Intelligent Reflecting Surface (IRS)-assisted spectrum sensing framework integrated with decentralized deep learning. The proposed model overcomes partial observation constraints and minimizes communication overhead while leveraging IRS technology to enhance spectrum sensing accuracy. Through comprehensive simulations, the framework demonstrates its ability to monitor wideband spectrum occupancy effectively, even under challenging signal-to-noise ratio (SNR) conditions. This approach offers a scalable and robust solution for spectrum management in next-generation wireless networks. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2503.22064 [pdf, other]

Multi-Task Semantic Communications via Large Models

Authors: Wanli Ni, Zhijin Qin, Haofeng Sun, Xiaoming Tao, Zhu Han

Abstract: Artificial intelligence (AI) promises to revolutionize the design, optimization and management of next-generation communication systems. In this article, we explore the integration of large AI models (LAMs) into semantic communications (SemCom) by leveraging their multi-modal data processing and generation capabilities. Although LAMs bring unprecedented abilities to extract semantics from raw data… ▽ More Artificial intelligence (AI) promises to revolutionize the design, optimization and management of next-generation communication systems. In this article, we explore the integration of large AI models (LAMs) into semantic communications (SemCom) by leveraging their multi-modal data processing and generation capabilities. Although LAMs bring unprecedented abilities to extract semantics from raw data, this integration entails multifaceted challenges including high resource demands, model complexity, and the need for adaptability across diverse modalities and tasks. To overcome these challenges, we propose a LAM-based multi-task SemCom (MTSC) architecture, which includes an adaptive model compression strategy and a federated split fine-tuning approach to facilitate the efficient deployment of LAM-based semantic models in resource-limited networks. Furthermore, a retrieval-augmented generation scheme is implemented to synthesize the most recent local and global knowledge bases to enhance the accuracy of semantic extraction and content generation, thereby improving the inference performance. Finally, simulation results demonstrate the efficacy of the proposed LAM-based MTSC architecture, highlighting the performance enhancements across various downstream tasks under varying channel conditions. △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: 7 pages, 6 figures

arXiv:2503.10641 [pdf, other]

Estimating Control Barriers from Offline Data

Authors: Hongzhan Yu, Seth Farrell, Ryo Yoshimitsu, Zhizhen Qin, Henrik I. Christensen, Sicun Gao

Abstract: Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to… ▽ More Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. Our approach introduces new annotation techniques based on out-of-distribution analysis, enabling efficient knowledge propagation from the limited labeled data to the unlabeled data. We also eliminate the dependency on a high-performance expert controller, and allow multiple sub-optimal policies or even manual control during data collection. We evaluate the proposed method on real-world platforms. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance, demonstrating statistically safer and less conservative maneuvers compared to existing methods. △ Less

Submitted 20 February, 2025; originally announced March 2025.

Comments: This paper has been accepted to ICRA 2025

arXiv:2503.05794 [pdf, other]

CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

Authors: Yiming Li, Kaiying Yan, Shuo Shao, Tongqing Zhai, Shu-Tao Xia, Zhan Qin, Dacheng Tao

Abstract: With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)… ▽ More With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW), enabling dataset owners to determine whether a suspicious third-party model has been trained on a protected dataset under a black-box setting. The CBW method consists of two key stages: dataset watermarking and ownership verification. During watermarking, we implant multiple trigger patterns in the dataset to make similar samples (measured by their feature similarities) close to the same trigger while dissimilar samples are near different triggers. This ensures that any model trained on the watermarked dataset exhibits specific misclassification behaviors when exposed to trigger-embedded inputs. To verify dataset ownership, we design a hypothesis-test-based framework that statistically evaluates whether a suspicious model exhibits the expected backdoor behavior. We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks. The code for reproducing main experiments is available at https://github.com/Radiant0726/CBW △ Less

Submitted 5 April, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

Comments: 14 pages. The journal extension of our ICASSP'21 paper (arXiv:2010.11607)

arXiv:2502.17168 [pdf, other]

SpikACom: A Neuromorphic Computing Framework for Green Communications

Authors: Yanzhen Liu, Zhijin Qin, Yongxu Zhu, Geoffrey Ye Li

Abstract: The ever-growing power consumption of wireless communication systems necessitates more energy-efficient algorithms. This paper introduces SpikACom ({Spik}ing {A}daptive {Com}munication), a neuromorphic computing-based framework for power-intensive wireless communication tasks. SpikACom leverages brain-inspired spiking neural networks (SNNs) for efficient signal processing. It is designed for dynam… ▽ More The ever-growing power consumption of wireless communication systems necessitates more energy-efficient algorithms. This paper introduces SpikACom ({Spik}ing {A}daptive {Com}munication), a neuromorphic computing-based framework for power-intensive wireless communication tasks. SpikACom leverages brain-inspired spiking neural networks (SNNs) for efficient signal processing. It is designed for dynamic wireless environments, helping to mitigate catastrophic forgetting and facilitate adaptation to new circumstances. Moreover, SpikACom is customizable, allowing flexibly integration of domain knowledge to enhance it interpretability and efficacy. We validate its performance on fundamental wireless communication tasks, including task-oriented semantic communication, multiple-input multiple-output (MIMO) beamforming, and orthogonal frequency-division multiplexing (OFDM) channel estimation. The simulation results show that SpikACom significantly reduces power consumption while matching or exceeding the performance of conventional algorithms. This study highlights the potential of SNNs for enabling greener and smarter wireless communication systems. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.07236 [pdf, other]

Adaptive Sampling and Joint Semantic-Channel Coding under Dynamic Channel Environment

Authors: Zhiyuan Qi, Yulong Feng, Zhijin Qin

Abstract: Deep learning enabled semantic communications are attracting extensive attention. However, most works normally ignore the data acquisition process and suffer from robustness issues under dynamic channel environment. In this paper, we propose an adaptive joint sampling-semantic-channel coding (Adaptive-JSSCC) framework. Specifically, we propose a semantic-aware sampling and reconstruction method to… ▽ More Deep learning enabled semantic communications are attracting extensive attention. However, most works normally ignore the data acquisition process and suffer from robustness issues under dynamic channel environment. In this paper, we propose an adaptive joint sampling-semantic-channel coding (Adaptive-JSSCC) framework. Specifically, we propose a semantic-aware sampling and reconstruction method to optimize the number of samples dynamically for each region of the images. According to semantic significance, we optimize sampling matrices for each region of the most individually and obtain a semantic sampling ratio distribution map shared with the receiver. Through the guidance of the map, high-quality reconstruction is achieved. Meanwhile, attention-based channel adaptive module (ACAM) is designed to overcome the neural network model mismatch between the training and testing channel environment during sampling-reconstruction and encoding-decoding. To this end, signal-to-noise ratio (SNR) is employed as an extra parameter input to integrate and reorganize intermediate characteristics. Simulation results show that the proposed Adaptive-JSSCC effectively reduces the amount of data acquisition without degrading the reconstruction performance in comparison to the state-of-the-art, and it is highly adaptable and adjustable to dynamic channel environments. △ Less

Submitted 10 February, 2025; originally announced February 2025.

arXiv:2501.15217 [pdf, other]

Predictive Lagrangian Optimization for Constrained Reinforcement Learning

Authors: Tianqi Zhang, Puzhen Yuan, Guojian Zhan, Ziyu Lin, Yao Lyu, Zhenzhi Qin, Jingliang Duan, Liping Zhang, Shengbo Eben Li

Abstract: Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral… ▽ More Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is described by policy gradient descent, and the objective is to minimize constraint violations. Then, we introduce a multiplier guided policy learning (MGPL) module to perform policy parameters updating. And we prove that the resulting optimal policy, achieved through alternating MFOCP and MGPL, aligns with the solution of the primal constrained RL problem, thereby establishing our equivalence framework. Furthermore, we point out that the existing PID Lagrangian is merely one special case within our framework that utilizes a PID controller. We also accommodate the integration of other various feedback controllers, thereby facilitating the development of new algorithms. As a representative, we employ model predictive control (MPC) as the feedback controller and consequently propose a new algorithm called predictive Lagrangian optimization (PLO). Numerical experiments demonstrate its superiority over the PID Lagrangian method, achieving a larger feasible region up to 7.2% and a comparable average reward. △ Less

Submitted 25 January, 2025; originally announced January 2025.

arXiv:2501.05859 [pdf, other]

Large Model Empowered Streaming Speech Semantic Communications

Authors: Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li

Abstract: In this paper, we introduce a large model-empowered streaming semantic communication system for speech transmission across various languages, named LSSC-ST. Specifically, we devise an edge-device collaborative semantic communication architecture by offloading the intricate semantic extraction and channel coding modules to edge servers, thereby reducing the computational burden on local devices. To… ▽ More In this paper, we introduce a large model-empowered streaming semantic communication system for speech transmission across various languages, named LSSC-ST. Specifically, we devise an edge-device collaborative semantic communication architecture by offloading the intricate semantic extraction and channel coding modules to edge servers, thereby reducing the computational burden on local devices. To support multilingual speech transmission, pre-trained large speech models are utilized to learn unified semantic features from speech in different languages, breaking the constraint of a single input language and enhancing the practicality of the LSSC-ST. Moreover, the input speech is sequentially streamed into the developed system as short speech segments, which enables low transmission latency without degrading the quality of the produced speech. A novel dynamic speech segmentation algorithm is proposed to further reduce the transmission latency by adaptively adjusting the duration of speech segments. According to simulation results, the LSSC-ST provides more accurate speech transmission and achieves a streaming manner with lower latency compared to the existing non-streaming semantic communication systems. △ Less

Submitted 21 February, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

arXiv:2412.16827 [pdf, other]

Optimal Error Analysis of Channel Estimation for IRS-assisted MIMO Systems

Authors: Zhen Qin, Zhihui Zhu

Abstract: As intelligent reflecting surface (IRS) has emerged as a new and promising technology capable of configuring the wireless environment favorably, channel estimation for IRS-assisted multiple-input multiple-output (MIMO) systems has garnered extensive attention in recent years. While various algorithms have been proposed to address this challenge, there is a lack of rigorous theoretical error analys… ▽ More As intelligent reflecting surface (IRS) has emerged as a new and promising technology capable of configuring the wireless environment favorably, channel estimation for IRS-assisted multiple-input multiple-output (MIMO) systems has garnered extensive attention in recent years. While various algorithms have been proposed to address this challenge, there is a lack of rigorous theoretical error analysis. This paper aims to address this gap by providing theoretical guarantees in terms of stable recovery of channel matrices for noisy measurements. We begin by establishing the equivalence between IRS-assisted MIMO systems and a compact tensor train (TT)-based tensor-on-tensor (ToT) regression. Building on this equivalence, we then investigate the restricted isometry property (RIP) for complex-valued subgaussian measurements. Our analysis reveals that successful recovery hinges on the relationship between the number of user terminals (in the uplink scenario) or base stations (in the downlink scenario) and the number of time slots during which channel matrices remain invariant. Utilizing the RIP condition, we analyze the theoretical recovery error for the solution to a constrained least-squares optimization problem, including upper error bound and minimax lower bound, demonstrating that the error decreases inversely with the number of time slots and increases proportionally with the number of unknown elements in the channel matrices. In addition, we extend our error analysis to two more specialized IRS-assisted MIMO systems, incorporating low-rank channel matrices or an unknown IRS. Furthermore, we explore a multi-hop IRS scheme and analyze the corresponding recovery errors. Finally, we introduce and implement two nonconvex optimization algorithms--alternating least squares and alternating gradient descent--to validate our conclusions through simulations. △ Less

Submitted 21 December, 2024; originally announced December 2024.

arXiv:2412.02538 [pdf, other]

On Privacy, Security, and Trustworthiness in Distributed Wireless Large AI Models (WLAM)

Authors: Zhaohui Yang, Wei Xu, Le Liang, Yuanhao Cui, Zhijin Qin, Merouane Debbah

Abstract: Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosys… ▽ More Combining wireless communication with large artificial intelligence (AI) models can open up a myriad of novel application scenarios. In sixth generation (6G) networks, ubiquitous communication and computing resources allow large AI models to serve democratic large AI models-related services to enable real-time applications like autonomous vehicles, smart cities, and Internet of Things (IoT) ecosystems. However, the security considerations and sustainable communication resources limit the deployment of large AI models over distributed wireless networks. This paper provides a comprehensive overview of privacy, security, and trustworthy for distributed wireless large AI model (WLAM). In particular, a detailed privacy and security are analysis for distributed WLAM is fist revealed. The classifications and theoretical findings about privacy and security in distributed WLAM are discussed. Then the trustworthy and ethics for implementing distributed WLAM are described. Finally, the comprehensive applications of distributed WLAM are presented in the context of electromagnetic signal processing. △ Less

Submitted 4 December, 2024; v1 submitted 3 December, 2024; originally announced December 2024.

Comments: 12 pages, 4 figures

arXiv:2411.04452 [pdf, other]

Optimal Allocation of Pauli Measurements for Low-rank Quantum State Tomography

Authors: Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin, Zhihui Zhu

Abstract: The process of reconstructing quantum states from experimental measurements, accomplished through quantum state tomography (QST), plays a crucial role in verifying and benchmarking quantum devices. A key challenge of QST is to find out how the accuracy of the reconstruction depends on the number of state copies used in the measurements. When multiple measurement settings are used, the total number… ▽ More The process of reconstructing quantum states from experimental measurements, accomplished through quantum state tomography (QST), plays a crucial role in verifying and benchmarking quantum devices. A key challenge of QST is to find out how the accuracy of the reconstruction depends on the number of state copies used in the measurements. When multiple measurement settings are used, the total number of state copies is determined by multiplying the number of measurement settings with the number of repeated measurements for each setting. Due to statistical noise intrinsic to quantum measurements, a large number of repeated measurements is often used in practice. However, recent studies have shown that even with single-sample measurements--where only one measurement sample is obtained for each measurement setting--high accuracy QST can still be achieved with a sufficiently large number of different measurement settings. In this paper, we establish a theoretical understanding of the trade-off between the number of measurement settings and the number of repeated measurements per setting in QST. Our focus is primarily on low-rank density matrix recovery using Pauli measurements. We delve into the global landscape underlying the low-rank QST problem and demonstrate that the joint consideration of measurement settings and repeated measurements ensures a bounded recovery error for all second-order critical points, to which optimization algorithms tend to converge. This finding suggests the advantage of minimizing the number of repeated measurements per setting when the total number of state copies is held fixed. Additionally, we prove that the Wirtinger gradient descent algorithm can converge to the region of second-order critical points with a linear convergence rate. We have also performed numerical experiments to support our theoretical findings. △ Less

Submitted 7 November, 2024; originally announced November 2024.

arXiv:2410.21000 [pdf, other]

Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering

Authors: Zhilin Zhang, Jie Wang, Zhanghao Qin, Ruiqi Zhu, Xiaoliang Gong

Abstract: Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. Wh… ▽ More Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. While recent approaches rely heavily on unified large pre-trained Visual-Language Models, research on more efficient fusion mechanisms remains relatively limited in this domain. In this paper, we introduce a fusion model, OMniBAN, that integrates Orthogonality loss, Multi-head attention, and a Bilinear Attention Network to achieve high computational efficiency as well as solid performance. We conduct comprehensive experiments and demonstrate how bilinear attention fusion can approximate the performance of larger fusion models like cross-modal Transformer. Our results show that OMniBAN requires fewer parameters (approximately 2/3 of Transformer-based Co-Attention) and substantially lower FLOPs (approximately 1/4), while achieving comparable overall performance and even slight improvements on closed-ended questions on two key MedVQA benchmarks. This balance between efficiency and accuracy suggests that OMniBAN could be a viable option for real-world medical image question answering, where computational resources are often constrained. △ Less

Submitted 11 May, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

Comments: To be published in 2025 International Joint Conference on Neural Networks (IJCNN)

arXiv:2410.20326 [pdf, other]

SEEV: Synthesis with Efficient Exact Verification for ReLU Neural Barrier Functions

Authors: Hongchao Zhang, Zhizhen Qin, Sicun Gao, Andrew Clark

Abstract: Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundar… ▽ More Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundary, thus incurring high computation cost. In this paper, we propose a framework for Synthesis with Efficient Exact Verification (SEEV). Our framework consists of two components, namely (i) an NCBF synthesis algorithm that introduces a novel regularizer to reduce the number of activation regions at the safety boundary, and (ii) a verification algorithm that exploits tight over-approximations of the safety conditions to reduce the cost of verifying each piecewise-linear segment. Our simulations show that SEEV significantly improves verification efficiency while maintaining the CBF quality across various benchmark systems and neural network structures. Our code is available at https://github.com/HongchaoZhang-HZ/SEEV. △ Less

Submitted 26 October, 2024; originally announced October 2024.

arXiv:2410.17343 [pdf]

EEG-DIF: Early Warning of Epileptic Seizures through Generative Diffusion Model-based Multi-channel EEG Signals Forecasting

Authors: Zekun Jiang, Wei Dai, Qu Wei, Ziyuan Qin, Kang Li, Le Zhang

Abstract: Multi-channel EEG signals are commonly used for the diagnosis and assessment of diseases such as epilepsy. Currently, various EEG diagnostic algorithms based on deep learning have been developed. However, most research efforts focus solely on diagnosing and classifying current signal data but do not consider the prediction of future trends for early warning. Additionally, since multi-channel EEG c… ▽ More Multi-channel EEG signals are commonly used for the diagnosis and assessment of diseases such as epilepsy. Currently, various EEG diagnostic algorithms based on deep learning have been developed. However, most research efforts focus solely on diagnosing and classifying current signal data but do not consider the prediction of future trends for early warning. Additionally, since multi-channel EEG can be essentially regarded as the spatio-temporal signal data received by detectors at different locations in the brain, how to construct spatio-temporal information representations of EEG signals to facilitate future trend prediction for multi-channel EEG becomes an important problem. This study proposes a multi-signal prediction algorithm based on generative diffusion models (EEG-DIF), which transforms the multi-signal forecasting task into an image completion task, allowing for comprehensive representation and learning of the spatio-temporal correlations and future developmental patterns of multi-channel EEG signals. Here, we employ a publicly available epilepsy EEG dataset to construct and validate the EEG-DIF. The results demonstrate that our method can accurately predict future trends for multi-channel EEG signals simultaneously. Furthermore, the early warning accuracy for epilepsy seizures based on the generated EEG data reaches 0.89. In general, EEG-DIF provides a novel approach for characterizing multi-channel EEG signals and an innovative early warning algorithm for epilepsy seizures, aiding in optimizing and enhancing the clinical diagnosis process. The code is available at https://github.com/JZK00/EEG-DIF. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 9 pages, 4 figures, 3 tables, accepted by ACM BCB 2024

arXiv:2410.15224 [pdf, other]

Robust Low-rank Tensor Train Recovery

Authors: Zhen Qin, Zhihui Zhu

Abstract: Tensor train (TT) decomposition represents an $N$-order tensor using $O(N)$ matrices (i.e., factors) of small dimensions, achieved through products among these factors. Due to its compact representation, TT decomposition has found wide applications, including various tensor recovery problems in signal processing and quantum information. In this paper, we study the problem of reconstructing a TT fo… ▽ More Tensor train (TT) decomposition represents an $N$-order tensor using $O(N)$ matrices (i.e., factors) of small dimensions, achieved through products among these factors. Due to its compact representation, TT decomposition has found wide applications, including various tensor recovery problems in signal processing and quantum information. In this paper, we study the problem of reconstructing a TT format tensor from measurements that are contaminated by outliers with arbitrary values. Given the vulnerability of smooth formulations to corruptions, we use an $\ell_1$ loss function to enhance robustness against outliers. We first establish the $\ell_1/\ell_2$-restricted isometry property (RIP) for Gaussian measurement operators, demonstrating that the information in the TT format tensor can be preserved using a number of measurements that grows linearly with $N$. We also prove the sharpness property for the $\ell_1$ loss function optimized over TT format tensors. Building on the $\ell_1/\ell_2$-RIP and sharpness property, we then propose two complementary methods to recover the TT format tensor from the corrupted measurements: the projected subgradient method (PSubGM), which optimizes over the entire tensor, and the factorized Riemannian subgradient method (FRSubGM), which optimizes directly over the factors. Compared to PSubGM, the factorized approach FRSubGM significantly reduces the memory cost at the expense of a slightly slower convergence rate. Nevertheless, we show that both methods, with diminishing step sizes, converge linearly to the ground-truth tensor given an appropriate initialization, which can be obtained by a truncated spectral method. △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.04475 [pdf, other]

Partial reciprocity-based precoding matrix prediction in FDD massive MIMO with mobility

Authors: Ziao Qin, Haifan Yin

Abstract: The timely precoding of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems is a substantial challenge in practice, especially in mobile environments. In order to improve the precoding performance and reduce the precoding complexity, we propose a partial reciprocity-based precoding matrix prediction scheme and further reduce its complexity by exploiting the channe… ▽ More The timely precoding of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems is a substantial challenge in practice, especially in mobile environments. In order to improve the precoding performance and reduce the precoding complexity, we propose a partial reciprocity-based precoding matrix prediction scheme and further reduce its complexity by exploiting the channel gram matrix. We prove that the precoders can be predicted through a closed-form eigenvector interpolation which was based on the periodic eigenvector samples. Numerical results validate the performance improvements of our schemes over the conventional schemes from 30 km/h to 500 km/h of moving speed. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 5 pages, 4 figures, 1 tabs

arXiv:2410.02583 [pdf, other]

Sample-Efficient Quantum State Tomography for Structured Quantum States in One Dimension

Authors: Zhen Qin, Casey Jameson, Alireza Goldar, Michael B. Wakin, Zhexuan Gong, Zhihui Zhu

Abstract: While quantum state tomography (QST) remains the gold standard for benchmarking and verifying quantum devices, it requires an exponentially large number of measurements and classical computational resources for generic quantum many-body systems, making it impractical even for intermediate-size quantum devices. Fortunately, many physical quantum states often exhibit certain low-dimensional structur… ▽ More While quantum state tomography (QST) remains the gold standard for benchmarking and verifying quantum devices, it requires an exponentially large number of measurements and classical computational resources for generic quantum many-body systems, making it impractical even for intermediate-size quantum devices. Fortunately, many physical quantum states often exhibit certain low-dimensional structures that enable the development of efficient QST. A notable example is the class of states represented by matrix product operators (MPOs) with a finite matrix/bond dimension, which include most physical states in one dimension and where the number of independent parameters describing the states only grows linearly with the number of qubits. Whether a sample efficient quantum state tomography protocol, where the number of required state copies scales only linearly as the number of parameters describing the state, exists for a generic MPO state still remains an important open question. In this paper, we answer this fundamental question affirmatively by using a class of informationally complete positive operator-valued measures (IC-POVMs) -- including symmetric IC-POVMs (SIC-POVMs) and spherical $t$-designs -- focusing on sample complexity while not accounting for the implementation complexity of the measurement settings. For SIC-POVMs and (approximate) spherical 2-designs, we show that the number of state copies to guarantee bounded recovery error of an MPO state with a constrained least-squares estimator depends on the probability distribution of the MPO under the POVM but scales only linearly with $n$ when the distribution is approximately uniform. For spherical $t$-designs with $t\geq 3$, we prove that only a number of state copies proportional to the number of independent parameters in the MPO is sufficient for a guaranteed recovery of any state represented by an MPO. △ Less

Submitted 1 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

arXiv:2410.01070 [pdf, other]

Meta Learning Based Adaptive Cooperative Perception in Nonstationary Vehicular Networks

Authors: Kaige Qu, Zixiong Qin, Weihua Zhuang

Abstract: To accommodate high network dynamics in real-time cooperative perception (CP), reinforcement learning (RL) based adaptive CP schemes have been proposed, to allow adaptive switchings between CP and stand-alone perception modes among connected and autonomous vehicles. The traditional offline-training online-execution RL framework suffers from performance degradation under nonstationary network condi… ▽ More To accommodate high network dynamics in real-time cooperative perception (CP), reinforcement learning (RL) based adaptive CP schemes have been proposed, to allow adaptive switchings between CP and stand-alone perception modes among connected and autonomous vehicles. The traditional offline-training online-execution RL framework suffers from performance degradation under nonstationary network conditions. To achieve fast and efficient model adaptation, we formulate a set of Markov decision processes for adaptive CP decisions in each stationary local vehicular network (LVN). A meta RL solution is proposed, which trains a meta RL model that captures the general features among LVNs, thus facilitating fast model adaptation for each LVN with the meta RL model as an initial point. Simulation results show the superiority of meta RL in terms of the convergence speed without reward degradation. The impact of the customization level of meta models on the model adaptation performance has also been evaluated. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2409.07482 [pdf, ps, other]

VSLLaVA: a pipeline of large multimodal foundation model for industrial vibration signal analysis

Authors: Qi Li, Xinran Zhang, Jinfeng Huang, Hongliang He, Feibin Zhang, Zhaoye Qin, Fulei Chu

Abstract: While Large Multimodal Models (LMMs) excel in general multimodal tasks, they lack the domain-specific knowledge for industrial vibration signal analysis. This paper introduces VSLLaVA, a comprehensive pipeline that utilizes expert knowledge-guided instruction tuning and evaluation to create an end-to-end LMM for signal analysis. To achieve this, we construct a novel Signal-Question-Answer (SQA) da… ▽ More While Large Multimodal Models (LMMs) excel in general multimodal tasks, they lack the domain-specific knowledge for industrial vibration signal analysis. This paper introduces VSLLaVA, a comprehensive pipeline that utilizes expert knowledge-guided instruction tuning and evaluation to create an end-to-end LMM for signal analysis. To achieve this, we construct a novel Signal-Question-Answer (SQA) dataset using an expert rule-based signal generator. This dataset facilitates a two-stage learning procedure. The first step is efficient instruction fine-tuning with Low-Rank Adaptation (LoRA), which imparts specialized signal identification capabilities. Subsequently, we designed a tailored Group Relative Policy Optimization (GRPO) to refine the reasoning capabilities and enhance classification robustness. Then, a dual-mode evaluation framework is proposed, combining an LLM referee with expert rules for semantic assessment using quantitative metrics for numerical and textual accuracy, which reveals that VSLLaVA significantly improves performance in signal type identification and parameter analysis, and makes progress in the identification and parameter analysis of fault-related signals. This research demonstrates a viable approach for developing specialized foundational models for complex industrial applications and marks a transition from conventional task-specific systems to a cohesive, interactive foundational model. △ Less

Submitted 1 September, 2025; v1 submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.04535 [pdf, other]

Synchronous Multi-modal Semantic Communication System with Packet-level Coding

Authors: Yun Tian, Jingkai Ying, Zhijin Qin, Ye Jin, Xiaoming Tao

Abstract: Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the sem… ▽ More Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the semantic and time domains is a challenging problem. In this paper, we take the facial video and speech transmission as an example and propose a Synchronous Multimodal Semantic Communication System (SyncSC) with Packet-Level Coding. To achieve semantic and time synchronization, 3D Morphable Mode (3DMM) coefficients and text are transmitted as semantics, and we propose a semantic codec that achieves similar quality of reconstruction and synchronization with lower bandwidth, compared to traditional methods. To protect semantic packets under the erasure channel, we propose a packet-Level Forward Error Correction (FEC) method, called PacSC, that maintains a certain visual quality performance even at high packet loss rates. Particularly, for text packets, a text packet loss concealment module, called TextPC, based on Bidirectional Encoder Representations from Transformers (BERT) is proposed, which significantly improves the performance of traditional FEC methods. The simulation results show that our proposed SyncSC reduce transmission overhead and achieve high-quality synchronous transmission of video and speech over the packet loss network. △ Less

Submitted 10 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 12 pages, 9 figures

arXiv:2407.14140 [pdf, other]

A Secure and Efficient Distributed Semantic Communication System for Heterogeneous Internet of Things

Authors: Weihao Zeng, Xinyu Xu, Qianyun Zhang, Jiting Shi, Zhenyu Guan, Shufeng Li, Zhijin Qin

Abstract: Semantic communications are expected to improve the transmission efficiency in Internet of Things (IoT) networks. However, the distributed nature of networks and heterogeneity of devices challenge the secure utilization of semantic communication systems. In this paper, we develop a distributed semantic communication system that achieves the security and efficiency during update and usage phases. A… ▽ More Semantic communications are expected to improve the transmission efficiency in Internet of Things (IoT) networks. However, the distributed nature of networks and heterogeneity of devices challenge the secure utilization of semantic communication systems. In this paper, we develop a distributed semantic communication system that achieves the security and efficiency during update and usage phases. A blockchain-based trust scheme for update is designed to continuously train and synchronize the system in dynamic IoT environments. To improve the updating efficiency, we propose a flexible semantic coding method base on compressive semantic knowledge bases. It greatly reduces the amount of data shared among devices for system update, and realizes the flexible adjustment of the size of knowledge bases and the number of transmitted signal symbols in model training and inference stages. In the usage phase, a signature mechanism for lossy semantics is introduced to guarantee the integrity and authenticity of the transmitted semantics in lossy semantic communications. We further design a noise-aware differential privacy mechanism, which introduces optimized noise based on the different channel information available to heterogeneous devices. Experiments on text transmission tasks show that the proposed system achieves the protection of the integrity and privacy for exchanged semantics, and reduces the data to be transmitted in the update phase by about $35\%$ to $88\%$, and in the usage phase by $60\%$ compared with related works. △ Less

Submitted 11 December, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

arXiv:2406.16314 [pdf, other]

DreamVoice: Text-Guided Voice Conversion

Authors: Jiarui Hai, Karan Thakkar, Helin Wang, Zengyi Qin, Mounya Elhilali

Abstract: Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. O… ▽ More Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot voice conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired voice timbres. Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. Our paper presents two major contributions to VC technology: (1) DreamVoiceDB, a robust dataset of voice timbre annotations for 900 speakers from VCTK and LibriTTS. (2) Two text-guided VC methods: DreamVC, an end-to-end diffusion-based text-guided VC model; and DreamVG, a versatile text-to-voice generation plugin that can be combined with any one-shot VC models. The experimental results demonstrate that our proposed methods trained on the DreamVoiceDB dataset generate voice timbres accurately aligned with the text prompt and achieve high-quality VC. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.06002 [pdf, other]

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

Authors: Zhen Qin, Zhihui Zhu

Abstract: Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (T… ▽ More Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. △ Less

Submitted 1 May, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.02592

arXiv:2405.12580 [pdf, other]

Hybrid Digital-Analog Semantic Communications

Authors: Huiqiang Xie, Zhijin Qin, Zhu Han, Khaled B. Letaief

Abstract: Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is… ▽ More Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is achieved through the introduction of digital-analog allocation and fusion modules. To strike a balance between data rate and distortion, we design new loss functions that take into account long-distance dependencies in the semantic distortion constraint, essential information recovery in the channel distortion constraint, and optimal bit stream generation in the rate constraint. Additionally, we propose denoising diffusion-based signal detection techniques, which involve carefully designed variance schedules and sampling algorithms to refine transmitted signals. Through extensive numerical experiments, we will demonstrate that HDA-DeepSC exhibits robustness to channel variations and is capable of supporting various communication scenarios. Our proposed framework outperforms existing benchmarks in terms of peak signal-to-noise ratio and multi-scale structural similarity, showcasing its superiority in semantic communication quality. △ Less

Submitted 27 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: 13 pages, 8 figures

arXiv:2405.08096 [pdf, ps, other]

Semantic MIMO Systems for Speech-to-Text Transmission

Authors: Zhenzi Weng, Zhijin Qin, Huiqiang Xie, Xiaoming Tao, Khaled B. Letaief

Abstract: Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve… ▽ More Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate transmission with high semantic fidelity by identifying the critical semantic information and guaranteeing its accurate recovery. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information. △ Less

Submitted 5 October, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.19477 [pdf, other]

Hybrid Bit and Semantic Communications

Authors: Kaiwen Yu, Renhe Fan, Gang Wu, Zhijin Qin

Abstract: Semantic communication technology is regarded as a method surpassing the Shannon limit of bit transmission, capable of effectively enhancing transmission efficiency. However, current approaches that directly map content to transmission symbols are challenging to deploy in practice, imposing significant limitations on the development of semantic communication. To address this challenge, we propose… ▽ More Semantic communication technology is regarded as a method surpassing the Shannon limit of bit transmission, capable of effectively enhancing transmission efficiency. However, current approaches that directly map content to transmission symbols are challenging to deploy in practice, imposing significant limitations on the development of semantic communication. To address this challenge, we propose a hybrid bit and semantic communication system, named HybridBSC, in which encoded semantic information is inserted into bit information for transmission via conventional digital communication systems utilizing same spectrum resources. The system can be easily deployed using existing communication architecture to achieve bit and semantic information transmission. Particularly, we design a semantic insertion and extraction scheme to implement this strategy. Furthermore, we conduct experimental validation based on the pluto-based software defined radio (SDR) platform in a real wireless channel, demonstrating that the proposed strategy can simultaneously transmit semantic and bit information. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.17867 [pdf, other]

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images… ▽ More AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2403.09222 [pdf, other]

A Robust Semantic Communication System for Image

Authors: Xiang Peng, Zhijin Qin, Xiaoming Tao, Jianhua Lu, Khaled B. Letaief

Abstract: Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric… ▽ More Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric for quantifying the intensity of semantic impairment and develop a semantic impairment dataset. Furthermore, we introduce a deep learning enabled semantic communication system, termed as DeepSC-RI, to enhance the robustness of image transmission, which incorporates a multi-scale semantic extractor with a dual-branch architecture for extracting semantics with varying granularity, thereby improving the robustness of the system. The fine-grained branch incorporates a semantic importance evaluation module to identify and prioritize crucial semantics, while the coarse-grained branch adopts a hierarchical approach for capturing the robust semantics. These two streams of semantics are seamlessly integrated via an advanced cross-attention-based semantic fusion module. Experimental results demonstrate the superior performance of DeepSC-RI under various levels of semantic impairment intensity. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 6 pages

arXiv:2403.05187 [pdf, ps, other]

Robust Semantic Communications for Speech Transmission

Authors: Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li

Abstract: In this paper, we propose a robust semantic communication system for speech transmission, named Ross-S2T, by delivering the essential semantic information. Specifically, we consider the speech-to-text translation (S2TT) as the transmission goal. First, a new deep semantic encoder is developed to convert speech in the source language to textual features associated with the target language, facilita… ▽ More In this paper, we propose a robust semantic communication system for speech transmission, named Ross-S2T, by delivering the essential semantic information. Specifically, we consider the speech-to-text translation (S2TT) as the transmission goal. First, a new deep semantic encoder is developed to convert speech in the source language to textual features associated with the target language, facilitating the end-to-end semantic exchange to perform the S2TT task and reducing the transmission data without performance degradation. To mitigate semantic impairments inherent in the corrupted speech, a novel generative adversarial network (GAN)-enabled deep semantic compensator is established to estimate the lost semantic information within the speech and extract deep semantic features simultaneously, which enables robust semantic transmission for corrupted speech. Furthermore, a semantic probe-aided compensator is devised to enhance the semantic fidelity of recovered semantic features and improve the understandability of the target text. According to simulation results, the proposed Ross-S2T exhibits superior S2TT performance compared to conventional approaches and high robustness against semantic impairments. △ Less

Submitted 4 July, 2025; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18183 [pdf, ps, other]

doi 10.1109/JSTSP.2024.3433387

Computational Offloading in Semantic-Aware Cloud-Edge-End Collaborative Networks

Authors: Zelin Ji, Zhijin Qin

Abstract: The trend of massive connectivity pushes forward the explosive growth of end devices. The emergence of various applications has prompted a demand for pervasive connectivity and more efficient computing paradigms. On the other hand, the lack of computational capacity of the end devices restricts the implementation of the intelligent applications, and becomes a bottleneck of the multiple access for… ▽ More The trend of massive connectivity pushes forward the explosive growth of end devices. The emergence of various applications has prompted a demand for pervasive connectivity and more efficient computing paradigms. On the other hand, the lack of computational capacity of the end devices restricts the implementation of the intelligent applications, and becomes a bottleneck of the multiple access for supporting massive connectivity. Mobile cloud computing (MCC) and mobile edge computing (MEC) techniques enable end devices to offload local computation-intensive tasks to servers by networks. In this paper, we consider the cloud-edge-end collaborative networks to utilize distributed computing resources. Furthermore, we apply task-oriented semantic communications to tackle the fast-varying channel between the end devices and MEC servers and reduce the communication cost. To minimize long-term energy consumption on constraints queue stability and computational delay, a Lyapunov-guided deep reinforcement learning hybrid (DRLH) framework is proposed to solve the mixed integer non-linear programming (MINLP) problem. The long-term energy consumption minimization problem is transformed into the deterministic problem in each time frame. The DRLH framework integrates a model-free deep reinforcement learning algorithm with a model-based mathematical optimization algorithm to mitigate computational complexity and leverage the scenario information, so that improving the convergence performance. Numerical results demonstrate that the proposed DRLH framework achieves near-optimal performance on energy consumption while stabilizing all queues. △ Less

Submitted 19 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Submitted to IEEE JSTSP

arXiv:2402.13073 [pdf, other]

Towards Intelligent Communications: Large Model Empowered Semantic Communications

Authors: Huiqiang Xie, Zhijin Qin, Xiaoming Tao, Zhu Han

Abstract: Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication… ▽ More Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication by enhancing semantic understanding and contextual understanding. This article systematically investigates the large model-empowered semantic communication systems from potential applications to system design. First, we propose a new semantic communication architecture that seamlessly integrates large models into semantic communication through the introduction of a memory module. Then, the typical applications are illustrated to show the benefits of the new architecture. Besides, we discuss the key designs in implementing the new semantic communication systems from module design to system training. Finally, the potential research directions are identified to boost the large model-empowered semantic communications. △ Less

Submitted 19 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures

arXiv:2401.02592 [pdf, ps, other]

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

Authors: Zhen Qin, Michael B. Wakin, Zhihui Zhu

Abstract: In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those fact… ▽ More In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings. △ Less

Submitted 28 August, 2025; v1 submitted 4 January, 2024; originally announced January 2024.

Journal ref: Journal of Machine Learning Research (December 2024)

arXiv:2401.00859 [pdf, ps, other]

Federated Multi-View Synthesizing for Metaverse

Authors: Yiyu Guo, Zhijin Qin, Xiaoming Tao, Geoffrey Ye Li

Abstract: The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view… ▽ More The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery. △ Less

Submitted 18 December, 2023; originally announced January 2024.

arXiv:2312.01479 [pdf, other]

OpenVoice: Versatile Instant Voice Cloning

Authors: Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun

Abstract: We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotio… ▽ More We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. OpenVoice has been used by more than 2M users worldwide as the voice engine of MyShell.ai △ Less

Submitted 18 August, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: Technical Report

arXiv:2311.04685 [pdf, other]

An End-Cloud Computing Enabled Surveillance Video Transmission System

Authors: Dingxi Yang, Zhijin Qin, Liting Wang, Xiaoming Tao, Fang Cui, Hengjiang Wang

Abstract: The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundan… ▽ More The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundant frame elimination module is employed to further reduce the data volume of surveillance videos. Then we develop a key-frame assisted video super-resolution model to reconstruct the high-quality video at the cloud side. Moreover, we propose a strategy of extracting key frames from source videos for better reconstruction performance by utilizing the peak signal-to-noise ratio (PSNR) of adjacent frames to measure the propagation distance of key frame information. Simulation results show that the developed system can effectively reduce the data volume by the end-cloud collaboration and outperforms existing video super-resolution models significantly in terms of PSNR and structural similarity index (SSIM). △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.06246 [pdf, other]

Compression Ratio Learning and Semantic Communications for Video Imaging

Authors: Bowen Zhang, Zhijin Qin, Geoffrey Ye Li

Abstract: Camera sensors have been widely used in intelligent robotic systems. Developing camera sensors with high sensing efficiency has always been important to reduce the power, memory, and other related resources. Inspired by recent success on programmable sensors and deep optic methods, we design a novel video compressed sensing system with spatially-variant compression ratios, which achieves higher im… ▽ More Camera sensors have been widely used in intelligent robotic systems. Developing camera sensors with high sensing efficiency has always been important to reduce the power, memory, and other related resources. Inspired by recent success on programmable sensors and deep optic methods, we design a novel video compressed sensing system with spatially-variant compression ratios, which achieves higher imaging quality than the existing snapshot compressed imaging methods with the same sensing costs. In this article, we also investigate the data transmission methods for programmable sensors, where the performance of communication systems is evaluated by the reconstructed images or videos rather than the transmission of sensor data itself. Usually, different reconstruction algorithms are designed for applications in high dynamic range imaging, video compressive sensing, or motion debluring. This task-aware property inspires a semantic communication framework for programmable sensors. In this work, a policy-gradient based reinforcement learning method is introduced to achieve the explicit trade-off between the compression (or transmission) rate and the image distortion. Numerical results show the superiority of the proposed methods over existing baselines. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2308.12619 [pdf, other]

Low-complexity eigenvector prediction-based precoding matrix prediction in massive MIMO with mobility

Authors: Ziao Qin, Haifan Yin, Weidong Li

Abstract: In practical massive multiple-input multiple-output (MIMO) systems, the precoding matrix is often obtained from the eigenvectors of channel matrices and is challenging to update in time due to finite computation resources at the base station, especially in mobile scenarios. In order to reduce the precoding complexity while enhancing the spectral efficiency (SE), a novel precoding matrix prediction… ▽ More In practical massive multiple-input multiple-output (MIMO) systems, the precoding matrix is often obtained from the eigenvectors of channel matrices and is challenging to update in time due to finite computation resources at the base station, especially in mobile scenarios. In order to reduce the precoding complexity while enhancing the spectral efficiency (SE), a novel precoding matrix prediction method based on the eigenvector prediction (EGVP) is proposed. The basic idea is to decompose the periodic uplink channel eigenvector samples into a linear combination of the channel state information (CSI) and channel weights. We further prove that the channel weights can be interpolated by an exponential model corresponding to the Doppler characteristics of the CSI. A fast matrix pencil prediction (FMPP) method is also devised to predict the CSI. We also prove that our scheme achieves asymptotically error-free precoder prediction with a distinct complexity advantage. Simulation results show that under the perfect non-delayed CSI, the proposed EGVP method reduces floating point operations by 80\% without losing SE performance compared to the traditional full-time precoding scheme. In more realistic cases with CSI delays, the proposed EGVP-FMPP scheme has clear SE performance gains compared to the precoding scheme widely used in current communication systems. △ Less

Submitted 30 June, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 13pages, 8 figures, 1 table, journal

arXiv:2307.03246 [pdf, other]

Semantic-Aware Image Compressed Sensing

Authors: Bowen Zhang, Zhijin Qin, Geoffrey Ye Li

Abstract: Deep learning based image compressed sensing (CS) has achieved great success. However, existing CS systems mainly adopt a fixed measurement matrix to images, ignoring the fact the optimal measurement numbers and bases are different for different images. To further improve the sensing efficiency, we propose a novel semantic-aware image CS system. In our system, the encoder first uses a fixed number… ▽ More Deep learning based image compressed sensing (CS) has achieved great success. However, existing CS systems mainly adopt a fixed measurement matrix to images, ignoring the fact the optimal measurement numbers and bases are different for different images. To further improve the sensing efficiency, we propose a novel semantic-aware image CS system. In our system, the encoder first uses a fixed number of base CS measurements to sense different images. According to the base CS results, the encoder then employs a policy network to analyze the semantic information in images and determines the measurement matrix for different image areas. At the decoder side, a semantic-aware initial reconstruction network is developed to deal with the changes of measurement matrices used at the encoder. A rate-distortion training loss is further introduced to dynamically adjust the average compression ratio for the semantic-aware CS system and the policy network is trained jointly with the encoder and the decoder in an en-to-end manner by using some proxy functions. Numerical results show that the proposed semantic-aware image CS system is superior to the traditional ones with fixed measurement matrices. △ Less

Submitted 10 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Modified version

arXiv:2307.02900 [pdf, ps, other]

doi 10.1109/TWC.2023.3345363

Meta Federated Reinforcement Learning for Distributed Resource Allocation

Authors: Zelin Ji, Zhijin Qin, Xiaoming Tao

Abstract: In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel condi… ▽ More In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel conditions, we propose a robust meta federated reinforcement learning (\textit{MFRL}) framework that allows local users to optimize transmit power and assign channels using locally trained neural network models, so as to offload computational burden from the cloud server to the local users, reducing transmission overhead associated with local channel state information. The BS performs the meta learning procedure to initialize a general global model, enabling rapid adaptation to different environments with improved EE performance. The federated learning technique, based on decentralized reinforcement learning, promotes collaboration and mutual benefits among users. Analysis and numerical results demonstrate that the proposed \textit{MFRL} framework accelerates the reinforcement learning process, decreases transmission overhead, and offloads computation, while outperforming the conventional decentralized reinforcement learning algorithm in terms of convergence speed and EE performance across various scenarios. △ Less

Submitted 9 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Submitted to TWC

arXiv:2306.09432 [pdf, other]

doi 10.1109/TIT.2024.3360951

Quantum State Tomography for Matrix Product Density Operators

Authors: Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin, Zhihui Zhu

Abstract: The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the mos… ▽ More The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state. △ Less

Submitted 18 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Journal ref: IEEE Transactions on Information Theory (PP. 5030 - 5056, Volume: 70, July 2024)

arXiv:2305.06543 [pdf, other]

QoE-based Semantic-Aware Resource Allocation for Multi-Task Networks

Authors: Lei Yan, Zhijin Qin, Chunfeng Li, Rui Zhang, Yongzhao Li, Xiaoming Tao

Abstract: By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope wi… ▽ More By transmitting task-related information only, semantic communications yield significant performance gains over conventional communications. However, the lack of mature semantic theory about semantic information quantification and performance evaluation makes it challenging to perform resource allocation for semantic communications, especially when multiple tasks coexist in the network. To cope with this challenge, we propose a quality-of-experience (QoE) based semantic-aware resource allocation method for multi-task networks in this paper. First, semantic entropy is defined to quantify the semantic information for different tasks, and the relationship between semantic entropy and Shannon entropy is analyzed. Then, we develop a novel QoE model to formulate the semantic-aware resource allocation in terms of semantic compression, channel assignment, and transmit power. The compatibility of the formulated problem with conventional communications is further demonstrated. To solve this problem, we decouple it into two subproblems and solved them by a developed deep Q-network (DQN) based method and a proposed low-complexity matching algorithm, respectively. Finally, simulation results validate the effectiveness and superiority of the proposed method, as well as its compatibility with conventional communications. △ Less

Submitted 8 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: This work has been accepted by IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2205.14530

Showing 1–50 of 110 results for author: Qin, Z