Visual Language Model based Cross-modal Semantic Communication Systems

Jiang, Feibo; Tang, Chuanguo; Dong, Li; Wang, Kezhi; Yang, Kun; Pan, Cunhua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.00020 (cs)

[Submitted on 6 May 2024]

Title:Visual Language Model based Cross-modal Semantic Communication Systems

Authors:Feibo Jiang, Chuanguo Tang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

View PDF HTML (experimental)

Abstract:Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-Noise Ratio (SNR). To address these challenges, we propose a novel Vision-Language Model-based Cross-modal Semantic Communication (VLM-CSC) system. The VLM-CSC comprises three novel components: (1) Cross-modal Knowledge Base (CKB) is used to extract high-density textual semantics from the semantically sparse image at the transmitter and reconstruct the original image based on textual semantics at the receiver. The transmission of high-density semantics contributes to alleviating bandwidth pressure. (2) Memory-assisted Encoder and Decoder (MED) employ a hybrid long/short-term memory mechanism, enabling the semantic encoder and decoder to overcome catastrophic forgetting in dynamic environments when there is a drift in the distribution of semantic features. (3) Noise Attention Module (NAM) employs attention mechanisms to adaptively adjust the semantic coding and the channel coding based on SNR, ensuring the robustness of the CSC system. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.

Comments:	12 pages, 10 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Theory (cs.IT); Machine Learning (cs.LG)
Cite as:	arXiv:2407.00020 [cs.CV]
	(or arXiv:2407.00020v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.00020

Submission history

From: Feibo Jiang [view email]
[v1] Mon, 6 May 2024 08:59:16 UTC (2,918 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Language Model based Cross-modal Semantic Communication Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Language Model based Cross-modal Semantic Communication Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators