MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Li, Chunyi; Lu, Guo; Feng, Donghui; Wu, Haoning; Zhang, Zicheng; Liu, Xiaohong; Zhai, Guangtao; Lin, Weisi; Zhang, Wenjun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.16749 (cs)

[Submitted on 26 Feb 2024 (v1), last revised 17 Apr 2024 (this version, v3)]

Title:MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Authors:Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

View PDF HTML (experimental)

Abstract:With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on this https URL.

Comments:	13 page, 11 figures, 4 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Cite as:	arXiv:2402.16749 [cs.CV]
	(or arXiv:2402.16749v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.16749

Submission history

From: Chunyi Li [view email]
[v1] Mon, 26 Feb 2024 17:11:11 UTC (45,904 KB)
[v2] Thu, 29 Feb 2024 16:53:20 UTC (48,616 KB)
[v3] Wed, 17 Apr 2024 14:06:28 UTC (44,713 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators