MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Zheng, Chuanxia; Vuong, Long Tung; Cai, Jianfei; Phung, Dinh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.09002 (cs)

[Submitted on 19 Sep 2022]

Title:MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Authors:Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

View PDF

Abstract:Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.09002 [cs.CV]
	(or arXiv:2209.09002v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.09002

Submission history

From: Chuanxia Zheng [view email]
[v1] Mon, 19 Sep 2022 13:26:51 UTC (4,360 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators