VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

Xie, Chaohao; Han, Kai; Wong, Kwan-Yee K.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.12267 (cs)

[Submitted on 21 Jan 2025]

Title:VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

Authors:Chaohao Xie, Kai Han, Kwan-Yee K. Wong

View PDF HTML (experimental)

Abstract:Recent video inpainting methods have achieved encouraging improvements by leveraging optical flow to guide pixel propagation from reference frames either in the image space or feature space. However, they would produce severe artifacts in the mask center when the masked area is too large and no pixel correspondences can be found for the center. Recently, diffusion models have demonstrated impressive performance in generating diverse and high-quality images, and have been exploited in a number of works for image inpainting. These methods, however, cannot be applied directly to videos to produce temporal-coherent inpainting results. In this paper, we propose a training-free framework, named VipDiff, for conditioning diffusion model on the reverse diffusion process to produce temporal-coherent inpainting results without requiring any training data or fine-tuning the pre-trained diffusion models. VipDiff takes optical flow as guidance to extract valid pixels from reference frames to serve as constraints in optimizing the randomly sampled Gaussian noise, and uses the generated results for further pixel propagation and conditional generation. VipDiff also allows for generating diverse video inpainting results over different sampled noise. Experiments demonstrate that VipDiff can largely outperform state-of-the-art video inpainting methods in terms of both spatial-temporal coherence and fidelity.

Comments:	10 pages, 5 Figures (Accepted at WACV 2025)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.12267 [cs.CV]
	(or arXiv:2501.12267v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.12267

Submission history

From: Chaohao Xie [view email]
[v1] Tue, 21 Jan 2025 16:39:09 UTC (42,612 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators