HumanGif: Single-View Human Diffusion with Generative Prior

Hu, Shoukang; Narihira, Takuya; Fukuda, Kazumi; Sawata, Ryosuke; Shibuya, Takashi; Mitsufuji, Yuki

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.12080 (cs)

[Submitted on 17 Feb 2025 (v1), last revised 29 Jun 2025 (this version, v3)]

Title:HumanGif: Single-View Human Diffusion with Generative Prior

Authors:Shoukang Hu, Takuya Narihira, Kazumi Fukuda, Ryosuke Sawata, Takashi Shibuya, Yuki Mitsufuji

View PDF HTML (experimental)

Abstract:Previous 3D human creation methods have made significant progress in synthesizing view-consistent and temporally aligned results from sparse-view images or monocular videos. However, it remains challenging to produce perpetually realistic, view-consistent, and temporally coherent human avatars from a single image, as limited information is available in the single-view input setting. Motivated by the success of 2D character animation, we propose HumanGif, a single-view human diffusion model with generative prior. Specifically, we formulate the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process, utilizing generative priors from foundational diffusion models to complement the missing information. To ensure fine-grained and consistent novel view and pose synthesis, we introduce a Human NeRF module in HumanGif to learn spatially aligned features from the input image, implicitly capturing the relative camera and human pose transformation. Furthermore, we introduce an image-level loss during optimization to bridge the gap between latent and image spaces in diffusion models. Extensive experiments on RenderPeople, DNA-Rendering, THuman 2.1, and TikTok datasets demonstrate that HumanGif achieves the best perceptual performance, with better generalizability for novel view and pose synthesis.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2502.12080 [cs.CV]
	(or arXiv:2502.12080v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.12080

Submission history

From: Shoukang Hu [view email]
[v1] Mon, 17 Feb 2025 17:55:27 UTC (16,780 KB)
[v2] Fri, 21 Feb 2025 16:03:54 UTC (16,772 KB)
[v3] Sun, 29 Jun 2025 10:53:42 UTC (19,630 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HumanGif: Single-View Human Diffusion with Generative Prior

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HumanGif: Single-View Human Diffusion with Generative Prior

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators