PromptHMR: Promptable Human Mesh Recovery

Wang, Yufu; Sun, Yu; Patel, Priyanka; Daniilidis, Kostas; Black, Michael J.; Kocabas, Muhammed

Computer Science > Computer Vision and Pattern Recognition

arXiv:2504.06397 (cs)

[Submitted on 8 Apr 2025 (v1), last revised 24 May 2025 (this version, v2)]

Title:PromptHMR: Promptable Human Mesh Recovery

Authors:Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas

View PDF HTML (experimental)

Abstract:Human pose and shape (HPS) estimation presents challenges in diverse scenarios such as crowded scenes, person-person interactions, and single-view reconstruction. Existing approaches lack mechanisms to incorporate auxiliary "side information" that could enhance reconstruction accuracy in such challenging scenarios. Furthermore, the most accurate methods rely on cropped person detections and cannot exploit scene context while methods that process the whole image often fail to detect people and are less accurate than methods that use crops. While recent language-based methods explore HPS reasoning through large language or vision-language models, their metric accuracy is well below the state of the art. In contrast, we present PromptHMR, a transformer-based promptable method that reformulates HPS estimation through spatial and semantic prompts. Our method processes full images to maintain scene context and accepts multiple input modalities: spatial prompts like bounding boxes and masks, and semantic prompts like language descriptions or interaction labels. PromptHMR demonstrates robust performance across challenging scenarios: estimating people from bounding boxes as small as faces in crowded scenes, improving body shape estimation through language descriptions, modeling person-person interactions, and producing temporally coherent motions in videos. Experiments on benchmarks show that PromptHMR achieves state-of-the-art performance while offering flexible prompt-based control over the HPS estimation process.

Comments:	CVPR 2025. Project website: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.06397 [cs.CV]
	(or arXiv:2504.06397v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2504.06397

Submission history

From: Yufu Wang [view email]
[v1] Tue, 8 Apr 2025 19:38:04 UTC (6,864 KB)
[v2] Sat, 24 May 2025 03:21:11 UTC (6,865 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PromptHMR: Promptable Human Mesh Recovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PromptHMR: Promptable Human Mesh Recovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators