WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Shin, Soyong; Kim, Juyong; Halilaj, Eni; Black, Michael J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.07531 (cs)

[Submitted on 12 Dec 2023 (v1), last revised 18 Apr 2024 (this version, v2)]

Title:WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Authors:Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black

View PDF HTML (experimental)

Abstract:The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations. First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines, limiting their use to offline applications. Finally, existing video-based methods are surprisingly less accurate than single-frame methods. We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video. WHAM learns to lift 2D keypoint sequences to 3D using motion capture data and fuses this with video features, integrating motion context and visual information. WHAM exploits camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. We combine this with a contact-aware trajectory refinement method that lets WHAM capture human motion in diverse conditions, such as climbing stairs. WHAM outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks. Code will be available for research purposes at this http URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.07531 [cs.CV]
	(or arXiv:2312.07531v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.07531

Submission history

From: Soyong Shin [view email]
[v1] Tue, 12 Dec 2023 18:57:46 UTC (9,702 KB)
[v2] Thu, 18 Apr 2024 19:43:25 UTC (12,699 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators