Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Ma, Lingni; Ye, Yuting; Hong, Fangzhou; Guzov, Vladimir; Jiang, Yifeng; Postyeni, Rowan; Pesqueira, Luis; Gamino, Alexander; Baiyya, Vijay; Kim, Hyo Jin; Bailey, Kevin; Fosas, David Soriano; Liu, C. Karen; Liu, Ziwei; Engel, Jakob; De Nardi, Renzo; Newcombe, Richard

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.09905v1 (cs)

[Submitted on 14 Jun 2024 (this version), latest version 20 Sep 2024 (v2)]

Title:Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Authors:Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

View PDF HTML (experimental)

Abstract:We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" device providing a third-person viewpoint. We compute world-aligned 6DoF transformations for all sensors, across devices and capture sessions. The dataset also provides 3D scene point clouds and calibrated gaze estimation. We derive a protocol to annotate hierarchical language descriptions of in-context human motion, from fine-grain pose narrations, to atomic actions and activity summarization. To the best of our knowledge, the Nymeria dataset is the world largest in-the-wild collection of human motion with natural and diverse activities; first of its kind to provide synchronized and localized multi-device multimodal egocentric data; and the world largest dataset with motion-language descriptions. It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km. The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545. To demonstrate the potential of the dataset we define key research tasks for egocentric body tracking, motion synthesis, and action recognition and evaluate several state-of-the-art baseline algorithms. Data and code will be open-sourced.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2406.09905 [cs.CV]
	(or arXiv:2406.09905v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.09905

Submission history

From: Lingni Ma [view email]
[v1] Fri, 14 Jun 2024 10:23:53 UTC (47,653 KB)
[v2] Fri, 20 Sep 2024 01:18:11 UTC (47,727 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators