PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

He, Yingchen; Weilbach, Christian D.; Wojciechowska, Martyna E.; Zhang, Yuxuan; Wood, Frank

Computer Science > Machine Learning

arXiv:2505.12707 (cs)

[Submitted on 19 May 2025]

Title:PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

Authors:Yingchen He, Christian D. Weilbach, Martyna E. Wojciechowska, Yuxuan Zhang, Frank Wood

View PDF HTML (experimental)

Abstract:Advances in deep generative modelling have made it increasingly plausible to train human-level embodied agents. Yet progress has been limited by the absence of large-scale, real-time, multi-modal, and socially interactive datasets that reflect the sensory-motor complexity of natural environments. To address this, we present PLAICraft, a novel data collection platform and dataset capturing multiplayer Minecraft interactions across five time-aligned modalities: video, game output audio, microphone input audio, mouse, and keyboard actions. Each modality is logged with millisecond time precision, enabling the study of synchronous, embodied behaviour in a rich, open-ended world. The dataset comprises over 10,000 hours of gameplay from more than 10,000 global participants.\footnote{We have done a privacy review for the public release of an initial 200-hour subset of the dataset, with plans to release most of the dataset over time.} Alongside the dataset, we provide an evaluation suite for benchmarking model capabilities in object recognition, spatial awareness, language grounding, and long-term memory. PLAICraft opens a path toward training and evaluating agents that act fluently and purposefully in real time, paving the way for truly embodied artificial intelligence.

Comments:	9 pages, 8 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:2505.12707 [cs.LG]
	(or arXiv:2505.12707v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2505.12707

Submission history

From: Christian Weilbach [view email]
[v1] Mon, 19 May 2025 05:00:47 UTC (44,561 KB)

Computer Science > Machine Learning

Title:PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators