$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Intelligence, Physical; Black, Kevin; Brown, Noah; Darpinian, James; Dhabalia, Karan; Driess, Danny; Esmail, Adnan; Equi, Michael; Finn, Chelsea; Fusai, Niccolo; Galliker, Manuel Y.; Ghosh, Dibya; Groom, Lachy; Hausman, Karol; Ichter, Brian; Jakubczak, Szymon; Jones, Tim; Ke, Liyiming; LeBlanc, Devin; Levine, Sergey; Li-Bell, Adrian; Mothukuri, Mohith; Nair, Suraj; Pertsch, Karl; Ren, Allen Z.; Shi, Lucy Xiaoyang; Smith, Laura; Springenberg, Jost Tobias; Stachowicz, Kyle; Tanner, James; Vuong, Quan; Walke, Homer; Walling, Anna; Wang, Haohuan; Yu, Lili; Zhilinsky, Ury

Abstract:In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $\pi_{0.5}$, a new model based on $\pi_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $\pi_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.

Subjects:	Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2504.16054 [cs.LG]
	(or arXiv:2504.16054v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2504.16054

Computer Science > Machine Learning

Title:$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators