RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Mao, Weixin; Zhong, Weiheng; Jiang, Zhou; Fang, Dong; Zhang, Zhongyue; Lan, Zihan; Li, Haosheng; Jia, Fan; Wang, Tiancai; Fan, Haoqiang; Yoshie, Osamu

Computer Science > Robotics

arXiv:2412.00171 (cs)

[Submitted on 29 Nov 2024 (v1), last revised 25 Mar 2025 (this version, v3)]

Title:RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Authors:Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Haosheng Li, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie

View PDF HTML (experimental)

Abstract:Existing robot policies predominantly adopt the task-centric approach, requiring end-to-end task data collection. This results in limited generalization to new tasks and difficulties in pinpointing errors within long-horizon, multi-stage tasks. To address this, we propose RoboMatrix, a skill-centric hierarchical framework designed for scalable robot task planning and execution in open-world environments. RoboMatrix extracts general meta-skills from diverse complex tasks, enabling the completion of unseen tasks through skill composition. Its architecture consists of a high-level scheduling layer that utilizes large language models (LLMs) for task decomposition, an intermediate skill layer housing meta-skill models, and a low-level hardware layer for robot control. A key innovation of our work is the introduction of the first unified vision-language-action (VLA) model capable of seamlessly integrating both movement and manipulation within one model. This is achieved by combining vision and language prompts to generate discrete actions. Experimental results demonstrate that RoboMatrix achieves a 50% higher success rate than task-centric baselines when applied to unseen objects, scenes, and tasks. To advance open-world robotics research, we will open-source code, hardware designs, model weights, and datasets at this https URL.

Comments:	17 pages, 16 figures
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.00171 [cs.RO]
	(or arXiv:2412.00171v3 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2412.00171

Submission history

From: Weixin Mao [view email]
[v1] Fri, 29 Nov 2024 17:36:03 UTC (4,779 KB)
[v2] Tue, 10 Dec 2024 10:02:45 UTC (4,777 KB)
[v3] Tue, 25 Mar 2025 09:43:25 UTC (7,257 KB)

Computer Science > Robotics

Title:RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators