Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

Eze, Chrisantus; Crick, Christopher

Computer Science > Robotics

arXiv:2402.07127 (cs)

[Submitted on 11 Feb 2024 (v1), last revised 18 Sep 2024 (this version, v2)]

Title:Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

Authors:Chrisantus Eze, Christopher Crick

View PDF HTML (experimental)

Abstract:Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased datasets. While curated datasets can help, challenges remain in generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild" video datasets have driven progress in computer vision through self-supervised techniques. Translating this to robotics, recent works have explored learning manipulation skills by passively watching abundant videos sourced online. Showing promising results, such video-based learning paradigms provide scalable supervision while reducing dataset bias. This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources, as well as emerging techniques for acquiring robot manipulation skills from uncontrolled video demonstrations. We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation. The survey summarizes video-based learning approaches, analyses their benefits over standard datasets, survey metrics, and benchmarks, and discusses open challenges and future directions in this nascent domain at the intersection of computer vision, natural language processing, and robot learning.

Comments:	Submitted at IEEE Access
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2402.07127 [cs.RO]
	(or arXiv:2402.07127v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2402.07127

Submission history

From: Chrisantus Eze [view email]
[v1] Sun, 11 Feb 2024 08:41:42 UTC (371 KB)
[v2] Wed, 18 Sep 2024 19:20:42 UTC (2,862 KB)

Computer Science > Robotics

Title:Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators