Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Acar, Cihan; Binici, Kuluhan; Tekirdağ, Alp; Wu, Yan

doi:10.1109/LRA.2023.3336245

Computer Science > Robotics

arXiv:2303.07026 (cs)

[Submitted on 13 Mar 2023 (v1), last revised 2 Dec 2023 (this version, v2)]

Title:Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Authors:Cihan Acar, Kuluhan Binici, Alp Tekirdağ, Yan Wu

View PDF HTML (experimental)

Abstract:The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, the hardware cost and design constraints in real-world scenarios can potentially make it challenging to use multiple cameras. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained ``teacher'' policy trained with multiple camera viewpoints guides a ``student'' policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated both in simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.

Comments:	IEEE Robotics and Automation Letters
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.07026 [cs.RO]
	(or arXiv:2303.07026v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2303.07026
Related DOI:	https://doi.org/10.1109/LRA.2023.3336245

Submission history

From: Cihan Acar [view email]
[v1] Mon, 13 Mar 2023 11:42:38 UTC (45,139 KB)
[v2] Sat, 2 Dec 2023 06:34:41 UTC (4,976 KB)

Computer Science > Robotics

Title:Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators