Keyframe Demonstration Seeded and Bayesian Optimized Policy Search

Tore, Onur Berk; Negahbani, Farzin; Akgun, Baris

Abstract:This paper introduces a novel Learning from Demonstration framework to learn robotic skills with keyframe demonstrations using a Dynamic Bayesian Network (DBN) and a Bayesian Optimized Policy Search approach to improve the learned skills. DBN learns the robot motion, perceptual change in the object of interest (aka skill sub-goals) and the relation between them. The rewards are also learned from the perceptual part of the DBN. The policy search part is a semiblack box algorithm, which we call BO-PI2 . It utilizes the action-perception relation to focus the high-level exploration, uses Gaussian Processes to model the expected-return and performs Upper Confidence Bound type low-level exploration for sampling the rollouts. BO-PI2 is compared against a stateof-the-art method on three different skills in a real robot setting with expert and naive user demonstrations. The results show that our approach successfully focuses the exploration on the failed sub-goals and the addition of reward-predictive exploration outperforms the state-of-the-art approach on cumulative reward, skill success, and termination time metrics.

Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2301.08184 [cs.RO]
	(or arXiv:2301.08184v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2301.08184

Computer Science > Robotics

Title:Keyframe Demonstration Seeded and Bayesian Optimized Policy Search

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators