Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Yang, Chao; Ma, Xiaojian; Huang, Wenbing; Sun, Fuchun; Liu, Huaping; Huang, Junzhou; Gan, Chuang

Computer Science > Machine Learning

arXiv:1910.04417 (cs)

[Submitted on 10 Oct 2019 (v1), last revised 18 Nov 2019 (this version, v4)]

Title:Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Authors:Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan

View PDF

Abstract:This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

Comments:	Accepted to NeurIPS 2019 as a spotlight. Chao Yang and Xiaojian Ma contributed equally to this work
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as:	arXiv:1910.04417 [cs.LG]
	(or arXiv:1910.04417v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.04417

Submission history

From: Xiaojian Ma [view email]
[v1] Thu, 10 Oct 2019 08:07:17 UTC (616 KB)
[v2] Mon, 14 Oct 2019 20:10:53 UTC (942 KB)
[v3] Sun, 20 Oct 2019 19:21:27 UTC (942 KB)
[v4] Mon, 18 Nov 2019 00:17:12 UTC (942 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat.ML

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
cs.AI
cs.LG
cs.RO
stat

References & Citations

DBLP - CS Bibliography

listing | bibtex

Chao Yang
Xiaojian Ma
Wenbing Huang
Fuchun Sun
Huaping Liu

…

export BibTeX citation

Computer Science > Machine Learning

Title:Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators