Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Zanette, Andrea; Lazaric, Alessandro; Kochenderfer, Mykel J.; Brunskill, Emma

Computer Science > Machine Learning

arXiv:2008.07737 (cs)

[Submitted on 18 Aug 2020 (v1), last revised 22 Oct 2020 (this version, v2)]

Title:Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Authors:Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

View PDF

Abstract:There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more standard notion of low inherent Bellman error, typically employed in least-square value iteration-style algorithms, we can provide strong PAC guarantees on learning a near optimal value function provided that the linear space is sufficiently "explorable". We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function, which is revealed only once learning has completed. If this reward function is also estimated from the samples gathered during pure exploration, our results also provide same-order PAC guarantees on the performance of the resulting policy for this setting.

Comments:	Minor update; appears in NeurIPS
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2008.07737 [cs.LG]
	(or arXiv:2008.07737v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2008.07737

Submission history

From: Andrea Zanette [view email]
[v1] Tue, 18 Aug 2020 04:34:21 UTC (43 KB)
[v2] Thu, 22 Oct 2020 02:30:08 UTC (43 KB)

Computer Science > Machine Learning

Title:Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators