Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Yan, Kai; Schwing, Alexander G.; Wang, Yu-xiong

Computer Science > Machine Learning

arXiv:2311.01331v2 (cs)

[Submitted on 2 Nov 2023 (v1), revised 21 Nov 2023 (this version, v2), latest version 9 Jun 2024 (v3)]

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Authors:Kai Yan, Alexander G. Schwing, Yu-xiong Wang

View PDF

Abstract:In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, Offline Learning from Observations (LfO) is extensively studied, where the agent learns to solve a task with only expert states and \textit{task-agnostic} non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods minimize the state occupancy divergence between the learner and expert policies. However, they are limited to either $f$-divergences (KL and $\chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To address this problem, we propose Primal Wasserstein DICE (PW-DICE), which minimizes the primal Wasserstein distance between the expert and learner state occupancies with a pessimistic regularizer and leverages a contrastively learned distance as the underlying metric for the Wasserstein distance. Theoretically, we prove that our framework is a generalization of the state-of-the-art, SMODICE, and unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods on multiple testbeds.

Comments:	23 pages. Accepted to the Optimal Transport and Machine Learning Workshop at NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.01331 [cs.LG]
	(or arXiv:2311.01331v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.01331

Submission history

From: Kai Yan [view email]
[v1] Thu, 2 Nov 2023 15:41:57 UTC (6,658 KB)
[v2] Tue, 21 Nov 2023 18:50:49 UTC (6,657 KB)
[v3] Sun, 9 Jun 2024 18:43:27 UTC (20,868 KB)

Computer Science > Machine Learning

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators