DPO: A Differential and Pointwise Control Approach to Reinforcement Learning

Nguyen, Minh; Bajaj, Chandrajit

Computer Science > Machine Learning

arXiv:2404.15617 (cs)

[Submitted on 24 Apr 2024 (v1), last revised 21 May 2025 (this version, v3)]

Title:DPO: A Differential and Pointwise Control Approach to Reinforcement Learning

Authors:Minh Nguyen, Chandrajit Bajaj

View PDF HTML (experimental)

Abstract:Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing due to poor sample efficiency and lack of pathwise physical consistency. We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective via a differential dual formulation. This induces a Hamiltonian structure that embeds physics priors and ensures consistent trajectories without requiring explicit constraints. To implement Differential RL, we develop Differential Policy Optimization (DPO), a pointwise, stage-wise algorithm that refines local movement operators along the trajectory for improved sample efficiency and dynamic alignment. We establish pointwise convergence guarantees, a property not available in standard RL, and derive a competitive theoretical regret bound of $O(K^{5/6})$. Empirically, DPO outperforms standard RL baselines on representative scientific computing tasks, including surface modeling, grid control, and molecular dynamics, under low-data and physics-constrained conditions.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)
Cite as:	arXiv:2404.15617 [cs.LG]
	(or arXiv:2404.15617v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.15617

Submission history

From: Minh Phuong Nguyen [view email]
[v1] Wed, 24 Apr 2024 03:11:12 UTC (242 KB)
[v2] Tue, 13 Aug 2024 03:47:38 UTC (270 KB)
[v3] Wed, 21 May 2025 03:15:48 UTC (384 KB)

Computer Science > Machine Learning

Title:DPO: A Differential and Pointwise Control Approach to Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DPO: A Differential and Pointwise Control Approach to Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators