Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Mguni, David; Sootla, Aivar; Ziomek, Juliusz; Slumbers, Oliver; Dai, Zipeng; Shao, Kun; Wang, Jun

Computer Science > Machine Learning

arXiv:2205.15953 (cs)

[Submitted on 31 May 2022 (v1), last revised 4 Jun 2023 (this version, v4)]

Title:Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Authors:David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng Dai, Kun Shao, Jun Wang

View PDF

Abstract:Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving successful outcomes and yet, the challenge of efficiently \textit{learning} to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol \textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as \textit{impulse control} which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most $k<\infty$ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar Lander} and in \textit{Highway} environments and a variant of the Merton portfolio problem within finance.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2205.15953 [cs.LG]
	(or arXiv:2205.15953v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.15953

Submission history

From: David Mguni [view email]
[v1] Tue, 31 May 2022 16:50:46 UTC (725 KB)
[v2] Mon, 6 Jun 2022 12:26:45 UTC (725 KB)
[v3] Wed, 17 May 2023 01:41:04 UTC (1,823 KB)
[v4] Sun, 4 Jun 2023 23:59:39 UTC (1,817 KB)

Computer Science > Machine Learning

Title:Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators