Search | arXiv e-print repository

Taming Infinity one Chunk at a Time: Concisely Represented Strategies in One-Counter MDPs

Authors: Michal Ajdarów, James C. A. Main, Petr Novotný, Mickael Randour

Abstract: Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objec… ▽ More Markov decision processes (MDPs) are a canonical model to reason about decision making within a stochastic environment. We study a fundamental class of infinite MDPs: one-counter MDPs (OC-MDPs). They extend finite MDPs via an associated counter taking natural values, thus inducing an infinite MDP over the set of configurations (current state and counter value). We consider two characteristic objectives: reaching a target state (state-reachability), and reaching a target state with counter value zero (selective termination). The synthesis problem for the latter is not known to be decidable and connected to major open problems in number theory. Furthermore, even seemingly simple strategies (e.g., memoryless ones) in OC-MDPs might be impossible to build in practice (due to the underlying infinite configuration space): we need finite, and preferably small, representations. To overcome these obstacles, we introduce two natural classes of concisely represented strategies based on a (possibly infinite) partition of counter values in intervals. For both classes, and both objectives, we study the verification problem (does a given strategy ensure a high enough probability for the objective?), and two synthesis problems (does there exist such a strategy?): one where the interval partition is fixed as input, and one where it is only parameterized. We develop a generic approach based on a compression of the induced infinite MDP that yields decidability in all cases, with all complexities within PSPACE. △ Less

Submitted 2 March, 2025; originally announced March 2025.

Comments: 55 pages

arXiv:2502.18296 [pdf, ps, other]

Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies

Authors: James C. A. Main, Mickael Randour

Abstract: We consider multi-dimensional payoff functions in Markov decision processes, and ask whether a given expected payoff vector can be achieved or not. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function and its consequences regardin… ▽ More We consider multi-dimensional payoff functions in Markov decision processes, and ask whether a given expected payoff vector can be achieved or not. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function and its consequences regarding randomisation requirements for strategies. In particular, we prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: 64 pages

arXiv:2310.02142 [pdf, other]

Arena-independent Memory Bounds for Nash Equilibria in Reachability Games

Authors: James C. A. Main

Abstract: We study the memory requirements of Nash equilibria in turn-based multiplayer games on possibly infinite graphs with reachability, shortest path and Büchi objectives. We present constructions for finite-memory Nash equilibria in these games that apply to arbitrary game graphs, bypassing the finite-arena requirement that is central in existing approaches. We show that, for these three types of ga… ▽ More We study the memory requirements of Nash equilibria in turn-based multiplayer games on possibly infinite graphs with reachability, shortest path and Büchi objectives. We present constructions for finite-memory Nash equilibria in these games that apply to arbitrary game graphs, bypassing the finite-arena requirement that is central in existing approaches. We show that, for these three types of games, from any Nash equilibrium, we can derive another Nash equilibrium where all strategies are finite-memory such that the same players accomplish their objective, without increasing their cost for shortest path games. Furthermore, we provide memory bounds that are independent of the size of the game graph for reachability and shortest path games. These bounds depend only on the number of players. To the best of our knowledge, we provide the first results pertaining to finite-memory constrained Nash equilibria in infinite arenas and the first arena-independent memory bounds for Nash equilibria. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 32 pages

arXiv:2205.04197 [pdf, other]

Timed Games with Bounded Window Parity Objectives

Authors: James C. A. Main, Mickael Randour, Jeremy Sproston

Abstract: The window mechanism, introduced by Chatterjee et al. for mean-payoff and total-payoff objectives in two-player turn-based games on graphs, refines long-term objectives with time bounds. This mechanism has proven useful in a variety of settings, and most recently in timed systems. In the timed setting, the so-called fixed timed window parity objectives have been studied. A fixed timed window par… ▽ More The window mechanism, introduced by Chatterjee et al. for mean-payoff and total-payoff objectives in two-player turn-based games on graphs, refines long-term objectives with time bounds. This mechanism has proven useful in a variety of settings, and most recently in timed systems. In the timed setting, the so-called fixed timed window parity objectives have been studied. A fixed timed window parity objective is defined with respect to some time bound and requires that, at all times, we witness a time frame, i.e., a window, of size less than the fixed bound in which the smallest priority is even. In this work, we focus on the bounded timed window parity objective. Such an objective is satisfied if there exists some bound for which the fixed objective is satisfied. The satisfaction of bounded objectives is robust to modeling choices such as constants appearing in constraints, unlike fixed objectives, for which the choice of constants may affect the satisfaction for a given bound. We show that verification of bounded timed window objectives in timed automata can be performed in polynomial space, and that timed games with these objectives can be solved in exponential time, even for multi-objective extensions. This matches the complexity classes of the fixed case. We also provide a comparison of the different variants of window parity objectives. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 44 pages

arXiv:2201.10825 [pdf, ps, other]

Different Strokes in Randomised Strategies: Revisiting Kuhn's Theorem under Finite-Memory Assumptions

Authors: James C. A. Main, Mickael Randour

Abstract: Two-player (antagonistic) games on (possibly stochastic) graphs are a prevalent model in theoretical computer science, notably as a framework for reactive synthesis. Optimal strategies may require randomisation when dealing with inherently probabilistic goals, balancing multiple objectives, or in contexts of partial information. There is no unique way to define randomised strategies. For instanc… ▽ More Two-player (antagonistic) games on (possibly stochastic) graphs are a prevalent model in theoretical computer science, notably as a framework for reactive synthesis. Optimal strategies may require randomisation when dealing with inherently probabilistic goals, balancing multiple objectives, or in contexts of partial information. There is no unique way to define randomised strategies. For instance, one can use so-called mixed strategies or behavioural ones. In the most general setting, these two classes do not share the same expressiveness. A seminal result in game theory -- Kuhn's theorem -- asserts their equivalence in games of perfect recall. This result crucially relies on the possibility for strategies to use infinite memory, i.e., unlimited knowledge of all past observations. However, computer systems are finite in practice. Hence it is pertinent to restrict our attention to finite-memory strategies, defined as automata with outputs. Randomisation can be implemented in these in different ways: the initialisation, outputs or transitions can be randomised or deterministic respectively. Depending on which aspects are randomised, the expressiveness of the corresponding class of finite-memory strategies differs. In this work, we study two-player concurrent stochastic games and provide a complete taxonomy of the classes of finite-memory strategies obtained by varying which of the three aforementioned components are randomised. Our taxonomy holds in games of perfect and imperfect information with perfect recall, and in games with more than two players. We also provide an adapted taxonomy for games with imperfect recall. △ Less

Submitted 22 November, 2024; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Extended version, preprint of Information and Computation article, 36 pages

arXiv:2105.06686 [pdf, other]

Time Flies When Looking out of the Window: Timed Games with Window Parity Objectives

Authors: James C. A. Main, Mickael Randour, Jeremy Sproston

Abstract: The window mechanism was introduced by Chatterjee et al. to reinforce mean-payoff and total-payoff objectives with time bounds in two-player turn-based games on graphs. It has since proved useful in a variety of settings, including parity objectives in games and both mean-payoff and parity objectives in Markov decision processes. We study window parity objectives in timed automata and timed game… ▽ More The window mechanism was introduced by Chatterjee et al. to reinforce mean-payoff and total-payoff objectives with time bounds in two-player turn-based games on graphs. It has since proved useful in a variety of settings, including parity objectives in games and both mean-payoff and parity objectives in Markov decision processes. We study window parity objectives in timed automata and timed games: given a bound on the window size, a path satisfies such an objective if, in all states along the path, we see a sufficiently small window in which the smallest priority is even. We show that checking that all time-divergent paths of a timed automaton satisfy such a window parity objective can be done in polynomial space, and that the corresponding timed games can be solved in exponential time. This matches the complexity class of timed parity games, while adding the ability to reason about time bounds. We also consider multi-dimensional objectives and show that the complexity class does not increase. To the best of our knowledge, this is the first study of the window mechanism in a real-time setting. △ Less

Submitted 11 August, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: Full version of CONCUR 2021 conference paper, 36 pages

Showing 1–6 of 6 results for author: Main, J C A