-
Constrained Average-Reward Intermittently Observable MDPs
Authors:
Konstantin Avrachenkov,
Madhu Dhiman,
Veeraruna Kavitha
Abstract:
In Markov Decision Processes (MDPs) with intermittent state information, decision-making becomes challenging due to periods of missing observations. Linear programming (LP) methods can play a crucial role in solving MDPs, in particular, with constraints. However, the resultant belief MDPs lead to infinite dimensional LPs, even when the original MDP is with a finite state and action spaces. The ver…
▽ More
In Markov Decision Processes (MDPs) with intermittent state information, decision-making becomes challenging due to periods of missing observations. Linear programming (LP) methods can play a crucial role in solving MDPs, in particular, with constraints. However, the resultant belief MDPs lead to infinite dimensional LPs, even when the original MDP is with a finite state and action spaces. The verification of strong duality becomes non-trivial. This paper investigates the conditions for no duality gap in average-reward finite Markov decision process with intermittent state observations. We first establish that in such MDPs, the belief MDP is unichain if the original Markov chain is recurrent. Furthermore, we establish strong duality of the problem, under the same assumption. Finally, we provide a wireless channel example, where the belief state depends on the last channel state received and the age of the channel state. Our numerical results indicate interesting properties of the solution.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Punitive policies to combat misreporting in dynamic supply chains
Authors:
Madhu Dhiman,
Atul Maurya,
Veeraruna Kavitha,
Priyank Sinha
Abstract:
Wholesale price contracts are known to be associated with double marginalization effects, which prevents supply chains from achieving their true market share. In a dynamic setting under information asymmetry, these inefficiencies manifest in the form of misreporting of the market potential by the manufacturer to the supplier, again leading to the loss of market share. We pose the dynamics of inter…
▽ More
Wholesale price contracts are known to be associated with double marginalization effects, which prevents supply chains from achieving their true market share. In a dynamic setting under information asymmetry, these inefficiencies manifest in the form of misreporting of the market potential by the manufacturer to the supplier, again leading to the loss of market share. We pose the dynamics of interaction between the supplier and manufacturer as the Stackelberg game and develop theoretical results for optimal punitive strategies that the supplier can implement to ensure that the manufacturer truthfully reveals the market potential in the single-stage setting. Later, we validate these results through the randomly generated, Monte-Carlo simulation based numerical examples.
△ Less
Submitted 11 May, 2025; v1 submitted 18 April, 2025;
originally announced April 2025.
-
Optimal Control with $L^{\infty}$ cost: incorporating peak minimization
Authors:
Madhu Dhiman,
Veeraruna Kavitha,
Nandyala Hemachandra
Abstract:
Inventory and queueing systems are often designed by controlling weighted combination of some time-averaged performance metrics (like cumulative holding, shortage, server-utilization or congestion costs); but real-world constraints, like fixed storage or limited waiting space, require attention to peak levels reached during the operating period.
This work formulates such control problems, which…
▽ More
Inventory and queueing systems are often designed by controlling weighted combination of some time-averaged performance metrics (like cumulative holding, shortage, server-utilization or congestion costs); but real-world constraints, like fixed storage or limited waiting space, require attention to peak levels reached during the operating period.
This work formulates such control problems, which are any arbitrary weighted combination of some integral cost terms and an L-infinity(peak-level) term. The resultant control problem does not fall into standard control framework, nor does it have standard solution in terms of some partial differential equations. We introduce an auxiliary state variable to track the instantaneous peak-levels, enabling reformulation into the classical framework. We then propose a smooth approximation to handle the resultant discontinuities, and show the existence of unique value function that uniquely solves the corresponding Hamilton-Jacobi-Bellman equation. We apply this framework to two key applications to obtain an optimal design that includes controlling the peak-levels. Surprisingly, the numerical results show peak inventory can be minimized with negligible revenue loss (under 6%); without considering peak-control, the peak levels were significantly higher. The peak-optimal policies for queueing-system can reduce peak-congestion by up to 27%, however, at the expense of higher cumulative-congestion costs. Thus, for inventory-control, the performance of the average-terms did not degrade much, while the same is not true for queueing-system. Hence, one would require a judiciously chosen weighted design of all the costs involved including the peak-levels for any application and such a design can now be derived numerically using the proposed framework.
△ Less
Submitted 28 June, 2025; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Integrative Modeling and Analysis of the Interplay Between Epidemic and News Propagation Processes
Authors:
Madhu Dhiman,
Chen Peng,
Veeraruna Kavitha,
Quanyan Zhu
Abstract:
The COVID-19 pandemic has witnessed the role of online social networks (OSNs) in the spread of infectious diseases. The rise in severity of the epidemic augments the need for proper guidelines, but also promotes the propagation of fake news-items. The popularity of a news-item can reshape the public health behaviors and affect the epidemic processes. There is a clear inter-dependency between the e…
▽ More
The COVID-19 pandemic has witnessed the role of online social networks (OSNs) in the spread of infectious diseases. The rise in severity of the epidemic augments the need for proper guidelines, but also promotes the propagation of fake news-items. The popularity of a news-item can reshape the public health behaviors and affect the epidemic processes. There is a clear inter-dependency between the epidemic process and the spreading of news-items. This work creates an integrative framework to understand the interplay. We first develop a population-dependent `saturated branching process' to continually track the propagation of trending news-items on OSNs. A two-time scale dynamical system is obtained by integrating the news-propagation model with SIRS epidemic model, to analyze the holistic system. It is observed that a pattern of periodic infections emerges under a linear behavioral influence, which explains the waves of infection and reinfection that we have experienced in the pandemic. We use numerical experiments to corroborate the results and use Twitter and COVID-19 data-sets to recreate the historical infection curve using the integrative model.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.