Search | arXiv e-print repository

Sequential Stochastic Optimization in Separable Learning Environments

Authors: R. Reid Bishop, Chelsea C. White III

Abstract: We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts. These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process, the observation process only observes the modulation proc… ▽ More We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts. These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process, the observation process only observes the modulation process, and the modulation process is exogenous to control. We model this broad class of problems as a partially observed Markov decision process (POMDP). The belief function for the modulation process is control invariant, thus separating the estimation of the modulation process from the control of the state process. We call this specially structured POMDP the separable POMDP, or SEP-POMDP, and show it (i) can serve as a model for a broad class of application areas, e.g., inventory control, finance, healthcare systems, (ii) inherits value function and optimal policy structure from a set of completely observed MDPs, (iii) can serve as a bridge between classical models of sequential decision making under uncertainty having fully specified model artifacts and such models that are not fully specified and require the use of predictive methods from statistics and machine learning, and (iv) allows for specialized approximate solution procedures. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: 30 pages (Main), 12 pages (Figures, References, Appendices), 5 figures

arXiv:1902.08773 [pdf, ps, other]

Managing mobile production-inventory systems influenced by a modulation process

Authors: Satya S. Malladi, Alan L. Erera, Chelsea C. White III

Abstract: We investigate the potential added value of being able to relocate production capacity, relative to fixed production capacity, in a network of multiple, geographically distributed manufacturing sites. There is a growing interest in production capacity that can be geographically relocated; e.g., modular units for pharmaceutical intermediates. It shows promise for enabling the fast fulfillment of a… ▽ More We investigate the potential added value of being able to relocate production capacity, relative to fixed production capacity, in a network of multiple, geographically distributed manufacturing sites. There is a growing interest in production capacity that can be geographically relocated; e.g., modular units for pharmaceutical intermediates. It shows promise for enabling the fast fulfillment of a distributed network with a reduction in the total inventory and total production capacity of a distributed network with fixed production capacity without sacrificing customer service levels or total system resilience. Allowing also for transshipment, we model a production-inventory system with L production sites and Y units of relocatable production capacity, develop efficient and effective heuristic solution methods for dynamic relocation and multi-location inventory control, and analyze the potential added value. We describe the (L, Y) problem as a problem of sequential decision making under uncertainty to determine transshipment, mobile production capacity relocation, and replenishment decisions at each decision epoch. To enhance model realism, we use a partially observed stochastic process, the modulation process, to model the exogenous and partially observable forces (e.g., the macro-economy) that affect demand. We then model the (L, Y) problem as a partially observed Markov decision process. Due to the considerable computational challenges of solving this model exactly, we propose two efficient, high quality heuristics. We show for an instance set with five locations that production capacity mobility and transshipment, relative to the fixed production capacity case, can improve systems performance by as much as 41\% on average over the no-flexibility case and that production capacity mobility can yield as much as 10\% more savings compared to when only transshipment is permitted. △ Less

Submitted 25 June, 2021; v1 submitted 23 February, 2019; originally announced February 2019.

Comments: 37 pages (including appendices), 15 tables

arXiv:1901.01483 [pdf, other]

Worst-Case Analysis for a Leader-follower Partially Observable Stochastic Game

Authors: Yanling Chang, Chelsea C. White III

Abstract: Partially observable stochastic games provide a rich mathematical paradigm for modeling multi-agent dynamic decision making under uncertainty and partial information. However, they generally do not admit closed-form solutions and are notoriously difficult to solve. Also, in reality, each agent often does not have complete knowledge of the other agent. This paper studies a leader-follower partially… ▽ More Partially observable stochastic games provide a rich mathematical paradigm for modeling multi-agent dynamic decision making under uncertainty and partial information. However, they generally do not admit closed-form solutions and are notoriously difficult to solve. Also, in reality, each agent often does not have complete knowledge of the other agent. This paper studies a leader-follower partially observable stochastic game where the leader has little knowledge of the adversarial follower's reward structure, level of rationality, and process for gathering and transmitting data relevant for decision making. We introduce the worst-case analysis to the partially observable stochastic game to cope with this lack of knowledge and determine the best worst-case value function of the leader. The resulting problem from the leader's perspective has a simple sufficient statistic; however, different from a classical partially observable Markov decision process, the value function of the resulting problem may not be convex. We design a viable and computationally attractive solution procedure for computing a lower bound of the leader's value function as well as its associated control policy in the finite planning horizon. We illustrate the use of the proposed approach in a liquid egg production security problem. △ Less

Submitted 13 April, 2020; v1 submitted 5 January, 2019; originally announced January 2019.

arXiv:1901.01464 [pdf, other]

The Value of Misinformation and Disinformation

Authors: Yanling Chang, Matthew F. Keblis, Ran Li, Eleftherios Iakovou, Chelsea C. White III

Abstract: Information is a critical dimension in warfare. Inaccurate information such as misinformation or disinformation further complicates military operations. In this paper, we examine the value of misinformation and disinformation to a military leader who through investment in people, programs and technology is able to affect the accuracy of information communicated between other actors. We model the p… ▽ More Information is a critical dimension in warfare. Inaccurate information such as misinformation or disinformation further complicates military operations. In this paper, we examine the value of misinformation and disinformation to a military leader who through investment in people, programs and technology is able to affect the accuracy of information communicated between other actors. We model the problem as a partially observable stochastic game with three agents, a leader and two followers. We determine the value to the leader of misinformation or disinformation being communicated between two (i) adversarial followers and (ii) allied followers. We demonstrate that only under certain conditions, the prevalent intuition that the leader would benefit from less (more) accurate communication between adversarial (allied) followers is valid. We analyzed why the intuition may fail and show a holistic paradigm taking into account both the reward structures and policies of agents is necessary in order to correctly determine the value of misinformation and disinformation. Our research identifies efficient targeted investments to affect the accuracy of information communicated between followers to the leader's advantage. △ Less

Submitted 5 January, 2019; originally announced January 2019.

arXiv:1803.06742 [pdf, ps, other]

Inventory Control with Modulated Demand and a Partially Observed Modulation Process

Authors: Satya S. Malladi, Alan L. Erera, Chelsea C. White III

Abstract: We consider a periodic review inventory control problem having an underlying modulation process that affects demand and that is partially observed by the uncensored demand process and a novel additional observation data (AOD) process. We present an attainability condition, AC, that guarantees the existence of an optimal myopic base stock policy if the reorder cost $K=0$ and the existence of an opt… ▽ More We consider a periodic review inventory control problem having an underlying modulation process that affects demand and that is partially observed by the uncensored demand process and a novel additional observation data (AOD) process. We present an attainability condition, AC, that guarantees the existence of an optimal myopic base stock policy if the reorder cost $K=0$ and the existence of an optimal $(s, S)$ policy if $K>0$, where both policies depend on the belief function of the modulation process. Assuming AC holds, we show that (i) when $K=0$, the value of the optimal base stock level is constant within regions of the belief space and that each region can be described by two linear inequalities and (ii) when $K>0$, the values of $s$ and $S$ and upper and lower bounds on these values are constant within regions of the belief space and that these regions can be described by a finite set of linear inequalities. A heuristic and bounds for the $K=0$ case are presented when AC does not hold. Special cases of this inventory control problem include problems considered in the Markov-modulated demand and Bayesian updating literatures. △ Less

Submitted 12 February, 2022; v1 submitted 18 March, 2018; originally announced March 2018.

arXiv:1605.01442 [pdf, ps, other]

2-Approximation Algorithms for Perishable Inventory Control When FIFO Is an Optimal Issuing Policy

Authors: Can Zhang, Turgay Ayer, Chelsea C. White III

Abstract: We consider a periodic-review, fixed-lifetime perishable inventory control problem where demand is a general stochastic process. The optimal solution for this problem is intractable due to "curse of dimensionality". In this paper, we first present a computationally efficient algorithm that we call the marginal-cost dual-balancing policy for perishable inventory control problem. We then prove that… ▽ More We consider a periodic-review, fixed-lifetime perishable inventory control problem where demand is a general stochastic process. The optimal solution for this problem is intractable due to "curse of dimensionality". In this paper, we first present a computationally efficient algorithm that we call the marginal-cost dual-balancing policy for perishable inventory control problem. We then prove that a myopic policy under the so-called marginal-cost accounting scheme provides a lower bound on the optimal ordering quantity. By combining the specific lower bound we derive and any upper bound on the optimal ordering quantity with the marginal-cost dual-balancing policy, we present a more general class of algorithms that we call the truncated-balancing policy. We prove that when first-in-first-out (FIFO) is an optimal issuing policy, both of our proposed algorithms admit a worst-case performance guarantee of two, i.e. the expected total cost of our policy is at most twice that of an optimal ordering policy. We further present sufficient conditions that ensure the optimality of FIFO issuing policy. Finally, we conduct numerical analyses based on real data and show that both of our algorithms perform much better than the worst-case performance guarantee, and the truncated-balancing policy has a significant performance improvement over the balancing policy. △ Less

Submitted 9 May, 2016; v1 submitted 4 May, 2016; originally announced May 2016.

arXiv:1404.4388 [pdf, other]

Partially Observed, Multi-objective Markov Games

Authors: Yanling Chang, Alan L. Erera, Chelsea C. White III

Abstract: The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). At each decision epoch, each agent knows: its past… ▽ More The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). At each decision epoch, each agent knows: its past and present states, its past actions, and noise corrupted observations of the other agent's past and present states. The actions of each agent are determined at each decision epoch based on these data. The leader considers multiple objectives in selecting its policy. The follower considers a single objective in selecting its policy with complete knowledge of and in response to the policy selected by the leader. This leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process (POMDP). This POMDP is used to determine the follower's best response policy. A multi-objective genetic algorithm (MOGA) is used to create the next generation of leader policies based on the fitness measures of each leader policy in the current generation. Computing a fitness measure for a leader policy requires a value determination calculation, given the leader policy and the follower's best response policy. The policies from which the leader can select a most preferred policy are the non-dominated policies of the final generation of leader policies created by the MOGA. An example is presented that illustrates how these results can be used to support a manager of a liquid egg production process (the leader) in selecting a sequence of actions to best control this process over time, given that there is an attacker (the follower) who seeks to contaminate the liquid egg production process with a chemical or biological toxin. △ Less

Submitted 16 April, 2014; originally announced April 2014.

Showing 1–7 of 7 results for author: White, C C