-
On Pioneering Works of Albert Shiryaev on Markov Decision Processes and Some Later Developments
Authors:
Eugene A. Feinberg
Abstract:
This article is dedicated to three fundamental papers on Markov Decision Processes and on control with incomplete observations published by Albert Shiryaev approximately sixty years ago. One of these papers was coauthored with O.V. Viskov. We discuss some of the results and some of many rich ideas presented in these papers and survey some later developments. At the end we mention some recent studi…
▽ More
This article is dedicated to three fundamental papers on Markov Decision Processes and on control with incomplete observations published by Albert Shiryaev approximately sixty years ago. One of these papers was coauthored with O.V. Viskov. We discuss some of the results and some of many rich ideas presented in these papers and survey some later developments. At the end we mention some recent studies of Albert Shiryaev on Kolmogorov's equations for jump Markov processes and on control of continuous-time jump Markov processes.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Properties of Turnpike Functions for Discounted Finite MDPs
Authors:
Eugene A. Feinberg,
Gaojin He
Abstract:
This paper studies discounted Markov Decision Processes (MDPs) with finite sets of states and actions. Value iteration is one of the major methods for finding optimal policies. For each discount factor, starting from a finite number of iterations, which is called the turnpike integer, value iteration algorithms always generate decision rules, which are deterministic optimal policies for the infini…
▽ More
This paper studies discounted Markov Decision Processes (MDPs) with finite sets of states and actions. Value iteration is one of the major methods for finding optimal policies. For each discount factor, starting from a finite number of iterations, which is called the turnpike integer, value iteration algorithms always generate decision rules, which are deterministic optimal policies for the infinite-horizon problems. This fact justifies the rolling horizon approach for computing infinite-horizon optimal policies by conducting a finite number of value iterations. This paper describes properties of turnpike integers and provides their upper bounds.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Average-Cost MDPs with Infinite State and Action Sets: New Sufficient Conditions for Optimality Inequalities and Equations
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Liliia S. Paliichuk
Abstract:
This paper studies discrete-time average-cost infinite-horizon Markov decision processes (MDPs) with Borel state and action sets. It introduces new sufficient conditions for { the} validity of optimality inequalities and optimality equations for MDPs with weakly and setwise continuous transition probabilities. These inequalities and equations imply the existence of deterministic optimal policies.
This paper studies discrete-time average-cost infinite-horizon Markov decision processes (MDPs) with Borel state and action sets. It introduces new sufficient conditions for { the} validity of optimality inequalities and optimality equations for MDPs with weakly and setwise continuous transition probabilities. These inequalities and equations imply the existence of deterministic optimal policies.
△ Less
Submitted 26 January, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Continuity of Filters for Discrete-Time Control Problems Defined by Explicit Equations
Authors:
Eugene A. Feinberg,
Sayaka Ishizawa,
Pavlo O. Kasyanov,
David N. Kraemer
Abstract:
Discrete time control systems whose dynamics and observations are described by stochastic equations are common in engineering, operations research, health care, and economics. For example, stochastic filtering problems are usually defined via stochastic equations. These problems can be reduced to Markov decision processes (MDPs) whose states are posterior state distributions, and transition probab…
▽ More
Discrete time control systems whose dynamics and observations are described by stochastic equations are common in engineering, operations research, health care, and economics. For example, stochastic filtering problems are usually defined via stochastic equations. These problems can be reduced to Markov decision processes (MDPs) whose states are posterior state distributions, and transition probabilities for such MDPs are sometimes called filters. This paper investigates sufficient conditions on transition and observation functions for the original problems to guarantee weak continuity of the filter. Under mild conditions on cost functions, weak continuity implies the existence of optimal policies minimizing the expected total costs, the validity of optimality equations, and convergence of value iterations to optimal values. This paper uses recent results on weak continuity of filters for partially observable MDPs defined by transition and observation probabilities. It develops a criterion of weak continuity of transition probabilities and a sufficient condition for continuity in total variation of transition probabilities. The results are illustrated with applications to filtering problems.
△ Less
Submitted 3 February, 2025; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Sequential Optimization of CVaR
Authors:
Rui Ding,
Eugene A. Feinberg
Abstract:
This paper studies optimization of the Conditional Value at Risk (CVaR) for a discounted total-cost Markov Decision Process (MDP) with finite state and action sets. This CVaR optimization problem can be reformulated as a Robust MDP(RMDP) with a compact state space. States in this RMDP are the original states of the problems augmented with tail risk levels, and the Decision Maker (DM) knows only th…
▽ More
This paper studies optimization of the Conditional Value at Risk (CVaR) for a discounted total-cost Markov Decision Process (MDP) with finite state and action sets. This CVaR optimization problem can be reformulated as a Robust MDP(RMDP) with a compact state space. States in this RMDP are the original states of the problems augmented with tail risk levels, and the Decision Maker (DM) knows only the initial tail risk level at the initial state and time. Thus, in order to find an optimal policy following this approach, the DM needs to solve an RMDP with incomplete state observations because after the first move, the DM observes the states of the system, but the tail risk levels are unknown. This paper shows that for the CVaR optimization problem the corresponding RMDP can be solved by using the methods of convex analysis. This paper introduces the algorithm for computing and implementing an optimal CVaR policy by using the value function for the version of this RMDP with completely observable tail risk levels at all states. This algorithm and the major results of the paper are presented for a more general problem of optimization of sum of a mean and CVaR for possibly different cost functions.
△ Less
Submitted 14 February, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Epi-Convergence of Expectation Functions under Varying Measures and Integrands
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Johannes O. Royset
Abstract:
For expectation functions on metric spaces, we provide sufficient conditions for epi-convergence under varying probability measures and integrands, and examine applications in the area of sieve estimators, mollifier smoothing, PDE-constrained optimization, and stochastic optimization with expectation constraints. As a stepping stone to epi-convergence of independent interest, we develop parametric…
▽ More
For expectation functions on metric spaces, we provide sufficient conditions for epi-convergence under varying probability measures and integrands, and examine applications in the area of sieve estimators, mollifier smoothing, PDE-constrained optimization, and stochastic optimization with expectation constraints. As a stepping stone to epi-convergence of independent interest, we develop parametric Fatou's lemmas under mild integrability assumptions. In the setting of Suslin metric spaces, the assumptions are expressed in terms of Pasch-Hausdorff envelopes. For general metric spaces, the assumptions shift to semicontinuity of integrands also on the sample space, which then is assumed to be a metric space.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Equivalent Conditions for Weak Continuity of Nonlinear Filters
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov
Abstract:
This paper studies weak continuity of nonlinear filters. It is well-known that Borel measurability of transition probabilities for problems with incomplete state observations is preserved when the original discrete-time process is replaced with the process whose states are belief probabilities. It is also known that the similar preservation may not hold for weak continuity of transition probabilit…
▽ More
This paper studies weak continuity of nonlinear filters. It is well-known that Borel measurability of transition probabilities for problems with incomplete state observations is preserved when the original discrete-time process is replaced with the process whose states are belief probabilities. It is also known that the similar preservation may not hold for weak continuity of transition probabilities. In this paper we show that the sufficient condition for weak continuity of transition probabilities for beliefs introduced by Kara, Saldi, and Yuksel (2019) is a necessary and sufficient condition for semi-uniform Feller continuity of transition probabilities. The property of semi-uniform Feller continuity was introduced in Feinberg, Kasyanov, and Zgurovsky (2021), and, if the original transition probability has this property, then the transition probability of the process, whose state is a pair consisting of the belief probability and observation, also has this property. Thus, this property implies weak continuity of nonlinear filters. This paper also reviews several necessary and sufficient conditions for semi-uniform Feller continuity.
△ Less
Submitted 22 March, 2023; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Continuity of Discounted Values and the Structure of Optimal Policies for Periodic-Review Inventory Control with Setup Costs
Authors:
Eugene A. Feinberg,
David N. Kraemer
Abstract:
This paper proves continuity of value functions in discounted periodic-review single-commodity total-cost inventory control problems with \revision{continuous inventory levels,} fixed ordering costs, possibly bounded inventory storage capacity, and possibly bounded order sizes for finite and infinite horizons. In each of these constrained models, the finite and infinite-horizon value functions are…
▽ More
This paper proves continuity of value functions in discounted periodic-review single-commodity total-cost inventory control problems with \revision{continuous inventory levels,} fixed ordering costs, possibly bounded inventory storage capacity, and possibly bounded order sizes for finite and infinite horizons. In each of these constrained models, the finite and infinite-horizon value functions are continuous, there exist deterministic Markov optimal finite-horizon policies, and there exist stationary deterministic Markov optimal infinite-horizon policies. For models with bounded inventory storage and unbounded order sizes, this paper also characterizes the conditions under which $(s_t, S_t)$ policies are optimal in the finite horizon and an $(s,S)$ policy is optimal in the infinite horizon.
△ Less
Submitted 26 July, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Continuity of Parametric Optima for Possibly Discontinuous Functions and Noncompact Decision Sets
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
David N. Kraemer
Abstract:
This paper investigates continuity properties of value functions and solutions for parametric optimization problems. These problems are important in operations research, control, and economics because optimality equations are their particular cases. The classic fact, Berge's maximum theorem, gives sufficient conditions for continuity of value functions and upper semicontinuity of solution multifun…
▽ More
This paper investigates continuity properties of value functions and solutions for parametric optimization problems. These problems are important in operations research, control, and economics because optimality equations are their particular cases. The classic fact, Berge's maximum theorem, gives sufficient conditions for continuity of value functions and upper semicontinuity of solution multifunctions. Berge's maximum theorem assumes that the objective function is continuous and the multifunction of feasible sets is compact-valued. These assumptions are not satisfied in many applied problems, which historically has limited the relevance of the theorem. This paper generalizes Berge's maximum theorem in three directions: (i) the objective function may not be continuous, (ii) the multifunction of feasible sets may not be compact-valued, and (iii) necessary and sufficient conditions are provided. To illustrate the main theorem, this paper provides applications to inventory control and to the analysis of robust optimization over possibly noncompact action sets and discontinuous objective functions.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Kolmogorov's Equations for Jump Markov Processes and their Applications to Control Problems
Authors:
Eugene A. Feinberg,
Albert N. Shiryaev
Abstract:
This paper describes the structure of solutions to Kolmogorov's equations for nonhomogeneous jump Markov processes and applications of these results to control of jump stochastic systems. These equations were studied by Feller (1940), who clarified in 1945 in the errata to that paper that some of its results covered only nonexplosive Markov processes. In this work, which is largely of a survey nat…
▽ More
This paper describes the structure of solutions to Kolmogorov's equations for nonhomogeneous jump Markov processes and applications of these results to control of jump stochastic systems. These equations were studied by Feller (1940), who clarified in 1945 in the errata to that paper that some of its results covered only nonexplosive Markov processes. In this work, which is largely of a survey nature, the case of explosive processes is also considered. This paper is based on the invited talk presented by the authors at the conference "Chebyshev-200", and it describes the results of their joined studies with Manasa Mandava (1984-2019).
△ Less
Submitted 7 November, 2021; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper deals with control of partially observable discrete-time stochastic systems. It introduces and studies Markov Decision Processes with Incomplete Information and with semi-uniform Feller transition probabilities. The important feature of these models is that their classic reduction to Completely Observable Markov Decision Processes with belief states preserves semi-uniform Feller continu…
▽ More
This paper deals with control of partially observable discrete-time stochastic systems. It introduces and studies Markov Decision Processes with Incomplete Information and with semi-uniform Feller transition probabilities. The important feature of these models is that their classic reduction to Completely Observable Markov Decision Processes with belief states preserves semi-uniform Feller continuity of transition probabilities. Under mild assumptions on cost functions, optimal policies exist, optimality equations hold, and value iterations converge to optimal values for these models. In particular, for Partially Observable Markov Decision Processes the results of this paper imply new and generalize several known sufficient conditions on transition and observation probabilities for weak continuity of transition probabilities for Markov Decision Processes with belief states, the existence of optimal policies, validity of optimality equations defining optimal policies, and convergence of value iterations to optimal values.
△ Less
Submitted 26 August, 2022; v1 submitted 20 August, 2021;
originally announced August 2021.
-
Semi-Uniform Feller Stochastic Kernels
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper studies transition probabilities from a Borel subset of a Polish space to a product of two Borel subsets of Polish spaces. For such transition probabilities it introduces and studies the property of semi-uniform Feller continuity. This paper provides several equivalent definitions of semi-uniform Feller continuity and establishes its preservation under integration. The motivation for th…
▽ More
This paper studies transition probabilities from a Borel subset of a Polish space to a product of two Borel subsets of Polish spaces. For such transition probabilities it introduces and studies the property of semi-uniform Feller continuity. This paper provides several equivalent definitions of semi-uniform Feller continuity and establishes its preservation under integration. The motivation for this study came from the theory of Markov decision processes with incomplete information, and this paper provides fundamental results useful for this theory.
△ Less
Submitted 5 January, 2023; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Average Cost Markov Decision Processes with Semi-Uniform Feller Transition Probabilities
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optimality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.
This paper studies average-cost Markov decision processes with semi-uniform Feller transition probabilities. This class of MDPs was recently introduced by the authors to study MDPs with incomplete information. This paper studies the validity of optimality inequalities, the existence of optimal policies, and the approximations of optimal policies by policies optimizing total discounted costs.
△ Less
Submitted 12 August, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
MDPs with Setwise Continuous Transition Probabilities
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov
Abstract:
This paper describes the structure of optimal policies for infinite-state Markov Decision Processes with setwise continuous transition probabilities. The action sets may be noncompact. The objective criteria are either the expected total discounted and undiscounted costs or average costs per unit time. The analysis of optimality equations and inequalities is based on the optimal selection theorem…
▽ More
This paper describes the structure of optimal policies for infinite-state Markov Decision Processes with setwise continuous transition probabilities. The action sets may be noncompact. The objective criteria are either the expected total discounted and undiscounted costs or average costs per unit time. The analysis of optimality equations and inequalities is based on the optimal selection theorem for inf-compact functions introduced in this paper.
△ Less
Submitted 30 July, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes
Authors:
Eugene A. Feinberg,
Manasa Mandava,
Albert N. Shiryaev
Abstract:
This paper extends to Continuous-Time Jump Markov Decision Processes (CTJMDP) the classic result for Markov Decision Processes stating that, for a given initial state distribution, for every policy there is a (randomized) Markov policy, which can be defined in a natural way, such that at each time instance the marginal distributions of state-action pairs for these two policies coincide. It is show…
▽ More
This paper extends to Continuous-Time Jump Markov Decision Processes (CTJMDP) the classic result for Markov Decision Processes stating that, for a given initial state distribution, for every policy there is a (randomized) Markov policy, which can be defined in a natural way, such that at each time instance the marginal distributions of state-action pairs for these two policies coincide. It is shown in this paper that this equality takes place for a CTJMDP if the corresponding Markov policy defines a nonexplosive jump Markov process. If this Markov process is explosive, then at each time instance the marginal probability, that a state-action pair belongs to a measurable set of state-action pairs, is not greater for the described Markov policy than the same probability for the original policy. These results are used in this paper to prove that for expected discounted total costs and for average costs per unit time, for a given initial state distribution, for each policy for a CTJMDP the described a Markov policy has the same or better performance.
△ Less
Submitted 14 May, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming
Authors:
Eugene A. Feinberg,
Gaojin He
Abstract:
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action…
▽ More
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Step Change Improvement in ADMET Prediction with PotentialNet Deep Featurization
Authors:
Evan N. Feinberg,
Robert Sheridan,
Elizabeth Joshi,
Vijay S. Pande,
Alan C. Cheng
Abstract:
The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) properties of drug candidates are estimated to account for up to 50% of all clinical trial failures. Predicting ADMET properties has therefore been of great interest to the cheminformatics and medicinal chemistry communities in recent decades. Traditional cheminformatics approaches, whether the learner is a random forest o…
▽ More
The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) properties of drug candidates are estimated to account for up to 50% of all clinical trial failures. Predicting ADMET properties has therefore been of great interest to the cheminformatics and medicinal chemistry communities in recent decades. Traditional cheminformatics approaches, whether the learner is a random forest or a deep neural network, leverage fixed fingerprint feature representations of molecules. In contrast, in this paper, we learn the features most relevant to each chemical task at hand by representing each molecule explicitly as a graph, where each node is an atom and each edge is a bond. By applying graph convolutions to this explicit molecular representation, we achieve, to our knowledge, unprecedented accuracy in prediction of ADMET properties. By challenging our methodology with rigorous cross-validation procedures and prospective analyses, we show that deep featurization better enables molecular predictors to not only interpolate but also extrapolate to new regions of chemical space.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
A Class of Solvable Markov Decision Models with Incomplete Information
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper investigates natural conditions for the existence of optimal policies for a Markov decision process with incomplete information (MDPII) and with expected total costs. The MDPII is the classic model of a controlled stochastic process with incomplete state observations which is more general than Partially Observable Markov Decision Processes (POMDPs). For MDPIIs we introduce the notion of…
▽ More
This paper investigates natural conditions for the existence of optimal policies for a Markov decision process with incomplete information (MDPII) and with expected total costs. The MDPII is the classic model of a controlled stochastic process with incomplete state observations which is more general than Partially Observable Markov Decision Processes (POMDPs). For MDPIIs we introduce the notion of a semi-uniform Feller transition probability, which is stronger than the notion of a weakly continuous transition probability. We show that an MDPII has a semi-uniform Feller transition probability if and only if the corresponding belief MDP also has a semi-uniform Feller transition probability. This fact has several corollaries. In particular, it provides new and implies all known sufficient conditions for the existence of optimal policies for POMDPs with expected total costs
△ Less
Submitted 28 September, 2021; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Fatou's Lemma in Its Classic Form and Lebesgue's Convergence Theorems for Varying Measures with Applications to MDPs
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Yan Liang
Abstract:
The classic Fatou lemma states that the lower limit of a sequence of integrals of functions is greater or equal than the integral of the lower limit. It is known that Fatou's lemma for a sequence of weakly converging measures states a weaker inequality because the integral of the lower limit is replaced with the integral of the lower limit in two parameters, where the second parameter is the argum…
▽ More
The classic Fatou lemma states that the lower limit of a sequence of integrals of functions is greater or equal than the integral of the lower limit. It is known that Fatou's lemma for a sequence of weakly converging measures states a weaker inequality because the integral of the lower limit is replaced with the integral of the lower limit in two parameters, where the second parameter is the argument of the functions. This paper provides sufficient conditions when Fatou's lemma holds in its classic form for a sequence of weakly converging measures. The functions can take both positive and negative values. The paper also provides similar results for sequences of setwise converging measures. It also provides Lebesgue's and monotone convergence theorems for sequences of weakly and setwise converging measures. The obtained results are used to prove broad sufficient conditions for the validity of optimality equations for average-cost Markov decision processes.
△ Less
Submitted 17 June, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
Fatou's Lemma for Weakly Converging Measures under the Uniform Integrability Condition
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Yan Liang
Abstract:
This note describes Fatou's lemma and Lebesgue's dominated convergence theorem for a sequence of measures converging weakly to a finite measure and for a sequence of functions whose negative parts are uniformly integrable with respect to these measures. The note also provides new formulations of uniform Fatou's lemma, uniform Lebesgue convergence theorem, the Dunford-Pettis theorem, and the fundam…
▽ More
This note describes Fatou's lemma and Lebesgue's dominated convergence theorem for a sequence of measures converging weakly to a finite measure and for a sequence of functions whose negative parts are uniformly integrable with respect to these measures. The note also provides new formulations of uniform Fatou's lemma, uniform Lebesgue convergence theorem, the Dunford-Pettis theorem, and the fundamental theorem for Young measures based on the equivalence of uniform integrability and the apparently weaker property of asymptotic uniform integrability for sequences of functions and finite measures.
△ Less
Submitted 27 March, 2019; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Sufficiency of Deterministic Policies for Atomless Discounted and Uniformly Absorbing MDPs with Multiple Criteria
Authors:
Eugene A. Feinberg,
Aleksey B. Piunovskiy
Abstract:
This paper studies Markov Decision Processes (MDPs) with atomless initial state distributions and atomless transition probabilities. Such MDPs are called atomless. The initial state distribution is considered to be fixed. We show that for discounted MDPs with bounded one-step reward vector-functions, for each policy there exists a deterministic (that is, nonrandomized and stationary) policy with t…
▽ More
This paper studies Markov Decision Processes (MDPs) with atomless initial state distributions and atomless transition probabilities. Such MDPs are called atomless. The initial state distribution is considered to be fixed. We show that for discounted MDPs with bounded one-step reward vector-functions, for each policy there exists a deterministic (that is, nonrandomized and stationary) policy with the same performance vector. This fact is proved in the paper for a more general class of uniformly absorbing MDPs with expected total costs, and then it is extended under certain assumptions to MDPs with unbounded rewards. For problems with multiple criteria and constraints, the results of this paper imply that for atomless MDPs studied in this paper it is sufficient to consider only deterministic policies, while without the atomless assumption it is well-known that randomized policies can outperform deterministic ones. We also provide an example of an MDP demonstrating that, if a vector measure is defined on a standard Borel space, then Lyapunov's convexity theorem is a special case of the described results.
△ Less
Submitted 25 October, 2018; v1 submitted 15 June, 2018;
originally announced June 2018.
-
Constrained discounted Markov decision processes with Borel state spaces
Authors:
Eugene A. Feinberg,
Anna Jaśkiewicz,
Andrzej S. Nowak
Abstract:
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper semicontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encounter…
▽ More
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper semicontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of stationary randomized policies. Then, we indicate that if the initial distribution and transition probabilities are non-atomic, then using a general purification result of Feinberg and Piunovskiy, stationary optimal policies can be deterministic. Our main results are illustrated by five examples.
△ Less
Submitted 27 March, 2019; v1 submitted 1 June, 2018;
originally announced June 2018.
-
Binding Pathway of Opiates to $μ$ Opioid Receptors Revealed by Unsupervised Machine Learning
Authors:
Amir Barati Farimani,
Evan N. Feinberg,
Vijay S. Pande
Abstract:
Many important analgesics relieve pain by binding to the $μ$-Opioid Receptor ($μ$OR), which makes the $μ$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $μ$OR are largely unknown. We performed extensive molecular dynamics (MD) simu…
▽ More
Many important analgesics relieve pain by binding to the $μ$-Opioid Receptor ($μ$OR), which makes the $μ$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $μ$OR are largely unknown. We performed extensive molecular dynamics (MD) simulation and analysis to find the selective allosteric binding sites of the $μ$OR and the path opiates take to bind to the orthosteric site. In this study, we predicted that the allosteric site is responsible for the attraction and selection of opiates. Using Markov state models and machine learning, we traced the pathway of opiates in binding to the orthosteric site, the main binding pocket. Our results have important implications in designing novel analgesics.
△ Less
Submitted 22 April, 2018;
originally announced April 2018.
-
Machine Learning Harnesses Molecular Dynamics to Discover New $μ$ Opioid Chemotypes
Authors:
Evan N. Feinberg,
Amir Barati Farimani,
Rajendra Uprety,
Amanda Hunkele,
Gavril W. Pasternak,
Susruta Majumdar,
Vijay S. Pande
Abstract:
Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $μ$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $μOR$ with molecular dynamics simulation and then machine learn ligand-structure rela…
▽ More
Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $μ$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $μOR$ with molecular dynamics simulation and then machine learn ligand-structure relationships to predict opioid ligand function. These artificial intelligence models identified a novel $μ$ opioid chemotype.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
PotentialNet for Molecular Property Prediction
Authors:
Evan N. Feinberg,
Debnil Sur,
Zhenqin Wu,
Brooke E. Husic,
Huanghao Mai,
Yang Li,
Saisai Sun,
Jianyi Yang,
Bharath Ramsundar,
Vijay S. Pande
Abstract:
The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning model…
▽ More
The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor $EF_χ^{(R)}$, to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.
△ Less
Submitted 22 October, 2018; v1 submitted 12 March, 2018;
originally announced March 2018.
-
An example showing that A-lower semi-continuity is essential for minimax continuity theorems
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
Recently Feinberg et al. [arXiv:1609.03990] established results on continuity properties of minimax values and solution sets for a function of two variables depending on a parameter. Such minimax problems appear in games with perfect information, when the second player knows the move of the first one, in turn-based games, and in robust optimization. Some of the results in [arXiv:1609.03990] are pr…
▽ More
Recently Feinberg et al. [arXiv:1609.03990] established results on continuity properties of minimax values and solution sets for a function of two variables depending on a parameter. Such minimax problems appear in games with perfect information, when the second player knows the move of the first one, in turn-based games, and in robust optimization. Some of the results in [arXiv:1609.03990] are proved under the assumption that the multifunction, defining the domains of the second variable, is $A$-lower semi-continuous. The $A$-lower semi-continuity property is stronger than lower semi-continuity, but in several important cases these properties coincide. This note provides an example demonstrating that in general the $A$-lower semi-continuity assumption cannot be relaxed to lower semi-continuity.
△ Less
Submitted 10 February, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs
Authors:
Eugene A. Feinberg,
Jefferson Huang
Abstract:
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also pr…
▽ More
This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
Stochastic Setup-Cost Inventory Model with Backorders and Quasiconvex Cost Functions
Authors:
Eugene A. Feinberg,
Yan Liang
Abstract:
In this paper we study a periodic-review single-commodity setup-cost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and average-cost problems and to the existence of optimal $(s,S)$ policies. I…
▽ More
In this paper we study a periodic-review single-commodity setup-cost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and average-cost problems and to the existence of optimal $(s,S)$ policies. In particular, we prove the equicontinuity of the family of discounted value functions and the convergence of optimal discounted lower thresholds to the optimal average-cost one for some sequences of discount factors converging to $1.$ If an arbitrary nonnegative amount of inventory can be ordered, we establish stronger convergence properties: (i) the optimal discounted lower thresholds $s_α$ converge to optimal average-cost lower threshold $s;$ and (ii) the discounted relative value functions converge to average-cost relative value function. These convergence results previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper hold for problems with deterministic positive lead times.
△ Less
Submitted 7 November, 2017; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Solutions for Zero-Sum Two-Player Games with Noncompact Decision Sets and Unbounded Payoffs
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper provides sufficient conditions for the existence of solutions for two-person zero-sum games with inf/sup-compact payoff functions and with possibly noncompact decision sets for both players. Payoff functions may be unbounded, and we do not assume any convexity/concavity-type conditions. For such games expected payoff may not exist for some pairs of strategies. The results of this paper…
▽ More
This paper provides sufficient conditions for the existence of solutions for two-person zero-sum games with inf/sup-compact payoff functions and with possibly noncompact decision sets for both players. Payoff functions may be unbounded, and we do not assume any convexity/concavity-type conditions. For such games expected payoff may not exist for some pairs of strategies. The results of this paper imply several classic facts. The paper also provides sufficient conditions for the existence of a value and solutions for each player. The results of this paper are illustrated with the number guessing game.
△ Less
Submitted 20 December, 2021; v1 submitted 14 April, 2017;
originally announced April 2017.
-
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Authors:
Joseph Gomes,
Bharath Ramsundar,
Evan N. Feinberg,
Vijay S. Pande
Abstract:
Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or f…
▽ More
Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
Irradiation of Materials with Short, Intense Ion pulses at NDCX-II
Authors:
P. A. Seidl,
Q. Ji,
A. Persaud,
E. Feinberg,
B. Ludewigt,
M. Silverman,
A. Sulyman,
W. L. Waldron,
T. Schenkel,
J. J. Barnard,
A. Friedman,
D. P. Grote,
E. P. Gilson,
I. D. Kaganovich,
A. D. Stepanov,
F. Treffert,
M. Zimmer
Abstract:
We present an overview of the performance of the Neutralized Drift Compression Experiment-II (NDCX-II) accelerator at Berkeley Lab, and report on recent target experiments on beam driven melting and transmission ion energy loss measurements with nanosecond and millimeter-scale ion beam pulses and thin tin foils. Bunches with around 10^11 ions, 1-mm radius, and 2-30 ns FWHM duration have been creat…
▽ More
We present an overview of the performance of the Neutralized Drift Compression Experiment-II (NDCX-II) accelerator at Berkeley Lab, and report on recent target experiments on beam driven melting and transmission ion energy loss measurements with nanosecond and millimeter-scale ion beam pulses and thin tin foils. Bunches with around 10^11 ions, 1-mm radius, and 2-30 ns FWHM duration have been created with corresponding fluences in the range of 0.1 to 0.7 J/cm^2. To achieve these short pulse durations and mm-scale focal spot radii, the 1.1 MeV He+ ion beam is neutralized in a drift compression section, which removes the space charge defocusing effect during final compression and focusing. The beam space charge and drift compression techniques resemble necessary beam conditions and manipulations in heavy ion inertial fusion accelerators. Quantitative comparison of detailed particle-in-cell simulations with the experiment play an important role in optimizing accelerator performance.
△ Less
Submitted 12 April, 2017; v1 submitted 16 March, 2017;
originally announced March 2017.
-
MoleculeNet: A Benchmark for Molecular Machine Learning
Authors:
Zhenqin Wu,
Bharath Ramsundar,
Evan N. Feinberg,
Joseph Gomes,
Caleb Geniesse,
Aneesh S. Pappu,
Karl Leswing,
Vijay Pande
Abstract:
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are be…
▽ More
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
△ Less
Submitted 25 October, 2018; v1 submitted 1 March, 2017;
originally announced March 2017.
-
Staging of RF-accelerating units in a MEMS-based ion accelerator
Authors:
A. Persaud,
P. A. Seidl,
Q. Ji,
E. Feinberg,
W. L. Waldron,
T. Schenkel,
S. Ardanuc,
K. B. Vinayakumar,
A. Lal
Abstract:
Multiple Electrostatic Quadrupole Array Linear Accelerators (MEQALACs) provide an opportunity to realize compact radio-frequency (RF) accelerator structures that can deliver very high beam currents. MEQALACs have been previously realized with acceleration gap distances and beam aperture sizes of the order of centimeters. Through advances in Micro-Electro-Mechanical Systems (MEMS) fabrication, MEQA…
▽ More
Multiple Electrostatic Quadrupole Array Linear Accelerators (MEQALACs) provide an opportunity to realize compact radio-frequency (RF) accelerator structures that can deliver very high beam currents. MEQALACs have been previously realized with acceleration gap distances and beam aperture sizes of the order of centimeters. Through advances in Micro-Electro-Mechanical Systems (MEMS) fabrication, MEQALACs can now be scaled down to the sub-millimeter regime and batch processed on wafer substrates. In this paper, we show first results from using three RF stages in a compact MEMS-based ion accelerator. The results presented show proof-of-concept with accelerator structures formed from printed circuit boards using a 3x3 beamlet arrangement and noble gas ions at 10 keV. We present a simple model to describe the measured results. The model is then used to examine some of the aspects of this approach, such as possible effects of alignment errors. We also discuss some of the scaling behaviour of a compact MEQALAC. The MEMS-based approach enables a low-cost, highly versatile accelerator covering a wide range of beam energies and currents. Applications include ion-beam analysis, mass spectrometry, materials processing, and at very high beam powers, plasma heating.
△ Less
Submitted 31 October, 2017; v1 submitted 1 February, 2017;
originally announced February 2017.
-
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure
Authors:
A. Persaud,
Q. Ji,
E. Feinberg,
P. A. Seidl,
W. L. Waldron,
A. Lal,
K. B. Vinayakumar,
S. Ardanuc,
D. A. Hammer,
T. Schenkel
Abstract:
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using…
▽ More
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using printed circuits board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using PCB. In this paper, we present proof of concept results of the principal components using PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
△ Less
Submitted 29 June, 2017; v1 submitted 30 October, 2016;
originally announced October 2016.
-
Recent Experiments At Ndcx-II: Irradiation Of Materials Using Short, Intense Ion Beams
Authors:
P. A. Seidl,
Q. Ji,
A. Persaud,
E. Feinberg,
B. Ludewigt,
M. Silverman,
A. Sulyman,
W. L. Waldron,
T. Schenkel,
J. J. Barnard,
A. Friedman,
D. P. Grote,
E. P. Gilson,
I. D. Kaganovich,
A. Stepanov,
F. Treffert,
M. Zimmer
Abstract:
We present an overview of the performance of the Neutralized Drift Compression Experiment-II (NDCX-II) accelerator at Berkeley Lab, and summarize recent studies of material properties created with nanosecond and millimeter-scale ion beam pulses. The scientific topics being explored include the dynamics of ion induced damage in materials, materials synthesis far from equilibrium, warm dense matter…
▽ More
We present an overview of the performance of the Neutralized Drift Compression Experiment-II (NDCX-II) accelerator at Berkeley Lab, and summarize recent studies of material properties created with nanosecond and millimeter-scale ion beam pulses. The scientific topics being explored include the dynamics of ion induced damage in materials, materials synthesis far from equilibrium, warm dense matter and intense beam-plasma physics. We summarize the improved accelerator performance, diagnostics and results of beam-induced irradiation of thin samples of, e.g., tin and silicon. Bunches with over 3x10^10 ions, 1- mm radius, and 2-30 ns FWHM duration have been created. To achieve these short pulse durations and mm-scale focal spot radii, the 1.2 MeV He+ ion beam is neutralized in a drift compression section which removes the space charge defocusing effect during final compression and focusing. Quantitative comparison of detailed particle-in-cell simulations with the experiment play an important role in optimizing accelerator performance; these keep pace with the accelerator repetition rate of ~1/minute.
△ Less
Submitted 17 October, 2016;
originally announced October 2016.
-
On the Optimality Equation for Average Cost Markov Decision Processes and its Validity for Inventory Control
Authors:
Eugene A. Feinberg,
Yan Liang
Abstract:
As is well known, average-cost optimality inequalities imply the existence of stationary optimal policies for Markov Decision Processes with average costs per unit time, and these inequalities hold under broad natural conditions. This paper provides sufficient conditions for the validity of the average-cost optimality equation for an infinite state problem with weakly continuous transition probabi…
▽ More
As is well known, average-cost optimality inequalities imply the existence of stationary optimal policies for Markov Decision Processes with average costs per unit time, and these inequalities hold under broad natural conditions. This paper provides sufficient conditions for the validity of the average-cost optimality equation for an infinite state problem with weakly continuous transition probabilities and with possibly unbounded one-step costs and noncompact action sets. These conditions also imply the convergence of sequences of discounted relative value functions to average-cost relative value functions and the continuity of average-cost relative value functions. As shown in the paper, the classic periodic-review inventory control problem satisfies these conditions. Therefore, the optimality inequality holds in the form of an equality with a continuous average-cost relative value function for this problem. In addition, the $K$-convexity of discounted relative value functions and their convergence to average-cost relative value functions, when the discount factor increases to 1, imply the $K$-convexity of average-cost relative value functions. This implies that average-cost optimal $(s,S)$ policies for the inventory control problem can be derived from the average-cost optimality equation.
△ Less
Submitted 2 October, 2016; v1 submitted 27 September, 2016;
originally announced September 2016.
-
Continuity of Equilibria for Two-Person Zero-Sum Games with Noncompact Action Sets and Unbounded Payoffs
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper extends Berge's maximum theorem for possibly noncompact action sets and unbounded cost functions to minimax problems and studies applications of these extensions to two-player zero-sum games with possibly noncompact action sets and unbounded payoffs. For games with perfect information, also known under the name of turn-based games, this paper establishes continuity properties of value f…
▽ More
This paper extends Berge's maximum theorem for possibly noncompact action sets and unbounded cost functions to minimax problems and studies applications of these extensions to two-player zero-sum games with possibly noncompact action sets and unbounded payoffs. For games with perfect information, also known under the name of turn-based games, this paper establishes continuity properties of value functions and solution multifunctions. For games with simultaneous moves, it provides results on the existence of lopsided values (the values in the asymmetric form) and solutions. This paper also establishes continuity properties of the lopsided values and solution multifunctions.
△ Less
Submitted 12 September, 2017; v1 submitted 13 September, 2016;
originally announced September 2016.
-
Structure of Optimal Solutions to Periodic-Review Total-Cost Inventory Control Models with Convex Costs and Backorders for all Values of Discount Factors
Authors:
Eugene A. Feinberg,
Yan Liang
Abstract:
This paper describes the structure of optimal policies for discounted periodic-review single-commodity total-cost inventory control problems with fixed ordering costs for finite and infinite horizons. There are known conditions in the literature for optimality of $(s_t,S_t)$ policies for finite-horizon problems and the optimality of $(s,S)$ policies for infinite-horizon problems. The results of th…
▽ More
This paper describes the structure of optimal policies for discounted periodic-review single-commodity total-cost inventory control problems with fixed ordering costs for finite and infinite horizons. There are known conditions in the literature for optimality of $(s_t,S_t)$ policies for finite-horizon problems and the optimality of $(s,S)$ policies for infinite-horizon problems. The results of this paper cover the situation, when such assumption may not hold. This paper describes a parameter, which, together with the value of the discount factor and the horizon length, defines the structure of an optimal policy. For the infinite horizon, depending on the values of this parameter and the discount factor, an optimal policy either is an $(s,S)$ policy or never orders inventory. For a finite horizon, depending on the values of this parameter, the discount factor, and the horizon length, there are three possible structures of an optimal policy: (i) it is an $(s_t,S_t)$ policy, (ii) it is an $(s_t,S_t)$ policy at earlier stages and then does not order inventory, or (iii) it never orders inventory. The paper also establishes continuity of optimal value functions and describes alternative optimal actions at states $s_t$ and $s.$
△ Less
Submitted 28 May, 2017; v1 submitted 13 September, 2016;
originally announced September 2016.
-
Optimality Conditions for Inventory Control
Authors:
Eugene A. Feinberg
Abstract:
This tutorial describes recently developed general optimality conditions for Markov Decision Processes that have significant applications to inventory control. In particular, these conditions imply the validity of optimality equations and inequalities. They also imply the convergence of value iteration algorithms. For total discounted-cost problems only two mild conditions on the continuity of tra…
▽ More
This tutorial describes recently developed general optimality conditions for Markov Decision Processes that have significant applications to inventory control. In particular, these conditions imply the validity of optimality equations and inequalities. They also imply the convergence of value iteration algorithms. For total discounted-cost problems only two mild conditions on the continuity of transition probabilities and lower semi-continuity of one-step costs are needed. For average-cost problems, a single additional assumption on the finiteness of relative values is required. The general results are applied to periodic-review inventory control problems with discounted and average-cost criteria without any assumptions on demand distributions. The case of partially observable states is also discussed.
△ Less
Submitted 2 June, 2016;
originally announced June 2016.
-
Kolmogorov's Equations for Jump Markov Processes with Unbounded Jump Rates
Authors:
Eugene A. Feinberg,
Manasa Mandava,
Albert N. Shiryaev
Abstract:
As well-known, transition probabilities of jump Markov processes satisfy Kolmogorov's backward and forward equations. In the seminal 1940 paper, William Feller investigated solutions of Kolmogorov's equations for jump Markov processes. Recently the authors solved the problem studied by Feller and showed that the minimal solution of Kolmogorov's backward and forward equations is the transition prob…
▽ More
As well-known, transition probabilities of jump Markov processes satisfy Kolmogorov's backward and forward equations. In the seminal 1940 paper, William Feller investigated solutions of Kolmogorov's equations for jump Markov processes. Recently the authors solved the problem studied by Feller and showed that the minimal solution of Kolmogorov's backward and forward equations is the transition probability of the corresponding jump Markov process if the transition rate at each state is bounded. This paper presents more general results. For Kolmogorov's backward equation, the sufficient condition for the described property of the minimal solution is that the transition rate at each state is locally integrable, and for Kolmogorov's forward equation the corresponding sufficient condition is that the transition rate at each state is locally bounded.
△ Less
Submitted 6 December, 2016; v1 submitted 7 March, 2016;
originally announced March 2016.
-
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of $(s,S)$ Inventory Policies
Authors:
Eugene A. Feinberg,
Mark E. Lewis
Abstract:
This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs w…
▽ More
This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs with possibly noncompact action sets and unbounded cost functions: (i) convergence of value iterations to optimal values for discounted problems with possibly non-zero terminal costs, (ii) convergence of optimal finite-horizon actions to optimal infinite-horizon actions for total discounted costs, as the time horizon tends to infinity, and (iii) convergence of optimal discount-cost actions to optimal average-cost actions for infinite-horizon problems, as the discount factor tends to 1.
Being applied to the setup-cost inventory control problem, the general results on MDPs imply the optimality of $(s,S)$ policies and convergence properties of optimal thresholds. In particular this paper analyzes the setup-cost inventory control problem without two assumptions often used in the literature: (a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of backordered inventory if the amount of backordered inventory is large.
△ Less
Submitted 20 March, 2017; v1 submitted 17 July, 2015;
originally announced July 2015.
-
On the Reduction of Total-Cost and Average-Cost MDPs to Discounted MDPs
Authors:
Eugene A. Feinberg,
Jefferson Huang
Abstract:
This paper provides conditions under which total-cost and average-cost Markov decision processes (MDPs) can be reduced to discounted ones. Results are given for transient total-cost MDPs with tran- sition rates whose values may be greater than one, as well as for average-cost MDPs with transition probabilities satisfying the condition that there is a state such that the expected time to reach it i…
▽ More
This paper provides conditions under which total-cost and average-cost Markov decision processes (MDPs) can be reduced to discounted ones. Results are given for transient total-cost MDPs with tran- sition rates whose values may be greater than one, as well as for average-cost MDPs with transition probabilities satisfying the condition that there is a state such that the expected time to reach it is uniformly bounded for all initial states and stationary policies. In particular, these reductions imply sufficient conditions for the validity of optimality equations and the existence of stationary optimal poli- cies for MDPs with undiscounted total cost and average-cost criteria. When the state and action sets are finite, these reductions lead to linear programming formulations and complexity estimates for MDPs under the aforementioned criteria.
△ Less
Submitted 3 May, 2017; v1 submitted 2 July, 2015;
originally announced July 2015.
-
Uniform Fatou's Lemma
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
Fatou's lemma is a classic fact in real analysis that states that the limit inferior of integrals of functions is greater than or equal to the integral of the inferior limit. This paper introduces a stronger inequality that holds uniformly for integrals on measurable subsets of a measurable space. The necessary and sufficient condition, under which this inequality holds for a sequence of finite me…
▽ More
Fatou's lemma is a classic fact in real analysis that states that the limit inferior of integrals of functions is greater than or equal to the integral of the inferior limit. This paper introduces a stronger inequality that holds uniformly for integrals on measurable subsets of a measurable space. The necessary and sufficient condition, under which this inequality holds for a sequence of finite measures converging in total variation, is provided. This statement is called the uniform Fatou's lemma, and it holds under the minor assumption that all the integrals are well-defined. The uniform Fatou's lemma improves the classic Fatou's lemma in the following directions: the uniform Fatou's lemma states a more precise inequality, it provides the necessary and sufficient condition, and it deals with variable measures. Various corollaries of the uniform Fatou's lemma are formulated. The examples in this paper demonstrate that: (a) the uniform Fatou's lemma may indeed provide a more accurate inequality than the classic Fatou's lemma; (b) the uniform Fatou's lemma does not hold if convergence of measures in total variation is relaxed to setwise convergence.
△ Less
Submitted 7 April, 2015;
originally announced April 2015.
-
Continuity of Minima: Local Results
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov
Abstract:
This paper compares and generalizes Berge's maximum theorem for noncompact image sets established in Feinberg, Kasyanov and Voorneveld (2014) and the local maximum theorem established in Bonnans and Shapiro (2000).
This paper compares and generalizes Berge's maximum theorem for noncompact image sets established in Feinberg, Kasyanov and Voorneveld (2014) and the local maximum theorem established in Bonnans and Shapiro (2000).
△ Less
Submitted 6 August, 2014;
originally announced August 2014.
-
Convergence of Probability Measures and Markov Decision Models with Incomplete Information
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper deals with three major types of convergence of probability measures on metric spaces: weak convergence, setwise converges, and convergence in the total variation. First, it describes and compares necessary and sufficient conditions for these types of convergence, some of which are well-known, in terms of convergence of probabilities of open and closed sets and, for the probabilities on…
▽ More
This paper deals with three major types of convergence of probability measures on metric spaces: weak convergence, setwise converges, and convergence in the total variation. First, it describes and compares necessary and sufficient conditions for these types of convergence, some of which are well-known, in terms of convergence of probabilities of open and closed sets and, for the probabilities on the real line, in terms of convergence of distribution functions. Second, it provides % convenient criteria for weak and setwise convergence of probability measures and continuity of stochastic kernels in terms of convergence of probabilities defined on the base of the topology generated by the metric. Third, it provides applications to control of Partially Observable Markov Decision Processes and, in particular, to Markov Decision Models with incomplete information.
△ Less
Submitted 3 July, 2014;
originally announced July 2014.
-
Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Michael Z. Zgurovsky
Abstract:
This paper describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs. Action sets may not be compact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semi-continu…
▽ More
This paper describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs. Action sets may not be compact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semi-continuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to Completely Observable Markov Decision Processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above mentioned optimality properties for COMDPs. The central question is whether transition probabilities for a COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov Decision Process are weakly continuous and observation probabilities for the POMDP are continuous in the total variation. Moreover, the continuity in the total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.
△ Less
Submitted 1 July, 2014; v1 submitted 9 January, 2014;
originally announced January 2014.
-
The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming
Authors:
Eugene A. Feinberg,
Jefferson Huang
Abstract:
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterati…
▽ More
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.
△ Less
Submitted 19 December, 2013;
originally announced December 2013.
-
Examples Concerning Abelian and Cesaro Limits
Authors:
Christopher J. Bishop,
Eugene A. Feinberg,
Junyu Zhang
Abstract:
This note provides examples of all possible equality and strict inequality relations between upper and lower Abelian and Cesaro limits of sequences bounded above or below.
This note provides examples of all possible equality and strict inequality relations between upper and lower Abelian and Cesaro limits of sequences bounded above or below.
△ Less
Submitted 1 July, 2014; v1 submitted 4 October, 2013;
originally announced October 2013.
-
Berge's Maximum Theorem for Noncompact Image Sets
Authors:
Eugene A. Feinberg,
Pavlo O. Kasyanov,
Mark Voorneveld
Abstract:
This note generalizes Berge's maximum theorem to noncompact image sets. It is also clarifies the results from E.A. Feinberg, P.O. Kasyanov, N.V. Zadoianchuk, "Berge's theorem for noncompact image sets," J. Math. Anal. Appl. 397(1)(2013), pp. 255-259 on the extension to noncompact image sets of another Berge's theorem, that states semi-continuity of value functions. Here we explain that the notion…
▽ More
This note generalizes Berge's maximum theorem to noncompact image sets. It is also clarifies the results from E.A. Feinberg, P.O. Kasyanov, N.V. Zadoianchuk, "Berge's theorem for noncompact image sets," J. Math. Anal. Appl. 397(1)(2013), pp. 255-259 on the extension to noncompact image sets of another Berge's theorem, that states semi-continuity of value functions. Here we explain that the notion of a $\K$-inf-compact function introduced there is applicable to metrizable topological spaces and to more general compactly generated topological spaces. For Hausdorff topological spaces we introduce the notion of a $\K\N$-inf-compact function ($\N$ stands for "nets" in $\K$-inf-compactness), which coincides with $\K$-inf-compactness for compactly generated and, in particular, for metrizable topological spaces.
△ Less
Submitted 29 September, 2013;
originally announced September 2013.
-
Optimal Switching On and Off the Entire Service Capacity of a Parallel Queue
Authors:
Eugene Feinberg,
Xiaoxuan Zhang
Abstract:
This paper studies optimal switching on and o? of the entire service capacity of an M/M/Infinity queue with holding, running and switching costs where the running costs depend only on whether the system is running or not. The goal is to minimize average costs per unit time. The main result is that an average-optimal policy either always runs the system or is an (M, N)-policy defined by two thresho…
▽ More
This paper studies optimal switching on and o? of the entire service capacity of an M/M/Infinity queue with holding, running and switching costs where the running costs depend only on whether the system is running or not. The goal is to minimize average costs per unit time. The main result is that an average-optimal policy either always runs the system or is an (M, N)-policy defined by two thresholds M and N, such that the system is switched on upon an arrival epoch when the system size accumulates to N and is switched off upon a departure epoch when the system size decreases to M. It is shown that this optimization problem can be reduced to a problem with a finite number of states and actions, and an average-optimal policy can be computed via linear programming. An example, in which the optimal (M, N)-policy outperforms the best (0, N)-policy, is provided. Thus, unlike the case of single-server queues studied in the literature, (0, N)-policies may not be average-optimal.
△ Less
Submitted 26 November, 2013; v1 submitted 31 July, 2013;
originally announced August 2013.