-
Optimizing Input Data Collection for Ranking and Selection
Authors:
Eunhye Song,
Taeho Kim
Abstract:
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of t…
▽ More
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of the optimum and show that its posterior probability of optimality converges to one at an exponential rate as the sampling budget increases. Assuming that the input parameters belong to a finite set, we characterize the $ε$-optimal static sampling ratios for input and simulation data that maximize the convergence rate. Using these ratios as guidance, we propose the optimal sampling algorithm for R&S (OSAR) that achieves the $ε$-optimal ratios almost surely in the limit. We further extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction. This not only improves OSAR's finite-sample performance, but also lets us tackle the case where the input parameters lie in a continuous space with a strong consistency guarantee for finding the optimum. We numerically demonstrate that OSAR outperforms a state-of-the-art competitor.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Efficient Input Uncertainty Quantification for Ratio Estimator
Authors:
Linyun He,
Ben Feng,
Eunhye Song
Abstract:
We study the construction of a confidence interval (CI) for a simulation output performance measure that accounts for input uncertainty when the input models are estimated from finite data. In particular, we focus on performance measures that can be expressed as a ratio of two dependent simulation outputs' means. We adopt the parametric bootstrap method to mimic input data sampling and construct t…
▽ More
We study the construction of a confidence interval (CI) for a simulation output performance measure that accounts for input uncertainty when the input models are estimated from finite data. In particular, we focus on performance measures that can be expressed as a ratio of two dependent simulation outputs' means. We adopt the parametric bootstrap method to mimic input data sampling and construct the percentile bootstrap CI after estimating the ratio at each bootstrap sample. The standard estimator, which takes the ratio of two sample averages, tends to exhibit large finite-sample bias and variance, leading to overcoverage of the percentile bootstrap CI. To address this, we propose two new ratio estimators that replace the sample averages with pooled mean estimators via the $k$-nearest neighbor ($k$NN) regression: the $k$NN estimator and the $k$LR estimator. The $k$NN estimator performs well in low dimensions but its theoretical performance guarantee degrades as the dimension increases. The $k$LR estimator combines the likelihood ratio (LR) method with the $k$NN regression, leveraging the strengths of both while mitigating their weaknesses; the LR method removes dependence on dimension, while the variance inflation introduced by the LR is controlled by $k$NN. Based on asymptotic analyses and finite-sample heuristics, we propose an experiment design that maximizes the efficiency of the proposed estimators and demonstrate their empirical performances using three examples including one in the enterprise risk management application.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Selection of the Most Probable Best
Authors:
Taeho Kim,
Kyoung-kuk Kim,
Eunhye Song
Abstract:
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB…
▽ More
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.
△ Less
Submitted 20 April, 2024; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Sequential Bayesian Risk Set Inference for Robust Discrete Optimization via Simulation
Authors:
Eunhye Song
Abstract:
Optimization via simulation (OvS) procedures that assume the simulation inputs are generated from the real-world distributions are subject to the risk of selecting a suboptimal solution when the distributions are substituted with input models estimated from finite real-world data -- known as input model risk. Focusing on discrete OvS, this paper proposes a new Bayesian framework for analyzing inpu…
▽ More
Optimization via simulation (OvS) procedures that assume the simulation inputs are generated from the real-world distributions are subject to the risk of selecting a suboptimal solution when the distributions are substituted with input models estimated from finite real-world data -- known as input model risk. Focusing on discrete OvS, this paper proposes a new Bayesian framework for analyzing input model risk of implementing an arbitrary solution, $x$, where uncertainty about the input models is captured by a posterior distribution. We define the $α$-level risk set of solution $x$ as the set of solutions whose expected performance is better than $x$ by a practically meaningful margin $(>δ)$ given common input models with significant probability ($>α$) under the posterior distribution. The user-specified parameters, $δ$ and $α$, control robustness of the procedure to the desired level as well as guards against unnecessary conservatism. An empty risk set implies that there is no practically better solution than $x$ with significant probability even though the real-world input distributions are unknown. For efficient estimation of the risk set, the conditional mean performance of a solution given a set of input distributions is modeled as a Gaussian process (GP) that takes the solution-distributions pair as an input. In particular, our GP model allows both parametric and nonparametric input models. We propose the sequential risk set inference procedure that estimates the risk set and selects the next solution-distributions pair to simulate using the posterior GP at each iteration. We show that simulating the pair expected to change the risk set estimate the most in the next iteration is the asymptotic one-step optimal sampling rule that minimizes the number of incorrectly classified solutions, if the procedure runs without stopping.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Efficient Nested Simulation Experiment Design via the Likelihood Ratio Method
Authors:
Mingbin Ben Feng,
Eunhye Song
Abstract:
In nested simulation literature, a common assumption is that the experimenter can choose the number of outer scenarios to sample. This paper considers the case when the experimenter is given a fixed set of outer scenarios from an external entity. We propose a nested simulation experiment design that pools inner replications from one scenario to estimate another scenario's conditional mean via the…
▽ More
In nested simulation literature, a common assumption is that the experimenter can choose the number of outer scenarios to sample. This paper considers the case when the experimenter is given a fixed set of outer scenarios from an external entity. We propose a nested simulation experiment design that pools inner replications from one scenario to estimate another scenario's conditional mean via the likelihood ratio method. Given the outer scenarios, we decide how many inner replications to run at each outer scenario as well as how to pool the inner replications by solving a bi-level optimization problem that minimizes the total simulation effort. We provide asymptotic analyses on the convergence rates of the performance measure estimators computed from the optimized experiment design. Under some assumptions, the optimized design achieves $\cO(Γ^{-1})$ mean squared error of the estimators given simulation budget $Γ$. Numerical experiments demonstrate that our design outperforms a state-of-the-art design that pools replications via regression.
△ Less
Submitted 13 May, 2024; v1 submitted 30 August, 2020;
originally announced August 2020.
-
A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines
Authors:
Dunbiao Niu,
Chengjing Wang,
Peipei Tang,
Qingsong Wang,
Enbin Song
Abstract:
Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems. However, the numerical difficulties of the SVMs will become severe with the increase of the sample size. Although there exist many solvers for the SVMs, only few of them…
▽ More
Support vector machines (SVMs) are successful modeling and prediction tools with a variety of applications. Previous work has demonstrated the superiority of the SVMs in dealing with the high dimensional, low sample size problems. However, the numerical difficulties of the SVMs will become severe with the increase of the sample size. Although there exist many solvers for the SVMs, only few of them are designed by exploiting the special structures of the SVMs. In this paper, we propose a highly efficient sparse semismooth Newton based augmented Lagrangian method for solving a large-scale convex quadratic programming problem with a linear equality constraint and a simple box constraint, which is generated from the dual problems of the SVMs. By leveraging the primal-dual error bound result, the fast local convergence rate of the augmented Lagrangian method can be guaranteed. Furthermore, by exploiting the second-order sparsity of the problem when using the semismooth Newton method,the algorithm can efficiently solve the aforementioned difficult problems. Finally, numerical comparisons demonstrate that the proposed algorithm outperforms the current state-of-the-art solvers for the large-scale SVMs.
△ Less
Submitted 3 February, 2021; v1 submitted 3 October, 2019;
originally announced October 2019.