Search | arXiv e-print repository

Adaptive Data Depth via Multi-Armed Bandits

Authors: Tavor Z. Baharav, Tze Leung Lai

Abstract: Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often h… ▽ More Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $α<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $\tilde{O}(n^{d-(d-1)α/2})$, where $\tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods. △ Less

Submitted 9 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: Keywords: multi-armed bandits, data depth, adaptivity, large-scale computation, simplicial depth

arXiv:1402.2550 [pdf, ps, other]

A New Approach to Designing Phase I-II Cancer Trials for Cytotoxic Chemotherapies

Authors: Jay Bartroff, Tze Leung Lai, Balasubramanian Narasimhan

Abstract: Recently there has been much work on early phase cancer designs that incorporate both toxicity and efficacy data, called Phase I-II designs because they combine elements of both phases. However, they do not explicitly address the Phase II hypothesis test of $H_0: p\le p_0$, where $p$ is the probability of efficacy at the estimated maximum tolerated dose (MTD) $\widehatη$ from Phase I and $p_0$ is… ▽ More Recently there has been much work on early phase cancer designs that incorporate both toxicity and efficacy data, called Phase I-II designs because they combine elements of both phases. However, they do not explicitly address the Phase II hypothesis test of $H_0: p\le p_0$, where $p$ is the probability of efficacy at the estimated maximum tolerated dose (MTD) $\widehatη$ from Phase I and $p_0$ is the baseline efficacy rate. Standard practice for Phase II remains to treat $p$ as a fixed, unknown parameter and to use Simon's 2-stage design with all patients dosed at $\widehatη$. We propose a Phase I-II design that addresses the uncertainty in the estimate $p=p(\widehatη)$ in $H_0$ by using sequential generalized likelihood theory. Combining this with a Phase I design that incorporates efficacy data, the Phase I-II design provides a common framework that can be used all the way from the first dose of Phase I through the final accept/reject decision about $H_0$ at the end of Phase II, utilizing both toxicity and efficacy data throughout. Efficient group sequential testing is used in Phase II that allows for early stopping to show treatment effect or futility. The proposed Phase I-II design thus removes the artificial barrier between Phase I and Phase II, and fulfills the objectives of searching for the MTD and testing if the treatment has an acceptable response rate to enter into a Phase III trial. △ Less

Submitted 11 February, 2014; originally announced February 2014.

arXiv:1108.1223 [pdf, ps, other]

Incorporating Individual and Collective Ethics into Phase I Cancer Trial Designs

Authors: Jay Bartroff, Tze Leung Lai

Abstract: A general framework is proposed for Bayesian model-based designs of Phase I cancer trials, in which a general criterion for coherence (Cheung, 2005) of a design is also developed. This framework can incorporate both "individual" and "collective" ethics into the design of the trial. We propose a new design which minimizes a risk function composed of two terms, with one representing the individual r… ▽ More A general framework is proposed for Bayesian model-based designs of Phase I cancer trials, in which a general criterion for coherence (Cheung, 2005) of a design is also developed. This framework can incorporate both "individual" and "collective" ethics into the design of the trial. We propose a new design which minimizes a risk function composed of two terms, with one representing the individual risk of the current dose and the other representing the collective risk. The performance of this design, which is measured in terms of the accuracy of the estimated target dose at the end of the trial, the toxicity and overdose rates, and certain loss functions reflecting the individual and collective ethics, is studied and compared with existing Bayesian model-based designs and is shown to have better performance than existing designs. △ Less

Submitted 4 August, 2011; originally announced August 2011.

Journal ref: Biometrics 67 (2011) p. 596-603

arXiv:1108.0996 [pdf, ps, other]

doi 10.1214/10-AOAS422

Mean--variance portfolio optimization when means and covariances are unknown

Authors: Tze Leung Lai, Haipeng Xing, Zehao Chen

Abstract: Markowitz's celebrated mean--variance portfolio optimization theory assumes that the means and covariances of the underlying asset returns are known. In practice, they are unknown and have to be estimated from historical data. Plugging the estimates into the efficient frontier that assumes known parameters has led to portfolios that may perform poorly and have counter-intuitive asset allocation we… ▽ More Markowitz's celebrated mean--variance portfolio optimization theory assumes that the means and covariances of the underlying asset returns are known. In practice, they are unknown and have to be estimated from historical data. Plugging the estimates into the efficient frontier that assumes known parameters has led to portfolios that may perform poorly and have counter-intuitive asset allocation weights; this has been referred to as the "Markowitz optimization enigma." After reviewing different approaches in the literature to address these difficulties, we explain the root cause of the enigma and propose a new approach to resolve it. Not only is the new approach shown to provide substantial improvements over previous methods, but it also allows flexible modeling to incorporate dynamic features and fundamental analysis of the training sample of historical data, as illustrated in simulation and empirical studies. △ Less

Submitted 4 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS422 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS422

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2A, 798-823

arXiv:1107.1919 [pdf, ps, other]

Multistage tests of multiple hypotheses

Authors: Jay Bartroff, Tze Leung Lai

Abstract: Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses whi… ▽ More Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses while preserving the family-wise error rate, and extends Holm's (1979) step-down procedure to the sequential setting, yielding substantial savings in sample size with small loss in power. △ Less

Submitted 10 July, 2011; originally announced July 2011.

arXiv:1106.2559 [pdf, other]

doi 10.1007/s11336-007-9053-9

Modern Sequential Analysis and its Applications to Computerized Adaptive Testing

Authors: Jay Bartroff, Matthew Finkelman, Tze Leung Lai

Abstract: After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adap… ▽ More After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests. △ Less

Submitted 13 June, 2011; originally announced June 2011.

Journal ref: Psychometrika 73 (2008) 473-486

arXiv:1105.4667 [pdf, ps, other]

Generalized Likelihood Ratio Statistics and Uncertainty Adjustments in Efficient Adaptive Design of Clinical Trials

Authors: Jay Bartroff, Tze Leung Lai

Abstract: A new approach to adaptive design of clinical trials is proposed in a general multiparameter exponential family setting, based on generalized likelihood ratio statistics and optimal sequential testing theory. These designs are easy to implement, maintain the prescribed Type I error probability, and are asymptotically efficient. Practical issues involved in clinical trials allowing mid-course adapt… ▽ More A new approach to adaptive design of clinical trials is proposed in a general multiparameter exponential family setting, based on generalized likelihood ratio statistics and optimal sequential testing theory. These designs are easy to implement, maintain the prescribed Type I error probability, and are asymptotically efficient. Practical issues involved in clinical trials allowing mid-course adaptation and the large literature on this subject are discussed, and comparisons between the proposed and existing designs are presented in extensive simulation studies of their finite-sample performance, measured in terms of the expected sample size and power functions. △ Less

Submitted 23 May, 2011; originally announced May 2011.

MSC Class: 62L10; 62F03; 62P10

arXiv:1105.3280 [pdf, ps, other]

Efficient adaptive designs with mid-course sample size adjustment in clinical trials

Authors: Jay Bartroff, Tze Leung Lai

Abstract: Adaptive designs have been proposed for clinical trials in which the nuisance parameters or alternative of interest are unknown or likely to be misspecified before the trial. Whereas most previous works on adaptive designs and mid-course sample size re-estimation have focused on two-stage or group sequential designs in the normal case, we consider here a new approach that involves at most three st… ▽ More Adaptive designs have been proposed for clinical trials in which the nuisance parameters or alternative of interest are unknown or likely to be misspecified before the trial. Whereas most previous works on adaptive designs and mid-course sample size re-estimation have focused on two-stage or group sequential designs in the normal case, we consider here a new approach that involves at most three stages and is developed in the general framework of multiparameter exponential families. Not only does this approach maintain the prescribed type I error probability, but it also provides a simple but asymptotically efficient sequential test whose finite-sample performance, measured in terms of the expected sample size and power functions, is shown to be comparable to the optimal sequential design, determined by dynamic programming, in the simplified normal mean case with known variance and prespecified alternative, and superior to the existing two-stage designs and also to adaptive group sequential designs when the alternative or nuisance parameters are unknown or misspecified. △ Less

Submitted 17 May, 2011; originally announced May 2011.

MSC Class: 62L10; 62F03; 62P10

arXiv:1011.6509 [pdf, ps, other]

doi 10.1214/10-STS317

Approximate Dynamic Programming and Its Applications to the Design of Phase I Cancer Trials

Authors: Jay Bartroff, Tze Leung Lai

Abstract: Optimal design of a Phase I cancer trial can be formulated as a stochastic optimization problem. By making use of recent advances in approximate dynamic programming to tackle the problem, we develop an approximation of the Bayesian optimal design. The resulting design is a convex combination of a "treatment" design, such as Babb et al.'s (1998) escalation with overdose control, and a "learning" de… ▽ More Optimal design of a Phase I cancer trial can be formulated as a stochastic optimization problem. By making use of recent advances in approximate dynamic programming to tackle the problem, we develop an approximation of the Bayesian optimal design. The resulting design is a convex combination of a "treatment" design, such as Babb et al.'s (1998) escalation with overdose control, and a "learning" design, such as Haines et al.'s (2003) $c$-optimal design, thus directly addressing the treatment versus experimentation dilemma inherent in Phase I trials and providing a simple and intuitive design for clinical use. Computational details are given and the proposed design is compared to existing designs in a simulation study. The design can also be readily modified to include a first stage that cautiously escalates doses similarly to traditional nonparametric step-up/down schemes, while validating the Bayesian parametric model for the efficient model-based design in the second stage. △ Less

Submitted 30 November, 2010; originally announced November 2010.

Comments: Published in at http://dx.doi.org/10.1214/10-STS317 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS317

Journal ref: Statistical Science 2010, Vol. 25, No. 2, 245-257

Showing 1–9 of 9 results for author: Lai, T L