-
Constrained Classification and Policy Learning
Authors:
Toru Kitagawa,
Shosei Sakaguchi,
Aleksey Tetenov
Abstract:
Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (cost-…
▽ More
Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (cost-sensitive) classification problem. Consistency of the surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006) crucially relies on the assumption of correct specification, meaning that the specified set of classifiers is rich enough to contain a first-best classifier. This assumption is, however, less credible when the set of classifiers is constrained by interpretability or fairness, leaving the applicability of surrogate loss based algorithms unknown in such second-best scenarios. This paper studies consistency of surrogate loss procedures under a constrained set of classifiers without assuming correct specification. We show that in the setting where the constraint restricts the classifier's prediction set only, hinge losses (i.e., $\ell_1$-support vector machines) are the only surrogate losses that preserve consistency in second-best scenarios. If the constraint additionally restricts the functional form of the classifier, consistency of a surrogate loss approach is not guaranteed even with hinge loss. We therefore characterize conditions for the constrained set of classifiers that can guarantee consistency of hinge risk minimizing classifiers. Exploiting our theoretical results, we develop robust and computationally attractive hinge loss based procedures for a monotone classification problem.
△ Less
Submitted 24 July, 2023; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Statistical Decision Properties of Imprecise Trials Assessing COVID-19 Drugs
Authors:
Charles F. Manski,
Aleksey Tetenov
Abstract:
As the COVID-19 pandemic progresses, researchers are reporting findings of randomized trials comparing standard care with care augmented by experimental drugs. The trials have small sample sizes, so estimates of treatment effects are imprecise. Seeing imprecision, clinicians reading research articles may find it difficult to decide when to treat patients with experimental drugs. Whatever decision…
▽ More
As the COVID-19 pandemic progresses, researchers are reporting findings of randomized trials comparing standard care with care augmented by experimental drugs. The trials have small sample sizes, so estimates of treatment effects are imprecise. Seeing imprecision, clinicians reading research articles may find it difficult to decide when to treat patients with experimental drugs. Whatever decision criterion one uses, there is always some probability that random variation in trial outcomes will lead to prescribing sub-optimal treatments. A conventional practice when comparing standard care and an innovation is to choose the innovation only if the estimated treatment effect is positive and statistically significant. This practice defers to standard care as the status quo. To evaluate decision criteria, we use the concept of near-optimality, which jointly considers the probability and magnitude of decision errors. An appealing decision criterion from this perspective is the empirical success rule, which chooses the treatment with the highest observed average patient outcome in the trial. Considering the design of recent and ongoing COVID-19 trials, we show that the empirical success rule yields treatment results that are much closer to optimal than those generated by prevailing decision criteria based on hypothesis tests.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
Clinical trial design enabling ε-optimal treatment rules
Authors:
Charles F. Manski,
Aleksey Tetenov
Abstract:
Medical research has evolved conventions for choosing sample size in randomized clinical trials that rest on the theory of hypothesis testing. Bayesians have argued that trials should be designed to maximize subjective expected utility in settings of clinical interest. This perspective is compelling given a credible prior distribution on treatment response, but Bayesians have struggled to provide…
▽ More
Medical research has evolved conventions for choosing sample size in randomized clinical trials that rest on the theory of hypothesis testing. Bayesians have argued that trials should be designed to maximize subjective expected utility in settings of clinical interest. This perspective is compelling given a credible prior distribution on treatment response, but Bayesians have struggled to provide guidance on specification of priors. We use the frequentist statistical decision theory of Wald (1950) to study design of trials under ambiguity. We show that ε-optimal rules exist when trials have large enough sample size. An ε-optimal rule has expected welfare within ε of the welfare of the best treatment in every state of nature. Equivalently, it has maximum regret no larger than ε. We consider trials that draw predetermined numbers of subjects at random within groups stratified by covariates and treatments. The principal analytical findings are simple sufficient conditions on sample sizes that ensure existence of ε-optimal treatment rules when outcomes are bounded. These conditions are obtained by application of Hoeffding (1963) large deviations inequalities to evaluate the performance of empirical success rules.
△ Less
Submitted 25 September, 2015;
originally announced September 2015.