-
Targeted tuning of random forests for quantile estimation and prediction intervals
Authors:
Matthew Berkowitz,
Rachel MacKay Altman,
Thomas M. Loughin
Abstract:
We present a novel tuning procedure for random forests (RFs) that improves the accuracy of estimated quantiles and produces valid, relatively narrow prediction intervals. While RFs are typically used to estimate mean responses (conditional on covariates), they can also be used to estimate quantiles by estimating the full distribution of the response. However, standard approaches for building RFs o…
▽ More
We present a novel tuning procedure for random forests (RFs) that improves the accuracy of estimated quantiles and produces valid, relatively narrow prediction intervals. While RFs are typically used to estimate mean responses (conditional on covariates), they can also be used to estimate quantiles by estimating the full distribution of the response. However, standard approaches for building RFs often result in excessively biased quantile estimates. To reduce this bias, our proposed tuning procedure minimizes "quantile coverage loss" (QCL), which we define as the estimated bias of the marginal quantile coverage probability estimate based on the out-of-bag sample. We adapt QCL tuning to handle censored data and demonstrate its use with random survival forests. We show that QCL tuning results in quantile estimates with more accurate coverage probabilities than those achieved using default parameter values or traditional tuning (using MSPE for uncensored data and C-index for censored data), while also reducing the estimated MSE of these coverage probabilities. We discuss how the superior performance of QCL tuning is linked to its alignment with the estimation goal. Finally, we explore the validity and width of prediction intervals created using this method.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
Authors:
Nikola Surjanovic,
Andrew Henrey,
Thomas M. Loughin
Abstract:
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-…
▽ More
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-to-noise ratio. The amount of overall pruning is controlled by adjusting the weight on an information criterion penalty as a tuning parameter, with the standard random forest being a special case of our alpha-trimmed random forest. A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once. In a benchmark suite of 46 example data sets, mean squared prediction error is often substantially lowered by using our pruning algorithm and is never substantially increased compared to a random forest with fully-grown trees at default parameter settings.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Improving the Hosmer-Lemeshow Goodness-of-Fit Test in Large Models with Replicated Trials
Authors:
Nikola Surjanovic,
Thomas M. Loughin
Abstract:
The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type 1 error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates are present in the da…
▽ More
The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type 1 error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates are present in the data. We demonstrate that the generalized version of the HL test by Surjanovic et al. (2020) can offer some protection against this power loss. We conclude with a brief discussion explaining the behaviour of the HL test, along with some guidance on how to choose between the two tests.
△ Less
Submitted 27 October, 2023; v1 submitted 25 February, 2021;
originally announced February 2021.
-
A Generalized Hosmer-Lemeshow Goodness-of-Fit Test for a Family of Generalized Linear Models
Authors:
Nikola Surjanovic,
Richard Lockhart,
Thomas M. Loughin
Abstract:
Generalized linear models (GLMs) are used within a vast number of application domains. However, formal goodness of fit (GOF) tests for the overall fit of the model$-$so-called "global" tests$-$seem to be in wide use only for certain classes of GLMs. In this paper we develop and apply a new global goodness-of-fit test, similar to the well-known and commonly used Hosmer-Lemeshow (HL) test, that can…
▽ More
Generalized linear models (GLMs) are used within a vast number of application domains. However, formal goodness of fit (GOF) tests for the overall fit of the model$-$so-called "global" tests$-$seem to be in wide use only for certain classes of GLMs. In this paper we develop and apply a new global goodness-of-fit test, similar to the well-known and commonly used Hosmer-Lemeshow (HL) test, that can be used with a wide variety of GLMs. The test statistic is a variant of the HL test statistic, but we rigorously derive an asymptotically correct sampling distribution of the test statistic using methods of Stute and Zhu (2002). Our new test is relatively straightforward to implement and interpret. We demonstrate the test on a real data set, and compare the performance of our new test with other global GOF tests for GLMs, finding that our test provides competitive or comparable power in various simulation settings. Our test also avoids the use of kernel-based estimators, used in various GOF tests for regression, thereby avoiding the issues of bandwidth selection and the curse of dimensionality. Since the asymptotic sampling distribution is known, a bootstrap procedure for the calculation of a p-value is also not necessary, and we therefore find that performing our test is computationally efficient.
△ Less
Submitted 25 February, 2021; v1 submitted 21 July, 2020;
originally announced July 2020.
-
A Comparison of Methods for Identifying Location Effects in Unreplicated Fractional Factorials in the Presence of Dispersion Effects
Authors:
Thomas M. Loughin,
Yan Zhang
Abstract:
Most methods for identifying location effects in unreplicated fractional factorial designs assume homoscedasticity of the response values. However, dispersion effects in the underlying process may create heteroscedasticity in the response values. This heteroscedasticity may go undetected when identification of location effects is pursued. Indeed, methods for identifying dispersion effects typicall…
▽ More
Most methods for identifying location effects in unreplicated fractional factorial designs assume homoscedasticity of the response values. However, dispersion effects in the underlying process may create heteroscedasticity in the response values. This heteroscedasticity may go undetected when identification of location effects is pursued. Indeed, methods for identifying dispersion effects typically require first modeling location effects. Therefore, it is imperative to understand how methods for identifying location effects function in the presence of undetected dispersion effects. We used simulation studies to examine the robustness of four different methods for identifying location effects---Box and Meyer (1986), Lenth (1989), Berk and Picard (1991), and Loughin and Noble (1997)---under models with one, two, or three dispersion effects of varying sizes. We found that the first three methods usually performed acceptably with respect to error rates and power, but the Loughin-Noble method lost control of the individual error rate when moderate-to-large dispersion effects were present.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Display advertising: Estimating conversion probability efficiently
Authors:
Abdollah Safari,
Rachel MacKay Altman,
Thomas M. Loughin
Abstract:
The goal of online display advertising is to entice users to "convert" (i.e., take a pre-defined action such as making a purchase) after clicking on the ad. An important measure of the value of an ad is the probability of conversion. The focus of this paper is the development of a computationally efficient, accurate, and precise estimator of conversion probability. The challenges associated with t…
▽ More
The goal of online display advertising is to entice users to "convert" (i.e., take a pre-defined action such as making a purchase) after clicking on the ad. An important measure of the value of an ad is the probability of conversion. The focus of this paper is the development of a computationally efficient, accurate, and precise estimator of conversion probability. The challenges associated with this estimation problem are the delays in observing conversions and the size of the data set (both number of observations and number of predictors). Two models have previously been considered as a basis for estimation: A logistic regression model and a joint model for observed conversion statuses and delay times. Fitting the former is simple, but ignoring the delays in conversion leads to an under-estimate of conversion probability. On the other hand, the latter is less biased but computationally expensive to fit. Our proposed estimator is a compromise between these two estimators. We apply our results to a data set from Criteo, a commerce marketing company that personalizes online display advertisements for users.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.