-
The risk of bias in denoising methods
Authors:
Kendrick Kay
Abstract:
Experimental datasets are growing rapidly in size, scope, and detail, but the value of these datasets is limited by unwanted measurement noise. It is therefore tempting to apply analysis techniques that attempt to reduce noise and enhance signals of interest. In this paper, we draw attention to the possibility that denoising methods may introduce bias and lead to incorrect scientific inferences. T…
▽ More
Experimental datasets are growing rapidly in size, scope, and detail, but the value of these datasets is limited by unwanted measurement noise. It is therefore tempting to apply analysis techniques that attempt to reduce noise and enhance signals of interest. In this paper, we draw attention to the possibility that denoising methods may introduce bias and lead to incorrect scientific inferences. To present our case, we first review the basic statistical concepts of bias and variance. Denoising techniques typically reduce variance observed across repeated measurements, but this can come at the expense of introducing bias to the average expected outcome. We then conduct three simple simulations that provide concrete examples of how bias may manifest in everyday situations. These simulations reveal several findings that may be surprising and counterintuitive: (i) different methods can be equally effective at reducing variance but some incur bias while others do not, (ii) identifying methods that better recover ground truth does not guarantee the absence of bias, (iii) bias can arise even if one has specific knowledge of properties of the signal of interest. We suggest that researchers should consider and possibly quantify bias before deploying denoising methods on important research data.
△ Less
Submitted 25 January, 2022; v1 submitted 23 January, 2022;
originally announced January 2022.
-
Fractional ridge regression: a fast, interpretable reparameterization of ridge regression
Authors:
Ariel Rokem,
Kendrick Kay
Abstract:
Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($α$) that controls the amount of regularization. Cross-validation is typically used to select the best $α$ from a set of candidates. However, efficient and appropriate selection of $α$ can be challenging, par…
▽ More
Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($α$) that controls the amount of regularization. Cross-validation is typically used to select the best $α$ from a set of candidates. However, efficient and appropriate selection of $α$ can be challenging, particularly where large amounts of data are analyzed. Because the selected $α$ depends on the scale of the data and predictors, it is not straightforwardly interpretable. Here, we propose to reparameterize RR in terms of the ratio $γ$ between the L2-norms of the regularized and unregularized coefficients. This approach, called fractional RR (FRR), has several benefits: the solutions obtained for different $γ$ are guaranteed to vary, guarding against wasted calculations, and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. We provide an algorithm to solve FRR, as well as open-source software implementations in Python and MATLAB (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems, and delivers results that are straightforward to interpret and compare across models and datasets.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Encoding and decoding V1 fMRI responses to natural images with sparse nonparametric models
Authors:
Vincent Q. Vu,
Pradeep Ravikumar,
Thomas Naselaris,
Kendrick N. Kay,
Jack L. Gallant,
Bin Yu
Abstract:
Functional MRI (fMRI) has become the most common method for investigating the human brain. However, fMRI data present some complications for statistical analysis and modeling. One recently developed approach to these data focuses on estimation of computational encoding models that describe how stimuli are transformed into brain activity measured in individual voxels. Here we aim at building encodi…
▽ More
Functional MRI (fMRI) has become the most common method for investigating the human brain. However, fMRI data present some complications for statistical analysis and modeling. One recently developed approach to these data focuses on estimation of computational encoding models that describe how stimuli are transformed into brain activity measured in individual voxels. Here we aim at building encoding models for fMRI signals recorded in the primary visual cortex of the human brain. We use residual analyses to reveal systematic nonlinearity across voxels not taken into account by previous models. We then show how a sparse nonparametric method [J. Roy. Statist. Soc. Ser. B 71 (2009b) 1009-1030] can be used together with correlation screening to estimate nonlinear encoding models effectively. Our approach produces encoding models that predict about 25% more accurately than models estimated using other methods [Nature 452 (2008a) 352--355]. The estimated nonlinearity impacts the inferred properties of individual voxels, and it has a plausible biological interpretation. One benefit of quantitative encoding models is that estimated models can be used to decode brain activity, in order to identify which specific image was seen by an observer. Encoding models estimated by our approach also improve such image identification by about 12% when the correct image is one of 11,500 possible images.
△ Less
Submitted 25 July, 2011; v1 submitted 14 April, 2011;
originally announced April 2011.