-
To Ensemble or Not Ensemble: When does End-To-End Training Fail?
Authors:
Andrew M. Webb,
Charles Reynolds,
Wenlin Chen,
Henry Reeve,
Dan-Andrei Iliescu,
Mikel Lujan,
Gavin Brown
Abstract:
End-to-End training (E2E) is becoming more and more popular to train complex Deep Network architectures. An interesting question is whether this trend will continue-are there any clear failure cases for E2E training? We study this question in depth, for the specific case of E2E training an ensemble of networks. Our strategy is to blend the gradient smoothly in between two extremes: from independen…
▽ More
End-to-End training (E2E) is becoming more and more popular to train complex Deep Network architectures. An interesting question is whether this trend will continue-are there any clear failure cases for E2E training? We study this question in depth, for the specific case of E2E training an ensemble of networks. Our strategy is to blend the gradient smoothly in between two extremes: from independent training of the networks, up to to full E2E training. We find clear failure cases, where over-parameterized models cannot be trained E2E. A surprising result is that the optimum can sometimes lie in between the two, neither an ensemble or an E2E system. The work also uncovers links to Dropout, and raises questions around the nature of ensemble diversity and multi-branch networks.
△ Less
Submitted 6 August, 2020; v1 submitted 12 February, 2019;
originally announced February 2019.
-
FMRI Clustering and False Positive Rates
Authors:
Robert W. Cox,
Gang Chen,
Daniel R. Glen,
Richard C. Reynolds,
Paul A. Taylor
Abstract:
Recently, Eklund et al. (2016) analyzed clustering methods in standard FMRI packages: AFNI (which we maintain), FSL, and SPM [1]. They claimed: 1) false positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; 2) nonparametric methods produce valid, but slightly conservative, FPRs; 3) a common flawed assumption is that th…
▽ More
Recently, Eklund et al. (2016) analyzed clustering methods in standard FMRI packages: AFNI (which we maintain), FSL, and SPM [1]. They claimed: 1) false positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; 2) nonparametric methods produce valid, but slightly conservative, FPRs; 3) a common flawed assumption is that the spatial autocorrelation function (ACF) of FMRI noise is Gaussian-shaped; and 4) a 15-year-old bug in AFNI's 3dClustSim significantly contributed to producing "particularly high" FPRs compared to other software. We repeated simulations from [1] (Beijing-Zang data [2], see [3]), and comment on each point briefly.
△ Less
Submitted 15 February, 2017;
originally announced February 2017.
-
FMRI Clustering in AFNI: False Positive Rates Redux
Authors:
Robert W. Cox,
Gang Chen,
Daniel R. Glen,
Richard C. Reynolds,
Paul A. Taylor
Abstract:
Recent reports of inflated false positive rates (FPRs) in FMRI group analysis tools by Eklund et al. (2016) have become a large topic within (and outside) neuroimaging. They concluded that: existing parametric methods for determining statistically significant clusters had greatly inflated FPRs ("up to 70%," mainly due to the faulty assumption that the noise spatial autocorrelation function is Gaus…
▽ More
Recent reports of inflated false positive rates (FPRs) in FMRI group analysis tools by Eklund et al. (2016) have become a large topic within (and outside) neuroimaging. They concluded that: existing parametric methods for determining statistically significant clusters had greatly inflated FPRs ("up to 70%," mainly due to the faulty assumption that the noise spatial autocorrelation function is Gaussian- shaped and stationary), calling into question potentially "countless" previous results; in contrast, nonparametric methods, such as their approach, accurately reflected nominal 5% FPRs. They also stated that AFNI showed "particularly high" FPRs compared to other software, largely due to a bug in 3dClustSim. We comment on these points using their own results and figures and by repeating some of their simulations. Briefly, while parametric methods show some FPR inflation in those tests (and assumptions of Gaussian-shaped spatial smoothness also appear to be generally incorrect), their emphasis on reporting the single worst result from thousands of simulation cases greatly exaggerated the scale of the problem. Importantly, FPR statistics depend on "task" paradigm and voxelwise p-value threshold; as such, we show how results of their study provide useful suggestions for FMRI study design and analysis, rather than simply a catastrophic downgrading of the field's earlier results. Regarding AFNI (which we maintain), 3dClustSim's bug-effect was greatly overstated - their own results show that AFNI results were not "particularly" worse than others. We describe further updates in AFNI for characterizing spatial smoothness more appropriately (greatly reducing FPRs, though some remain >5%); additionally, we outline two newly implemented permutation/randomization-based approaches producing FPRs clustered much more tightly about 5% for voxelwise p<=0.01.
△ Less
Submitted 15 February, 2017;
originally announced February 2017.