Additive Bayesian variable selection under censoring and misspecification

Rossell, David; Rubio, Francisco Javier

Statistics > Methodology

arXiv:1907.13563v3 (stat)

[Submitted on 31 Jul 2019 (v1), revised 6 May 2020 (this version, v3), latest version 12 Nov 2021 (v5)]

Title:Additive Bayesian variable selection under censoring and misspecification

Authors:David Rossell, Francisco Javier Rubio

View PDF

Abstract:We study the interplay of two important issues on Bayesian model selection (BMS): censoring and model misspecification. We consider additive accelerated failure time (AAFT), Cox proportional hazards and probit models, and a more general concave log-likelihood structure. A fundamental question is what solution can one hope BMS to provide, when (inevitably) models are misspecified. We show that asymptotically BMS keeps any covariate with predictive power for either the outcome or censoring times, and discards other covariates. Misspecification refers to assuming the wrong model or functional effect on the response, including using a finite basis for a truly non-parametric effect, or omitting truly relevant covariates. We argue for using simple models that are computationally practical yet attain good power to detect potentially complex effects, despite misspecification. Misspecification and censoring both have an asymptotically negligible effect on (suitably-defined) false positives, but their impact on power is exponential. We portray these issues via simple descriptions of early/late censoring and the drop in predictive accuracy due to misspecification. From a methods point of view, we consider local priors and a novel structure that combines local and non-local priors to enforce sparsity. We develop algorithms to capitalize on the AAFT tractability, approximations to AAFT and probit likelihoods giving significant computational gains, a simple augmented Gibbs sampler to hierarchically explore linear and non-linear effects, and an implementation in the R package mombf. We illustrate the proposed methods and others based on likelihood penalties via extensive simulations under misspecification and censoring. We present two applications concerning the effect of gene expression on colon and breast cancer.

Subjects:	Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
Cite as:	arXiv:1907.13563 [stat.ME]
	(or arXiv:1907.13563v3 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.1907.13563

Submission history

From: Francisco Javier Rubio [view email]
[v1] Wed, 31 Jul 2019 15:43:40 UTC (176 KB)
[v2] Wed, 2 Oct 2019 22:19:24 UTC (207 KB)
[v3] Wed, 6 May 2020 18:39:21 UTC (212 KB)
[v4] Tue, 30 Mar 2021 15:18:10 UTC (236 KB)
[v5] Fri, 12 Nov 2021 10:35:57 UTC (230 KB)

Statistics > Methodology

Title:Additive Bayesian variable selection under censoring and misspecification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Additive Bayesian variable selection under censoring and misspecification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators