Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large $n$ & large $p$" Bayesian sparse regression

Nishimura, Akihiko; Suchard, Marc A.

doi:10.1080/01621459.2022.2057859

Statistics > Computation

arXiv:1810.12437 (stat)

[Submitted on 29 Oct 2018 (v1), last revised 3 Mar 2022 (this version, v6)]

Title:Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large $n$ & large $p$" Bayesian sparse regression

Authors:Akihiko Nishimura, Marc A. Suchard

View PDF

Abstract:In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of $10^5$ ~ $10^6$ and of $10^4$ ~ $10^5$. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian methods based on shrinkage priors. In the "large n & large p" setting, however, posterior computation encounters a major bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix $\Phi$ is expensive to compute and factorize. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $\Phi \beta = b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by $\Phi$; this involves no explicit factorization or calculation of $\Phi$ itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day.

Comments:	36 pages, 7 figures + Supplement; Software package available --- see documentation at this https URL and source code at this https URL
Subjects:	Computation (stat.CO); Machine Learning (stat.ML)
Cite as:	arXiv:1810.12437 [stat.CO]
	(or arXiv:1810.12437v6 [stat.CO] for this version)
	https://doi.org/10.48550/arXiv.1810.12437
Related DOI:	https://doi.org/10.1080/01621459.2022.2057859

Submission history

From: Akihiko Nishimura [view email]
[v1] Mon, 29 Oct 2018 22:21:56 UTC (484 KB)
[v2] Sun, 9 Dec 2018 15:24:01 UTC (1,229 KB)
[v3] Mon, 4 Mar 2019 16:44:40 UTC (1,231 KB)
[v4] Fri, 17 Jan 2020 17:35:58 UTC (1,224 KB)
[v5] Tue, 20 Jul 2021 18:01:22 UTC (940 KB)
[v6] Thu, 3 Mar 2022 19:05:48 UTC (1,789 KB)

Statistics > Computation

Title:Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large $n$ & large $p$" Bayesian sparse regression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Computation

Title:Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large $n$ & large $p$" Bayesian sparse regression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators