-
The R Package BHAM: Fast and Scalable Bayesian Hierarchical Additive Model for High-dimensional Data
Authors:
Boyi Guo,
Nengjun Yi
Abstract:
BHAM is a freely avaible R pakcage that implments Bayesian hierarchical additive models for high-dimensional clinical and genomic data. The package includes functions that generalized additive model, and Cox additive model with the spike-and-slab LASSO prior. These functions implement scalable and stable algorithms to estimate parameters. BHAM also provides utility functions to construct additive…
▽ More
BHAM is a freely avaible R pakcage that implments Bayesian hierarchical additive models for high-dimensional clinical and genomic data. The package includes functions that generalized additive model, and Cox additive model with the spike-and-slab LASSO prior. These functions implement scalable and stable algorithms to estimate parameters. BHAM also provides utility functions to construct additive models in high dimensional settings, select optimal models, summarize bi-level variable selection results, and visualize nonlinear effects. The package can facilitate flexible modeling of large-scale molecular data, i.e. detecting susceptible variables and infering disease diagnostic and prognostic. In this article, we describe the models, algorithms and related features implemented in BHAM. The package is freely available via the public GitHub repository https://github.com/boyiguo1/BHAM.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
A scalable and flexible Cox proportional hazards model for high-dimensional survival prediction and functional selection
Authors:
Boyi Guo,
Nengjun Yi
Abstract:
Cox proportional hazards model is one of the most popular models in biomedical data analysis. There have been continuing efforts to improve the flexibility of such models for complex signal detection, for example, via additive functions. Nevertheless, the task to extend Cox additive models to accommodate high-dimensional data is nontrivial. When estimating additive functions, commonly used group s…
▽ More
Cox proportional hazards model is one of the most popular models in biomedical data analysis. There have been continuing efforts to improve the flexibility of such models for complex signal detection, for example, via additive functions. Nevertheless, the task to extend Cox additive models to accommodate high-dimensional data is nontrivial. When estimating additive functions, commonly used group sparse regularization may introduce excess smoothing shrinkage on additive functions, damaging predictive performance. Moreover, an "all-in-all-out" approach makes functional selection challenging to answer if nonlinear effects exist. We develop an additive Cox PH model to address these challenges in high-dimensional data analysis. Notably, we impose a novel spike-and-slab LASSO prior that motivates the bi-level functional selection on additive functions. A scalable and deterministic algorithm, EM-Coordinate Descent, is designed for scalable model fitting. We compare the predictive and computational performance against state-of-the-art models in simulation studies and metabolomics data analysis. The proposed model is broadly applicable to various fields of research, e.g. genomics and population health, via the freely available R package BHAM (https://boyiguo1.github.io/BHAM/).
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Spike-and-Slab LASSO Generalized Additive Models and Scalable Algorithms for High-Dimensional Data Analysis
Authors:
Boyi Guo,
Byron C. Jaeger,
A. K. M. Fazlur Rahman,
D. Leann Long,
Nengjun Yi
Abstract:
There are proposals that extend the classical generalized additive models (GAMs) to accommodate high-dimensional data ($p>>n$) using group sparse regularization. However, the sparse regularization may induce excess shrinkage when estimating smooth functions, damaging predictive performance. Moreover, most of these GAMs consider an "all-in-all-out" approach for functional selection, rendering them…
▽ More
There are proposals that extend the classical generalized additive models (GAMs) to accommodate high-dimensional data ($p>>n$) using group sparse regularization. However, the sparse regularization may induce excess shrinkage when estimating smooth functions, damaging predictive performance. Moreover, most of these GAMs consider an "all-in-all-out" approach for functional selection, rendering them difficult to answer if nonlinear effects are necessary. While some Bayesian models can address these shortcomings, using Markov chain Monte Carlo algorithms for model fitting creates a new challenge, scalability. Hence, we propose Bayesian hierarchical generalized additive models as a solution: we consider the smoothing penalty for proper shrinkage of curve interpolation via reparameterization. A novel two-part spike-and-slab LASSO prior for smooth functions is developed to address the sparsity of signals while providing extra flexibility to select the linear or nonlinear components of smooth functions. A scalable and deterministic algorithm, EM-Coordinate Descent, is implemented in an open-source R package BHAM. Simulation studies and metabolomics data analyses demonstrate improved predictive and computational performance against state-of-the-art models. Functional selection performance suggests trade-offs exist regarding the effect hierarchy assumption.
△ Less
Submitted 16 May, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.