-
Semiparametric quantile functional regression analysis of adolescent physical activity distributions in the presence of missing data
Authors:
Benny Ren,
Ian Barnett,
Haochang Shou,
Jeremy Rubin,
Hongxiao Zhu,
Terry Conway,
Kelli Cain,
Brian Saelens,
Karen Glanz,
James Sallis,
Jeffrey S. Morris
Abstract:
In the age of digital healthcare, passively collected physical activity profiles from wearable sensors are a preeminent tool for evaluating health outcomes. In order to fully leverage the vast amounts of data collected through wearable accelerometers, we propose to use quantile functional regression to model activity profiles as distributional outcomes through quantile responses, which can be used…
▽ More
In the age of digital healthcare, passively collected physical activity profiles from wearable sensors are a preeminent tool for evaluating health outcomes. In order to fully leverage the vast amounts of data collected through wearable accelerometers, we propose to use quantile functional regression to model activity profiles as distributional outcomes through quantile responses, which can be used to evaluate activity level differences across covariates based on any desired distributional summary. Our proposed framework addresses two key problems not handled in existing distributional regression literature. First, we use spline mixed model formulations in the basis space to model nonparametric effects of continuous predictors on the distributional response. Second, we address the underlying missingness problem that is common in these types of wearable data but typically not addressed. We show that the missingness can induce bias in the subject-specific distributional summaries that leads to biased distributional regression estimates and even bias the frequently used scalar summary measures, and introduce a nonparametric function-on-function modeling approach that adjusts for each subject's missingness profile to address this problem. We evaluate our nonparametric modeling and missing data adjustment using simulation studies based on realistically simulated activity profiles and use it to gain insights into adolescent activity profiles from the Teen Environment and Neighborhood study.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe)
Authors:
Neel Desai,
Veerabhadran Baladandayuthapani,
Russell T. Shinohara,
Jeffrey S. Morris
Abstract:
We propose a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. Our method called Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe) simultaneously selects among null, linear, and smooth non-linear effects for each predictor while incorporating joint estimation of the sparse residual structure among respons…
▽ More
We propose a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. Our method called Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe) simultaneously selects among null, linear, and smooth non-linear effects for each predictor while incorporating joint estimation of the sparse residual structure among responses, with the motivation that accounting for inter-response correlation structure can lead to improved accuracy in variable selection and estimation efficiency. CoMPAdRe is constructed in a computationally efficient way that allows the selection and estimation of linear and non-linear covariates to be conducted in parallel across responses. Compared to single-response approaches that marginally select linear and non-linear covariate effects, we demonstrate in simulation studies that the joint multivariate modeling leads to gains in both estimation efficiency and selection accuracy, of greater magnitude in settings where signal is moderate relative to the level of noise. We apply our approach to protein-mRNA expression levels from multiple breast cancer pathways obtained from The Cancer Proteome Atlas and characterize both mRNA-protein associations and protein-protein subnetworks for each pathway. We find non-linear mRNA-protein associations for the Core Reactive, EMT, PIK-AKT, and RTK pathways.
△ Less
Submitted 18 November, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Early-Phase Local-Area Model for Pandemics Using Limited Data: A SARS-CoV-2 Application
Authors:
Jiasheng Shi,
Jeffrey S. Morris,
David M. Rubin,
Jing Huang
Abstract:
The emergence of novel infectious agents presents challenges to statistical models of disease transmission. These challenges arise from limited, poor-quality data and an incomplete understanding of the agent. Moreover, outbreaks manifest differently across regions due to various factors, making it imperative for models to factor in regional specifics. In this work, we offer a model that effectivel…
▽ More
The emergence of novel infectious agents presents challenges to statistical models of disease transmission. These challenges arise from limited, poor-quality data and an incomplete understanding of the agent. Moreover, outbreaks manifest differently across regions due to various factors, making it imperative for models to factor in regional specifics. In this work, we offer a model that effectively utilizes constrained data resources to estimate disease transmission rates at the local level, especially during the early outbreak phase when primarily infection counts and aggregated local characteristics are accessible. This model merges a pathogen transmission methodology based on daily infection numbers with regression techniques, drawing correlations between disease transmission and local-area factors, such as demographics, health policies, behavior, and even climate, to estimate and forecast daily infections. We incorporate the quasi-score method and an error term to navigate potential data concerns and mistaken assumptions. Additionally, we introduce an online estimator that facilitates real-time data updates, complemented by an iterative algorithm for parameter estimation. This approach facilitates real-time analysis of disease transmission when data quality is suboptimal and knowledge of the infectious pathogen is limited. It is particularly useful in the early stages of outbreaks, providing support for local decision-making.
△ Less
Submitted 18 March, 2024; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Novel Bayesian method for simultaneous detection of activation signatures and background connectivity for task fMRI data
Authors:
Michelle F. Miranda,
Jeffrey S. Morris
Abstract:
In this paper, we introduce a new Bayesian approach for analyzing task fMRI data that simultaneously detects activation signatures and background connectivity. Our modeling involves a new hybrid tensor spatial-temporal basis strategy that enables scalable computing yet captures nearby and distant intervoxel correlation and long-memory temporal correlation. The spatial basis involves a composite hy…
▽ More
In this paper, we introduce a new Bayesian approach for analyzing task fMRI data that simultaneously detects activation signatures and background connectivity. Our modeling involves a new hybrid tensor spatial-temporal basis strategy that enables scalable computing yet captures nearby and distant intervoxel correlation and long-memory temporal correlation. The spatial basis involves a composite hybrid transform with two levels: the first accounts for within-ROI correlation, and second between-ROI distant correlation. We demonstrate in simulations how our basis space regression modeling strategy increases sensitivity for identifying activation signatures, partly driven by the induced background connectivity that itself can be summarized to reveal biological insights. This strategy leads to computationally scalable fully Bayesian inference at the voxel or ROI level that adjusts for multiple testing. We apply this model to Human Connectome Project data to reveal insights into brain activation patterns and background connectivity related to working memory tasks.
△ Less
Submitted 1 December, 2023; v1 submitted 31 August, 2021;
originally announced September 2021.
-
Bayesian functional graphical models
Authors:
Lin Zhang,
Veera Baladandayuthapani,
Quinton Neville,
Karina Quevedo,
Jeffrey S. Morris
Abstract:
We develop a Bayesian graphical modeling framework for functional data for correlated multivariate random variables observed over a continuous domain. Our method leads to graphical Markov models for functional data which allows the graphs to vary over the functional domain. The model involves estimation of graphical models that evolve functionally in a nonparametric fashion while accounting for wi…
▽ More
We develop a Bayesian graphical modeling framework for functional data for correlated multivariate random variables observed over a continuous domain. Our method leads to graphical Markov models for functional data which allows the graphs to vary over the functional domain. The model involves estimation of graphical models that evolve functionally in a nonparametric fashion while accounting for within-functional correlations and borrowing strength across functional positions so contiguous locations are encouraged but not forced to have similar graph structure and edge strength. We utilize a strategy that combines nonparametric basis function modeling with modified Bayesian graphical regularization techniques, which induces a new class of hypoexponential normal scale mixture distributions that not only leads to adaptively shrunken estimators of the conditional cross-covariance but also facilitates a thorough theoretical investigation of the shrinkage properties. Our approach scales up to large functional datasets collected on a fine grid. We show through simulations and real data analysis that the Bayesian functional graphical model can efficiently reconstruct the functionally-evolving graphical models by accounting for within-function correlations.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer
Authors:
Zeya Wang,
Veera Baladandayuthapan,
Ahmed O. Kaseb,
Hesham M. Amin,
Manal M. Hassan,
Wenyi Wang,
Jeffrey S. Morris
Abstract:
Graphical models are commonly used to discover associations within gene or protein networks for complex diseases such as cancer. Most existing methods estimate a single graph for a population, while in many cases, researchers are interested in characterizing the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the ne…
▽ More
Graphical models are commonly used to discover associations within gene or protein networks for complex diseases such as cancer. Most existing methods estimate a single graph for a population, while in many cases, researchers are interested in characterizing the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. Bayesian shrinkage algorithms are used to induce sparsity in the underlying graphical models. We assess our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity and a case study assessing how blood protein networks in hepatocellular carcinoma patients vary with severity of disease, measured by HepatoScore, a novel biomarker signature measuring disease severity.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
Scalable Function-on-Scalar Quantile Regression for Densely Sampled Functional Data
Authors:
Yusha Liu,
Meng Li,
Jeffrey S. Morris
Abstract:
Functional quantile regression (FQR) is a useful alternative to mean regression for functional data as it provides a comprehensive understanding of how scalar predictors influence the conditional distribution of functional responses. In this article, we study the FQR model for densely sampled, high-dimensional functional data without relying on parametric error or independent stochastic process as…
▽ More
Functional quantile regression (FQR) is a useful alternative to mean regression for functional data as it provides a comprehensive understanding of how scalar predictors influence the conditional distribution of functional responses. In this article, we study the FQR model for densely sampled, high-dimensional functional data without relying on parametric error or independent stochastic process assumptions, with the focus on statistical inference under this challenging regime along with scalable implementation. This is achieved by a simple but powerful distributed strategy, in which we first perform separate quantile regression to compute $M$-estimators at each sampling location, and then carry out estimation and inference for the entire coefficient functions by properly exploiting the uncertainty quantification and dependence structure of $M$-estimators. We derive a uniform Bahadur representation and a strong Gaussian approximation result for the $M$-estimators on the discrete sampling grid, leading to dimension reduction and serving as the basis for inference. An interpolation-based estimator with minimax optimality is proposed, and large sample properties for point and simultaneous interval estimators are established. The obtained minimax optimal rate under the FQR model shows an interesting phase transition phenomenon that has been previously observed in functional mean regression. The proposed methods are illustrated via simulations and an application to a mass spectrometry proteomics dataset.
△ Less
Submitted 6 November, 2023; v1 submitted 9 February, 2020;
originally announced February 2020.
-
Ordinal Probit Functional Outcome Regression with Application to Computer-Use Behavior in Rhesus Monkeys
Authors:
Mark J. Meyer,
Jeffrey S. Morris,
Regina Paxton Gazes,
Brent A. Coull
Abstract:
Research in functional regression has made great strides in expanding to non-Gaussian functional outcomes, but exploration of ordinal functional outcomes remains limited. Motivated by a study of computer-use behavior in rhesus macaques (Macaca mulatta), we introduce the Ordinal Probit Functional Outcome Regression model (OPFOR). OPFOR models can be fit using one of several basis functions includin…
▽ More
Research in functional regression has made great strides in expanding to non-Gaussian functional outcomes, but exploration of ordinal functional outcomes remains limited. Motivated by a study of computer-use behavior in rhesus macaques (Macaca mulatta), we introduce the Ordinal Probit Functional Outcome Regression model (OPFOR). OPFOR models can be fit using one of several basis functions including penalized B-splines, wavelets, and O'Sullivan splines -- the last of which typically performs best. Simulation using a variety of underlying covariance patterns shows that the model performs reasonably well in estimation under multiple basis functions with near nominal coverage for joint credible intervals. Finally, in application, we use Bayesian model selection criteria adapted to functional outcome regression to best characterize the relation between several demographic factors of interest and the monkeys' computer use over the course of a year. In comparison with a standard ordinal longitudinal analysis, OPFOR outperforms a cumulative-link mixed-effects model in simulation and provides additional and more nuanced information on the nature of the monkeys' computer-use behavior.
△ Less
Submitted 18 March, 2021; v1 submitted 23 January, 2019;
originally announced January 2019.
-
Regression Analyses of Distributions using Quantile Functional Regression
Authors:
Hojin Yang,
Veerabhadran Baladandayuthapani,
Arvind U. K. Rao,
Jeffrey S. Morris
Abstract:
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected featur…
▽ More
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, and various upper and lower quantiles. To account for smoothness in the quantile functions, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
Function-on-Scalar Quantile Regression with Application to Mass Spectrometry Proteomics Data
Authors:
Yusha Liu,
Meng Li,
Jeffrey S. Morris
Abstract:
Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the inter-patient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at…
▽ More
Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the inter-patient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at aberrant levels for a subset of, not all, cancer samples. Differences in these biomarkers can easily be missed by mean regression, but might be more easily detected by quantile-based approaches. Thus, we propose a unified Bayesian framework to perform quantile regression on functional responses. Our approach utilizes an asymmetric Laplace working likelihood, represents the functional coefficients with basis representations which enable borrowing of strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. Different types of basis transform and continuous shrinkage priors can be used in our framework. A scalable Gibbs sampler is developed to generate posterior samples that can be used to perform Bayesian estimation and inference while accounting for multiple testing. Our framework performs quantile regression and coefficient regularization in a unified manner, allowing them to inform each other and leading to improvement in performance over competing methods as demonstrated by simulation studies. We also introduce an adjustment procedure to the model to improve its frequentist properties of posterior inference. We apply our model to identify proteomic biomarkers of pancreatic cancer that are differentially expressed for a subset of cancer patients compared to the normal controls, which were missed by previous mean-regression based approaches. Supplementary materials for this article are available online.
△ Less
Submitted 3 October, 2019; v1 submitted 1 September, 2018;
originally announced September 2018.
-
Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data
Authors:
Wonyul Lee,
Michelle F. Miranda,
Phlip Rausch,
Veerbhadran Baladandayuthapani,
Massimo Fazio,
J. Crawford Downs,
Jeffrey S. Morris
Abstract:
Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors su…
▽ More
Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data.
△ Less
Submitted 7 May, 2018; v1 submitted 23 February, 2018;
originally announced February 2018.
-
Quantile Functional Regression using Quantlets
Authors:
Hojin Yang,
Veerabhadran Baladandayuthapani,
Jeffrey S. Morris
Abstract:
In this paper, we develop a quantile functional regression modeling framework that models the distribution of a set of common repeated observations from a subject through the quantile function, which is regressed on a set of covariates to determine how these factors affect various aspects of the underlying subject-specific distribution. To account for smoothness in the quantile functions, we intro…
▽ More
In this paper, we develop a quantile functional regression modeling framework that models the distribution of a set of common repeated observations from a subject through the quantile function, which is regressed on a set of covariates to determine how these factors affect various aspects of the underlying subject-specific distribution. To account for smoothness in the quantile functions, we introduce custom basis functions we call \textit{quantlets} that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so {non-Gaussianness} can be assessed. While these quantlets could be used within various functional regression frameworks, we build a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and allows fully Bayesian inferences after fitting a Markov chain Monte Carlo. Specifically, we apply global tests to assess which covariates have any effect on the distribution at all, followed by local tests to identify at which specific quantiles the differences lie while adjusting for multiple testing, and to assess whether the covariate affects certain major aspects of the distribution, including location, scale, skewness, Gaussianness, or tails. If the difference lies in these commonly-used summaries, our approach can still detect them, but our systematic modeling strategy can also detect effects on other aspects of the distribution that might be missed if one restricted attention to pre-chosen summaries. We demonstrate the benefit of the basis space modeling through simulation studies, and illustrate the method using a biomedical imaging data set in which we relate the distribution of pixel intensities from a tumor image to various demographic, clinical, and genetic characteristics.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
Functional Regression
Authors:
Jeffrey S. Morris
Abstract:
Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastes…
▽ More
Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field.
△ Less
Submitted 16 June, 2014;
originally announced June 2014.
-
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Authors:
Jeffrey S. Morris,
Veerabhadran Baladandayuthapani,
Richard C. Herrick,
Pietro Sanna,
Howard Gutstein
Abstract:
Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images…
▽ More
Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images to make inference about populations or interventions. In this paper we present a unified analysis framework for the analysis of quantitative image data using a Bayesian functional mixed model approach. This framework is flexible enough to handle complex, irregular images with many local features, and can model the simultaneous effects of multiple factors on the image intensities and account for the correlation between images induced by the design. We introduce a general isomorphic modeling approach to fitting the functional mixed model, of which the wavelet-based functional mixed model is one special case. With suitable modeling choices, this approach leads to efficient calculations and can result in flexible modeling and adaptive smoothing of the salient features in the data. The proposed method has the following advantages: it can be run automatically, it produces inferential plots indicating which regions of the image are associated with each factor, it simultaneously considers the practical and statistical significance of findings, and it controls the false discovery rate.
△ Less
Submitted 19 August, 2011;
originally announced August 2011.
-
Online Variational Bayes Inference for High-Dimensional Correlated Data
Authors:
Sylvie Tchumtchoua,
David B. Dunson,
Jeffrey S. Morris
Abstract:
High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose flexible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We add…
▽ More
High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose flexible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. We applied the methodology to analyze signal intensity in MRI images of subjects with knee osteoarthritis, using data from the Osteoarthritis Initiative.
△ Less
Submitted 4 August, 2011;
originally announced August 2011.