-
Bayesian nonparametric mixtures of Archimedean copulas
Authors:
Ruyi Pan,
Luis E. Nieto-Barajas,
Radu V. Craiu
Abstract:
Copula-based dependence modeling often relies on parametric formulations. This is mathematically convenient, but can be statistically inefficient when the parametric families are not suitable for the data and model in focus. A Bayesian nonparametric mixture of Archimedean copulas is introduced to increase the flexibility of copula-based dependence modeling. Specifically, the Poisson-Dirichlet proc…
▽ More
Copula-based dependence modeling often relies on parametric formulations. This is mathematically convenient, but can be statistically inefficient when the parametric families are not suitable for the data and model in focus. A Bayesian nonparametric mixture of Archimedean copulas is introduced to increase the flexibility of copula-based dependence modeling. Specifically, the Poisson-Dirichlet process is used as a mixing distribution over the Archimedean copulas' parameter. Properties of the mixture model are studied for the main Archimedean families, and posterior distributions are sampled via their full conditional distributions. The performance of the model is illustrated via numerical experiments involving simulated and real data.
△ Less
Submitted 29 April, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
Rare Event Classification with Weighted Logistic Regression for Identifying Repeating Fast Radio Bursts
Authors:
Antonio Herrera-Martin,
Radu V. Craiu,
Gwendolyn M. Eadie,
David C. Stenning,
Derek Bingham,
Bryan M. Gaensler,
Ziggy Pleunis,
Paul Scholz,
Ryan Mckinven,
Bikash Kharel,
Kiyoshi W. Masui
Abstract:
An important task in the study of fast radio bursts (FRBs) remains the automatic classification of repeating and non-repeating sources based on their morphological properties. We propose a statistical model that considers a modified logistic regression to classify FRB sources. The classical logistic regression model is modified to accommodate the small proportion of repeaters in the data, a featur…
▽ More
An important task in the study of fast radio bursts (FRBs) remains the automatic classification of repeating and non-repeating sources based on their morphological properties. We propose a statistical model that considers a modified logistic regression to classify FRB sources. The classical logistic regression model is modified to accommodate the small proportion of repeaters in the data, a feature that is likely due to the sampling procedure and duration and is not a characteristic of the population of FRB sources. The weighted logistic regression hinges on the choice of a tuning parameter that represents the true proportion $τ$ of repeating FRB sources in the entire population. The proposed method has a sound statistical foundation, direct interpretability, and operates with only 5 parameters, enabling quicker retraining with added data. Using the CHIME/FRB Collaboration sample of repeating and non-repeating FRBs and numerical experiments, we achieve a classification accuracy for repeaters of nearly 75\% or higher when $τ$ is set in the range of $50$ to $60$\%. This implies a tentative high proportion of repeaters, which is surprising, but is also in agreement with recent estimates of $τ$ that are obtained using other methods.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Perfecting MCMC Sampling: Recipes and Reservations
Authors:
Radu V. Craiu,
Xiao-Li Meng
Abstract:
This review paper is intended for the Handbook of Markov chain Monte Carlo's second edition. The authors will be grateful for any suggestions that could perfect it.
This review paper is intended for the Handbook of Markov chain Monte Carlo's second edition. The authors will be grateful for any suggestions that could perfect it.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Approximate Methods for Bayesian Computation
Authors:
Radu V. Craiu,
Evgeny Levi
Abstract:
Rich data generating mechanisms are ubiquitous in this age of information and require complex statistical models to draw meaningful inference. While Bayesian analysis has seen enormous development in the last 30 years, benefitting from the impetus given by the successful application of Markov chain Monte Carlo (MCMC) sampling, the combination of big data and complex models conspire to produce sign…
▽ More
Rich data generating mechanisms are ubiquitous in this age of information and require complex statistical models to draw meaningful inference. While Bayesian analysis has seen enormous development in the last 30 years, benefitting from the impetus given by the successful application of Markov chain Monte Carlo (MCMC) sampling, the combination of big data and complex models conspire to produce significant challenges for the traditional MCMC algorithms. We review modern algorithmic developments addressing the latter and compare their performance using numerical experiments.
△ Less
Submitted 19 October, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Copula Modelling of Serially Correlated Multivariate Data with Hidden Structures
Authors:
Robert Zimmerman,
Radu V. Craiu,
Vianey Leos-Barajas
Abstract:
We propose a copula-based extension of the hidden Markov model (HMM) which applies when the observations recorded at each time in the sample are multivariate. The joint model produced by the copula extension allows decoding of the hidden states based on information from multiple observations. However, unlike the case of independent marginals, the copula dependence structure embedded into the likel…
▽ More
We propose a copula-based extension of the hidden Markov model (HMM) which applies when the observations recorded at each time in the sample are multivariate. The joint model produced by the copula extension allows decoding of the hidden states based on information from multiple observations. However, unlike the case of independent marginals, the copula dependence structure embedded into the likelihood poses additional computational challenges. We tackle the latter using a theoretically-justified variation of the EM algorithm developed within the framework of inference functions for margins. We illustrate the method using numerical experiments and an analysis of house occupancy.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Six Statistical Senses
Authors:
Radu V. Craiu,
Ruobin Gong,
Xiao-Li Meng
Abstract:
This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a "sense" because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illu…
▽ More
This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a "sense" because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem.
△ Less
Submitted 18 September, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Measuring the severity of multi-collinearity in high dimensions
Authors:
Wei Q. Deng,
Radu V. Craiu,
Lei Sun
Abstract:
Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-colline…
▽ More
Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity without limiting the observed data dimensions. We applied these measures to genomic applications to investigate patterns of multi-collinearity in genetic variations across individuals with diverse ancestral backgrounds. The measures were able to visually distinguish genomic regions of excessive multi-collinearity and contrast the level of multi-collinearity between different continental populations.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Living on the Edge: An Unified Approach to Antithetic Sampling
Authors:
Roberto Casarin,
Radu V. Craiu,
Lorenzo Frattarolo,
Christian P. Robert
Abstract:
We identify recurrent ingredients in the antithetic sampling literature leading to a unified sampling framework. We introduce a new class of antithetic schemes that includes the most used antithetic proposals. This perspective enables the derivation of new properties of the sampling schemes: i) optimality in the Kullback-Leibler sense; ii) closed-form multivariate Kendall's $τ$ and Spearman's $ρ$;…
▽ More
We identify recurrent ingredients in the antithetic sampling literature leading to a unified sampling framework. We introduce a new class of antithetic schemes that includes the most used antithetic proposals. This perspective enables the derivation of new properties of the sampling schemes: i) optimality in the Kullback-Leibler sense; ii) closed-form multivariate Kendall's $τ$ and Spearman's $ρ$; iii)ranking in concordance order and iv) a central limit theorem that characterizes stochastic behavior of Monte Carlo estimators when the sample size tends to infinity. Finally, we provide applications to Monte Carlo integration and Markov Chain Monte Carlo Bayesian estimation.
△ Less
Submitted 6 December, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Double Happiness: Enhancing the Coupled Gains of L-lag Coupling via Control Variates
Authors:
Radu V. Craiu,
Xiao-Li Meng
Abstract:
The recently proposed L-lag coupling for unbiased Markov chain Monte Carlo (MCMC) calls for a joint celebration by MCMC practitioners and theoreticians. For practitioners, it circumvents the thorny issue of deciding the burn-in period or when to terminate an MCMC sampling process, and opens the door for safe parallel implementation. For theoreticians, it provides a powerful tool to establish elega…
▽ More
The recently proposed L-lag coupling for unbiased Markov chain Monte Carlo (MCMC) calls for a joint celebration by MCMC practitioners and theoreticians. For practitioners, it circumvents the thorny issue of deciding the burn-in period or when to terminate an MCMC sampling process, and opens the door for safe parallel implementation. For theoreticians, it provides a powerful tool to establish elegant and easily estimable bounds on the exact error of an MCMC approximation at any finite number of iterates. A serendipitous observation about the bias-correcting term leads us to introduce naturally available control variates into the L-lag coupling estimators. In turn, this extension enhances the coupled gains of L-lag coupling, because it results in more efficient unbiased estimators, as well as a better bound on the total variation error of MCMC iterations, albeit the gains diminish as L increases. Specifically, the new upper bound is theoretically guaranteed to never exceed the one given previously. We also argue that L-lag coupling represents a coupling for the future, breaking from the coupling-from-the-past type of perfect sampling, by reducing the generally unachievable requirement of being perfect to one of being unbiased, a worthwhile trade-off for ease of implementation in most practical situations. The theoretical analysis is supported by numerical experiments that show tighter bounds and a gain in efficiency when control variates are introduced.
△ Less
Submitted 13 April, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Assessing Data Support for the Simplifying Assumption in Bivariate Conditional Copulas
Authors:
Evgeny Levi,
Radu V Craiu
Abstract:
The paper considers the problem of establishing data support for the simplifying assumption (SA) in a bivariate conditional copula model. It is known that SA greatly simplifies the inference for a conditional copula model, but standard tools and methods for testing SA tend to not provide reliable results. After splitting the observed data into training and test sets, the method proposed will use a…
▽ More
The paper considers the problem of establishing data support for the simplifying assumption (SA) in a bivariate conditional copula model. It is known that SA greatly simplifies the inference for a conditional copula model, but standard tools and methods for testing SA tend to not provide reliable results. After splitting the observed data into training and test sets, the method proposed will use a flexible training data Bayesian fit to define tests based on randomization and standard asymptotic theory. Theoretical justification for the method is provided and its performance is studied using simulated data. The paper also discusses implementations in alternative models of interest, e.g. Gaussian, Logistic and Quantile regressions.
△ Less
Submitted 27 September, 2019;
originally announced September 2019.
-
Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods
Authors:
Evgeny Levi,
Radu V. Craiu
Abstract:
With larger data at their disposal, scientists are emboldened to tackle complex questions that require sophisticated statistical models. It is not unusual for the latter to have likelihood functions that elude analytical formulations. Even under such adversity, when one can simulate from the sampling distribution, Bayesian analysis can be conducted using approximate methods such as Approximate Bay…
▽ More
With larger data at their disposal, scientists are emboldened to tackle complex questions that require sophisticated statistical models. It is not unusual for the latter to have likelihood functions that elude analytical formulations. Even under such adversity, when one can simulate from the sampling distribution, Bayesian analysis can be conducted using approximate methods such as Approximate Bayesian Computation (ABC) or Bayesian Synthetic Likelihood (BSL). A significant drawback of these methods is that the number of required simulations can be prohibitively large, thus severely limiting their scope. In this paper we design perturbed MCMC samplers that can be used within the ABC and BSL paradigms to significantly accelerate computation while maintaining control on computational efficiency. The proposed strategy relies on recycling samples from the chain's past. The algorithmic design is supported by a theoretical analysis while practical performance is examined via a series of simulation examples and data analyses.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
The X Factor: A Robust and Powerful Approach to X-chromosome-Inclusive Whole-genome Association Studies
Authors:
Bo Chen,
Radu V. Craiu,
Lisa J. Strug,
Lei Sun
Abstract:
The X-chromosome is often excluded from genome-wide association studies because of analytical challenges. Some of the problems, such as the random, skewed or no X-inactivation model uncertainty, have been investigated. Other considerations have received little to no attention, such as the value in considering non-additive and gene-sex interaction effects, and the inferential consequence of choosin…
▽ More
The X-chromosome is often excluded from genome-wide association studies because of analytical challenges. Some of the problems, such as the random, skewed or no X-inactivation model uncertainty, have been investigated. Other considerations have received little to no attention, such as the value in considering non-additive and gene-sex interaction effects, and the inferential consequence of choosing different baseline alleles (i.e.\ the reference vs.\ the alternative allele). Here we propose a unified and flexible regression-based association test for X-chromosomal variants. We provide theoretical justifications for its robustness in the presence of various model uncertainties, as well as for its improved power when compared with the existing approaches under certain scenarios. For completeness, we also revisit the autosomes and show that the proposed framework leads to a more robust approach than the standard method. Finally, we provide supporting evidence by revisiting several published association studies. Supplementary materials for this article are available online.
△ Less
Submitted 14 May, 2021; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Exploring dimension learning via a penalized probabilistic principal component analysis
Authors:
Wei Q. Deng,
Radu V. Craiu
Abstract:
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the frame…
▽ More
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an "optimal" penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.
△ Less
Submitted 8 February, 2022; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Bayesian Model Averaging for the X-Chromosome Inactivation Dilemma in Genetic Association Study
Authors:
Bo Chen,
Radu V. Craiu,
Lei Sun
Abstract:
X-chromosome is often excluded from the so called `whole-genome' association studies due to its intrinsic difference between males and females. One particular analytical challenge is the unknown status of X-inactivation, where one of the two X-chromosome variants in females may be randomly selected to be silenced. In the absence of biological evidence in favour of one specific model, we consider a…
▽ More
X-chromosome is often excluded from the so called `whole-genome' association studies due to its intrinsic difference between males and females. One particular analytical challenge is the unknown status of X-inactivation, where one of the two X-chromosome variants in females may be randomly selected to be silenced. In the absence of biological evidence in favour of one specific model, we consider a Bayesian model averaging framework that offers a principled way to account for the inherent model uncertainty, providing model averaging-based posterior density intervals and Bayes factors. We examine the inferential properties of the proposed methods via extensive simulation studies, and we apply the methods to a genetic association study of an intestinal disease occurring in about twenty percent of Cystic Fibrosis patients. Compared with the results previously reported assuming the presence of inactivation, we show that the proposed Bayesian methods provide more feature-rich quantities that are useful in practice.
△ Less
Submitted 25 June, 2017; v1 submitted 4 April, 2017;
originally announced April 2017.
-
A scalable and efficient covariate selection criterion for mixed effects regression models with unknown random effects structure
Authors:
Radu V. Craiu,
Thierry Duchesne
Abstract:
We propose a new model selection criterion for mixed effects regression models that is computable when the model is fitted with a two-step method, even when the structure and the distribution of the random effects are unknown. The criterion is especially useful in the early stage of the model building process when one needs to decide which covariates should be included in a mixed effects regressio…
▽ More
We propose a new model selection criterion for mixed effects regression models that is computable when the model is fitted with a two-step method, even when the structure and the distribution of the random effects are unknown. The criterion is especially useful in the early stage of the model building process when one needs to decide which covariates should be included in a mixed effects regression model but has no knowledge of the random effect structure. This is particularly relevant in substantive fields where variable selection is guided by information criteria rather than regularization. The calculation of the criterion requires only the evaluation of cluster-level log-likelihoods and does not rely on heavy numerical integration. We provide theoretical and numerical arguments to justify the method and we illustrate its usefulness by analyzing data on a socio-economic study of young American Indians.
△ Less
Submitted 22 March, 2017; v1 submitted 2 December, 2016;
originally announced December 2016.
-
Likelihood Inflating Sampling Algorithm
Authors:
Reihaneh Entezari,
Radu V. Craiu,
Jeffrey S. Rosenthal
Abstract:
Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational cos…
▽ More
Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational costs by randomly splitting the dataset into smaller subsets and running MCMC methods independently in parallel on each subset using different processors. Each processor will be used to run an MCMC chain that samples sub-posterior distributions which are defined using an "inflated" likelihood function. We develop a strategy for combining the draws from different sub-posteriors to study the full posterior of the Bayesian Additive Regression Trees (BART) model. The performance of the method is tested using both simulated and real data.
△ Less
Submitted 30 June, 2017; v1 submitted 6 May, 2016;
originally announced May 2016.
-
Nonparametric imputation method for nonresponse in surveys
Authors:
Caren Hasler,
Radu V. Craiu
Abstract:
Many imputation methods are based on statistical models that assume that the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this model may lead to severe errors in estimates and to misleading conclusions. A new imputation method for item nonresponse in surveys is proposed based on a nonparametric estimation of the functional…
▽ More
Many imputation methods are based on statistical models that assume that the variable of interest is a noisy observation of a function of the auxiliary variables or covariates. Misspecification of this model may lead to severe errors in estimates and to misleading conclusions. A new imputation method for item nonresponse in surveys is proposed based on a nonparametric estimation of the functional dependence between the variable of interest and the auxiliary variables. We consider the use of smoothing spline estimation within an additive model framework to flexibly build an imputation model in the case of multiple auxiliary variables. The performance of our method is assessed via numerical experiments involving simulated and real data.
△ Less
Submitted 6 February, 2017; v1 submitted 16 March, 2016;
originally announced March 2016.
-
Adaptive Component-wise Multiple-Try Metropolis Sampling
Authors:
Jinyoung Yang,
Evgeny Levi,
Radu V. Craiu,
Jeffrey S. Rosenthal
Abstract:
One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for an…
▽ More
One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for another part of the state space. We consider a component-wise multiple-try Metropolis (CMTM) algorithm that can automatically choose from a set of candidate moves sampled from different distributions. The computational efficiency is increased using an adaptation rule for the CMTM algorithm that dynamically builds a better set of proposal distributions as the Markov chain runs. The ergodicity of the adaptive chain is demonstrated theoretically. The performance is studied via simulations and real data examples.
△ Less
Submitted 21 March, 2017; v1 submitted 10 March, 2016;
originally announced March 2016.
-
Gaussian Process Single Index Models for Conditional Copulas
Authors:
Evgeny Levi,
Radu V. Craiu
Abstract:
Parametric conditional copula models allow the copula parameters to vary with a set of covariates according to an unknown calibration function. Flexible Bayesian inference for the calibration function of a bivariate conditional copula is proposed via a sparse Gaussian process (GP) prior distribution over the set of smooth calibration functions for the single index model (SIM). The estimation of pa…
▽ More
Parametric conditional copula models allow the copula parameters to vary with a set of covariates according to an unknown calibration function. Flexible Bayesian inference for the calibration function of a bivariate conditional copula is proposed via a sparse Gaussian process (GP) prior distribution over the set of smooth calibration functions for the single index model (SIM). The estimation of parameters from the marginal distributions and the calibration function is done jointly via Markov Chain Monte Carlo sampling from the full posterior distribution. A new Conditional Cross Validated Pseudo-Marginal (CCVML) criterion is introduced in order to perform copula selection and is modified using a permutation-based procedure to assess data support for the simplifying assumption. The performance of the estimation method and model selection criteria is studied via a series of simulations using correct and misspecified models with Clayton, Frank and Gaussian copulas and a numerical application involving red wine features.
△ Less
Submitted 25 May, 2017; v1 submitted 9 March, 2016;
originally announced March 2016.
-
Embarrassingly Parallel Sequential Markov-chain Monte Carlo for Large Sets of Time Series
Authors:
Roberto Casarin,
Radu V. Craiu,
Fabrizio Leisen
Abstract:
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algor…
▽ More
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines 'divide and conquer" ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.
△ Less
Submitted 4 December, 2015;
originally announced December 2015.
-
Additive Models for Conditional Copulas
Authors:
Avideh Sabeti,
Mian Wei,
Radu V. Craiu
Abstract:
Conditional copulas are flexible statistical tools that couple joint conditional and marginal conditional distributions. In a linear regression setting with more than one covariate and two dependent outcomes, we propose the use of additive models for conditional bivariate copula models and discuss computation and model selection tools for performing Bayesian inference. The method is illustrated us…
▽ More
Conditional copulas are flexible statistical tools that couple joint conditional and marginal conditional distributions. In a linear regression setting with more than one covariate and two dependent outcomes, we propose the use of additive models for conditional bivariate copula models and discuss computation and model selection tools for performing Bayesian inference. The method is illustrated using simulations and a real example.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
Bayesian Latent Variable Modeling of Longitudinal Family Data for Genetic Pleiotropy Studies
Authors:
Lizhen Xu,
Radu V. Craiu,
Lei Sun
Abstract:
Motivated by genetic association studies of pleiotropy, we propose here a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters,…
▽ More
Motivated by genetic association studies of pleiotropy, we propose here a Bayesian latent variable approach to jointly study multiple outcomes or phenotypes. The proposed method models both continuous and binary phenotypes, and it accounts for serial and familial correlations when longitudinal and pedigree data have been collected. We present a Bayesian estimation method for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample the posterior distribution. We discuss phenotype and model selection in the Bayesian setting, and we study the performance of two selection strategies based on Bayes factors and spike-and-slab priors. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to a genome-wide association study of various complication phenotypes related to type 1 diabetes.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.
-
Statistical Testing for Conditional Copulas
Authors:
Elif F. Acar,
Radu V. Craiu,
Fang Yao
Abstract:
In conditional copula models, the copula parameter is deterministically linked to a covariate via the calibration function. The latter is of central interest for inference and is usually estimated nonparametrically. However, when a parametric model for the calibration function is appropriate, the resulting estimator exhibits significant gains in statistical efficiency and requires smaller computat…
▽ More
In conditional copula models, the copula parameter is deterministically linked to a covariate via the calibration function. The latter is of central interest for inference and is usually estimated nonparametrically. However, when a parametric model for the calibration function is appropriate, the resulting estimator exhibits significant gains in statistical efficiency and requires smaller computational costs. We develop methodology for testing a parametric formulation of the calibration function against a general alternative and propose a generalized likelihood ratio-type test that enables conditional copula model diagnostics. We derive the asymptotic null distribution of the proposed test and study its finite sample performance using simulations. The method is applied to two data examples.
△ Less
Submitted 30 April, 2012;
originally announced April 2012.
-
Interacting Multiple Try Algorithms with Different Proposal Distributions
Authors:
Roberto Casarin,
Radu V. Craiu,
Fabrizio Leisen
Abstract:
We propose a new class of interacting Markov chain Monte Carlo (MCMC) algorithms designed for increasing the efficiency of a modified multiple-try Metropolis (MTM) algorithm. The extension with respect to the existing MCMC literature is twofold. The sampler proposed extends the basic MTM algorithm by allowing different proposal distributions in the multiple-try generation step. We exploit the stru…
▽ More
We propose a new class of interacting Markov chain Monte Carlo (MCMC) algorithms designed for increasing the efficiency of a modified multiple-try Metropolis (MTM) algorithm. The extension with respect to the existing MCMC literature is twofold. The sampler proposed extends the basic MTM algorithm by allowing different proposal distributions in the multiple-try generation step. We exploit the structure of the MTM algorithm with different proposal distributions to naturally introduce an interacting MTM mechanism (IMTM) that expands the class of population Monte Carlo methods. We show the validity of the algorithm and discuss the choice of the selection weights and of the different proposals. We provide numerical studies which show that the new algorithm can perform better than the basic MTM algorithm and that the interaction mechanism allows the IMTM to efficiently explore the state space.
△ Less
Submitted 4 November, 2010;
originally announced November 2010.
-
Bayesian methods to overcome the winner's curse in genetic studies
Authors:
Lizhen Xu,
Radu V. Craiu,
Lei Sun
Abstract:
Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bay…
▽ More
Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bayes method in which a spike-and-slab prior is used to account for the possibility that the significant test result may be due to chance. We examine the robustness of the method using different priors corresponding to different degrees of confidence in the testing results and propose a Bayesian model averaging procedure to combine estimates produced by different models. The Bayesian estimators yield smaller variance compared to the conditional likelihood estimator and outperform the latter in studies with low power. We investigate the performance of the method with simulations and applications to four real data examples.
△ Less
Submitted 14 April, 2011; v1 submitted 16 July, 2009;
originally announced July 2009.
-
Nonparametric Covariate Adjustment for Receiver Operating Characteristic Curves
Authors:
Fang Yao,
Radu V. Craiu,
Benjamin Reiser
Abstract:
The accuracy of a diagnostic test is typically characterised using the receiver operating characteristic (ROC) curve. Summarising indexes such as the area under the ROC curve (AUC) are used to compare different tests as well as to measure the difference between two populations. Often additional information is available on some of the covariates which are known to influence the accuracy of such m…
▽ More
The accuracy of a diagnostic test is typically characterised using the receiver operating characteristic (ROC) curve. Summarising indexes such as the area under the ROC curve (AUC) are used to compare different tests as well as to measure the difference between two populations. Often additional information is available on some of the covariates which are known to influence the accuracy of such measures. We propose nonparametric methods for covariate adjustment of the AUC. Models with normal errors and non-normal errors are discussed and analysed separately. Nonparametric regression is used for estimating mean and variance functions in both scenarios. In the general noise case we propose a covariate-adjusted Mann-Whitney estimator for AUC estimation which effectively uses available data to construct working samples at any covariate value of interest and is computationally efficient for implementation. This provides a generalisation of the Mann-Whitney approach for comparing two populations by taking covariate effects into account. We derive asymptotic properties for the AUC estimators in both settings, including asymptotic normality, optimal strong uniform convergence rates and MSE consistency. The usefulness of the proposed methods is demonstrated through simulated and real data examples.
△ Less
Submitted 4 May, 2009;
originally announced May 2009.
-
A Mixture-Based Approach to Regional Adaptation for MCMC
Authors:
Radu V. Craiu,
Antonio Fabio Di Narzo
Abstract:
Recent advances in adaptive Markov chain Monte Carlo (AMCMC) include the need for regional adaptation in situations when the optimal transition kernel is different across different regions of the sample space. Motivated by these findings, we propose a mixture-based approach to determine the partition needed for regional AMCMC. The mixture model is fitted using an online EM algorithm (see Andrieu…
▽ More
Recent advances in adaptive Markov chain Monte Carlo (AMCMC) include the need for regional adaptation in situations when the optimal transition kernel is different across different regions of the sample space. Motivated by these findings, we propose a mixture-based approach to determine the partition needed for regional AMCMC. The mixture model is fitted using an online EM algorithm (see Andrieu and Moulines, 2006) which allows us to bypass simultaneously the heavy computational load and to implement the regional adaptive algorithm with online recursion (RAPTOR). The method is tried on simulated as well as real data examples.
△ Less
Submitted 30 March, 2009; v1 submitted 30 March, 2009;
originally announced March 2009.