-
Estimation of ratios of normalizing constants using stochastic approximation : the SARIS algorithm
Authors:
Tom Guédon,
Charlotte Baey,
Estelle Kuhn
Abstract:
Computing ratios of normalizing constants plays an important role in statistical modeling. Two important examples are hypothesis testing in latent variables models, and model comparison in Bayesian statistics. In both examples, the likelihood ratio and the Bayes factor are defined as the ratio of the normalizing constants of posterior distributions. We propose in this article a novel methodology t…
▽ More
Computing ratios of normalizing constants plays an important role in statistical modeling. Two important examples are hypothesis testing in latent variables models, and model comparison in Bayesian statistics. In both examples, the likelihood ratio and the Bayes factor are defined as the ratio of the normalizing constants of posterior distributions. We propose in this article a novel methodology that estimates this ratio using stochastic approximation principle. Our estimator is consistent and asymptotically Gaussian. Its asymptotic variance is smaller than the one of the popular optimal bridge sampling estimator. Furthermore, it is much more robust to little overlap between the two unnormalized distributions considered. Thanks to its online definition, our procedure can be integrated in an estimation process in latent variables model, and therefore reduce the computational effort. The performances of the estimator are illustrated through a simulation study and compared to two other estimators : the ratio importance sampling and the optimal bridge sampling estimators.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Bootstrap test procedure for variance components in nonlinear mixed effects models in the presence of nuisance parameters and a singular Fisher Information Matrix
Authors:
Tom Guédon,
Charlotte Baey,
Estelle Kuhn
Abstract:
We examine the problem of variance components testing in general mixed effects models using the likelihood ratio test. We account for the presence of nuisance parameters, i.e. the fact that some untested variances might also be equal to zero. Two main issues arise in this context leading to a non regular setting. First, under the null hypothesis the true parameter value lies on the boundary of the…
▽ More
We examine the problem of variance components testing in general mixed effects models using the likelihood ratio test. We account for the presence of nuisance parameters, i.e. the fact that some untested variances might also be equal to zero. Two main issues arise in this context leading to a non regular setting. First, under the null hypothesis the true parameter value lies on the boundary of the parameter space. Moreover, due to the presence of nuisance parameters the exact location of these boundary points is not known, which prevents from using classical asymptotic theory of maximum likelihood estimation. Then, in the specific context of nonlinear mixed-effects models, the Fisher information matrix is singular at the true parameter value. We address these two points by proposing a shrinked parametric bootstrap procedure, which is straightforward to apply even for nonlinear models. We show that the procedure is consistent, solving both the boundary and the singularity issues, and we provide a verifiable criterion for the applicability of our theoretical results. We show through a simulation study that, compared to the asymptotic approach, our procedure has a better small sample performance and is more robust to the presence of nuisance parameters. A real data application is also provided.
△ Less
Submitted 24 May, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Modeling dependent survival data through random effects with spatial correlation at the subject level
Authors:
Ajmal Oodally,
Estelle Kuhn,
Klara Goethals,
Luc Duchateau
Abstract:
Dynamical phenomena such as infectious diseases are often investigated by following up subjects longitudinally, thus generating time to event data. The spatial aspect of such data is also of primordial importance, as many infectious diseases are transmitted from one subject to another. In this paper, a spatially correlated frailty model is introduced that accommodates for the correlation between s…
▽ More
Dynamical phenomena such as infectious diseases are often investigated by following up subjects longitudinally, thus generating time to event data. The spatial aspect of such data is also of primordial importance, as many infectious diseases are transmitted from one subject to another. In this paper, a spatially correlated frailty model is introduced that accommodates for the correlation between subjects based on the distance between them. Estimates are obtained through a stochastic approximation version of the Expectation Maximization algorithm combined with a Monte-Carlo Markov Chain, for which convergence is proven. The novelty of this model is that spatial correlation is introduced for survival data at the subject level, each subject having its own frailty. This univariate spatially correlated frailty model is used to analyze spatially dependent malaria data, and its results are compared with other standard models.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
varTestnlme: an R package for Variance Components Testing in Linear and Nonlinear Mixed-effects Models
Authors:
Charlotte Baey,
Estelle Kuhn
Abstract:
The issue of variance components testing arises naturally when building mixed-effects models, to decide which effects should be modeled as fixed or random. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether any subset of the variances and c…
▽ More
The issue of variance components testing arises naturally when building mixed-effects models, to decide which effects should be modeled as fixed or random. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether any subset of the variances and covariances corresponding to any subset of the random effects, are equal to zero using asymptotic property of the likelihood ratio test statistic. It also offers the possibility to test simultaneously for fixed effects and variance components. It can be used for linear, generalized linear or nonlinear mixed-effects models fitted via lme4, nlme or saemix. Theoretical properties of the used likelihood ratio test are recalled, numerical methods used to implement the test procedure are detailed and examples based on different real datasets using different mixed models are provided.
△ Less
Submitted 17 May, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Convergent stochastic algorithm for parameter estimation in frailty models using integrated partial likelihood
Authors:
Oodally Ajmal,
Luc Duchateau,
Estelle Kuhn
Abstract:
Frailty models are often the model of choice for heterogeneous survival data. A frailty model contains both random effects and fixed effects, with the random effects accommodating for the correlation in the data. Different estimation procedures have been proposed for the fixed effects and the variances of and covariances between the random effects. Especially with an unspecified baseline hazard, i…
▽ More
Frailty models are often the model of choice for heterogeneous survival data. A frailty model contains both random effects and fixed effects, with the random effects accommodating for the correlation in the data. Different estimation procedures have been proposed for the fixed effects and the variances of and covariances between the random effects. Especially with an unspecified baseline hazard, i.e., the Cox model, the few available methods deal only with a specific correlation structure. In this paper, an estimation procedure, based on the integrated partial likelihood, is introduced, which can generally deal with any kind of correlation structure. The new approach, namely the maximisation of the integrated partial likelihood, combined with a stochastic estimation procedure allows also for a wide choice of distributions for the random effects. First, we demonstrate the almost sure convergence of the stochastic algorithm towards a critical point of the integrated partial likelihood. Second, numerical convergence properties are evaluated by simulation. Third, the advantage of using an unspecified baseline hazard is demonstrated through application on cancer clinical trial data.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Estimating Fisher Information Matrix in Latent Variable Models based on the Score Function
Authors:
Maud Delattre,
Estelle Kuhn
Abstract:
The Fisher information matrix (FIM) is a key quantity in statistics as it is required for example for evaluating asymptotic precisions of parameter estimates, for computing test statistics or asymptotic distributions in statistical testing, for evaluating post model selection inference results or optimality criteria in experimental designs. However its exact computation is often not trivial. In pa…
▽ More
The Fisher information matrix (FIM) is a key quantity in statistics as it is required for example for evaluating asymptotic precisions of parameter estimates, for computing test statistics or asymptotic distributions in statistical testing, for evaluating post model selection inference results or optimality criteria in experimental designs. However its exact computation is often not trivial. In particular in many latent variable models, it is intricated due to the presence of unobserved variables. Therefore the observed FIM is usually considered in this context to estimate the FIM. Several methods have been proposed to approximate the observed FIM when it can not be evaluated analytically. Among the most frequently used approaches are Monte-Carlo methods or iterative algorithms derived from the missing information principle. All these methods require to compute second derivatives of the complete data log-likelihood which leads to some disadvantages from a computational point of view. In this paper, we present a new approach to estimate the FIM in latent variable model. The advantage of our method is that only the first derivatives of the log-likelihood is needed, contrary to other approaches based on the observed FIM. Indeed we consider the empirical estimate of the covariance matrix of the score. We prove that this estimate of the Fisher information matrix is unbiased, consistent and asymptotically Gaussian. Moreover we highlight that none of both estimates is better than the other in terms of asymptotic covariance matrix. When the proposed estimate can not be directly analytically evaluated, we present a stochastic approximation estimation algorithm to compute it. This algorithm provides this estimate of the FIM as a by-product of the parameter estimates. We emphasize that the proposed algorithm only requires to compute the first derivatives of the complete data log-likelihood with respect to the parameters. We prove that the estimation algorithm is consistent and asymptotically Gaussian when the number of iterations goes to infinity. We evaluate the finite sample size properties of the proposed estimate and of the observed FIM through simulation studies in linear mixed effects models and mixture models. We also investigate the convergence properties of the estimation algorithm in non linear mixed effects models. We compare the performances of the proposed algorithm to those of other existing methods.
△ Less
Submitted 6 February, 2023; v1 submitted 13 September, 2019;
originally announced September 2019.
-
Properties of the Stochastic Approximation EM Algorithm with Mini-batch Sampling
Authors:
Tabea Rebafka,
Estelle Kuhn,
Catherine Matias
Abstract:
To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation-Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classicalconditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorit…
▽ More
To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation-Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classicalconditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models.In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.
△ Less
Submitted 12 May, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Likelihood ratio test for variance components in nonlinear mixed effects models
Authors:
Charlotte Baey,
Paul-Henry Cournède,
Estelle Kuhn
Abstract:
Mixed effects models are widely used to describe heterogeneity in a population. A crucial issue when adjusting such a model to data consists in identifying fixed and random effects. From a statistical point of view, it remains to test the nullity of the variances of a given subset of random effects. Some authors have proposed to use the likelihood ratio test and have established its asymptotic dis…
▽ More
Mixed effects models are widely used to describe heterogeneity in a population. A crucial issue when adjusting such a model to data consists in identifying fixed and random effects. From a statistical point of view, it remains to test the nullity of the variances of a given subset of random effects. Some authors have proposed to use the likelihood ratio test and have established its asymptotic distribution in some particular cases. Nevertheless, to the best of our knowledge, no general variance components testing procedure has been fully investigated yet. In this paper, we study the likelihood ratio test properties to test that the variances of a general subset of the random effects are equal to zero in both linear and nonlinear mixed effects model, extending the existing results. We prove that the asymptotic distribution of the test is a chi-bar-square distribution, that is to say a mixture of chi-square distributions, and we identify the corresponding weights. We highlight in particular that the limiting distribution depends on the presence of correlations between the random effects but not on the linear or nonlinear structure of the mixed effects model. We illustrate the finite sample size properties of the test procedure through simulation studies and apply the test procedure to two real datasets of dental growth and of coucal growth.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Stochastic Algorithm For Parameter Estimation For Dense Deformable Template Mixture Model
Authors:
Stéphanie Allassonnière,
Estelle Kuhn
Abstract:
Estimating probabilistic deformable template models is a new approach in the fields of computer vision and probabilistic atlases in computational anatomy. A first coherent statistical framework modelling the variability as a hidden random variable has been given by Allassonnière, Amit and Trouvé in [1] in simple and mixture of deformable template models. A consistent stochastic algorithm has bee…
▽ More
Estimating probabilistic deformable template models is a new approach in the fields of computer vision and probabilistic atlases in computational anatomy. A first coherent statistical framework modelling the variability as a hidden random variable has been given by Allassonnière, Amit and Trouvé in [1] in simple and mixture of deformable template models. A consistent stochastic algorithm has been introduced in [2] to face the problem encountered in [1] for the convergence of the estimation algorithm for the one component model in the presence of noise. We propose here to go on in this direction of using some "SAEM-like" algorithm to approximate the MAP estimator in the general Bayesian setting of mixture of deformable template model. We also prove the convergence of this algorithm toward a critical point of the penalised likelihood of the observations and illustrate this with handwritten digit images.
△ Less
Submitted 16 January, 2009; v1 submitted 11 February, 2008;
originally announced February 2008.
-
Construction of Bayesian Deformable Models via Stochastic Approximation Algorithm: A Convergence Study
Authors:
Stéphanie Allassonnière,
Estelle Kuhn,
Alain Trouvé
Abstract:
The problem of the definition and the estimation of generative models based on deformable templates from raw data is of particular importance for modelling non aligned data affected by various types of geometrical variability. This is especially true in shape modelling in the computer vision community or in probabilistic atlas building for Computational Anatomy (CA). A first coherent statistical…
▽ More
The problem of the definition and the estimation of generative models based on deformable templates from raw data is of particular importance for modelling non aligned data affected by various types of geometrical variability. This is especially true in shape modelling in the computer vision community or in probabilistic atlas building for Computational Anatomy (CA). A first coherent statistical framework modelling the geometrical variability as hidden variables has been given by Allassonnière, Amit and Trouvé (JRSS 2006). Setting the problem in a Bayesian context they proved the consistency of the MAP estimator and provided a simple iterative deterministic algorithm with an EM flavour leading to some reasonable approximations of the MAP estimator under low noise conditions. In this paper we present a stochastic algorithm for approximating the MAP estimator in the spirit of the SAEM algorithm. We prove its convergence to a critical point of the observed likelihood with an illustration on images of handwritten digits.
△ Less
Submitted 16 January, 2009; v1 submitted 6 June, 2007;
originally announced June 2007.