-
kDGLM: a R package for Bayesian analysis of Generalized Dynamic Linear Models
Authors:
Silvaneo V. dos Santos Jr.,
Mariane Branco Alves,
Helio S. Migon
Abstract:
This paper introduces kDGLM, an R package designed for Bayesian analysis of Generalized Dynamic Linear Models (GDLM), with a primary focus on both uni- and multivariate exponential families. Emphasizing sequential inference for time series data, the kDGLM package provides comprehensive support for fitting, smoothing, monitoring, and feed-forward interventions. The methodology employed by kDGLM, as…
▽ More
This paper introduces kDGLM, an R package designed for Bayesian analysis of Generalized Dynamic Linear Models (GDLM), with a primary focus on both uni- and multivariate exponential families. Emphasizing sequential inference for time series data, the kDGLM package provides comprehensive support for fitting, smoothing, monitoring, and feed-forward interventions. The methodology employed by kDGLM, as proposed in Alves et al. (2024), seamlessly integrates with well-established techniques from the literature, particularly those used in (Gaussian) Dynamic Models. These include discount strategies, autoregressive components, transfer functions, and more. Leveraging key properties of the Kalman filter and smoothing, kDGLM exhibits remarkable computational efficiency, enabling virtually instantaneous fitting times that scale linearly with the length of the time series. This characteristic makes it an exceptionally powerful tool for the analysis of extended time series. For example, when modeling monthly hospital admissions in Brazil due to gastroenteritis from 2010 to 2022, the fitting process took a mere 0.11s. Even in a spatial-time variant of the model (27 outcomes, 110 latent states, and 156 months, yielding 17,160 parameters), the fitting time was only 4.24s. Currently, the kDGLM package supports a range of distributions, including univariate Normal (unknown mean and observational variance), bivariate Normal (unknown means, observational variances, and correlation), Poisson, Gamma (known shape and unknown mean), and Multinomial (known number of trials and unknown event probabilities). Additionally, kDGLM allows the joint modeling of multiple time series, provided each series follows one of the supported distributions. Ongoing efforts aim to continuously expand the supported distributions.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Unsupervised Bayesian classification for models with scalar and functional covariates
Authors:
Nancy L. Garcia,
Mariana Rodrigues-Motta,
Helio S. Migon,
Eva Petkova,
Thaddeus Tarpey,
R. Todd Ogden,
Julio O. Giodano,
Martin Matias Perez
Abstract:
We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of L components of a mixture model. This process can be thought as a hierarchical model with first level modelling a scalar response according to a mixture of parametric distributions, the second level models the mixture probabilities by means of a generalised linear model…
▽ More
We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of L components of a mixture model. This process can be thought as a hierarchical model with first level modelling a scalar response according to a mixture of parametric distributions, the second level models the mixture probabilities by means of a generalised linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality since functional covariates can be measured at very small intervals leading to a highly parametrised model but also does not take into account the nature of the data. We use basis expansion to reduce the dimensionality and a Bayesian approach to estimate the parameters while providing predictions of the latent classification vector. By means of a simulation study we investigate the behaviour of our approach considering normal mixture model and zero inflated mixture of Poisson distributions. We also compare the performance of the classical Gibbs sampling approach with Variational Bayes Inference.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
An Efficient Sequential Approach for k-Parametric Dynamic Generalised Linear Models
Authors:
Mariane Branco Alves,
Helio S. Migon,
Silvaneo V. Santos Jr,
Raíra Marotta
Abstract:
A novel sequential inferential method for Bayesian dynamic generalised linear models is presented, addressing both univariate and multivariate $k$-parametric exponential families. It efficiently handles diverse responses, including multinomial, gamma, normal, and Poisson distributed outcomes, by leveraging the conjugate and predictive structure of the exponential family. The approach integrates in…
▽ More
A novel sequential inferential method for Bayesian dynamic generalised linear models is presented, addressing both univariate and multivariate $k$-parametric exponential families. It efficiently handles diverse responses, including multinomial, gamma, normal, and Poisson distributed outcomes, by leveraging the conjugate and predictive structure of the exponential family. The approach integrates information geometry concepts, such as the projection theorem and Kullback-Leibler divergence, and aligns with recent advances in variational inference. Applications to both synthetic and real datasets highlight its computational efficiency and scalability, surpassing alternative methods. The approach supports the strategic integration of new information, facilitating monitoring, intervention, and the application of discount factors, which are typical in sequential analyses. The R package kDGLM is available for direct use by applied researchers, facilitating the implementation of the method for specific k-parametric dynamic generalised models.
△ Less
Submitted 13 January, 2025; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Bayesian estimation of dynamic weights in Gaussian mixture models
Authors:
Michel H. Montoril,
Leandro T. Correia,
Helio S. Migon
Abstract:
This paper proposes a generalization of Gaussian mixture models, where the mixture weight is allowed to behave as an unknown function of time. This model is capable of successfully capturing the features of the data, as demonstrated by simulated and real datasets. It can be useful in studies such as clustering, change-point and process control. In order to estimate the mixture weight function, we…
▽ More
This paper proposes a generalization of Gaussian mixture models, where the mixture weight is allowed to behave as an unknown function of time. This model is capable of successfully capturing the features of the data, as demonstrated by simulated and real datasets. It can be useful in studies such as clustering, change-point and process control. In order to estimate the mixture weight function, we propose two new Bayesian nonlinear dynamic approaches for polynomial models, that can be extended to other problems involving polynomial nonlinear dynamic models. One of the methods, called here component-wise Metropolis-Hastings, apply the Metropolis-Hastings algorithm to each local level component of the state equation. It is more general and can be used in any situation where the observation and state equations are nonlinearly connected. The other method tends to be faster, but is applied specifically to binary data (using the probit link function). The performance of these methods of estimation, in the context of the proposed dynamic Gaussian mixture model, is evaluated through simulated datasets. Also, an application to an array Comparative Genomic Hybridization (aCGH) dataset from glioblastoma cancer illustrates our proposal, highlighting the ability of the method to detect chromosome aberrations.
△ Less
Submitted 8 September, 2022; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Variational Full Bayes Lasso: Knots Selection in Regression Splines
Authors:
Larissa Alves,
Ronaldo Dias,
Helio S. Migon
Abstract:
We develop a fully automatic Bayesian Lasso via variational inference. This is a scalable procedure for approximating the posterior distribution. Special attention is driven to the knot selection in regression spline. In order to carry through our proposal, a full automatic variational Bayesian Lasso, a Jefferey's prior is proposed for the hyperparameters and a decision theoretical approach is int…
▽ More
We develop a fully automatic Bayesian Lasso via variational inference. This is a scalable procedure for approximating the posterior distribution. Special attention is driven to the knot selection in regression spline. In order to carry through our proposal, a full automatic variational Bayesian Lasso, a Jefferey's prior is proposed for the hyperparameters and a decision theoretical approach is introduced to decide if a knot is selected or not. Extensive simulation studies were developed to ensure the effectiveness of the proposed algorithms. The performance of the algorithms were also tested in some real data sets, including data from the world pandemic Covid-19. Again, the algorithms showed a very good performance in capturing the data structure.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
The effects of degrees of freedom estimation in the Asymmetric GARCH model with Student-t Innovations
Authors:
T. C. O. Fonseca,
V. S. Cerqueira,
H. S. Migon,
C. A. C. Torres
Abstract:
This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and m…
▽ More
This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and model selection We discuss problems related to the estimation of degrees of freedom in the Student-t model and propose a solution based on independent Jeffreys priors which correct problems in the likelihood function. A simulated study is presented to investigate how the estimation of model parameters in the Student-t GARCH model are affected by small sample sizes, prior distributions and misspecification regarding the sampling distribution. An application to the Dow Jones stock market data illustrates the usefulness of the asymmetric GARCH model with Student-t errors.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Reference Bayesian analysis for hierarchical models
Authors:
Thaís C. O. Fonseca,
Helio S. Migon,
Heudson Mirandola
Abstract:
This paper proposes an alternative approach for constructing invariant Jeffreys prior distributions tailored for hierarchical or multilevel models. In particular, our proposal is based on a flexible decomposition of the Fisher information for hierarchical models which overcomes the marginalization step of the likelihood of model parameters. The Fisher information matrix for the hierarchical model…
▽ More
This paper proposes an alternative approach for constructing invariant Jeffreys prior distributions tailored for hierarchical or multilevel models. In particular, our proposal is based on a flexible decomposition of the Fisher information for hierarchical models which overcomes the marginalization step of the likelihood of model parameters. The Fisher information matrix for the hierarchical model is derived from the Hessian of the Kullback-Liebler (KL) divergence for the model in a neighborhood of the parameter value of interest. Properties of the KL divergence are used to prove the proposed decomposition. Our proposal takes advantage of the hierarchy and leads to an alternative way of computing Jeffreys priors for the hyperparameters and an upper bound for the prior information. While the Jeffreys prior gives the minimum information about parameters, the proposed bound gives an upper limit for the information put in any prior distribution. A prior with information above that limit may be considered too informative. From a practical point of view, the proposed prior may be evaluated computationally as part of a MCMC algorithm. This property might be essential for modeling setups with many levels in which analytic marginalization is not feasible. We illustrate the usefulness of our proposal with examples in mixture models, in model selection priors such as lasso and in the Student-t model.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Dynamic quantile linear models: a Bayesian approach
Authors:
Kelly C. M. Gonçalves,
Helio S. Migon,
Leonardo S. Bastos
Abstract:
A new class of models, named dynamic quantile linear models, is presented. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm. A fast sequential procedure suited for high-dimensional predictive modeling applica…
▽ More
A new class of models, named dynamic quantile linear models, is presented. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm. A fast sequential procedure suited for high-dimensional predictive modeling applications with massive data, in which the generating process is itself changing overtime, is also proposed. The proposed model is evaluated using synthetic and well-known time series data. The model is also applied to predict annual incidence of tuberculosis in Rio de Janeiro state for future years and compared with global strategy targets set by the World Health Organization.
△ Less
Submitted 18 February, 2018; v1 submitted 31 October, 2017;
originally announced November 2017.
-
A Hierarchical Dynamic Beta Regression Model of School Performance in the Brazilian Mathematical Olympiads for Public Schools
Authors:
Alexandra M. Schmidt,
Caroline P. de Moraes,
Helio S. Migon
Abstract:
The Brazilian Mathematical Olympiads for Public Schools (OBMEP) is held every year since 2005. In the 2013 edition there were over 47,000 schools registered involving nearly 19.2 million students. The Brazilian public educational system is structured into three administrative levels: federal, state and municipal. Students participating in the OBMEP come from three educational levels, two in primar…
▽ More
The Brazilian Mathematical Olympiads for Public Schools (OBMEP) is held every year since 2005. In the 2013 edition there were over 47,000 schools registered involving nearly 19.2 million students. The Brazilian public educational system is structured into three administrative levels: federal, state and municipal. Students participating in the OBMEP come from three educational levels, two in primary and one in secondary school. We aim at studying the performance of Brazilian public schools which have been taking part of the OBMEP from 2006 until 2013. We propose a standardization of the mean scores of schools per year and educational level which is modeled through a hierarchical dynamic beta regression model. Both the mean and precision of the beta distribution are modeled as a function of covariates whose effects evolve smoothly with time. Results show that, regardless of the educational level, federal schools have better performance than municipal or state schools. The mean performance of schools increases with the human development index (HDI) of the municipality the school is located in. Moreover, the difference in mean performance between federal and state or municipal schools tends to increase with the HDI. Schools with higher proportion of boys tend to have better mean performance in the second and third educational levels of OBMEP.
△ Less
Submitted 2 July, 2015;
originally announced July 2015.