Search | arXiv e-print repository

Finite mixture representations of zero-&-$N$-inflated distributions for count-compositional data

Authors: André F. B. Menezes, Andrew C. Parnell, Keefe Murphy

Abstract: We provide novel probabilistic portrayals of two multivariate models designed to handle zero-inflation in count-compositional data. We develop a new unifying framework that represents both as finite mixture distributions. One of these distributions, based on Dirichlet-multinomial components, has been studied before, but has not yet been properly characterised as a sampling distribution of the coun… ▽ More We provide novel probabilistic portrayals of two multivariate models designed to handle zero-inflation in count-compositional data. We develop a new unifying framework that represents both as finite mixture distributions. One of these distributions, based on Dirichlet-multinomial components, has been studied before, but has not yet been properly characterised as a sampling distribution of the counts. The other, based on multinomial components, is a new contribution. Using our finite mixture representations enables us to derive key statistical properties, including moments, marginal distributions, and special cases for both distributions. We develop enhanced Bayesian inference schemes with efficient Gibbs sampling updates, wherever possible, for parameters and auxiliary variables, demonstrating improvements over existing methods in the literature. We conduct simulation studies to evaluate the efficiency of the Bayesian inference procedures and to illustrate the practical utility of the proposed distributions. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2412.14946 [pdf, other]

Joint Models for Handling Non-Ignorable Missing Data using Bayesian Additive Regression Trees: Application to Leaf Photosynthetic Traits Data

Authors: Yong Chen Goh, Wuu Kuang Soh, Andrew C. Parnell, Keefe Murphy

Abstract: Dealing with missing data poses significant challenges in predictive analysis, often leading to biased conclusions when oversimplified assumptions about the missing data process are made. In cases where the data are missing not at random (MNAR), jointly modeling the data and missing data indicators is essential. Motivated by a real data application with partially missing multivariate outcomes rela… ▽ More Dealing with missing data poses significant challenges in predictive analysis, often leading to biased conclusions when oversimplified assumptions about the missing data process are made. In cases where the data are missing not at random (MNAR), jointly modeling the data and missing data indicators is essential. Motivated by a real data application with partially missing multivariate outcomes related to leaf photosynthetic traits and several environmental covariates, we propose two methods under a selection model framework for handling data with missingness in the response variables suitable for recovering various missingness mechanisms. Both approaches use a multivariate extension of Bayesian additive regression trees (BART) to flexibly model the outcomes. The first approach simultaneously uses a probit regression model to jointly model the missingness. In scenarios where the relationship between the missingness and the data is more complex or non-linear, we propose a second approach using a probit BART model to characterize the missing data process, thereby employing two BART models simultaneously. Both models also effectively handle ignorable covariate missingness. The efficacy of both models compared to existing missing data approaches is demonstrated through extensive simulations, in both univariate and multivariate settings, and through the aforementioned application to the leaf photosynthetic trait data. △ Less

Submitted 19 December, 2024; originally announced December 2024.

arXiv:2408.17230 [pdf, other]

cosimmr: an R package for fast fitting of Stable Isotope Mixing Models with covariates

Authors: Emma Govan, Andrew L Jackson, Stuart Bearhop, Richard Inger, Brian C Stock, Brice X Semmens, Eric J Ward, Andrew C Parnell

Abstract: The study of animal diets and the proportional contribution that different foods make to their diets is an important task in ecology. Stable Isotope Mixing Models (SIMMs) are an important tool for studying an animal's diet and understanding how the animal interacts with its environment. We present cosimmr, a new R package designed to include covariates when estimating diet proportions in SIMMs, wi… ▽ More The study of animal diets and the proportional contribution that different foods make to their diets is an important task in ecology. Stable Isotope Mixing Models (SIMMs) are an important tool for studying an animal's diet and understanding how the animal interacts with its environment. We present cosimmr, a new R package designed to include covariates when estimating diet proportions in SIMMs, with simple functions to produce plots and summary statistics. The inclusion of covariates allows for users to perform a more in-depth analysis of their system and to gain new insights into the diets of the organisms being studied. A common problem with the previous generation of SIMMs is that they are very slow to produce a posterior distribution of dietary estimates, especially for more complex model structures, such as when covariates are included. The widely-used Markov chain Monte Carlo (MCMC) algorithm used by many traditional SIMMs often requires a very large number of iterations to reach convergence. In contrast, cosimmr uses Fixed Form Variational Bayes (FFVB), which we demonstrate gives up to an order of magnitude speed improvement with no discernible loss of accuracy. We provide a full mathematical description of the model, which includes corrections for trophic discrimination and concentration dependence, and evaluate its performance against the state of the art MixSIAR model. Whilst MCMC is guaranteed to converge to the posterior distribution in the long term, FFVB converges to an approximation of the posterior distribution, which may lead to sub-optimal performance. However we show that the package produces equivalent results in a fraction of the time for all the examples on which we test. The package is designed to be user-friendly and is based on the existing simmr framework. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2404.02228 [pdf, other]

Seemingly unrelated Bayesian additive regression trees for cost-effectiveness analyses in healthcare

Authors: Jonas Esser, Mateus Maia, Andrew C. Parnell, Judith Bosmans, Hanneke van Dongen, Thomas Klausch, Keefe Murphy

Abstract: In recent years, theoretical results and simulation evidence have shown Bayesian additive regression trees to be a highly-effective method for nonparametric regression. Motivated by cost-effectiveness analyses in health economics, where interest lies in jointly modelling the costs of healthcare treatments and the associated health-related quality of life experienced by a patient, we propose a mult… ▽ More In recent years, theoretical results and simulation evidence have shown Bayesian additive regression trees to be a highly-effective method for nonparametric regression. Motivated by cost-effectiveness analyses in health economics, where interest lies in jointly modelling the costs of healthcare treatments and the associated health-related quality of life experienced by a patient, we propose a multivariate extension of BART which is applicable in regression analyses with several dependent outcome variables. Our framework allows for continuous or binary outcomes and overcomes some key limitations of existing multivariate BART models by allowing each individual response to be associated with different ensembles of trees, while still handling dependencies between the outcomes. In the case of continuous outcomes, our model is essentially a nonparametric version of seemingly unrelated regression. Likewise, our proposal for binary outcomes is a nonparametric generalisation of the multivariate probit model. We give suggestions for easily interpretable prior distributions, which allow specification of both informative and uninformative priors. We provide detailed discussions of MCMC sampling methods to conduct posterior inference. Our methods are implemented in the R package "subart". We showcase their performance through extensive simulation experiments and an application to an empirical case study from health economics. By also accommodating propensity scores in a manner befitting a causal analysis, we find substantial evidence for a novel trauma care intervention's cost-effectiveness. △ Less

Submitted 26 February, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2306.07817 [pdf, other]

simmr: A package for fitting Stable Isotope Mixing Models in R

Authors: Emma Govan, Andrew L. Jackson, Richard Inger, Stuart Bearhop, Andrew C. Parnell

Abstract: We introduce an R package for fitting Stable Isotope Mixing Models (SIMMs) via both Markov chain Monte Carlo and Variational Bayes. The package is mainly used for estimating dietary contributions from food sources taken via measurements of stable isotope ratios from animals. It can also be used to estimate proportional contributions of a mixture from known sources, for example apportionment of riv… ▽ More We introduce an R package for fitting Stable Isotope Mixing Models (SIMMs) via both Markov chain Monte Carlo and Variational Bayes. The package is mainly used for estimating dietary contributions from food sources taken via measurements of stable isotope ratios from animals. It can also be used to estimate proportional contributions of a mixture from known sources, for example apportionment of river sediment, amongst many other use cases. The package contains a simple structure which allows non-expert users to interface with the package, with most of the computational complexity hidden behind the main fitting functions. In this paper we detail the background to these functions and provide case studies on how the package should be used. Further examples are available in the online package vignettes. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 27 pages, 9 figures

arXiv:2301.03655 [pdf, other]

Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials

Authors: Antonia A. L. Dos Santos, Danilo A. Sarti, Rafael A. Moral, Andrew C. Parnell

Abstract: We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction. We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the model. Simulation experiments show that our method out-performs previous related models and machine learning algorithms under different sample sizes and degrees of… ▽ More We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction. We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the model. Simulation experiments show that our method out-performs previous related models and machine learning algorithms under different sample sizes and degrees of complexity. We further explore the applicability of our model by analysing real-world data related to wheat production across Ireland from 2010 to 2019. Our model performs competitively and overcomes key limitations found in other analogous approaches. Finally, we adapt a set of visualisations for the posterior distribution of the tensor effects that facilitate the identification of optimal interactions between the tensor variables whilst accounting for the uncertainty in the posterior distribution. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2207.00011 [pdf, other]

Variational Inference for Additive Main and Multiplicative Interaction Effects Models

Authors: AntÔnia A. L. Dos Santos, Rafael A. Moral, Danilo A. Sarti, Andrew C. Parnell

Abstract: In plant breeding the presence of a genotype by environment (GxE) interaction has a strong impact on cultivation decision making and the introduction of new crop cultivars. The combination of linear and bilinear terms has been shown to be very useful in modelling this type of data. A widely-used approach to identify GxE is the Additive Main Effects and Multiplicative Interaction Effects (AMMI) mod… ▽ More In plant breeding the presence of a genotype by environment (GxE) interaction has a strong impact on cultivation decision making and the introduction of new crop cultivars. The combination of linear and bilinear terms has been shown to be very useful in modelling this type of data. A widely-used approach to identify GxE is the Additive Main Effects and Multiplicative Interaction Effects (AMMI) model. However, as data frequently can be high-dimensional, Markov chain Monte Carlo (MCMC) approaches can be computationally infeasible. In this article, we consider a variational inference approach for such a model. We derive variational approximations for estimating the parameters and we compare the approximations to MCMC using both simulated and real data. The new inferential framework we propose is on average two times faster whilst maintaining the same predictive performance as MCMC. △ Less

Submitted 29 June, 2022; originally announced July 2022.

arXiv:2204.02112 [pdf, other]

GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes

Authors: Mateus Maia, Keefe Murphy, Andrew C. Parnell

Abstract: The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness an… ▽ More The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios. △ Less

Submitted 14 September, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2108.07636 [pdf, other]

Accounting for shared covariates in semi-parametric Bayesian additive regression trees

Authors: Estevão B. Prado, Andrew C. Parnell, Keefe Murphy, Nathan McJames, Ann O'Shea, Rafael A. Moral

Abstract: We propose some extensions to semi-parametric models based on Bayesian additive regression trees (BART). In the semi-parametric BART paradigm, the response variable is approximated by a linear predictor and a BART model, where the linear component is responsible for estimating the main effects and BART accounts for non-specified interactions and non-linearities. Previous semi-parametric models bas… ▽ More We propose some extensions to semi-parametric models based on Bayesian additive regression trees (BART). In the semi-parametric BART paradigm, the response variable is approximated by a linear predictor and a BART model, where the linear component is responsible for estimating the main effects and BART accounts for non-specified interactions and non-linearities. Previous semi-parametric models based on BART have assumed that the set of covariates in the linear predictor and the BART model are mutually exclusive in an attempt to avoid poor coverage properties and reduce bias in the estimates of the parameters in the linear predictor. The main novelty in our approach lies in the way we change the tree-generation moves in BART to deal with this bias and resolve non-identifiability issues between the parametric and non-parametric components, even when they have covariates in common. This allows us to model complex interactions involving the covariates of primary interest, both among themselves and with those in the BART component. Our novel method is developed with a view to analysing data from an international education assessment, where certain predictors of students' achievements in mathematics are of particular interpretational interest. Through additional simulation studies and another application to a well-known benchmark dataset, we also show competitive performance when compared to regression models, alternative formulations of semi-parametric BART, and other tree-based methods. The implementation of the proposed method is available at \url{https://github.com/ebprado/CSP-BART}. △ Less

Submitted 30 July, 2024; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: 48 pages, 8 tables, 10 figures

arXiv:2007.04177 [pdf, other]

Modelling excess zeros in count data: A new perspective on modelling approaches

Authors: John Haslett, Andrew C. Parnell, John Hinde, Rafael A. Moral

Abstract: We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, res… ▽ More We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer to these as Implicit and Explicit ZI models, respectively. Although sometimes seen as competing approaches, they can be complementary; OD is a consequence of ZI modelling, and ZI is a by-product of OD modelling. The central objective in such analyses is often concerned with inference on the effect of covariates on the mean, in light of the apparent excess of zeros in the counts. Typically the modelling of the excess zeros per se is a secondary objective and there are choices to be made between, and within, the OD and ZI approaches. The contribution of this paper is primarily conceptual. We contrast, descriptively, the impact on zeros of the two approaches. We further offer a novel descriptive characterisation of alternative ZI models, including the classic hurdle and mixture models, by providing a unifying theoretical framework for their comparison. This in turn leads to a novel and technically simpler ZI model. We develop the underlying theory for univariate counts and touch on its implication for multivariate count data. △ Less

Submitted 29 July, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: 41 pages, 3 figures, 1 table

arXiv:2006.07493 [pdf, other]

doi 10.1007/s11222-021-09997-3

Bayesian Additive Regression Trees with Model Trees

Authors: Estevão B. Prado, Rafael A. Moral, Andrew C. Parnell

Abstract: Bayesian Additive Regression Trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of non-linearity and high-order interactions. In this paper, we introduce an extension of BART, called Model… ▽ More Bayesian Additive Regression Trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of non-linearity and high-order interactions. In this paper, we introduce an extension of BART, called Model Trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART. △ Less

Submitted 10 March, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Journal ref: Statistics and Computing 31, 20 (2021)

arXiv:1906.06744 [pdf, other]

Bayesian spatial extreme value analysis of maximum temperatures in County Dublin, Ireland

Authors: John O'Sullivan, Conor Sweeney, Andrew C. Parnell

Abstract: In this study, we begin a comprehensive characterisation of temperature extremes in Ireland for the period 1981-2010. We produce return levels of anomalies of daily maximum temperature extremes for an area over Ireland, for the 30-year period 1981-2010. We employ extreme value theory (EVT) to model the data using the generalised Pareto distribution (GPD) as part of a three-level Bayesian hierarchi… ▽ More In this study, we begin a comprehensive characterisation of temperature extremes in Ireland for the period 1981-2010. We produce return levels of anomalies of daily maximum temperature extremes for an area over Ireland, for the 30-year period 1981-2010. We employ extreme value theory (EVT) to model the data using the generalised Pareto distribution (GPD) as part of a three-level Bayesian hierarchical model. We use predictive processes in order to solve the computationally difficult problem of modelling data over a very dense spatial field. To our knowledge, this is the first study to combine predictive processes and EVT in this manner. The model is fit using Markov chain Monte Carlo (MCMC) algorithms. Posterior parameter estimates and return level surfaces are produced, in addition to specific site analysis at synoptic stations, including Casement Aerodrome and Dublin Airport. Observational data from the period 2011-2018 is included in this site analysis to determine if there is evidence of a change in the observed extremes. An increase in the frequency of extreme anomalies, but not the severity, is observed for this period. We found that the frequency of observed extreme anomalies from 2011-2018 at the Casement Aerodrome and Phoenix Park synoptic stations exceed the upper bounds of the credible intervals from the model by 20% and 7% respectively. △ Less

Submitted 16 June, 2019; originally announced June 2019.

arXiv:1508.02010 [pdf, other]

doi 10.5194/cp-12-525-2016

A Bayesian Hierarchical Model for Reconstructing Sea Levels: From Raw Data to Rates of Change

Authors: Niamh Cahill, Andrew C. Kemp, Benjamin P. Horton, Andrew C. Parnell

Abstract: We present a holistic Bayesian hierarchical model for reconstructing the continuous and dynamic evolution of relative sea-level (RSL) change with fully quantified uncertainty. The reconstruction is produced from biological (foraminifera) and geochemical (δ13C) sea-level indicators preserved in dated cores of salt-marsh sediment. Our model is comprised of three modules: (1) A Bayesian transfer func… ▽ More We present a holistic Bayesian hierarchical model for reconstructing the continuous and dynamic evolution of relative sea-level (RSL) change with fully quantified uncertainty. The reconstruction is produced from biological (foraminifera) and geochemical (δ13C) sea-level indicators preserved in dated cores of salt-marsh sediment. Our model is comprised of three modules: (1) A Bayesian transfer function for the calibration of foraminifera into tidal elevation, which is flexible enough to formally accommodate additional proxies (in this case bulk-sediment δ13C values); (2) A chronology developed from an existing Bchron age-depth model, and (3) An existing errors-in-variables integrated Gaussian process (EIV-IGP) model for estimating rates of sea-level change. We illustrate our approach using a case study of Common Era sea-level variability from New Jersey, U.S.A. We develop a new Bayesian transfer function (B-TF), with and without the δ13C proxy and compare our results to those from a widely-used weighted-averaging transfer function (WA-TF). The formal incorporation of a second proxy into the B-TF model results in smaller vertical uncertainties and improved accuracy for reconstructed RSL. The vertical uncertainty from the multi-proxy B-TF is ~28% smaller on average compared to the WA-TF. When evaluated against historic tide-gauge measurements, the multi-proxy B-TF most accurately reconstructs the RSL changes observed in the instrumental record (MSE = 0.003). The holistic model provides a single, unifying framework for reconstructing and analysing sea level through time. This approach is suitable for reconstructing other paleoenvironmental variables using biological proxies. △ Less

Submitted 9 August, 2015; originally announced August 2015.

Comments: 27 pages, 7 figures

arXiv:1507.00181 [pdf, other]

Bayesian Additive Regression Trees using Bayesian Model Averaging

Authors: Belinda Hernández, Adrian E. Raftery, Stephen R. Pennington, Andrew C. Parnell

Abstract: Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally. Another method which is popular for hi… ▽ More Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions. We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git △ Less

Submitted 8 July, 2015; v1 submitted 1 July, 2015; originally announced July 2015.

arXiv:1407.6242 [pdf, ps, other]

Frequency behaviour for multinomial counts of fisheries discards via a nested wavelet zero and N inflated binomial model

Authors: Andrew C. Parnell, Norman Graham, Andrew L. Jackson, Mafalda Viana

Abstract: In this paper we identify the changing frequency behaviour of multinomial counts of fish species discarded by vessels in the Irish Sea. We use a Bayesian hierarchical model which captures dynamic frequency changes via a shrinkage model applied to wavelet basis functions. Wavelets are known for capturing data features at different temporal scales; we use a recently-proposed shrinkage prior from the… ▽ More In this paper we identify the changing frequency behaviour of multinomial counts of fish species discarded by vessels in the Irish Sea. We use a Bayesian hierarchical model which captures dynamic frequency changes via a shrinkage model applied to wavelet basis functions. Wavelets are known for capturing data features at different temporal scales; we use a recently-proposed shrinkage prior from the factor analysis literature so that features at the finest levels of detail exhibit the greatest shrinkage. Rather than using a multinomial distribution for monitoring the changes in discards over time, which can be slow to fit and inflexible, we use a nested zero-and-N inflated (ZaNI) binomial distribution which enables much faster computation with no obvious deterioration in model flexibility. Our results show that seasonal behaviour in these data are not regular and occur at different frequencies. We also show that the nested ZaNI binomial distribution is a good fit to multinomial count data of this sort when an informative nested structure is applied. △ Less

Submitted 23 July, 2014; originally announced July 2014.

Comments: 24 pages, 9 figures

arXiv:1407.0064 [pdf, ps, other]

The zero & $N$-inflated binomial distribution with applications

Authors: James Sweeney, John Haslett, Andrew C. Parnell

Abstract: In this article we consider the distribution arising when two zero-inflated Poisson count processes are constrained by their sum total, resulting in a novel zero & $N$-inflated binomial distribution. This result motivates a general class of model for applications in which a sum-constrained count response is subject to multiple sources of heterogeneity, principally an excess of zeroes and $N$'s in… ▽ More In this article we consider the distribution arising when two zero-inflated Poisson count processes are constrained by their sum total, resulting in a novel zero & $N$-inflated binomial distribution. This result motivates a general class of model for applications in which a sum-constrained count response is subject to multiple sources of heterogeneity, principally an excess of zeroes and $N$'s in the underlying count generating process. Two examples from the ecological regression literature are used to illustrate the wide applicability of the proposed model, and serve to detail its substantial superiority in modelling performance as compared to competing models. We also present an extension to the modelling framework for more complex cases, considering a gender study dataset which is overdispersed relative to the new likelihood, and conclude the article with the description of a general framework for a zero & $N$-inflated multinomial distribution. △ Less

Submitted 17 February, 2016; v1 submitted 30 June, 2014; originally announced July 2014.

arXiv:1402.3014 [pdf, other]

Joint Inference of Misaligned Irregular Time Series with Application to Greenland Ice Core Data

Authors: Thinh K. Doan, Andrew C. Parnell, John Haslett

Abstract: Ice cores provide insight into the past climate over many millennia. Due to ice compaction, the raw data for any single core are irregular in time. Multiple cores have different irregularities; jointly these series are misaligned. After processing, such data are made available to researchers as regular time series: a data product. Typically, these cores are independently processed. In this paper,… ▽ More Ice cores provide insight into the past climate over many millennia. Due to ice compaction, the raw data for any single core are irregular in time. Multiple cores have different irregularities; jointly these series are misaligned. After processing, such data are made available to researchers as regular time series: a data product. Typically, these cores are independently processed. In this paper, we consider a fast Bayesian method for the joint processing of multiple irregular series. This is shown to be more efficient. Further, our approach permits a realistic modelling of the impact of the multiple sources of uncertainty. The methodology is illustrated with the analysis of a pair of ice cores (GISP2 and GRIP). Our data products, in the form of marginal posterior distributions on an arbitrary temporal grid, are finite Gaussian mixtures. We can also produce sample paths from the joint posterior distribution to study non-linear functionals of interest. More generally, the concept of joint analysis via hierarchical Gaussian process model can be widely extended as the models used can be viewed within the larger context of continuous space-time processes. △ Less

Submitted 22 September, 2014; v1 submitted 12 February, 2014; originally announced February 2014.

Comments: 14 pages, 8 figures

arXiv:1312.6761 [pdf, ps, other]

doi 10.1214/15-AOAS824

Modeling sea-level change using errors-in-variables integrated Gaussian processes

Authors: Niamh Cahill, Andrew C. Kemp, Benjamin P. Horton, Andrew C. Parnell

Abstract: We perform Bayesian inference on historical and late Holocene (last 2000 years) rates of sea-level change. The input data to our model are tide-gauge measurements and proxy reconstructions from cores of coastal sediment. These data are complicated by multiple sources of uncertainty, some of which arise as part of the data collection exercise. Notably, the proxy reconstructions include temporal unc… ▽ More We perform Bayesian inference on historical and late Holocene (last 2000 years) rates of sea-level change. The input data to our model are tide-gauge measurements and proxy reconstructions from cores of coastal sediment. These data are complicated by multiple sources of uncertainty, some of which arise as part of the data collection exercise. Notably, the proxy reconstructions include temporal uncertainty from dating of the sediment core using techniques such as radiocarbon. The model we propose places a Gaussian process prior on the rate of sea-level change, which is then integrated and set in an errors-in-variables framework to take account of age uncertainty. The resulting model captures the continuous and dynamic evolution of sea-level change with full consideration of all sources of uncertainty. We demonstrate the performance of our model using two real (and previously published) example data sets. The global tide-gauge data set indicates that sea-level rise increased from a rate with a posterior mean of 1.13 mm$/$yr in 1880 AD (0.89 to 1.28 mm$/$yr 95% credible interval for the posterior mean) to a posterior mean rate of 1.92 mm$/$yr in 2009 AD (1.84 to 2.03 mm$/$yr 95% credible interval for the posterior mean). The proxy reconstruction from North Carolina (USA) after correction for land-level change shows the 2000 AD rate of rise to have a posterior mean of 2.44 mm$/$yr (1.91 to 3.01 mm$/$yr 95% credible interval). This is unprecedented in at least the last 2000 years. △ Less

Submitted 11 September, 2015; v1 submitted 24 December, 2013; originally announced December 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS824 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS824

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 547-571

arXiv:1209.6457 [pdf, other]

Bayesian Stable Isotope Mixing Models

Authors: Andrew C. Parnell, Donald L. Phillips, Stuart Bearhop, Brice X. Semmens, Eric J. Ward, Jonathan W. Moore, Andrew L. Jackson, Richard Inger

Abstract: In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixture. The most widely used application is quantifying the diet of organisms based on the food sources they have been observe… ▽ More In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixture. The most widely used application is quantifying the diet of organisms based on the food sources they have been observed to consume. At the centre of the multivariate statistical model we propose is a compositional mixture of the food sources corrected for various metabolic factors. The compositional component of our model is based on the isometric log ratio (ilr) transform of Egozcue (2003). Through this transform we can apply a range of time series and non-parametric smoothing relationships. We illustrate our models with 3 case studies based on real animal dietary behaviour. △ Less

Submitted 28 September, 2012; originally announced September 2012.

Comments: 16 pages, 9 Figures, 1 Table

arXiv:1206.5009 [pdf, other]

On Bayesian Modelling of the Uncertainties in Palaeoclimate Reconstruction

Authors: Andrew C. Parnell, James Sweeney, Thinh K. Doan, Michael Salter-Townshend, Judy R. M. Allen, Brian Huntley, John Haslett

Abstract: We outline a model and algorithm to perform inference on the palaeoclimate and palaeoclimate volatility from pollen proxy data. We use a novel multivariate non-linear non-Gaussian state space model consisting of an observation equation linking climate to proxy data and an evolution equation driving climate change over time. The link from climate to proxy data is defined by a pre-calibrated forward… ▽ More We outline a model and algorithm to perform inference on the palaeoclimate and palaeoclimate volatility from pollen proxy data. We use a novel multivariate non-linear non-Gaussian state space model consisting of an observation equation linking climate to proxy data and an evolution equation driving climate change over time. The link from climate to proxy data is defined by a pre-calibrated forward model, as developed in Salter-Townshend and Haslett (2012) and Sweeney (2012). Climatic change is represented by a temporally-uncertain Normal-Inverse Gaussian Levy process, being able to capture large jumps in multivariate climate whilst remaining temporally consistent. The pre-calibrated nature of the forward model allows us to cut feedback between the observation and evolution equations and thus integrate out the state variable entirely whilst making minimal simplifying assumptions. A key part of this approach is the creation of mixtures of marginal data posteriors representing the information obtained about climate from each individual time point. Our approach allows for an extremely efficient MCMC algorithm, which we demonstrate with a pollen core from Sluggan Bog, County Antrim, Northern Ireland. △ Less

Submitted 21 June, 2012; originally announced June 2012.

Comments: 25 pages, 7 figures

Showing 1–20 of 20 results for author: Parnell, A C