-
Robust estimation of microbial diversity in theory and in practice
Authors:
Bart Haegeman,
Jérôme Hamelin,
John Moriarty,
Peter Neal,
Jonathan Dushoff,
Joshua S. Weitz
Abstract:
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reli…
▽ More
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao's estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics ("Hill diversities"), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao's estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.
△ Less
Submitted 23 February, 2013; v1 submitted 15 February, 2013;
originally announced February 2013.
-
Function-Valued Traits in Evolution
Authors:
Pantelis Z. Hadjipantelis,
Nick S. Jones,
John Moriarty,
David A. Springate,
Christopher G. Knight
Abstract:
Many biological characteristics of evolutionary interest are not scalar variables but continuous functions. Given a dataset of function-valued traits generated by evolution, we develop a practical statistical approach to infer ancestral function-valued traits, and estimate the generative evolutionary process. We do this by combining dimension reduction and phylogenetic Gaussian process regressio…
▽ More
Many biological characteristics of evolutionary interest are not scalar variables but continuous functions. Given a dataset of function-valued traits generated by evolution, we develop a practical statistical approach to infer ancestral function-valued traits, and estimate the generative evolutionary process. We do this by combining dimension reduction and phylogenetic Gaussian process regression, a nonparametric procedure which explicitly accounts for known phylogenetic relationships. We test the methods' performance on simulated function-valued data generated from a stochastic evolutionary model. The methods are applied assuming that only the phylogeny and the function-valued traits of taxa at its tips are known. Our method is robust and applicable to a wide range of function-valued data, and also offers a phylogenetically aware method for estimating the autocorrelation of function-valued traits.
△ Less
Submitted 15 December, 2012;
originally announced December 2012.
-
Ancestral Inference from Functional Data: Statistical Methods and Numerical Examples
Authors:
Pantelis Z. Hadjipantelis,
Nick S. Jones,
John Moriarty,
David Springate,
Christopher G. Knight
Abstract:
Many biological characteristics of evolutionary interest are not scalar variables but continuous functions. Here we use phylogenetic Gaussian process regression to model the evolution of simulated function-valued traits. Given function-valued data only from the tips of an evolutionary tree and utilising independent principal component analysis (IPCA) as a method for dimension reduction, we constru…
▽ More
Many biological characteristics of evolutionary interest are not scalar variables but continuous functions. Here we use phylogenetic Gaussian process regression to model the evolution of simulated function-valued traits. Given function-valued data only from the tips of an evolutionary tree and utilising independent principal component analysis (IPCA) as a method for dimension reduction, we construct distributional estimates of ancestral function-valued traits, and estimate parameters describing their evolutionary dynamics.
△ Less
Submitted 2 August, 2012;
originally announced August 2012.
-
Evolutionary Inference for Function-valued Traits: Gaussian Process Regression on Phylogenies
Authors:
Nick S. Jones,
John Moriarty
Abstract:
Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both…
▽ More
Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both for prediction (placing posterior distributions on ancestral functions) and model selection (comparing rates of evolution across a phylogeny, or identifying the most likely phylogenies consistent with the observed data). Our work is integrative, extending the popular phylogenetic Brownian Motion and Ornstein-Uhlenbeck models to functional data and Bayesian inference, and extending Gaussian Process regression to phylogenies. We provide a brief illustration of the application of our method.
△ Less
Submitted 3 August, 2012; v1 submitted 26 April, 2010;
originally announced April 2010.