-
Brownian motion tree models are toric
Authors:
Bernd Sturmfels,
Caroline Uhler,
Piotr Zwiernik
Abstract:
Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices.
Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Does Antibiotic Resistance Evolve in Hospitals?
Authors:
Anna Seigal,
Portia Mira,
Bernd Sturmfels,
Miriam Barlow
Abstract:
Nosocomial outbreaks of bacteria are well-documented. Based on these incidents, and the heavy usage of antibiotics in hospitals, it has been assumed that antibiotic resistance evolves in hospital environments. To test this assumption, we studied resistance phenotypes of bacteria collected from patient isolates at a community hospital over a 2.5-year period. A graphical model analysis shows no asso…
▽ More
Nosocomial outbreaks of bacteria are well-documented. Based on these incidents, and the heavy usage of antibiotics in hospitals, it has been assumed that antibiotic resistance evolves in hospital environments. To test this assumption, we studied resistance phenotypes of bacteria collected from patient isolates at a community hospital over a 2.5-year period. A graphical model analysis shows no association between resistance and patient information other than time of arrival. This allows us to focus on time course data.
We introduce a Hospital Transmission Model, based on negative binomial delay. Our main contribution is a statistical hypothesis test called the Nosocomial Evolution of Resistance Detector (NERD). It calculates the significance of resistance trends occurring in a hospital. It can inform hospital staff about the effects of various practices and interventions, can help detect clonal outbreaks, and is available as an R-package.
We applied the NERD method to each of the 16 antibiotics in the study via 16 hypothesis tests. For 13 of the antibiotics, we found that the hospital environment had no significant effect upon the evolution of resistance; the hospital is merely a piece of the larger picture. The p-values obtained for the other three antibiotics (Cefepime, Ceftazidime and Gentamicin) indicate that particular care should be taken in hospital practices with these antibiotics. One of the three, Ceftazidime, was significant after accounting for multiple hypotheses, indicating a trend of decreased resistance for this drug.
△ Less
Submitted 18 October, 2016; v1 submitted 11 May, 2016;
originally announced May 2016.
-
Convexity in Tree Spaces
Authors:
Bo Lin,
Bernd Sturmfels,
Xiaoxian Tang,
Ruriko Yoshida
Abstract:
We study the geometry of metrics and convexity structures on the space of phylogenetic trees, which is here realized as the tropical linear space of all \ ultrametrics. The ${\rm CAT}(0)$-metric of Billera-Holmes-Vogtman arises from the theory of orthant spaces. While its geodesics can be computed by the Owen-Provan algorithm, geodesic triangles are complicated. We show that the dimension of such…
▽ More
We study the geometry of metrics and convexity structures on the space of phylogenetic trees, which is here realized as the tropical linear space of all \ ultrametrics. The ${\rm CAT}(0)$-metric of Billera-Holmes-Vogtman arises from the theory of orthant spaces. While its geodesics can be computed by the Owen-Provan algorithm, geodesic triangles are complicated. We show that the dimension of such a triangle can be arbitrarily high. Tropical convexity and the tropical metric behave better. They exhibit properties desirable for geometric statistics, such as geodesics of small depth.
△ Less
Submitted 14 June, 2016; v1 submitted 29 October, 2015;
originally announced October 2015.
-
Algebraic Systems Biology: A Case Study for the Wnt Pathway
Authors:
Elizabeth Gross,
Heather A. Harrington,
Zvi Rosen,
Bernd Sturmfels
Abstract:
Steady state analysis of dynamical systems for biological networks give rise to algebraic varieties in high-dimensional spaces whose study is of interest in their own right. We demonstrate this for the shuttle model of the Wnt signaling pathway. Here the variety is described by a polynomial system in 19 unknowns and 36 parameters. Current methods from computational algebraic geometry and combinato…
▽ More
Steady state analysis of dynamical systems for biological networks give rise to algebraic varieties in high-dimensional spaces whose study is of interest in their own right. We demonstrate this for the shuttle model of the Wnt signaling pathway. Here the variety is described by a polynomial system in 19 unknowns and 36 parameters. Current methods from computational algebraic geometry and combinatorics are applied to analyze this model.
△ Less
Submitted 10 February, 2015;
originally announced February 2015.
-
Rational Design of Antibiotic Treatment Plans
Authors:
Portia M. Mira,
Kristina Crona,
Devin Greene,
Juan C. Meza,
Bernd Sturmfels,
Miriam Barlow
Abstract:
The development of reliable methods for restoring susceptibility after antibiotic resistance arises has proven elusive. A greater understanding of the relationship between antibiotic administration and the evolution of resistance is key to overcoming this challenge. Here we present a data-driven mathematical approach for developing antibiotic treatment plans that can reverse the evolution of antib…
▽ More
The development of reliable methods for restoring susceptibility after antibiotic resistance arises has proven elusive. A greater understanding of the relationship between antibiotic administration and the evolution of resistance is key to overcoming this challenge. Here we present a data-driven mathematical approach for developing antibiotic treatment plans that can reverse the evolution of antibiotic resistance determinants. We have generated adaptive landscapes for 16 genotypes of the TEM beta-lactamase that vary from the wild type genotype TEM-1 through all combinations of four amino acid substitutions. We determined the growth rate of each genotype when treated with each of 15 beta-lactam antibiotics. By using growth rates as a measure of fitness, we computed the probability of each amino acid substitution in each beta-lactam treatment using two different models named the Correlated Probability Model (CPM) and the Equal Probability Model (EPM). We then performed an exhaustive search through the 15 treatments for substitution paths leading from each of the 16 genotypes back to the wild type TEM-1. We identified those treatment paths that returned the highest probabilities of selecting for reversions of amino acid substitutions and returning TEM to the wild type state. For the CPM model, the optimized probabilities ranged between 0.6 and 1.0. For the EPM model, the optimized probabilities ranged between 0.38 and 1.0. For cyclical CPM treatment plans in which the starting and ending genotype was the wild type, the probabilities were between 0.62 and 0.7. Overall this study shows that there is promise for reversing the evolution of resistance through antibiotic treatment plans.
△ Less
Submitted 5 June, 2014;
originally announced June 2014.
-
Siphons in chemical reaction networks
Authors:
Anne Shiu,
Bernd Sturmfels
Abstract:
Siphons in a chemical reaction system are subsets of the species that have the potential of being absent in a steady state. We present a characterization of minimal siphons in terms of primary decomposition of binomial ideals, we explore the underlying geometry, and we demonstrate the effective computation of siphons using computer algebra software. This leads to a new method for determining whe…
▽ More
Siphons in a chemical reaction system are subsets of the species that have the potential of being absent in a steady state. We present a characterization of minimal siphons in terms of primary decomposition of binomial ideals, we explore the underlying geometry, and we demonstrate the effective computation of siphons using computer algebra software. This leads to a new method for determining whether given initial concentrations allow for various boundary steady states.
△ Less
Submitted 5 January, 2010; v1 submitted 29 April, 2009;
originally announced April 2009.
-
Reconstructing Spatiotemporal Gene Expression Data from Partial Observations
Authors:
Dustin A. Cartwright,
Siobhan M. Brady,
David A. Orlando,
Bernd Sturmfels,
Philip N. Benfey
Abstract:
Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have…
▽ More
Developmental transcriptional networks in plants and animals operate in both space and time. To understand these transcriptional networks it is essential to obtain whole-genome expression data at high spatiotemporal resolution. Substantial amounts of spatial and temporal microarray expression data previously have been obtained for the Arabidopsis root; however, these two dimensions of data have not been integrated thoroughly. Complicating this integration is the fact that these data are heterogeneous and incomplete, with observed expression levels representing complex spatial or temporal mixtures. Given these partial observations, we present a novel method for reconstructing integrated high resolution spatiotemporal data. Our method is based on a new iterative algorithm for finding approximate roots to systems of bilinear equations.
△ Less
Submitted 24 March, 2009;
originally announced March 2009.
-
Computer algebra in systems biology
Authors:
Reinhard Laubenbacher,
Bernd Sturmfels
Abstract:
Systems biology focuses on the study of entire biological systems rather than on their individual components. With the emergence of high-throughput data generation technologies for molecular biology and the development of advanced mathematical modeling techniques, this field promises to provide important new insights. At the same time, with the availability of increasingly powerful computers, co…
▽ More
Systems biology focuses on the study of entire biological systems rather than on their individual components. With the emergence of high-throughput data generation technologies for molecular biology and the development of advanced mathematical modeling techniques, this field promises to provide important new insights. At the same time, with the availability of increasingly powerful computers, computer algebra has developed into a useful tool for many applications. This article illustrates the use of computer algebra in systems biology by way of a well-known gene regulatory network, the Lac Operon in the bacterium E. coli.
△ Less
Submitted 18 December, 2008; v1 submitted 27 December, 2007;
originally announced December 2007.
-
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies
Authors:
Jason Morton,
Lior Pachter,
Anne Shiu,
Bernd Sturmfels
Abstract:
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebraic combinatorics. The test has the advantage of being robust to measurement er…
▽ More
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebraic combinatorics. The test has the advantage of being robust to measurement errors, and can be used to ascertain the significance of top-ranked genes. We apply the test to recently published measurements of gene expression during mouse somitogenesis and find 32 genes that collectively are significant. Among these are previously identified periodic genes involved in the Notch/FGF and Wnt signaling pathways, as well as novel candidate genes that may play a role in regulating the segmentation clock. These results confirm that there are an abundance of exceptionally periodic genes expressed during somitogenesis. The emphasis of this paper is on the statistics and combinatorics that underlie the cyclohedron test and its implementation within a multiple testing framework.
△ Less
Submitted 22 May, 2007; v1 submitted 23 February, 2007;
originally announced February 2007.
-
Towards the Human Genotope
Authors:
Peter Huggins,
Lior Pachter,
Bernd Sturmfels
Abstract:
The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper we take a few initial steps towards a description of this object, which may be fundamental for future population based genetics studies. Here we use data from the HapMap Project, restricted to two ENCODE regions, to study a subpolytope of th…
▽ More
The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper we take a few initial steps towards a description of this object, which may be fundamental for future population based genetics studies. Here we use data from the HapMap Project, restricted to two ENCODE regions, to study a subpolytope of the human genotope. We study three different approaches for obtaining informative low-dimensional projections of this subpolytope. The projections are specified by projection onto few tag SNPs, principal component analysis, and archetypal analysis. We describe the application of our geometric approach to identifying structure in populations based on single nucleotide polymorphisms.
△ Less
Submitted 25 December, 2006; v1 submitted 9 November, 2006;
originally announced November 2006.
-
Epistasis and Shapes of Fitness Landscapes
Authors:
Niko Beerenwinkel,
Lior Pachter,
Bernd Sturmfels
Abstract:
The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Gene interactions among multiple loci are usually reduced to two-way interactions. We present a geometric theory of shapes of fitness landscapes for multiple loci. A central concept is the genotope, which is the convex hull of all possi…
▽ More
The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Gene interactions among multiple loci are usually reduced to two-way interactions. We present a geometric theory of shapes of fitness landscapes for multiple loci. A central concept is the genotope, which is the convex hull of all possible allele frequencies in populations. Triangulations of the genotope correspond to different shapes of fitness landscapes and reveal all the gene interactions. The theory is applied to fitness data from HIV and Drosophila melanogaster. In both cases, our findings refine earlier analyses and reveal previously undetected gene interactions.
△ Less
Submitted 14 April, 2006; v1 submitted 29 March, 2006;
originally announced March 2006.
-
Parametric Alignment of Drosophila Genomes
Authors:
Colin Dewey,
Peter Huggins,
Kevin Woods,
Bernd Sturmfels,
Lior Pachter
Abstract:
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment.…
▽ More
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.
The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.
△ Less
Submitted 2 December, 2005;
originally announced December 2005.
-
Evolution on distributive lattices
Authors:
Niko Beerenwinkel,
Nicholas Eriksson,
Bernd Sturmfels
Abstract:
We consider the directed evolution of a population after an intervention that has significantly altered the underlying fitness landscape. We model the space of genotypes as a distributive lattice; the fitness landscape is a real-valued function on that lattice. The risk of escape from intervention, i.e., the probability that the population develops an escape mutant before extinction, is encoded…
▽ More
We consider the directed evolution of a population after an intervention that has significantly altered the underlying fitness landscape. We model the space of genotypes as a distributive lattice; the fitness landscape is a real-valued function on that lattice. The risk of escape from intervention, i.e., the probability that the population develops an escape mutant before extinction, is encoded in the risk polynomial. Tools from algebraic combinatorics are applied to compute the risk polynomial in terms of the fitness landscape. In an application to the development of drug resistance in HIV, we study the risk of viral escape from treatment with the protease inhibitors ritonavir and indinavir.
△ Less
Submitted 12 May, 2006; v1 submitted 23 November, 2005;
originally announced November 2005.
-
The Mathematics of Phylogenomics
Authors:
Lior Pachter,
Bernd Sturmfels
Abstract:
The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and th…
▽ More
The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields.
The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.
△ Less
Submitted 27 September, 2005; v1 submitted 8 September, 2004;
originally announced September 2004.
-
Phylogenetic Algebraic Geometry
Authors:
Nicholas Eriksson,
Kristian Ranestad,
Bernd Sturmfels,
Seth Sullivant
Abstract:
Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determinantal varieties and their secant varieties, but larger trees lead to new and largely unexplored territory.…
▽ More
Phylogenetic algebraic geometry is concerned with certain complex projective algebraic varieties derived from finite trees. Real positive points on these varieties represent probabilistic models of evolution. For small trees, we recover classical geometric objects, such as toric and determinantal varieties and their secant varieties, but larger trees lead to new and largely unexplored territory. This paper gives a self-contained introduction to this subject and offers numerous open problems for algebraic geometers.
△ Less
Submitted 2 July, 2004;
originally announced July 2004.
-
Resultants in Genetic Linkage Analysis
Authors:
Ingileif B. Hallgrimsdottir,
Bernd Sturmfels
Abstract:
Statistical models for genetic linkage analysis of k-locus diseases are k-dimensional subvarieties of a (3^k-1)-dimensional probability simplex. We determine the algebraic invariants of these models with general characteristics for k=1, in particular we recover, and generalize, the Hardy-Weinberg curve. For k = 2, the invariants are presented as determinants of 32x32-matrices of linear forms in…
▽ More
Statistical models for genetic linkage analysis of k-locus diseases are k-dimensional subvarieties of a (3^k-1)-dimensional probability simplex. We determine the algebraic invariants of these models with general characteristics for k=1, in particular we recover, and generalize, the Hardy-Weinberg curve. For k = 2, the invariants are presented as determinants of 32x32-matrices of linear forms in 9 unknowns, a suitable format for computations with numerical data.
△ Less
Submitted 17 November, 2004; v1 submitted 1 May, 2004;
originally announced May 2004.
-
Toric ideals of phylogenetic invariants
Authors:
Bernd Sturmfels,
Seth Sullivant
Abstract:
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. T…
▽ More
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gröbner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gröbner bases consist of certain explicitly constructed polynomials of degree at most four.
△ Less
Submitted 7 February, 2004;
originally announced February 2004.
-
Parametric Inference for Biological Sequence Analysis
Authors:
Lior Pachter,
Bernd Sturmfels
Abstract:
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm,…
▽ More
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the \emph{polytope propagation algorithm} for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
△ Less
Submitted 25 January, 2004;
originally announced January 2004.
-
Tropical Geometry of Statistical Models
Authors:
Lior Pachter,
Bernd Sturmfels
Abstract:
This paper presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties.
From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. The question addressed here is how…
▽ More
This paper presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties.
From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. The question addressed here is how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. A key role is played by the Newton polytope of a statistical model. Our results are applied to the hidden Markov model and to the general Markov model on a binary tree.
△ Less
Submitted 25 January, 2004; v1 submitted 8 November, 2003;
originally announced November 2003.