-
Optimal transport and barycenters for dendritic measures
Authors:
Young-Heon Kim,
Brendan Pass,
David J. Schneider
Abstract:
We introduce and study a variant of the Wasserstein distance on the space of probability measures, specially designed to deal with measures whose support has a dendritic, or treelike structure with a particular direction of orientation. Our motivation is the comparison of and interpolation between plants' root systems. We characterize barycenters with respect to this metric, and establish that the…
▽ More
We introduce and study a variant of the Wasserstein distance on the space of probability measures, specially designed to deal with measures whose support has a dendritic, or treelike structure with a particular direction of orientation. Our motivation is the comparison of and interpolation between plants' root systems. We characterize barycenters with respect to this metric, and establish that the interpolations of root-like measures, using this new metric, are also root like, in a certain sense; this property fails for conventional Wasserstein barycenters. We also establish geodesic convexity with respect to this metric for a variety of functionals, some of which we expect to have biological importance.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences
Authors:
Thomas D. Schneider,
Vishnu Jejjala
Abstract:
Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24…
▽ More
Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.
△ Less
Submitted 29 October, 2019; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Registering the evolutionary history in individual-based models of speciation
Authors:
Carolina L. N. Costa,
Flavia M. D. Marquitti,
S. Ivan Perez,
David M. Schneider,
Marlon F. Ramos,
Marcus A. M. de Aguiar
Abstract:
Understanding the emergence of biodiversity patterns in nature is a central problem in biology. Theoretical models of speciation have addressed this question in the macroecological scale, but little has been investigated in the macroevolutionary context. Knowledge of the evolutionary history allows the study of patterns underlying the processes considered in these models, revealing their signature…
▽ More
Understanding the emergence of biodiversity patterns in nature is a central problem in biology. Theoretical models of speciation have addressed this question in the macroecological scale, but little has been investigated in the macroevolutionary context. Knowledge of the evolutionary history allows the study of patterns underlying the processes considered in these models, revealing their signatures and the role of speciation and extinction in shaping macroevolutionary patterns. In this paper we introduce two algorithms to record the evolutionary history of populations in individual-based models of speciation, from which genealogies and phylogenies can be constructed. The first algorithm relies on saving ancestral-descendant relationships, generating a matrix that contains the times to the most recent common ancestor between all pairs of individuals at every generation (the Most Recent Common Ancestor Time matrix, MRCAT). The second algorithm directly records all speciation and extinction events throughout the evolutionary process, generating a matrix with the true phylogeny of species (the Sequential Speciation and Extinction Events, SSEE). We illustrate the use of these algorithms in a spatially explicit individual-based model of speciation. We compare the trees generated via MRCAT and SSEE algorithms with trees inferred by methods that use only genetic distance among extant species, commonly used in empirical studies and applied here to simulated genetic data. Comparisons between tress are performed with metrics describing the overall topology, branch length distribution and imbalance of trees. We observe that both MRCAT and distance-based trees differ from the true phylogeny, with the first being closer to the true tree than the second.
△ Less
Submitted 19 December, 2017; v1 submitted 13 September, 2017;
originally announced September 2017.
-
Two-strain competition in quasi-neutral stochastic disease dynamics
Authors:
Oleg Kogan,
Michael Khasin,
Baruch Meerson,
David Schneider,
Christopher R. Myers
Abstract:
We develop a new perturbation method for studying quasi-neutral competition in a broad class of stochastic competition models, and apply it to the analysis of fixation of competing strains in two epidemic models. The first model is a two-strain generalization of the stochastic Susceptible-Infected-Susceptible (SIS) model. Here we extend previous results due to Parsons and Quince (2007), Parsons et…
▽ More
We develop a new perturbation method for studying quasi-neutral competition in a broad class of stochastic competition models, and apply it to the analysis of fixation of competing strains in two epidemic models. The first model is a two-strain generalization of the stochastic Susceptible-Infected-Susceptible (SIS) model. Here we extend previous results due to Parsons and Quince (2007), Parsons et al (2008) and Lin, Kim and Doering (2012). The second model, a two-strain generalization of the stochastic Susceptible-Infected-Recovered (SIR) model with population turnover, has not been studied previously. In each of the two models, when the basic reproduction numbers of the two strains are identical, a system with an infinite population size approaches a point on the deterministic coexistence line (CL): a straight line of fixed points in the phase space of sub-population sizes. Shot noise drives one of the strain populations to fixation, and the other to extinction, on a time scale proportional to the total population size. Our perturbation method explicitly tracks the dynamics of the probability distribution of the sub-populations in the vicinity of the CL. We argue that, whereas the slow strain has a competitive advantage for mathematically "typical" initial conditions, it is the fast strain that is more likely to win in the important situation when a few infectives of both strains are introduced into a susceptible population.
△ Less
Submitted 3 October, 2014; v1 submitted 1 August, 2014;
originally announced August 2014.
-
Some mathematical tools for the Lenski experiment
Authors:
Bernard Ycart,
Agnès Hamon,
Joël Gaffé,
Dominique Schneider
Abstract:
The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.
The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Evolutionary consequences of assortativeness in haploid genotypes
Authors:
David M. Schneider,
Ayana B. Martins,
Eduardo do Carmo,
Marcus A. M. de Aguiar
Abstract:
We study the evolution of allele frequencies in a large population where random mating is violated in a particular way that is related to recent works on speciation. Specifically, we consider non-random encounters in haploid organisms described by biallelic genes at two loci and assume that individuals whose alleles differ at both loci are incompatible. We show that evolution under these condition…
▽ More
We study the evolution of allele frequencies in a large population where random mating is violated in a particular way that is related to recent works on speciation. Specifically, we consider non-random encounters in haploid organisms described by biallelic genes at two loci and assume that individuals whose alleles differ at both loci are incompatible. We show that evolution under these conditions leads to the disappearance of one of the alleles and substantially reduces the diversity of the population. The allele that disappears, and the other allele frequencies at equilibrium, depend only on their initial values, and so does the time to equilibration. However, certain combinations of allele frequencies remain constant during the process, revealing the emergence of strong correlation between the two loci promoted by the epistatic mechanism of incompatibility. We determine the geometrical structure of the haplotype frequency space and solve the dynamical equations, obtaining a simple rule to determine equilibrium solution from the initial conditions. We show that our results are equivalent to selection against double heterozigotes for a population of diploid individuals and discuss the relevance of our findings to speciation.
△ Less
Submitted 3 September, 2013;
originally announced September 2013.
-
The structure of infectious disease outbreaks across the animal-human interface
Authors:
Sarabjeet Singh,
David J. Schneider,
Christopher R. Myers
Abstract:
Despite the enormous relevance of zoonotic infections to world- wide public health, and despite much effort in modeling individual zoonoses, a fundamental understanding of the disease dynamics and the nature of outbreaks arising in such systems is still lacking. We introduce a simple stochastic model of susceptible-infected- recovered dynamics in a coupled animal-human metapopulation, and solve an…
▽ More
Despite the enormous relevance of zoonotic infections to world- wide public health, and despite much effort in modeling individual zoonoses, a fundamental understanding of the disease dynamics and the nature of outbreaks arising in such systems is still lacking. We introduce a simple stochastic model of susceptible-infected- recovered dynamics in a coupled animal-human metapopulation, and solve analytically for several important properties of the cou- pled outbreaks. At early timescales, we solve for the probability and time of spillover, and the disease prevalence in the animal population at spillover as a function of model parameters. At long times, we characterize the distribution of outbreak sizes and the critical threshold for a large human outbreak, both of which show a strong dependence on the basic reproduction number in the animal population. The coupling of animal and human infection dynamics has several crucial implications, most importantly al- lowing for the possibility of large human outbreaks even when human-to-human transmission is subcritical.
△ Less
Submitted 16 July, 2013;
originally announced July 2013.
-
Epidemic fronts in complex networks with metapopulation structure
Authors:
Jason Hindes,
Sarabjeet Singh,
Christopher R. Myers,
David J. Schneider
Abstract:
Infection dynamics have been studied extensively on complex networks, yielding insight into the effects of heterogeneity in contact patterns on disease spread. Somewhat separately, metapopulations have provided a paradigm for modeling systems with spatially extended and "patchy" organization. In this paper we expand on the use of multitype networks for combining these paradigms, such that simple c…
▽ More
Infection dynamics have been studied extensively on complex networks, yielding insight into the effects of heterogeneity in contact patterns on disease spread. Somewhat separately, metapopulations have provided a paradigm for modeling systems with spatially extended and "patchy" organization. In this paper we expand on the use of multitype networks for combining these paradigms, such that simple contagion models can include complexity in the agent interactions and multiscale structure. We first present a generalization of the Volz-Miller mean-field approximation for Susceptible-Infected-Recovered (SIR) dynamics on multitype networks. We then use this technique to study the special case of epidemic fronts propagating on a one-dimensional lattice of interconnected networks - representing a simple chain of coupled population centers - as a necessary first step in understanding how macro-scale disease spread depends on micro-scale topology. Using the formalism of front propagation into unstable states, we derive the effective transport coefficients of the linear spreading: asymptotic speed, characteristic wavelength, and diffusion coefficient for the leading edge of the pulled fronts, and analyze their dependence on the underlying graph structure. We also derive the epidemic threshold for the system and study the front profile for various network configurations. To our knowledge, this is the first such application of front propagation concepts to random network models.
△ Less
Submitted 20 April, 2013; v1 submitted 15 April, 2013;
originally announced April 2013.
-
Robustness Against Extinction by Stochastic Sex Determination in Small Populations
Authors:
David M. Schneider,
Eduardo do Carmo,
Yaneer Bar-Yam,
Marcus A. M. de Aguiar
Abstract:
Sexually reproducing populations with small number of individuals may go extinct by stochastic fluctuations in sex determination, causing all their members to become male or female in a generation. In this work we calculate the time to extinction of isolated populations with fixed number $N$ of individuals that are updated according to the Moran birth and death process. At each time step, one indi…
▽ More
Sexually reproducing populations with small number of individuals may go extinct by stochastic fluctuations in sex determination, causing all their members to become male or female in a generation. In this work we calculate the time to extinction of isolated populations with fixed number $N$ of individuals that are updated according to the Moran birth and death process. At each time step, one individual is randomly selected and replaced by its offspring resulting from mating with another individual of opposite sex; the offspring can be male or female with equal probability. A set of $N$ time steps is called a generation, the average time it takes for the entire population to be replaced. The number k of females fluctuates in time, similarly to a random walk, and extinction, which is the only asymptotic possibility, occurs when k=0 or k=N. We show that it takes only one generation for an arbitrary initial distribution of males and females to approach the binomial distribution. This distribution, however, is unstable and the population eventually goes extinct in 2^N/N generations. We also discuss the robustness of these results against bias in the determination of the sex of the offspring, a characteristic promoted by infection by the bacteria Wolbachia in some arthropod species or by temperature in reptiles.
△ Less
Submitted 18 September, 2012; v1 submitted 22 April, 2012;
originally announced April 2012.
-
The Context Sensitivity Problem in Biological Sequence Segmentation
Authors:
Siew-Ann Cheong,
Paul Stodghill,
David J. Schneider,
Samuel W. Cartinhour,
Christopher R. Myers
Abstract:
In this paper, we describe the context sensitivity problem encountered in partitioning a heterogeneous biological sequence into statistically homogeneous segments. After showing signatures of the problem in the bacterial genomes of Escherichia coli K-12 MG1655 and Pseudomonas syringae DC3000, when these are segmented using two entropic segmentation schemes, we clarify the contextual origins of t…
▽ More
In this paper, we describe the context sensitivity problem encountered in partitioning a heterogeneous biological sequence into statistically homogeneous segments. After showing signatures of the problem in the bacterial genomes of Escherichia coli K-12 MG1655 and Pseudomonas syringae DC3000, when these are segmented using two entropic segmentation schemes, we clarify the contextual origins of these signatures through mean-field analyses of the segmentation schemes. Finally, we explain why we believe all sequence segmentation schems are plagued by the context sensitivity problem.
△ Less
Submitted 17 April, 2009;
originally announced April 2009.
-
Extending the Recursive Jensen-Shannon Segmentation of Biological Sequences
Authors:
Siew-Ann Cheong,
Paul Stodghill,
David J. Schneider,
Samuel W. Cartinhour,
Christopher R. Myers
Abstract:
In this paper, we extend a previously developed recursive entropic segmentation scheme for applications to biological sequences. Instead of Bernoulli chains, we model the statistically stationary segments in a biological sequence as Markov chains, and define a generalized Jensen-Shannon divergence for distinguishing between two Markov chains. We then undertake a mean-field analysis, based on whi…
▽ More
In this paper, we extend a previously developed recursive entropic segmentation scheme for applications to biological sequences. Instead of Bernoulli chains, we model the statistically stationary segments in a biological sequence as Markov chains, and define a generalized Jensen-Shannon divergence for distinguishing between two Markov chains. We then undertake a mean-field analysis, based on which we identify pitfalls associated with the recursive Jensen-Shannon segmentation scheme. Following this, we explain the need for segmentation optimization, and describe two local optimization schemes for improving the positions of domain walls discovered at each recursion stage. We also develop a new termination criterion for recursive Jensen-Shannon segmentation based on the strength of statistical fluctuations up to a minimum statistically reliable segment length, avoiding the need for unrealistic null and alternative segment models of the target sequence. Finally, we compare the extended scheme against the original scheme by recursively segmenting the Escherichia coli K-12 MG1655 genome.
△ Less
Submitted 16 April, 2009;
originally announced April 2009.