-
Branching with selection and mutation II: Mutant fitness of Gumbel type
Authors:
Su-Chan Park,
Joachim Krug,
Peter Mörters
Abstract:
We study a model of a branching process subject to selection, modeled by giving each family an individual fitness acting as a branching rate, and mutation, modeled by resampling the fitness of a proportion of offspring in each generation. For two large classes of fitness distributions of Gumbel type we determine the growth of the population, almost surely on survival. We then study the empirical f…
▽ More
We study a model of a branching process subject to selection, modeled by giving each family an individual fitness acting as a branching rate, and mutation, modeled by resampling the fitness of a proportion of offspring in each generation. For two large classes of fitness distributions of Gumbel type we determine the growth of the population, almost surely on survival. We then study the empirical fitness distribution in a simplified model, which is numerically indistinguishable from the original model, and show the emergence of a Gaussian travelling wave.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Complexity and accessibility of random landscapes
Authors:
Sakshi Pahujani,
Joachim Krug
Abstract:
These notes introduce probabilistic landscape models defined on high-dimensional discrete sequence spaces. The models are motivated primarily by fitness landscapes in evolutionary biology, but links to statistical physics and computer science are mentioned where appropriate. Elementary and advanced results on the structure of landscapes are described with a focus on features that are relevant to e…
▽ More
These notes introduce probabilistic landscape models defined on high-dimensional discrete sequence spaces. The models are motivated primarily by fitness landscapes in evolutionary biology, but links to statistical physics and computer science are mentioned where appropriate. Elementary and advanced results on the structure of landscapes are described with a focus on features that are relevant to evolutionary searches, such as the number of local maxima and the existence of fitness-monotonic paths. The recent discovery of submodularity as a biologically meaningful property of fitness landscapes and its consequences for their accessibility is discussed in detail.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Evolutionary accessibility of random and structured fitness landscapes
Authors:
Joachim Krug,
Daniel Oros
Abstract:
Biological evolution can be conceptualized as a search process in the space of gene sequences guided by the fitness landscape, a mapping that assigns a measure of reproductive value to each genotype. Here we discuss probabilistic models of fitness landscapes with a focus on their evolutionary accessibility, where a path in a fitness landscape is said to be accessible if the fitness values encounte…
▽ More
Biological evolution can be conceptualized as a search process in the space of gene sequences guided by the fitness landscape, a mapping that assigns a measure of reproductive value to each genotype. Here we discuss probabilistic models of fitness landscapes with a focus on their evolutionary accessibility, where a path in a fitness landscape is said to be accessible if the fitness values encountered along the path increase monotonically. For uncorrelated (random) landscapes with independent and identically distributed fitness values, the probability of existence of accessible paths between genotypes at a distance linear in the sequence length $L$ becomes nonzero at a nontrivial threshold value of the fitness difference between the initial and final genotype, which can be explicitly computed for large classes of genotype graphs. The behaviour in uncorrelated random landscapes is contrasted with landscape models that display additional, biologically motivated structural features. In particular, landscapes defined by a tradeoff between adaptation to environmental extremes have been found to display a combinatorially large number of accessible paths to all local fitness maxima. We show that this property is characteristic of a broad class of models that satisfy a certain global constraint, and provide further examples from this class.
△ Less
Submitted 28 February, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Branching with selection and mutation I: Mutant fitness of Fréchet type
Authors:
Su-Chan Park,
Joachim Krug,
Léo Touzo,
Peter Mörters
Abstract:
We investigate two stochastic models of a growing population subject to selection and mutation. In our models each individual carries a fitness which determines its mean offspring number. Many of these offspring inherit their parent's fitness, but some are mutants and obtain a fitness randomly sampled from a distribution in the domain of attraction of the Fréchet distribution. We give a rigorous p…
▽ More
We investigate two stochastic models of a growing population subject to selection and mutation. In our models each individual carries a fitness which determines its mean offspring number. Many of these offspring inherit their parent's fitness, but some are mutants and obtain a fitness randomly sampled from a distribution in the domain of attraction of the Fréchet distribution. We give a rigorous proof for the precise rate of superexponential growth of these stochastic processes and support the argument by a heuristic and numerical study of the mechanism underlying this growth.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Unpredictable repeatability in molecular evolution
Authors:
Suman G Das,
Joachim Krug
Abstract:
The extent of parallel evolution at the genotypic level is quantitatively linked to the distribution of beneficial fitness effects (DBFE) of mutations. The standard view, based on light-tailed distributions (i.e. distributions with finite moments), is that the probability of parallel evolution in duplicate populations is inversely proportional to the number of available mutations, and moreover tha…
▽ More
The extent of parallel evolution at the genotypic level is quantitatively linked to the distribution of beneficial fitness effects (DBFE) of mutations. The standard view, based on light-tailed distributions (i.e. distributions with finite moments), is that the probability of parallel evolution in duplicate populations is inversely proportional to the number of available mutations, and moreover that the DBFE is sufficient to determine the probability when the number of available mutations is large. Here we show that when the DBFE is heavy-tailed, as found in several recent experiments, these expectations are defied. The probability of parallel evolution decays anomalously slowly in the number of mutations or even becomes independent of it, implying higher repeatability of evolution. At the same time, the probability of parallel evolution is non-self-averaging, that is, it does not converge to its mean value even when a large number of mutations are involved. This behavior arises because the evolutionary process is dominated by only a few mutations of high weight. Consequently, the probability varies widely across systems with the same DBFE. Contrary to the standard view, the DBFE is no longer sufficient to determine the extent of parallel evolution, making it much less predictable. We illustrate these ideas theoretically and through analysis of empirical data on antibiotic resistance evolution.
△ Less
Submitted 30 September, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Geometry of fitness landscapes: Peaks, shapes and universal positive epistasis
Authors:
Kristina Crona,
Joachim Krug,
Malvika Srivastava
Abstract:
Darwinian evolution is driven by random mutations, genetic recombination (gene shuffling) and selection that favors genotypes with high fitness. For systems where each genotype can be represented as a bitstring of length $L$, an overview of possible evolutionary trajectories is provided by the $L$-cube graph with nodes labeled by genotypes and edges directed toward the genotype with higher fitness…
▽ More
Darwinian evolution is driven by random mutations, genetic recombination (gene shuffling) and selection that favors genotypes with high fitness. For systems where each genotype can be represented as a bitstring of length $L$, an overview of possible evolutionary trajectories is provided by the $L$-cube graph with nodes labeled by genotypes and edges directed toward the genotype with higher fitness. Peaks (sinks in the graphs) are important since a population can get stranded at a suboptimal peak. The fitness landscape is defined by the fitness values of all genotypes in the system. Some notion of curvature is necessary for a more complete analysis of the landscapes, including the effect of recombination. The shape approach uses triangulations (shapes) induced by fitness landscapes. The main topic for this work is the interplay between peak patterns and shapes. Because of constraints on the shapes for $L=3$ imposed by peaks, there are in total 25 possible combinations of peak patterns and shapes. Similar constraints exist for higher $L$. Specifically, we show that the constraints induced by the staircase triangulation can be formulated as a condition of {\emph{universal positive epistasis}}, an order relation on the fitness effects of arbitrary sets of mutations that respects the inclusion relation between the corresponding genetic backgrounds. We apply the concept to a large protein fitness landscape for an immunoglobulin-binding protein expressed in Streptococcal bacteria.
△ Less
Submitted 10 April, 2023; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Accessibility Percolation on Cartesian Power Graphs
Authors:
Benjamin Schmiegelt,
Joachim Krug
Abstract:
A fitness landscape is a mapping from a space of discrete genotypes to the real numbers. A path in a fitness landscape is a sequence of genotypes connected by single mutational steps. Such a path is said to be accessible if the fitness values of the genotypes encountered along the path increase monotonically. We study accessible paths on random fitness landscapes of the House-of-Cards type, on whi…
▽ More
A fitness landscape is a mapping from a space of discrete genotypes to the real numbers. A path in a fitness landscape is a sequence of genotypes connected by single mutational steps. Such a path is said to be accessible if the fitness values of the genotypes encountered along the path increase monotonically. We study accessible paths on random fitness landscapes of the House-of-Cards type, on which fitness values are independent, identically and continuously distributed random variables. The genotype space is taken to be a Cartesian power graph $\mathcal{A}^L$, where $L$ is the number of genetic loci and the allele graph $\mathcal{A}$ encodes the possible allelic states and mutational transitions on one locus. The probability of existence of accessible paths between two genotypes at a distance linear in $L$ displays a transition from 0 to a positive value at a threshold $β_c$ for the fitness difference between the initial and final genotype. We derive a lower bound on $β_c$ for general $\mathcal{A}$ and show that this bound is tight for a large class of allele graphs. Our results generalize previous results for accessibility percolation on the biallelic hypercube, and compare favorably to published numerical results for multiallelic Hamming graphs.
△ Less
Submitted 20 February, 2023; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Accessibility percolation in random fitness landscapes
Authors:
Joachim Krug
Abstract:
The fitness landscape encodes the mapping of genotypes to fitness and provides a succinct representation of possible trajectories followed by an evolving population. Evolutionary accessibility is quantified by the existence of fitness-monotonic paths connecting far away genotypes. Studies of accessibility percolation use probabilistic fitness landscape models to explore the emergence of such paths…
▽ More
The fitness landscape encodes the mapping of genotypes to fitness and provides a succinct representation of possible trajectories followed by an evolving population. Evolutionary accessibility is quantified by the existence of fitness-monotonic paths connecting far away genotypes. Studies of accessibility percolation use probabilistic fitness landscape models to explore the emergence of such paths as a function of the initial fitness, the parameters of the landscape or the structure of the genotype graph. This chapter reviews these studies and discusses their implications for the predictability of evolutionary processes.
△ Less
Submitted 29 June, 2021; v1 submitted 28 March, 2019;
originally announced March 2019.
-
An exactly solvable record model for rainfall
Authors:
Satya N. Majumdar,
Philipp von Bomhard,
Joachim Krug
Abstract:
Daily precipitation time series are composed of null entries corresponding to dry days and nonzero entries that describe the rainfall amounts on wet days. Assuming that wet days follow a Bernoulli process with success probability $p$, we show that the presence of dry days induces negative correlations between record-breaking precipitation events. The resulting non-monotonic behavior of the Fano fa…
▽ More
Daily precipitation time series are composed of null entries corresponding to dry days and nonzero entries that describe the rainfall amounts on wet days. Assuming that wet days follow a Bernoulli process with success probability $p$, we show that the presence of dry days induces negative correlations between record-breaking precipitation events. The resulting non-monotonic behavior of the Fano factor of the record counting process is recovered in empirical data. We derive the full probability distribution $P(R,n)$ of the number of records $R_n$ up to time $n$, and show that for large $n$, its large deviation form coincides with that of a Poisson distribution with parameter $\ln(p\,n)$. We also study in detail the joint limit $p \to 0$, $n \to \infty$, which yields a random record model in continuous time $t = pn$.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
Accessibility percolation on n-trees
Authors:
Stefan Nowak,
Joachim Krug
Abstract:
Accessibility percolation is a new type of percolation problem inspired by evolutionary biology. To each vertex of a graph a random number is assigned and a path through the graph is called accessible if all numbers along the path are in ascending order. For the case when the random variables are independent and identically distributed, we derive an asymptotically exact expression for the probabil…
▽ More
Accessibility percolation is a new type of percolation problem inspired by evolutionary biology. To each vertex of a graph a random number is assigned and a path through the graph is called accessible if all numbers along the path are in ascending order. For the case when the random variables are independent and identically distributed, we derive an asymptotically exact expression for the probability that there is at least one accessible path from the root to the leaves in an $n$-tree. This probability tends to 1 (0) if the branching number is increased with the height of the tree faster (slower) than linearly. When the random variables are biased such that the mean value increases linearly with the distance from the root, a percolation threshold emerges at a finite value of the bias.
△ Less
Submitted 3 April, 2013; v1 submitted 6 February, 2013;
originally announced February 2013.
-
Correlations between record events in sequences of random variables with a linear trend
Authors:
Gregor Wergen,
Jasper Franke,
Joachim Krug
Abstract:
The statistics of records in sequences of independent, identically distributed random variables is a classic subject of study. One of the earliest results concerns the stochastic independence of record events. Recently, records statistics beyond the case of i.i.d. random variables have received much attention, but the question of independence of record events has not been addressed systematically.…
▽ More
The statistics of records in sequences of independent, identically distributed random variables is a classic subject of study. One of the earliest results concerns the stochastic independence of record events. Recently, records statistics beyond the case of i.i.d. random variables have received much attention, but the question of independence of record events has not been addressed systematically. In this paper, we study this question in detail for the case of independent, non-identically distributed random variables, specifically, for random variables with a linearly moving mean. We find a rich pattern of positive and negative correlations, and show how their asymptotics is determined by the universality classes of extreme value statistics.
△ Less
Submitted 23 September, 2011; v1 submitted 19 May, 2011;
originally announced May 2011.
-
Records and sequences of records from random variables with a linear trend
Authors:
Jasper Franke,
Gregor Wergen,
Joachim Krug
Abstract:
We consider records and sequences of records drawn from discrete time series of the form $X_{n}=Y_{n}+cn$, where the $Y_{n}$ are independent and identically distributed random variables and $c$ is a constant drift. For very small and very large drift velocities, we investigate the asymptotic behavior of the probability $p_n(c)$ of a record occurring in the $n$th step and the probability $P_N(c)$ t…
▽ More
We consider records and sequences of records drawn from discrete time series of the form $X_{n}=Y_{n}+cn$, where the $Y_{n}$ are independent and identically distributed random variables and $c$ is a constant drift. For very small and very large drift velocities, we investigate the asymptotic behavior of the probability $p_n(c)$ of a record occurring in the $n$th step and the probability $P_N(c)$ that all $N$ entries are records, i.e. that $X_1 < X_2 < ... < X_N$. Our work is motivated by the analysis of temperature time series in climatology, and by the study of mutational pathways in evolutionary biology.
△ Less
Submitted 9 August, 2010;
originally announced August 2010.
-
A pedestrian's view on interacting particle systems, KPZ universality, and random matrices
Authors:
Thomas Kriecherbauer,
Joachim Krug
Abstract:
These notes are based on lectures delivered by the authors at a Langeoog seminar of SFB/TR12 "Symmetries and universality in mesoscopic systems" to a mixed audience of mathematicians and theoretical physicists. After a brief outline of the basic physical concepts of equilibrium and nonequilibrium states, the one-dimensional simple exclusion process is introduced as a paradigmatic nonequilibrium in…
▽ More
These notes are based on lectures delivered by the authors at a Langeoog seminar of SFB/TR12 "Symmetries and universality in mesoscopic systems" to a mixed audience of mathematicians and theoretical physicists. After a brief outline of the basic physical concepts of equilibrium and nonequilibrium states, the one-dimensional simple exclusion process is introduced as a paradigmatic nonequilibrium interacting particle system. The stationary measure on the ring is derived and the idea of the hydrodynamic limit is sketched. We then introduce the phenomenological Kardar-Parisi-Zhang (KPZ) equation and explain the associated universality conjecture for surface fluctuations in growth models. This is followed by a detailed exposition of a seminal paper of Johansson that relates the current fluctuations of the totally asymmetric simple exclusion process (TASEP) to the Tracy-Widom distribution of random matrix theory. The implications of this result are discussed within the framework of the KPZ conjecture.
△ Less
Submitted 29 July, 2010; v1 submitted 19 March, 2008;
originally announced March 2008.
-
Records in a changing world
Authors:
Joachim Krug
Abstract:
In the context of this paper, a record is an entry in a sequence of random variables (RV's) that is larger or smaller than all previous entries. After a brief review of the classic theory of records, which is largely restricted to sequences of independent and identically distributed (i.i.d.) RV's, new results for sequences of independent RV's with distributions that broaden or sharpen with time…
▽ More
In the context of this paper, a record is an entry in a sequence of random variables (RV's) that is larger or smaller than all previous entries. After a brief review of the classic theory of records, which is largely restricted to sequences of independent and identically distributed (i.i.d.) RV's, new results for sequences of independent RV's with distributions that broaden or sharpen with time are presented. In particular, we show that when the width of the distribution grows as a power law in time $n$, the mean number of records is asymptotically of order $\ln n$ for distributions with a power law tail (the \textit{Fréchet class} of extremal value statistics), of order $(\ln n)^2$ for distributions of exponential type (\textit{Gumbel class}), and of order $n^{1/(ν+1)}$ for distributions of bounded support (\textit{Weibull class}), where the exponent $ν$ describes the behaviour of the distribution at the upper (or lower) boundary. Simulations are presented which indicate that, in contrast to the i.i.d. case, the sequence of record breaking events is correlated in such a way that the variance of the number of records is asymptotically smaller than the mean.
△ Less
Submitted 6 February, 2007;
originally announced February 2007.