-
Mathematically tractable models of random phylogenetic networks: an overview of some recent developments
Authors:
François Bienvenu
Abstract:
Models of random phylogenetic networks have been used since the inception of the field, but the introduction and rigorous study of mathematically tractable models is a much more recent topic that has gained momentum in the last 5 years. This manuscript discusses some recent developments in the field through a selection of examples. The emphasis is on the techniques rather than on the results thems…
▽ More
Models of random phylogenetic networks have been used since the inception of the field, but the introduction and rigorous study of mathematically tractable models is a much more recent topic that has gained momentum in the last 5 years. This manuscript discusses some recent developments in the field through a selection of examples. The emphasis is on the techniques rather than on the results themselves, and on probabilistic tools rather than on combinatorial ones.
△ Less
Submitted 18 November, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
The $B_2$ index of galled trees
Authors:
François Bienvenu,
Jean-Jil Duchamps,
Michael Fuchs,
Tsan-Cheng Yu
Abstract:
In recent years, there has been an effort to extend the classical notion of phylogenetic balance, originally defined in the context of trees, to networks. One of the most natural ways to do this is with the so-called $B_2$ index. In this paper, we study the $B_2$ index for a prominent class of phylogenetic networks: galled trees. We show that the $B_2$ index of a uniform leaf-labeled galled tree c…
▽ More
In recent years, there has been an effort to extend the classical notion of phylogenetic balance, originally defined in the context of trees, to networks. One of the most natural ways to do this is with the so-called $B_2$ index. In this paper, we study the $B_2$ index for a prominent class of phylogenetic networks: galled trees. We show that the $B_2$ index of a uniform leaf-labeled galled tree converges in distribution as the network becomes large. We characterize the corresponding limiting distribution, and show that its expected value is 2.707911858984... This is the first time that a balance index has been studied to this level of detail for a random phylogenetic network.
One specificity of this work is that we use two different and independent approaches, each with its advantages: analytic combinatorics, and local limits. The analytic combinatorics approach is more direct, as it relies on standard tools; but it involves slightly more complex calculations. Because it has not previously been used to study such questions, the local limit approach requires developing an extensive framework beforehand; however, this framework is interesting in itself and can be used to tackle other similar problems.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
0-1 laws for pattern occurrences in phylogenetic trees and networks
Authors:
François Bienvenu,
Mike Steel
Abstract:
In a recent paper, the question of determining the fraction of binary trees that contain a fixed pattern known as the snowflake was posed. We show that this fraction goes to 1, providing two very different proofs: a purely combinatorial one that is quantitative and specific to this problem; and a proof using branching process techniques that is less explicit, but also much more general, as it appl…
▽ More
In a recent paper, the question of determining the fraction of binary trees that contain a fixed pattern known as the snowflake was posed. We show that this fraction goes to 1, providing two very different proofs: a purely combinatorial one that is quantitative and specific to this problem; and a proof using branching process techniques that is less explicit, but also much more general, as it applies to any fixed patterns and can be extended to other trees and networks. In particular, it follows immediately from our second proof that the fraction of $d$-ary trees (resp. level-$k$ networks) that contain a fixed $d$-ary tree (resp. level-$k$ network) tends to $1$ as the number of leaves grows.
△ Less
Submitted 25 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
A branching process with coalescence to model random phylogenetic networks
Authors:
François Bienvenu,
Jean-Jil Duchamps
Abstract:
We introduce a biologically natural, mathematically tractable model of random phylogenetic network to describe evolution in the presence of hybridization. One of the features of this model is that the hybridization rate of the lineages correlates negatively with their phylogenetic distance. We give formulas / characterizations for quantities of biological interest that make them straightforward to…
▽ More
We introduce a biologically natural, mathematically tractable model of random phylogenetic network to describe evolution in the presence of hybridization. One of the features of this model is that the hybridization rate of the lineages correlates negatively with their phylogenetic distance. We give formulas / characterizations for quantities of biological interest that make them straightforward to compute in practice. We show that the appropriately rescaled network, seen as a metric space, converges to the Brownian continuum random tree, and that the uniformly rooted network has a local weak limit, which we describe explicitly.
△ Less
Submitted 10 October, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Revisiting Shao and Sokal's $B_2$ index of phylogenetic balance
Authors:
François Bienvenu,
Gabriel Cardona,
Celine Scornavacca
Abstract:
Measures of phylogenetic balance, such as the Colless and Sackin indices, play an important role in phylogenetics. Unfortunately, these indices are specifically designed for phylogenetic trees, and do not extend naturally to phylogenetic networks (which are increasingly used to describe reticulate evolution). This led us to consider a lesser-known balance index, whose definition is based on a prob…
▽ More
Measures of phylogenetic balance, such as the Colless and Sackin indices, play an important role in phylogenetics. Unfortunately, these indices are specifically designed for phylogenetic trees, and do not extend naturally to phylogenetic networks (which are increasingly used to describe reticulate evolution). This led us to consider a lesser-known balance index, whose definition is based on a probabilistic interpretation that is equally applicable to trees and to networks. This index, known as the $B_2$ index, was first proposed by Shao and Sokal in 1990. Surprisingly, it does not seem to have been studied mathematically since. Likewise, it is used only sporadically in the biological literature, where it tends to be viewed as arcane. In this paper, we study mathematical properties of $B_2$ such as its expectation and variance under the most common models of random trees and its extremal values over various classes of phylogenetic networks. We also assess its relevance in biological applications, and find it to be comparable to that of the Colless and Sackin indices. Altogether, our results call for a reevaluation of the status of this somewhat forgotten measure of phylogenetic balance.
△ Less
Submitted 8 September, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Combinatorial and stochastic properties of ranked tree-child networks
Authors:
François Bienvenu,
Amaury Lambert,
Mike Steel
Abstract:
Tree-child networks are a recently-described class of directed acyclic graphs that have risen to prominence in phylogenetics (the study of evolutionary trees and networks). Although these networks have a number of attractive mathematical properties, many combinatorial questions concerning them remain intractable. In this paper, we show that endowing these networks with a biologically relevant rank…
▽ More
Tree-child networks are a recently-described class of directed acyclic graphs that have risen to prominence in phylogenetics (the study of evolutionary trees and networks). Although these networks have a number of attractive mathematical properties, many combinatorial questions concerning them remain intractable. In this paper, we show that endowing these networks with a biologically relevant ranking structure yields mathematically tractable objects, which we term ranked tree-child networks (RTCNs). We explain how to derive exact and explicit combinatorial results concerning the enumeration and generation of these networks. We also explore probabilistic questions concerning the properties of RTCNs when they are sampled uniformly at random. These questions include the lengths of random walks between the root and leaves (both from the root to the leaves and from a leaf to the root); the distribution of the number of cherries in the network; and sampling RTCNs conditional on displaying a given tree. We also formulate a conjecture regarding the scaling limit of the process that counts the number of lineages in the ancestry of a leaf. The main idea in this paper, namely using ranking as a way to achieve combinatorial tractability, may also extend to other classes of networks.
△ Less
Submitted 27 May, 2021; v1 submitted 19 July, 2020;
originally announced July 2020.
-
The split-and-drift random graph, a null model for speciation
Authors:
François Bienvenu,
Florence Débarre,
Amaury Lambert
Abstract:
We introduce a new random graph model motivated by biological questions relating to speciation. This random graph is defined as the stationary distribution of a Markov chain on the space of graphs on $\{1, \ldots, n\}$. The dynamics of this Markov chain is governed by two types of events: vertex duplication, where at constant rate a pair of vertices is sampled uniformly and one of these vertices l…
▽ More
We introduce a new random graph model motivated by biological questions relating to speciation. This random graph is defined as the stationary distribution of a Markov chain on the space of graphs on $\{1, \ldots, n\}$. The dynamics of this Markov chain is governed by two types of events: vertex duplication, where at constant rate a pair of vertices is sampled uniformly and one of these vertices loses its incident edges and is rewired to the other vertex and its neighbors; and edge removal, where each edge disappears at constant rate. Besides the number of vertices $n$, the model has a single parameter $r_n$.
Using a coalescent approach, we obtain explicit formulas for the first moments of several graph invariants such as the number of edges or the number of complete subgraphs of order $k$. These are then used to identify five non-trivial regimes depending on the asymptotics of the parameter $r_n$. We derive an explicit expression for the degree distribution, and show that under appropriate rescaling it converges to classical distributions when the number of vertices goes to infinity. Finally, we give asymptotic bounds for the number of connected components, and show that in the sparse regime the number of edges is Poissonian.
△ Less
Submitted 17 March, 2018; v1 submitted 3 June, 2017;
originally announced June 2017.
-
A General Formula for the Generation Time
Authors:
François Bienvenu,
Lloyd Demetrius,
Stéphane Legendre
Abstract:
We show that the generation time -- a notion usually described in a biological context -- can be defined in a general way as a return time in a conveniently constructed finite Markov chain. The simple formula we obtain agrees with previous results derived for structured populations projected in discrete time, and allows to define the generation time of any process described by a weighted directed…
▽ More
We show that the generation time -- a notion usually described in a biological context -- can be defined in a general way as a return time in a conveniently constructed finite Markov chain. The simple formula we obtain agrees with previous results derived for structured populations projected in discrete time, and allows to define the generation time of any process described by a weighted directed graph whose matrix is primitive.
△ Less
Submitted 7 October, 2013; v1 submitted 25 July, 2013;
originally announced July 2013.