-
An nth-cousin mating model and the n-anacci numbers
Authors:
Elisa Heinrich Mora,
Noah A. Rosenberg
Abstract:
In seeking to understand the size of inbred pedigrees, J. Lachance (J. Theor. Biol. 261, 238-247, 2009) studied a population model in which, for a fixed value of $n$, each mating occurs between $n$th cousins. We explain a connection between the second-cousin case of the model ($n=2$) and the Fibonacci sequence, and more generally, between the $n$th-cousin case and the $n$-anacci sequence…
▽ More
In seeking to understand the size of inbred pedigrees, J. Lachance (J. Theor. Biol. 261, 238-247, 2009) studied a population model in which, for a fixed value of $n$, each mating occurs between $n$th cousins. We explain a connection between the second-cousin case of the model ($n=2$) and the Fibonacci sequence, and more generally, between the $n$th-cousin case and the $n$-anacci sequence $(n \geq 2)$. For a model with $n$th-cousin mating $(n \geq 1)$, we obtain the generating function describing the size of the pedigree $t$ generations back from the present, and we use it to evaluate the asymptotic growth of the pedigree size. In particular, we show that the growth of the pedigree asymptotically follows the growth rate of the $n$-anacci sequence -- the golden ratio $φ= (1 + \sqrt{5})/2 \approx 1.6180$ in the second-cousin case $n=2$ -- and approaches 2 as $n$ increases. The computations explain the appearance of familiar numerical sequences and constants in a pedigree model. They also recall similar appearances of such sequences and constants in studies of population biology more generally.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
The space of multifurcating ranked tree shapes: enumeration, lattice structure, and Markov chains
Authors:
Julie Zhang,
Noah A. Rosenberg,
Julia A. Palacios
Abstract:
Coalescent models of bifurcating genealogies are used to infer evolutionary parameters from molecular data. However, there are many situations where bifurcating genealogies do not accurately reflect the true underlying ancestral history of samples, and a multifurcating genealogy is required. The space of multifurcating genealogical trees, where nodes can have more than two descendants, is largely…
▽ More
Coalescent models of bifurcating genealogies are used to infer evolutionary parameters from molecular data. However, there are many situations where bifurcating genealogies do not accurately reflect the true underlying ancestral history of samples, and a multifurcating genealogy is required. The space of multifurcating genealogical trees, where nodes can have more than two descendants, is largely underexplored in the setting of coalescent inference. In this paper, we examine the space of rooted, ranked, and unlabeled multifurcating trees. We recursively enumerate the space and then construct a partial ordering which induces a lattice on the space of multifurcating ranked tree shapes. The lattice structure lends itself naturally to defining Markov chains that permit exploration on the space of multifurcating ranked tree shapes. Finally, we prove theoretical bounds for the mixing time of two Markov chains defined on the lattice, and we present simulation results comparing the distribution of trees and tree statistics under various coalescent models to the uniform distribution on this tree space.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Enumerative combinatorics of unlabeled and labeled time-consistent galled trees
Authors:
Lily Agranat-Tamir,
Michael Fuchs,
Bernhard Gittenberger,
Noah A. Rosenberg
Abstract:
In mathematical phylogenetics, the time-consistent galled trees provide a simple class of rooted binary network structures that can be used to represent a variety of different biological phenomena. We study the enumerative combinatorics of unlabeled and labeled time-consistent galled trees. We present a new derivation via the symbolic method of the number of unlabeled time-consistent galled trees…
▽ More
In mathematical phylogenetics, the time-consistent galled trees provide a simple class of rooted binary network structures that can be used to represent a variety of different biological phenomena. We study the enumerative combinatorics of unlabeled and labeled time-consistent galled trees. We present a new derivation via the symbolic method of the number of unlabeled time-consistent galled trees with a fixed number of leaves and a fixed number of galls. We also derive new generating functions and asymptotics for labeled time-consistent galled trees.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Enumeration of rooted binary perfect phylogenies
Authors:
Chloe E. Shiff,
Noah A. Rosenberg
Abstract:
Rooted binary perfect phylogenies provide a generalization of rooted binary unlabeled trees in which each leaf is assigned a positive integer value that corresponds in a biological setting to the count of the number of indistinguishable lineages associated with the leaf. For the rooted binary unlabeled trees, these integers equal 1. We address a variety of enumerative problems concerning rooted bi…
▽ More
Rooted binary perfect phylogenies provide a generalization of rooted binary unlabeled trees in which each leaf is assigned a positive integer value that corresponds in a biological setting to the count of the number of indistinguishable lineages associated with the leaf. For the rooted binary unlabeled trees, these integers equal 1. We address a variety of enumerative problems concerning rooted binary perfect phylogenies with sample size $s$: the rooted binary unlabeled trees in which a sample of size $s$ lineages is distributed across the leaves of an unlabeled tree with $n$ leaves, $1 \leq n \leq s$. The enumerations further characterize the rooted binary perfect phylogenies, which include the rooted binary unlabeled trees, and which can provide a set of structures useful for various biological contexts.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Combinatorics of a dissimilarity measure for pairs of draws from discrete probability vectors on finite sets of objects
Authors:
Zarif Ahsan,
Xiran Liu,
Noah A. Rosenberg
Abstract:
Motivated by a problem in population genetics, we examine the combinatorics of dissimilarity for pairs of random unordered draws of multiple objects, with replacement, from a collection of distinct objects. Consider two draws of size $K$ taken with replacement from a set of $I$ objects, where the two draws represent samples from potentially distinct probability distributions over the set of $I$ ob…
▽ More
Motivated by a problem in population genetics, we examine the combinatorics of dissimilarity for pairs of random unordered draws of multiple objects, with replacement, from a collection of distinct objects. Consider two draws of size $K$ taken with replacement from a set of $I$ objects, where the two draws represent samples from potentially distinct probability distributions over the set of $I$ objects. We define the set of \emph{identity states} for pairs of draws via a series of actions by permutation groups, describing the enumeration of all such states for a given $K \geq 2$ and $I \geq 2$. Given two probability vectors for the $I$ objects, we compute the probability of each identity state. From the set of all such probabilities, we obtain the expectation for a dissimilarity measure, finding that it has a simple form that generalizes a result previously obtained for the case of $K=2$. We determine when the expected dissimilarity between two draws from the same probability distribution exceeds that of two draws taken from different probability distributions. We interpret the results in the setting of the genetics of polyploid organisms, those whose genetic material contains many copies of the genome ($K > 2$).
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Tree height and the asymptotic mean of the Colijn-Plazzotta rank of unlabeled binary rooted trees
Authors:
Luc Devroye,
Michael R. Doboli,
Noah A. Rosenberg,
Stephan Wagner
Abstract:
The Colijn--Plazzotta ranking is a bijective encoding of the unlabeled binary rooted trees with positive integers. We show that the rank $f(t)$ of a tree $t$ is closely related to its height $h$, the length of the longest path from a leaf to the root. We consider the rank $f(τ_n)$ of a random $n$-leaf tree $τ_n$ under each of three models: (i) uniformly random unlabeled unordered binary rooted tre…
▽ More
The Colijn--Plazzotta ranking is a bijective encoding of the unlabeled binary rooted trees with positive integers. We show that the rank $f(t)$ of a tree $t$ is closely related to its height $h$, the length of the longest path from a leaf to the root. We consider the rank $f(τ_n)$ of a random $n$-leaf tree $τ_n$ under each of three models: (i) uniformly random unlabeled unordered binary rooted trees, or unlabeled topologies; (ii) uniformly random leaf-labeled binary trees, or labeled topologies under the uniform model; and (iii) random binary search trees, or labeled topologies under the Yule--Harding model. Relying on the close relationship between tree rank and tree height, we obtain results concerning the asymptotic properties of $\log \log f(τ_n)$. In particular, we find $\mathbb{E} \{\log_2 \log f(τ_n)\} \sim 2 \sqrt{πn}$ for uniformly random unlabeled ordered binary rooted trees and uniformly random leaf-labeled binary trees, and for a constant $α\approx 4.31107$, $\mathbb{E}\{\log_2 \log f(τ_n)\} \sim α\log n $ for leaf-labeled binary trees under the Yule--Harding model. We show that the mean of $f(τ_n)$ itself under the three models is largely determined by the rank $c_{n-1}$ of the highest-ranked tree -- the caterpillar -- obtaining an asymptotic relationship with $π_n c_{n-1}$, where $π_n$ is a model-specific function of $n$. The results resolve open problems, providing a new class of results on an encoding useful in mathematical phylogenetics.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
The distributions under two species-tree models of the total number of ancestral configurations for matching gene trees and species trees
Authors:
Filippo Disanto,
Michael Fuchs,
Chun-Yen Huang,
Ariel R. Paningbatan,
Noah A. Rosenberg
Abstract:
Given a gene-tree labeled topology $G$ and a species tree $S$, the "ancestral configurations" at an internal node $k$ of $S$ represent the combinatorially different sets of gene lineages that can be present at $k$ when all possible realizations of $G$ in $S$ are considered. Ancestral configurations have been introduced as a data structure for evaluating the conditional probability of a gene-tree l…
▽ More
Given a gene-tree labeled topology $G$ and a species tree $S$, the "ancestral configurations" at an internal node $k$ of $S$ represent the combinatorially different sets of gene lineages that can be present at $k$ when all possible realizations of $G$ in $S$ are considered. Ancestral configurations have been introduced as a data structure for evaluating the conditional probability of a gene-tree labeled topology given a species tree, and their enumeration assists in describing the complexity of this computation. In the case that the gene-tree labeled topology $G=t$ matches that of the species tree $S$, by techniques of analytic combinatorics, we study distributional properties of the "total" number of ancestral configurations measured across the different nodes of a random labeled topology $t$ selected under the uniform and the Yule probability models. Under both of these probabilistic scenarios, we show that the total number $T_n$ of ancestral configurations of a random labeled topology of $n$ taxa asymptotically follows a lognormal distribution. Over uniformly distributed labeled topologies, the asymptotic growth of the mean and the variance of $T_n$ are found to satisfy $\mathbb{E}_{\rm U}[T_n] \sim 2.449 \cdot 1.333^n$ and $\mathbb{V}_{\rm U}[T_n] \sim 5.050 \cdot 1.822^n$, respectively. Under the Yule model, which assigns higher probabilities to more balanced labeled topologies, we obtain the mean $\mathbb{E}_{\rm Y}[T_n] \sim 1.425^n$ and the variance $\mathbb{V}_{\rm Y}[T_n] \sim 2.045^n$.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
Authors:
Egor Lappo,
Noah A. Rosenberg
Abstract:
To a given gene tree topology $G$ and species tree topology $S$ with leaves labeled bijectively from a fixed set $X$, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of the species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations…
▽ More
To a given gene tree topology $G$ and species tree topology $S$ with leaves labeled bijectively from a fixed set $X$, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of the species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations of lattices of ancestral configurations. For a matching gene tree topology and species tree topology $G=S$, we present a method for defining the digraph of ancestral configurations from the tree topology by using iterated cartesian products of graphs. We show that a specific set of paths on the digraph of ancestral configurations is in bijection with the set of labeled histories -- a well-known phylogenetic object that enumerates possible temporal orderings of the coalescences of a tree. For each of a series of tree families, we obtain closed-form expressions for the number of labeled histories by using this bijection to count paths on associated digraphs. Finally, we prove that our lattice construction extends to nonmatching tree pairs, and we use it to characterize pairs $(G,S)$ having the maximal number of ancestral configurations for a fixed $G$. We discuss how the construction provides new methods for performing enumerations of combinatorial aspects of gene and species trees.
△ Less
Submitted 6 September, 2023; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Enumeration of binary trees compatible with a perfect phylogeny
Authors:
Julia A. Palacios,
Anand Bhaskar,
Filippo Disanto,
Noah A. Rosenberg
Abstract:
Evolutionary models used for describing molecular sequence variation suppose that at a non-recombining genomic segment, sequences share ancestry that can be represented as a genealogy--a rooted, binary, timed tree, with tips corresponding to individual sequences. Under the infinitely-many-sites mutation model, mutations are randomly superimposed along the branches of the genealogy, so that every m…
▽ More
Evolutionary models used for describing molecular sequence variation suppose that at a non-recombining genomic segment, sequences share ancestry that can be represented as a genealogy--a rooted, binary, timed tree, with tips corresponding to individual sequences. Under the infinitely-many-sites mutation model, mutations are randomly superimposed along the branches of the genealogy, so that every mutation occurs at a chromosomal site that has not previously mutated; if a mutation occurs at an interior branch, then all individuals descending from that branch carry the mutation. The implication is that observed patterns of molecular variation from this model impose combinatorial constraints on the hidden state space of genealogies. In particular, observed molecular variation can be represented in the form of a perfect phylogeny, a tree structure that fully encodes the mutational differences among sequences. For a sample of n sequences, a perfect phylogeny might not possess n distinct leaves, and hence might be compatible with many possible binary tree structures that could describe the evolutionary relationships among the n sequences. Here, we investigate enumerative properties of the set of binary ranked and unranked tree shapes that are compatible with a perfect phylogeny, and hence, the binary ranked and unranked tree shapes conditioned on an observed pattern of mutations under the infinitely-many-sites mutation model. We provide a recursive enumeration of these shapes. We consider both perfect phylogenies that can be represented as binary and those that are multifurcating. The results have implications for computational aspects of the statistical inference of evolutionary parameters that underlie sets of molecular sequences.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Enumeration of coalescent histories for caterpillar species trees and $p$-pseudocaterpillar gene trees
Authors:
Egor Alimpiev,
Noah A Rosenberg
Abstract:
For a fixed set $X$ containing $n$ taxon labels, an ordered pair consisting of a gene tree topology $G$ and a species tree $S$ bijectively labeled with the labels of $X$ possesses a set of coalescent histories -- mappings from the set of internal nodes of $G$ to the set of edges of $S$ describing possible lists of edges in $S$ on which the coalescences in $G$ take place. Enumerations of coalescent…
▽ More
For a fixed set $X$ containing $n$ taxon labels, an ordered pair consisting of a gene tree topology $G$ and a species tree $S$ bijectively labeled with the labels of $X$ possesses a set of coalescent histories -- mappings from the set of internal nodes of $G$ to the set of edges of $S$ describing possible lists of edges in $S$ on which the coalescences in $G$ take place. Enumerations of coalescent histories for gene trees and species trees have produced suggestive results regarding the pairs $(G,S)$ that, for a fixed $n$, have the largest number of coalescent histories. We define a class of 2-cherry binary tree topologies that we term $p$-pseudocaterpillars, examining coalescent histories for non-matching pairs $(G,S)$, in the case in which $S$ has a caterpillar shape and $G$ has a $p$-pseudocaterpillar shape. Using a construction that associates coalescent histories for $(G,S)$ with a class of "roadblocked" monotonic paths, we identify the $p$-pseudocaterpillar labeled gene tree topology that, for a fixed caterpillar labeled species tree topology, gives rise to the largest number of coalescent histories. The shape that maximizes the number of coalescent histories places the "second" cherry of the $p$-pseudocaterpillar equidistantly from the root of the "first" cherry and from the tree root. A symmetry in the numbers of coalescent histories for $p$-pseudocaterpillar gene trees and caterpillar species trees is seen to exist around the maximizing value of the parameter $p$. The results provide insight into the factors that influence the number of coalescent histories possible for a given gene tree and species tree.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
The distributions under two species-tree models of the number of root ancestral configurations for matching gene trees and species trees
Authors:
Filippo Disanto,
Michael Fuchs,
Ariel R. Paningbatan,
Noah A. Rosenberg
Abstract:
For a pair consisting of a gene tree and a species tree, the ancestral configurations at an internal node of the species tree are the distinct sets of gene lineages that can be present at that node. Ancestral configurations appear in computations of gene tree probabilities under evolutionary models conditional on fixed species trees, and the enumeration of root ancestral configurations -- ancestra…
▽ More
For a pair consisting of a gene tree and a species tree, the ancestral configurations at an internal node of the species tree are the distinct sets of gene lineages that can be present at that node. Ancestral configurations appear in computations of gene tree probabilities under evolutionary models conditional on fixed species trees, and the enumeration of root ancestral configurations -- ancestral configurations at the root of the species tree -- assists in describing the complexity of these computations. In the case that the gene tree matches the species tree in topology, we study the distribution of the number of root ancestral configurations of a random labeled tree topology under each of two models.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Coalescent histories for lodgepole species trees
Authors:
Filippo Disanto,
Noah A. Rosenberg
Abstract:
Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of th…
▽ More
Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees $(λ_n)_{n\geq 0}$, in which tree $λ_n$ has $m=2n+1$ taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with $m!!$ in the number of taxa $m$. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with $m$ taxa, increasing a previous bound of $(\sqrtπ / 32)[(5m-12)/(4m-6)] m \sqrt{m}$ to $[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}$. We discuss the implications of our enumerative results for phylogenetic computations.
△ Less
Submitted 11 March, 2015;
originally announced March 2015.