-
Bounding the SNPR distance between two tree-child networks using generalised agreement forests
Authors:
Steven Kelk,
Simone Linz,
Charles Semple
Abstract:
Agreement forests continue to play a central role in the comparison of phylogenetic trees since their introduction more than 25 years ago. More specifically, they are used to characterise several distances that are based on tree rearrangement operations and related quantifiers of dissimilarity between phylogenetic trees. In addition, the concept of agreement forests continues to underlie most adva…
▽ More
Agreement forests continue to play a central role in the comparison of phylogenetic trees since their introduction more than 25 years ago. More specifically, they are used to characterise several distances that are based on tree rearrangement operations and related quantifiers of dissimilarity between phylogenetic trees. In addition, the concept of agreement forests continues to underlie most advancements in the development of algorithms that exactly compute the aforementioned measures. In this paper, we introduce agreement digraphs, a concept that generalises agreement forests for two phylogenetic trees to two phylogenetic networks. Analogous to the way in which agreement forests compute the subtree prune and regraft distance between two phylogenetic trees but inherently more complex, we then use agreement digraphs to bound the subnet prune and regraft distance between two tree-child networks from above and below and show that our bounds are tight.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Detachable pairs in $3$-connected matroids and simple $3$-connected graphs
Authors:
Nick Brettell,
Charles Semple,
Gerry Toft
Abstract:
Let $M$ be a $3$-connected matroid. A pair $\{e,f\}$ in $M$ is detachable if $M \backslash e \backslash f$ or $M / e / f$ is $3$-connected. Williams (2015) proved that if $M$ has at least 13 elements, then at least one of the following holds: $M$ has a detachable pair, $M$ has a $3$-element circuit or cocircuit, or $M$ is a spike. We address the case where $M$ has a $3$-element circuit or cocircui…
▽ More
Let $M$ be a $3$-connected matroid. A pair $\{e,f\}$ in $M$ is detachable if $M \backslash e \backslash f$ or $M / e / f$ is $3$-connected. Williams (2015) proved that if $M$ has at least 13 elements, then at least one of the following holds: $M$ has a detachable pair, $M$ has a $3$-element circuit or cocircuit, or $M$ is a spike. We address the case where $M$ has a $3$-element circuit or cocircuit, to obtain a characterisation of when a matroid with at least 13 elements has a detachable pair. As a consequence, we characterise when a simple $3$-connected graph $G$ with $|E(G)| \ge 13$ has a pair of edges $\{e,f\}$ such that $G/e/f$ or $G \backslash e\backslash f$ is simple and $3$-connected.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
When is a set of phylogenetic trees displayed by a normal network?
Authors:
Magnus Bordewich,
Simone Linz,
Charles Semple
Abstract:
A normal network is uniquely determined by the set of phylogenetic trees that it displays. Given a set $\mathcal{P}$ of rooted binary phylogenetic trees, this paper presents a polynomial-time algorithm that reconstructs the unique binary normal network whose set of displayed binary trees is $\mathcal{P}$, if such a network exists. Additionally, we show that any two rooted phylogenetic trees can be…
▽ More
A normal network is uniquely determined by the set of phylogenetic trees that it displays. Given a set $\mathcal{P}$ of rooted binary phylogenetic trees, this paper presents a polynomial-time algorithm that reconstructs the unique binary normal network whose set of displayed binary trees is $\mathcal{P}$, if such a network exists. Additionally, we show that any two rooted phylogenetic trees can be displayed by a normal network and show that this result does not extend to more than two trees. This is in contrast to tree-child networks where it has been previously shown that any collection of rooted phylogenetic trees can be displayed by a tree-child network. Lastly, we introduce a type of cherry-picking sequence that characterises when a collection $\mathcal{P}$ of rooted phylogenetic trees can be displayed by a normal network and, further, characterise the minimum number of reticulations needed over all normal networks that display $\mathcal{P}$. We then exploit these sequences to show that, for all $n\ge 3$, there exist two rooted binary phylogenetic trees on $n$ leaves that can be displayed by a tree-child network with a single reticulation, but cannot be displayed by a normal network with less than $n-2$ reticulations.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Clonal cores and flexipaths in matroids
Authors:
Nick Brettell,
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
A partitioned matroid $(M, \{X_1,X_2,\dots,X_n\})$ consists of a matroid $M$ and a partition $\{X_1,X_2,\dots,X_n\}$ of its ground set. As such structures arise frequently in structural matroid theory, this paper introduces a general technique for analyzing those special properties of partitioned matroids that depend solely on the values of the connectivities $λ(X_i)$, the local connectivities…
▽ More
A partitioned matroid $(M, \{X_1,X_2,\dots,X_n\})$ consists of a matroid $M$ and a partition $\{X_1,X_2,\dots,X_n\}$ of its ground set. As such structures arise frequently in structural matroid theory, this paper introduces a general technique for analyzing those special properties of partitioned matroids that depend solely on the values of the connectivities $λ(X_i)$, the local connectivities $\sqcap(\cup_{j\in J}X_j, \cup_{k\in K}X_k,)$, and the dual local connectivities $\sqcap^*(\cup_{h\in H}X_h, \cup_{g\in G}X_g)$. In particular, we consider those partitioned matroids in which each $X_i$ is an independent, coindependent set of clones of cardinality $λ(X_i)$. Calling such partitioned matroids clonal-core matroids, we show that special results of the above type for partitioned matroids can be verified in general by proving them just for clonal-core matroids. Aiming at the long-term goal of finding the unavoidable minors of $4$-connected matroids, we illustrate this technique by studying $4$-paths. These are sequences $(L,P_1,P_2,\ldots, P_n,R)$ of sets that partition the ground set of a matroid so that the union of any proper initial segment of parts is $4$-separating. Viewing the ends $L$ and $R$ as fixed, we call such a partition a $4$-flexipath if $(L,Q_1,Q_2,\ldots, Q_n,R)$ is a $4$-path for all permutations $(Q_1,Q_2,\ldots, Q_n)$ of $(P_1,P_2,\ldots, P_n)$. A straightforward simplification enables us to focus on $(4,c)$-flexipaths for some $c$ in $\{1,2,3\}$, that is, those $4$-flexipaths for which $λ(Q_i) = c$ and $λ(Q_i \cup Q_j) > c$ for all distinct $i$ and $j$. Our main result for $4$-paths is that the only non-trivial case that arises here is when $c=2$. In that case, there are essentially only two possible dual pairs of $(4,c)$-flexipaths when $n \ge 5$.
△ Less
Submitted 15 April, 2025; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Phylogenetic trees defined by at most three characters
Authors:
Katharina T. Huber,
Simone Linz,
Vincent Moulton,
Charles Semple
Abstract:
In evolutionary biology, phylogenetic trees are commonly inferred from a set of characters (partitions) of a collection of biological entities (e.g., species or individuals in a population). Such characters naturally arise from molecular sequences or morphological data. Interestingly, it has been known for some time that any binary phylogenetic tree can be (convexly) defined by a set of at most fo…
▽ More
In evolutionary biology, phylogenetic trees are commonly inferred from a set of characters (partitions) of a collection of biological entities (e.g., species or individuals in a population). Such characters naturally arise from molecular sequences or morphological data. Interestingly, it has been known for some time that any binary phylogenetic tree can be (convexly) defined by a set of at most four characters, and that there are binary phylogenetic trees for which three characters are not enough. Thus, it is of interest to characterise those phylogenetic trees that are defined by a set of at most three characters. In this paper, we provide such a characterisation, in particular proving that a binary phylogenetic tree $T$ is defined by a set of at most three characters precisely if $T$ has no internal subtree isomorphic to a certain tree.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
What is a 4-connected matroid?
Authors:
Nick Brettell,
Susan Jowett,
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
The {\em breadth} of a tangle $\mathcal{T}$ in a matroid is the size of the largest spanning uniform submatroid of the tangle matroid of $\mathcal{T}$. A matroid $M$ is {\em weakly $4$-connected} if it is 3-connected and whenever $(X,Y)$ is a partition of $E(M)$ with $|X|,|Y|>4$, then $λ(X)\geq 3$. We prove that if $\mathcal{T}$ is a tangle of order $k\geq 4$ and breadth $l$ in a matroid $M$, then…
▽ More
The {\em breadth} of a tangle $\mathcal{T}$ in a matroid is the size of the largest spanning uniform submatroid of the tangle matroid of $\mathcal{T}$. A matroid $M$ is {\em weakly $4$-connected} if it is 3-connected and whenever $(X,Y)$ is a partition of $E(M)$ with $|X|,|Y|>4$, then $λ(X)\geq 3$. We prove that if $\mathcal{T}$ is a tangle of order $k\geq 4$ and breadth $l$ in a matroid $M$, then $M$ has a weakly 4-connected minor $N$ with a tangle $\mathcal{T}$ of order $k$, breadth $l$ and has the property that $\mathcal{T}$ is the tangle in $M$ induced by $\mathcal{T}_N$. A set $Z$ of elements of a matroid $M$ is $4$-{\em connected} if $λ(A)\geq\min\{|A\cap Z|,|Z-A|,3\}$ for all $A\subseteq E(M)$. As a corollary of our theorems on tangles we prove that if $M$ contains an $n$-element $4$-connected set where $n\geq 7$, then $M$ has a weakly $4$-connected minor that contains an $n$-element $4$-connected set.
△ Less
Submitted 3 March, 2025; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Quantifying the difference between phylogenetic diversity and diversity indices
Authors:
Magnus Bordewich,
Charles Semple
Abstract:
Phylogenetic diversity is a popular measure for quantifying the biodiversity of a collection $Y$ of species, while phylogenetic diversity indices provide a way to apportion phylogenetic diversity to individual species. Typically, for some specific diversity index, the phylogenetic diversity of $Y$ is not equal to the sum of the diversity indices of the species in $Y.$ In this paper, we investigate…
▽ More
Phylogenetic diversity is a popular measure for quantifying the biodiversity of a collection $Y$ of species, while phylogenetic diversity indices provide a way to apportion phylogenetic diversity to individual species. Typically, for some specific diversity index, the phylogenetic diversity of $Y$ is not equal to the sum of the diversity indices of the species in $Y.$ In this paper, we investigate the extent of this difference for two commonly-used indices: Fair Proportion and Equal Splits. In particular, we determine the maximum value of this difference under various instances including when the associated rooted phylogenetic tree is allowed to vary across all root phylogenetic trees with the same leaf set and whose edge lengths are constrained by either their total sum or their maximum value.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
The excluded minors for the intersection of bicircular and lattice path matroids
Authors:
Emma Hogan,
Charles Semple
Abstract:
The classes of bicircular matroids and lattice path matroids are closed under minors. The complete list of excluded minors for the class of lattice path matroids is known, and it has been recently shown that the analogous list for the class of bicircular matroids is finite. In this paper, we establish the complete list of excluded minors for the class of matroids that is the intersection of these…
▽ More
The classes of bicircular matroids and lattice path matroids are closed under minors. The complete list of excluded minors for the class of lattice path matroids is known, and it has been recently shown that the analogous list for the class of bicircular matroids is finite. In this paper, we establish the complete list of excluded minors for the class of matroids that is the intersection of these two classes. This resolves a recently posed open problem.
△ Less
Submitted 12 January, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Hypercubes and Hamilton cycles of display sets of rooted phylogenetic networks
Authors:
Janosch Döcker,
Simone Linz,
Charles Semple
Abstract:
In the context of reconstructing phylogenetic networks from a collection of phylogenetic trees, several characterisations and subsequently algorithms have been established to reconstruct a phylogenetic network that collectively embeds all trees in the input in some minimum way. For many instances however, the resulting network also embeds additional phylogenetic trees that are not part of the inpu…
▽ More
In the context of reconstructing phylogenetic networks from a collection of phylogenetic trees, several characterisations and subsequently algorithms have been established to reconstruct a phylogenetic network that collectively embeds all trees in the input in some minimum way. For many instances however, the resulting network also embeds additional phylogenetic trees that are not part of the input. However, little is known about these inferred trees. In this paper, we explore the relationships among all phylogenetic trees that are embedded in a given phylogenetic network. First, we investigate some combinatorial properties of the collection P of all rooted binary phylogenetic trees that are embedded in a rooted binary phylogenetic network N. To this end, we associated a particular graph G, which we call rSPR graph, with the elements in P and show that, if |P|=2^k, where k is the number of vertices with in-degree two in N, then G has a Hamiltonian cycle. Second, by exploiting rSPR graphs and properties of hypercubes, we turn to the well-studied class of rooted binary level-1 networks and give necessary and sufficient conditions for when a set of rooted binary phylogenetic trees can be embedded in a level-1 network without inferring any additional trees. Lastly, we show how these conditions translate into a polynomial-time algorithm to reconstruct such a network if it exists.
△ Less
Submitted 18 August, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
A splitter theorem for elastic elements in $3$-connected matroids
Authors:
George Drummond,
Charles Semple
Abstract:
An element $e$ of a $3$-connected matroid $M$ is elastic if ${\rm si}(M/e)$, the simplification of $M/e$, and ${\rm co}(M\backslash e)$, the cosimplification of $M\backslash e$, are both $3$-connected. It was recently shown that if $|E(M)|\geq 4$, then $M$ has at least four elastic elements provided $M$ has no $4$-element fans and no member of a specific family of $3$-separators. In this paper, we…
▽ More
An element $e$ of a $3$-connected matroid $M$ is elastic if ${\rm si}(M/e)$, the simplification of $M/e$, and ${\rm co}(M\backslash e)$, the cosimplification of $M\backslash e$, are both $3$-connected. It was recently shown that if $|E(M)|\geq 4$, then $M$ has at least four elastic elements provided $M$ has no $4$-element fans and no member of a specific family of $3$-separators. In this paper, we extend this wheels-and-whirls type result to a splitter theorem, where the removal of elements is with respect to elasticity and keeping a specified $3$-connected minor. We also prove that if $M$ has exactly four elastic elements, then it has path-width three. Lastly, we resolve a question of Whittle and Williams, and show that past analogous results, where the removal of elements is relative to a fixed basis, are consequences of this work.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
The excluded minors for 2- and 3-regular matroids
Authors:
Nick Brettell,
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
The class of 2-regular matroids is a natural generalisation of regular and near-regular matroids. We prove an excluded-minor characterisation for the class of 2-regular matroids. The class of 3-regular matroids coincides with the class of matroids representable over the Hydra-5 partial field, and the 3-connected matroids in the class with a $U_{2,5}$- or $U_{3,5}$-minor are precisely those with si…
▽ More
The class of 2-regular matroids is a natural generalisation of regular and near-regular matroids. We prove an excluded-minor characterisation for the class of 2-regular matroids. The class of 3-regular matroids coincides with the class of matroids representable over the Hydra-5 partial field, and the 3-connected matroids in the class with a $U_{2,5}$- or $U_{3,5}$-minor are precisely those with six inequivalent representations over GF(5). We also prove that an excluded minor for this class has at most 15 elements.
△ Less
Submitted 6 September, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Excluded minors are almost fragile II: essential elements
Authors:
Nick Brettell,
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
Let $M$ be an excluded minor for the class of $\mathbb{P}$-representable matroids for some partial field $\mathbb{P}$, let $N$ be a $3$-connected strong $\mathbb{P}$-stabilizer that is non-binary, and suppose $M$ has a pair of elements $\{a,b\}$ such that $M\backslash a,b$ is $3$-connected with an $N$-minor. Suppose also that $|E(M)| \geq |E(N)|+11$ and $M \backslash a,b$ is not $N$-fragile. In th…
▽ More
Let $M$ be an excluded minor for the class of $\mathbb{P}$-representable matroids for some partial field $\mathbb{P}$, let $N$ be a $3$-connected strong $\mathbb{P}$-stabilizer that is non-binary, and suppose $M$ has a pair of elements $\{a,b\}$ such that $M\backslash a,b$ is $3$-connected with an $N$-minor. Suppose also that $|E(M)| \geq |E(N)|+11$ and $M \backslash a,b$ is not $N$-fragile. In the prequel to this paper, we proved that $M \backslash a,b$ is at most five elements away from an $N$-fragile minor. An element $e$ in a matroid $M'$ is $N$-essential if neither $M'/e$ nor $M' \backslash e$ has an $N$-minor. In this paper, we prove that, under mild assumptions, $M \backslash a,b$ is one element away from a minor having at least $r(M)-2$ elements that are $N$-essential.
△ Less
Submitted 14 August, 2023; v1 submitted 26 June, 2022;
originally announced June 2022.
-
Cyclic matroids
Authors:
Nick Brettell,
Charles Semple,
Gerry Toft
Abstract:
For all positive integers $s$ and $t$ exceeding one, a matroid $M$ on $n$ elements is {\em nearly $(s, t)$-cyclic} if there is a cyclic ordering $σ$ of its ground set such that every $s-1$ consecutive elements of $σ$ are contained in an $s$-element circuit and every $t-1$ consecutive elements of $σ$ are contained in a $t$-element cocircuit. In the case $s=t$, nearly $(s, s)$-cyclic matroids have b…
▽ More
For all positive integers $s$ and $t$ exceeding one, a matroid $M$ on $n$ elements is {\em nearly $(s, t)$-cyclic} if there is a cyclic ordering $σ$ of its ground set such that every $s-1$ consecutive elements of $σ$ are contained in an $s$-element circuit and every $t-1$ consecutive elements of $σ$ are contained in a $t$-element cocircuit. In the case $s=t$, nearly $(s, s)$-cyclic matroids have been studied previously. In this paper, we show that if $M$ is nearly $(s, t)$-cyclic and $n$ is sufficiently large, then these $s$-element circuits and $t$-element cocircuits are consecutive in $σ$ in a prescribed way, that is, $M$ is "$(s, t)$-cyclic". Furthermore, we show that, given $s$ and $t$ where $t\ge s$, every $(s, t)$-cyclic matroid on $n > s+t-2$ elements is a weak-map image of the $\left(\frac{t-s}{2}\right)$-th truncation of a certain $(s, s)$-cyclic matroid. If $s=3$, this certain matroid is the rank-$\frac{n}{2}$ whirl, and if $s=4$, this certain matroid is the rank-$\frac{n}{2}$ free swirl.
△ Less
Submitted 23 June, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Counting and optimising maximum phylogenetic diversity sets
Authors:
Kerry Manson,
Charles Semple,
Mike Steel
Abstract:
In conservation biology, phylogenetic diversity (PD) provides a way to quantify the impact of the current rapid extinction of species on the evolutionary `Tree of Life'. This approach recognises that extinction not only removes species but also the branches of the tree on which unique features shared by the extinct species arose. In this paper, we investigate three questions that are relevant to P…
▽ More
In conservation biology, phylogenetic diversity (PD) provides a way to quantify the impact of the current rapid extinction of species on the evolutionary `Tree of Life'. This approach recognises that extinction not only removes species but also the branches of the tree on which unique features shared by the extinct species arose. In this paper, we investigate three questions that are relevant to PD. The first asks how many sets of species of given size $k$ preserve the maximum possible amount of PD in a given tree. The number of such maximum PD sets can be very large, even for moderate-sized phylogenies. We provide a combinatorial characterisation of maximum PD sets, focusing on the setting where the branch lengths are ultrametric (e.g. proportional to time). This leads to a polynomial-time algorithm for calculating the number of maximum PD sets of size $k$ by applying a generating function; we also investigate the types of tree shapes that harbour the most (or fewest) maximum PD sets of size $k$. Our second question concerns optimising a linear function on the species (regarded as leaves of the phylogenetic tree) across all the maximum PD sets of a given size. Using the characterisation result from the first question, we show how this optimisation problem can be solved in polynomial time, even though the number of maximum PD sets can grow exponentially. Our third question considers a dual problem: If $k$ species were to become extinct, then what is the largest possible {\em loss} of PD in the resulting tree? For this question, we describe a polynomial-time solution based on dynamical programming.
△ Less
Submitted 11 April, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
On the Complexity of Optimising Variants of Phylogenetic Diversity on Phylogenetic Networks
Authors:
Magnus Bordewich,
Charles Semple,
Kristina Wicke
Abstract:
Phylogenetic Diversity (PD) is a prominent quantitative measure of the biodiversity of a collection of present-day species (taxa). This measure is based on the evolutionary distance among the species in the collection. Loosely speaking, if $\mathcal{T}$ is a rooted phylogenetic tree whose leaf set $X$ represents a set of species and whose edges have real-valued lengths (weights), then the PD score…
▽ More
Phylogenetic Diversity (PD) is a prominent quantitative measure of the biodiversity of a collection of present-day species (taxa). This measure is based on the evolutionary distance among the species in the collection. Loosely speaking, if $\mathcal{T}$ is a rooted phylogenetic tree whose leaf set $X$ represents a set of species and whose edges have real-valued lengths (weights), then the PD score of a subset $S$ of $X$ is the sum of the weights of the edges of the minimal subtree of $\mathcal{T}$ connecting the species in $S$. In this paper, we define several natural variants of the PD score for a subset of taxa which are related by a known rooted phylogenetic network. Under these variants, we explore, for a positive integer $k$, the computational complexity of determining the maximum PD score over all subsets of taxa of size $k$ when the input is restricted to different classes of rooted phylogenetic networks
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Non-essential arcs in phylogenetic networks
Authors:
Simone Linz,
Charles Semple
Abstract:
In the study of rooted phylogenetic networks, analyzing the set of rooted phylogenetic trees that are embedded in such a network is a recurring task. From an algorithmic viewpoint, this analysis almost always requires an exhaustive search of a particular multiset $S$ of rooted phylogenetic trees that are embedded in a rooted phylogenetic network $\mathcal{N}$. Since the size of $S$ is exponential…
▽ More
In the study of rooted phylogenetic networks, analyzing the set of rooted phylogenetic trees that are embedded in such a network is a recurring task. From an algorithmic viewpoint, this analysis almost always requires an exhaustive search of a particular multiset $S$ of rooted phylogenetic trees that are embedded in a rooted phylogenetic network $\mathcal{N}$. Since the size of $S$ is exponential in the number of reticulations of $\mathcal{N}$, it is consequently of interest to keep this number as small as possible but without loosing any element of $S$. In this paper, we take a first step towards this goal by introducing the notion of a non-essential arc of $\mathcal{N}$, which is an arc whose deletion from $\mathcal{N}$ results in a rooted phylogenetic network $\mathcal{N}'$ such that the sets of rooted phylogenetic trees that are embedded in $\mathcal{N}$ and $\mathcal{N}'$ are the same. We investigate the popular class of tree-child networks and characterize which arcs are non-essential. This characterization is based on a family of directed graphs. Using this novel characterization, we show that identifying and deleting all non-essential arcs in a tree-child network takes time that is cubic in the number of leaves of the network. Moreover, we show that deciding if a given arc of an arbitrary phylogenetic network is non-essential is $Π_2^P$-complete.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Trinets encode orchard phylogenetic networks
Authors:
Charles Semple,
Gerry Toft
Abstract:
Rooted triples, rooted binary phylogenetic trees on three leaves, are sufficient to encode rooted binary phylogenetic trees. That is, if $\mathcal T$ and $\mathcal T'$ are rooted binary phylogenetic $X$-trees that infers the same set of rooted triples, then $\mathcal T$ and $\mathcal T'$ are isomorphic. However, in general, this sufficiency does not extend to rooted binary phylogenetic networks. I…
▽ More
Rooted triples, rooted binary phylogenetic trees on three leaves, are sufficient to encode rooted binary phylogenetic trees. That is, if $\mathcal T$ and $\mathcal T'$ are rooted binary phylogenetic $X$-trees that infers the same set of rooted triples, then $\mathcal T$ and $\mathcal T'$ are isomorphic. However, in general, this sufficiency does not extend to rooted binary phylogenetic networks. In this paper, we show that trinets, phylogenetic network analogues of rooted triples, are sufficient to encode rooted binary orchard networks. Rooted binary orchard networks naturally generalise rooted binary tree-child networks. Moreover, we present a polynomial-time algorithm for building a rooted binary orchard network from its set of trinets. As a consequence, this algorithm affirmatively answers a previously-posed question of whether there is a polynomial-time algorithm for building a rooted binary tree-child network from the set of trinets it infers.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Defining phylogenetic networks using ancestral profiles
Authors:
Allan Bai,
Peter Erdos,
Charles Semple,
Mike Steel
Abstract:
Rooted phylogenetic networks provide a more complete representation of the ancestral relationship between species than phylogenetic trees when reticulate evolutionary processes are at play. One way to reconstruct a phylogenetic network is to consider its `ancestral profile' (the number of paths from each ancestral vertex to each leaf). In general, this information does not uniquely determine the u…
▽ More
Rooted phylogenetic networks provide a more complete representation of the ancestral relationship between species than phylogenetic trees when reticulate evolutionary processes are at play. One way to reconstruct a phylogenetic network is to consider its `ancestral profile' (the number of paths from each ancestral vertex to each leaf). In general, this information does not uniquely determine the underlying phylogenetic network. A recent paper considered a new class of phylogenetic networks called `orchard networks' where this uniqueness was claimed to hold. Here we show that an additional restriction on the network, that of being `stack-free', is required in order for the original uniqueness claim to hold. On the other hand, if the additional stack-free restriction is lifted, we establish an alternative result; namely, there is uniqueness within the class of orchard networks up to the resolution of vertices of high in-degree.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Elastic elements in 3-connected matroids
Authors:
George Drummond,
Zachary Gershkoff,
Susan Jowett,
Charles Semple,
Jagdeep Singh
Abstract:
It follows by Bixby's Lemma that if $e$ is an element of a $3$-connected matroid $M$, then either ${\rm co}(M\delete e)$, the cosimplification of $M\delete e$, or ${\rm si}(M/e)$, the simplification of $M/e$, is $3$-connected. A natural question to ask is whether $M$ has an element $e$ such that both ${\rm co}(M\delete e)$ and ${\rm si}(M/e)$ are $3$-connected. Calling such an element "elastic", i…
▽ More
It follows by Bixby's Lemma that if $e$ is an element of a $3$-connected matroid $M$, then either ${\rm co}(M\delete e)$, the cosimplification of $M\delete e$, or ${\rm si}(M/e)$, the simplification of $M/e$, is $3$-connected. A natural question to ask is whether $M$ has an element $e$ such that both ${\rm co}(M\delete e)$ and ${\rm si}(M/e)$ are $3$-connected. Calling such an element "elastic", in this paper we show that if $|E(M)|\ge 4$, then $M$ has at least four elastic elements provided $M$ has no $4$-element fans and, up to duality, $M$ has no $3$-separating set $S$ that is the disjoint union of a rank-$2$ subset and a corank-$2$ subset of $E(M)$ such that $M|S$ is isomorphic to a member or a single-element deletion of a member of a certain family of matroids.
△ Less
Submitted 22 November, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
On the maximum agreement subtree conjecture for balanced trees
Authors:
Magnus Bordewich,
Simone Linz,
Megan Owen,
Katherine St. John,
Charles Semple,
Kristina Wicke
Abstract:
We give a counterexample to the conjecture of Martin and Thatte that two balanced rooted binary leaf-labelled trees on $n$ leaves have a maximum agreement subtree (MAST) of size at least $n^{\frac{1}{2}}$. In particular, we show that for any $c>0$, there exist two balanced rooted binary leaf-labelled trees on $n$ leaves such that any MAST for these two trees has size less than $c n^{\frac{1}{2}}$.…
▽ More
We give a counterexample to the conjecture of Martin and Thatte that two balanced rooted binary leaf-labelled trees on $n$ leaves have a maximum agreement subtree (MAST) of size at least $n^{\frac{1}{2}}$. In particular, we show that for any $c>0$, there exist two balanced rooted binary leaf-labelled trees on $n$ leaves such that any MAST for these two trees has size less than $c n^{\frac{1}{2}}$. We also improve the lower bound of the size of such a MAST to $n^{\frac{1}{6}}$.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Caterpillars on three and four leaves are sufficient to reconstruct normal networks
Authors:
Simone Linz,
Charles Semple
Abstract:
While every rooted binary phylogenetic tree is determined by its set of displayed rooted triples, such a result does not hold for an arbitrary rooted binary phylogenetic network. In particular, there exist two non-isomorphic rooted binary temporal normal networks that display the same set of rooted triples. Moreover, without any structural constraint on the rooted phylogenetic networks under consi…
▽ More
While every rooted binary phylogenetic tree is determined by its set of displayed rooted triples, such a result does not hold for an arbitrary rooted binary phylogenetic network. In particular, there exist two non-isomorphic rooted binary temporal normal networks that display the same set of rooted triples. Moreover, without any structural constraint on the rooted phylogenetic networks under consideration, similarly negative results have also been established for binets and trinets which are rooted subnetworks on two and three leaves, respectively. Hence, in general, piecing together a rooted phylogenetic network from such a set of small building blocks appears insurmountable. In contrast to these results, in this paper, we show that a rooted binary normal network is determined by its sets of displayed caterpillars (particular type of subtrees) on three and four leaves. The proof is constructive and realises a polynomial-time algorithm that takes the sets of caterpillars on three and four leaves displayed by a rooted binary normal network and, up to isomorphism, reconstructs this network.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Placing quantified variants of 3-SAT and Not-All-Equal 3-SAT in the polynomial hierarchy
Authors:
Janosch Döcker,
Britta Dorn,
Simone Linz,
Charles Semple
Abstract:
The complexity of variants of 3-SAT and Not-All-Equal 3-SAT is well studied. However, in contrast, very little is known about the complexity of the problems' quantified counterparts. In the first part of this paper, we show that $\forall \exists$ 3-SAT is $Π_2^P$-complete even if (1) each variable appears exactly twice unnegated and exactly twice negated, (2) each clause is a disjunction of exactl…
▽ More
The complexity of variants of 3-SAT and Not-All-Equal 3-SAT is well studied. However, in contrast, very little is known about the complexity of the problems' quantified counterparts. In the first part of this paper, we show that $\forall \exists$ 3-SAT is $Π_2^P$-complete even if (1) each variable appears exactly twice unnegated and exactly twice negated, (2) each clause is a disjunction of exactly three distinct variables, and (3) the number of universal variables is equal to the number of existential variables. Furthermore, we show that the problem remains $Π_2^P$-complete if (1a) each universal variable appears exactly once unnegated and exactly once negated, (1b) each existential variable appears exactly twice unnegated and exactly twice negated, and (2) and (3) remain unchanged. On the other hand, the problem becomes NP-complete for certain variants in which each universal variable appears exactly once. In the second part of the paper, we establish $Π_2^P$-completeness for $\forall \exists$ Not-All-Equal 3-SAT even if (1') the Boolean formula is linear and monotone, (2') each universal variable appears exactly once and each existential variable appears exactly three times, and (3') each clause is a disjunction of exactly three distinct variables that contains at most one universal variable. On the positive side, we uncover variants of $\forall \exists$ Not-All-Equal 3-SAT that are co-NP-complete or solvable in polynomial time.
△ Less
Submitted 21 August, 2019; v1 submitted 14 August, 2019;
originally announced August 2019.
-
Orienting undirected phylogenetic networks
Authors:
Katharina T. Huber,
Leo van Iersel,
Remie Janssen,
Mark Jones,
Vincent Moulton,
Yukihiro Murakami,
Charles Semple
Abstract:
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that…
▽ More
This paper studies the relationship between undirected (unrooted) and directed (rooted) phylogenetic networks. We describe a polynomial-time algorithm for deciding whether an undirected nonbinary phylogenetic network, given the locations of the root and reticulation vertices, can be oriented as a directed nonbinary phylogenetic network. Moreover, we characterize when this is possible and show that, in such instances, the resulting directed nonbinary phylogenetic network is unique. In addition, without being given the location of the root and the reticulation vertices, we describe an algorithm for deciding whether an undirected binary phylogenetic network $N$ can be oriented as a directed binary phylogenetic network of a certain class. The algorithm is fixed-parameter tractable (FPT) when the parameter is the level of $N$ and is applicable to classes of directed phylogenetic networks that satisfy certain conditions. As an example, we show that the well-studied class of binary tree-child networks satisfies these conditions.
△ Less
Submitted 29 September, 2023; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Display sets of normal and tree-child networks
Authors:
Janosch Doecker,
Simone Linz,
Charles Semple
Abstract:
Phylogenetic trees canonically arise as embeddings of phylogenetic networks. We recently showed that the problem of deciding if two phylogenetic networks embed the same sets of phylogenetic trees is computationally hard, \blue{in particular, we showed it to be $Π^P_2$-complete}. In this paper, we establish a polynomial-time algorithm for this decision problem if the initial two networks consists o…
▽ More
Phylogenetic trees canonically arise as embeddings of phylogenetic networks. We recently showed that the problem of deciding if two phylogenetic networks embed the same sets of phylogenetic trees is computationally hard, \blue{in particular, we showed it to be $Π^P_2$-complete}. In this paper, we establish a polynomial-time algorithm for this decision problem if the initial two networks consists of a normal network and a tree-child network. The running time of the algorithm is quadratic in the size of the leaf sets.
△ Less
Submitted 20 January, 2019;
originally announced January 2019.
-
Displaying trees across two phylogenetic networks
Authors:
Janosch Döcker,
Simone Linz,
Charles Semple
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees to leaf-labeled directed acyclic graphs that represent ancestral relationships between species whose past includes non-tree-like events such as hybridization and horizontal gene transfer. Indeed, each phylogenetic network embeds a collection of phylogenetic trees. Referring to the collection of trees that a given phylogenetic network…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees to leaf-labeled directed acyclic graphs that represent ancestral relationships between species whose past includes non-tree-like events such as hybridization and horizontal gene transfer. Indeed, each phylogenetic network embeds a collection of phylogenetic trees. Referring to the collection of trees that a given phylogenetic network $N$ embeds as the display set of $N$, several questions in the context of the display set of $N$ have recently been analyzed. For example, the widely studied Tree-Containment problem asks if a given phylogenetic tree is contained in the display set of a given network. The focus of this paper are two questions that naturally arise in comparing the display sets of two phylogenetic networks. First, we analyze the problem of deciding if the display sets of two phylogenetic networks have a tree in common. Surprisingly, this problem turns out to be NP-complete even for two temporal normal networks. Second, we investigate the question of whether or not the display sets of two phylogenetic networks are equal. While we recently showed that this problem is polynomial-time solvable for a normal and a tree-child network, it is computationally hard in the general case. In establishing hardness, we show that the problem is contained in the second level of the polynomial-time hierarchy. Specifically, it is $Π_2^P$-complete. Along the way, we show that two other problems are also $Π_2^P$-complete, one of which being a generalization of Tree-Containment.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
A class of phylogenetic networks reconstructable from ancestral profiles
Authors:
Peter L. Erdos,
Charles Semple,
Mike Steel
Abstract:
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set $X$ of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and w…
▽ More
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set $X$ of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and while phylogenetic trees can be uniquely determined from various simple combinatorial data on $X$, for networks the reconstruction question is much more subtle. Here we ask when can a network be uniquely reconstructed from its `ancestral profile' (the number of paths from each ancestral vertex to each element in $X$). We show that reconstruction holds (even within the class of all networks) for a class of networks we call `orchard networks', and we provide a polynomial-time algorithm for reconstructing any orchard network from its ancestral profile. Our approach relies on establishing a structural theorem for orchard networks, which also provides for a fast (polynomial-time) algorithm to test if any given network is of orchard type. Since the class of orchard networks includes tree-sibling tree-consistent networks and tree-child networks, our result generalise reconstruction results from 2008 and 2009. Orchard networks allow for an unbounded number $k$ of reticulation vertices, in contrast to tree-sibling tree-consistent networks and tree-child networks for which $k$ is at most $2|X|-4$ and $|X|-1$, respectively.
△ Less
Submitted 1 May, 2019; v1 submitted 13 January, 2019;
originally announced January 2019.
-
Matroids with a cyclic arrangement of circuits and cocircuits
Authors:
Nick Brettell,
Deborah Chun,
Tara Fife,
Charles Semple
Abstract:
For all positive integers $t$ exceeding one, a matroid has the cyclic $(t-1,t)$-property if its ground set has a cyclic ordering $σ$ such that every set of $t-1$ consecutive elements in $σ$ is contained in a $t$-element circuit and $t$-element cocircuit. We show that if $M$ has the cyclic $(t-1,t)$-property and $|E(M)|$ is sufficiently large, then these $t$-element circuits and $t$-element cocircu…
▽ More
For all positive integers $t$ exceeding one, a matroid has the cyclic $(t-1,t)$-property if its ground set has a cyclic ordering $σ$ such that every set of $t-1$ consecutive elements in $σ$ is contained in a $t$-element circuit and $t$-element cocircuit. We show that if $M$ has the cyclic $(t-1,t)$-property and $|E(M)|$ is sufficiently large, then these $t$-element circuits and $t$-element cocircuits are arranged in a prescribed way in $σ$, which, for odd $t$, is analogous to how 3-element circuits and cocircuits appear in wheels and whirls, and, for even $t$, is analogous to how 4-element circuits and cocircuits appear in swirls. Furthermore, we show that any appropriate concatenation $Φ$ of $σ$ is a flower. If $t$ is odd, then $Φ$ is a daisy, but if $t$ is even, then, depending on $M$, it is possible for $Φ$ to be either an anemone or a daisy.
△ Less
Submitted 10 June, 2018;
originally announced June 2018.
-
Attaching leaves and picking cherries to characterise the hybridisation number for a set of phylogenies
Authors:
Simone Linz,
Charles Semple
Abstract:
Throughout the last decade, we have seen much progress towards characterising and computing the minimum hybridisation number for a set P of rooted phylogenetic trees. Roughly speaking, this minimum quantifies the number of hybridisation events needed to explain a set of phylogenetic trees by simultaneously embedding them into a phylogenetic network. From a mathematical viewpoint, the notion of agr…
▽ More
Throughout the last decade, we have seen much progress towards characterising and computing the minimum hybridisation number for a set P of rooted phylogenetic trees. Roughly speaking, this minimum quantifies the number of hybridisation events needed to explain a set of phylogenetic trees by simultaneously embedding them into a phylogenetic network. From a mathematical viewpoint, the notion of agreement forests is the underpinning concept for almost all results that are related to calculating the minimum hybridisation number for when |P|=2. However, despite various attempts, characterising this number in terms of agreement forests for |P|>2 remains elusive. In this paper, we characterise the minimum hybridisation number for when P is of arbitrary size and consists of not necessarily binary trees. Building on our previous work on cherry-picking sequences, we first establish a new characterisation to compute the minimum hybridisation number in the space of tree-child networks. Subsequently, we show how this characterisation extends to the space of all rooted phylogenetic networks. Moreover, we establish a particular hardness result that gives new insight into some of the limitations of agreement forests.
△ Less
Submitted 4 January, 2019; v1 submitted 12 December, 2017;
originally announced December 2017.
-
Recovering tree-child networks from shortest inter-taxa distance information
Authors:
Magnus Bordewich,
Katharina T Huber,
Vincent Moulton,
Charles Semple
Abstract:
Phylogenetic networks are a type of leaf-labelled, acyclic, directed graph used by biologists to represent the evolutionary history of species whose past includes reticulation events. A phylogenetic network is tree-child if each non-leaf vertex is the parent of a tree vertex or a leaf. Up to a certain equivalence, it has been recently shown that, under two different types of weightings, edge-weigh…
▽ More
Phylogenetic networks are a type of leaf-labelled, acyclic, directed graph used by biologists to represent the evolutionary history of species whose past includes reticulation events. A phylogenetic network is tree-child if each non-leaf vertex is the parent of a tree vertex or a leaf. Up to a certain equivalence, it has been recently shown that, under two different types of weightings, edge-weighted tree-child networks are determined by their collection of distances between each pair of taxa. However, the size of these collections can be exponential in the size of the taxa set. In this paper, we show that, if we ignore redundant edges, the same results are obtained with only a quadratic number of inter-taxa distances by using the shortest distance between each pair of taxa. The proofs are constructive and give cubic-time algorithms in the size of the taxa sets for building such weighted networks.
△ Less
Submitted 23 November, 2017;
originally announced November 2017.
-
Quarnet inference rules for level-1 networks
Authors:
Katharine T. Huber,
Vincent Moulton,
Charles Semple,
Taoyang Wu
Abstract:
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set $X$ of species from a collection of trees, each having leaf-set some subset of $X$. In the 1980's characterizations, certain inference rules were given for when a collection of 4-leav…
▽ More
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set $X$ of species from a collection of trees, each having leaf-set some subset of $X$. In the 1980's characterizations, certain inference rules were given for when a collection of 4-leaved trees, one for each 4-element subset of $X$, can all be simultaneously displayed by a single supertree with leaf-set $X$. Recently, it has become of interest to extend such results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has been shown that a certain type of phylogenetic network, called a level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of $X$, can all be simultaneously displayed by a level-1 network with leaf-set $X$. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of $X$ from orderings on subsets of $X$ of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
Tree-based networks: characterisations, metrics, and support trees
Authors:
Joan Carles Pons,
Charles Semple,
Mike Steel
Abstract:
Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree $T$, called a base tree, and adding arcs between arcs of $T$. The class of ph…
▽ More
Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree $T$, called a base tree, and adding arcs between arcs of $T$. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network $N$, we characterise the support trees of $N$, that is, the tree-based embeddings of $N$. We use this characterisation to give an explicit formula for the number of support trees of $N$ when $N$ is binary. This formula is written in terms of the components of a bipartite graph.
△ Less
Submitted 4 September, 2018; v1 submitted 21 October, 2017;
originally announced October 2017.
-
A universal tree-based network with the minimum number of reticulations
Authors:
Magnus Bordewich,
Charles Semple
Abstract:
A tree-based network $\mathcal N$ on $X$ is universal if every rooted binary phylogenetic $X$-tree is a base tree for $\mathcal N$. Hayamizu and, independently, Zhang constructively showed that, for all positive integers $n$, there exists an universal tree-based network on $n$ leaves. For all $n$, Hayamizu's construction contains $Θ(n!)$ reticulations, while Zhang's construction contains $Θ(n^2)$…
▽ More
A tree-based network $\mathcal N$ on $X$ is universal if every rooted binary phylogenetic $X$-tree is a base tree for $\mathcal N$. Hayamizu and, independently, Zhang constructively showed that, for all positive integers $n$, there exists an universal tree-based network on $n$ leaves. For all $n$, Hayamizu's construction contains $Θ(n!)$ reticulations, while Zhang's construction contains $Θ(n^2)$ reticulations. A simple counting argument shows that an universal tree-based network has $Ω(n\log n)$ reticulations. With this in mind, Hayamizu as well as Steel posed the problem of determining whether or not such networks exists with $O(n\log n)$ reticulations. In this paper, we show that, for all $n$, there exists an universal tree-based network on $n$ leaves with $O(n\log n)$ reticulations.
△ Less
Submitted 21 December, 2017; v1 submitted 25 July, 2017;
originally announced July 2017.
-
A splitter theorem for 3-connected 2-polymatroids
Authors:
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
Seymour's Splitter Theorem is a basic inductive tool for dealing with $3$-connected matroids. This paper proves a generalization of that theorem for the class of $2$-polymatroids. Such structures include matroids, and they model both sets of points and lines in a projective space and sets of edges in a graph. A series compression in such a structure is an analogue of contracting an edge of a graph…
▽ More
Seymour's Splitter Theorem is a basic inductive tool for dealing with $3$-connected matroids. This paper proves a generalization of that theorem for the class of $2$-polymatroids. Such structures include matroids, and they model both sets of points and lines in a projective space and sets of edges in a graph. A series compression in such a structure is an analogue of contracting an edge of a graph that is in a series pair. A $2$-polymatroid $N$ is an s-minor of a $2$-polymatroid $M$ if $N$ can be obtained from $M$ by a sequence of contractions, series compressions, and dual-contractions, where the last are modified deletions. The main result proves that if $M$ and $N$ are $3$-connected $2$-polymatroids such that $N$ is an s-minor of $M$, then $M$ has a $3$-connected s-minor $M'$ that has an s-minor isomorphic to $N$ and has $|E(M)| - 1$ elements unless $M$ is a whirl or the cycle matroid of a wheel. In the exceptional case, such an $M'$ can be found with $|E(M)| - 2$ elements.
△ Less
Submitted 24 June, 2017;
originally announced June 2017.
-
On the information content of discrete phylogenetic characters
Authors:
Magnus Bordewich,
Ina Maria Deutschmann,
Mareike Fischer,
Elisa Kasbohm,
Charles Semple,
Mike Steel
Abstract:
Phylogenetic inference aims to reconstruct the evolutionary relationships of different species based on genetic (or other) data. Discrete characters are a particular type of data, which contain information on how the species should be grouped together. However, it has long been known that some characters contain more information than others. For instance, a character that assigns the same state to…
▽ More
Phylogenetic inference aims to reconstruct the evolutionary relationships of different species based on genetic (or other) data. Discrete characters are a particular type of data, which contain information on how the species should be grouped together. However, it has long been known that some characters contain more information than others. For instance, a character that assigns the same state to each species groups all of them together and so provides no insight into the relationships of the species considered. At the other extreme, a character that assigns a different state to each species also conveys no phylogenetic signal. In this manuscript, we study a natural combinatorial measure of the information content of an individual character and analyse properties of characters that provide the maximum phylogenetic information, particularly, the number of states such a character uses and how the different states have to be distributed among the species or taxa of the phylogenetic tree.
△ Less
Submitted 19 December, 2017; v1 submitted 14 March, 2017;
originally announced March 2017.
-
New Characterisations of Tree-Based Networks and Proximity Measures
Authors:
Andrew Francis,
Charles Semple,
Mike Steel
Abstract:
Phylogenetic networks are a type of directed acyclic graph that represent how a set $X$ of present-day species are descended from a common ancestor by processes of speciation and reticulate evolution. In the absence of reticulate evolution, such networks are simply phylogenetic (evolutionary) trees. Moreover, phylogenetic networks that are not trees can sometimes be represented as phylogenetic tre…
▽ More
Phylogenetic networks are a type of directed acyclic graph that represent how a set $X$ of present-day species are descended from a common ancestor by processes of speciation and reticulate evolution. In the absence of reticulate evolution, such networks are simply phylogenetic (evolutionary) trees. Moreover, phylogenetic networks that are not trees can sometimes be represented as phylogenetic trees with additional directed edges placed between their edges. Such networks are called {\em tree based}, and the class of phylogenetic networks that are tree based has recently been characterised. In this paper, we establish a number of new characterisations of tree-based networks in terms of path partitions and antichains (in the spirit of Dilworth's theorem), as well as via matchings in a bipartite graph. We also show that a temporal network is tree based if and only if it satisfies an antichain-to-leaf condition. In the second part of the paper, we define three indices that measure the extent to which an arbitrary phylogenetic network deviates from being tree based. We describe how these three indices can be described exactly and computed efficiently using classical results concerning maximum-sized matchings in bipartite graphs.
△ Less
Submitted 10 August, 2017; v1 submitted 13 November, 2016;
originally announced November 2016.
-
Excluded minors are almost fragile
Authors:
Nick Brettell,
Ben Clark,
James Oxley,
Charles Semple,
Geoff Whittle
Abstract:
Let $M$ be an excluded minor for the class of $\mathbb{P}$-representable matroids for some partial field $\mathbb P$, and let $N$ be a $3$-connected strong $\mathbb{P}$-stabilizer that is non-binary. We prove that either $M$ is bounded relative to $N$, or, up to replacing $M$ by a $Δ$-$Y$-equivalent excluded minor, we can choose a pair of elements $\{a,b\}$ such that either $M\backslash \{a,b\}$ i…
▽ More
Let $M$ be an excluded minor for the class of $\mathbb{P}$-representable matroids for some partial field $\mathbb P$, and let $N$ be a $3$-connected strong $\mathbb{P}$-stabilizer that is non-binary. We prove that either $M$ is bounded relative to $N$, or, up to replacing $M$ by a $Δ$-$Y$-equivalent excluded minor, we can choose a pair of elements $\{a,b\}$ such that either $M\backslash \{a,b\}$ is $N$-fragile, or $M^* \backslash \{a,b\}$ is $N^*$-fragile.
△ Less
Submitted 19 September, 2018; v1 submitted 31 March, 2016;
originally announced March 2016.
-
Reticulation-visible networks
Authors:
Magnus Bordewich,
Charles Semple
Abstract:
Let $X$ be a finite set, $\mathcal N$ be a reticulation-visible network on $X$, and $\mathcal T$ be a rooted binary phylogenetic tree. We show that there is a polynomial-time algorithm for deciding whether or not $\mathcal N$ displays $\mathcal T$. Furthermore, for all $|X|\ge 1$, we show that $\mathcal N$ has at most $8|X|-7$ vertices in total and at most $3|X|-3$ reticulation vertices, and that…
▽ More
Let $X$ be a finite set, $\mathcal N$ be a reticulation-visible network on $X$, and $\mathcal T$ be a rooted binary phylogenetic tree. We show that there is a polynomial-time algorithm for deciding whether or not $\mathcal N$ displays $\mathcal T$. Furthermore, for all $|X|\ge 1$, we show that $\mathcal N$ has at most $8|X|-7$ vertices in total and at most $3|X|-3$ reticulation vertices, and that these upper bounds are sharp.
△ Less
Submitted 24 June, 2017; v1 submitted 21 August, 2015;
originally announced August 2015.
-
On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks
Authors:
Christopher Bryant,
Mareike Fischer,
Simone Linz,
Charles Semple
Abstract:
Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic pat…
▽ More
Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic patterns of relationships cannot be represented by a single phylogenetic tree. This is why phylogenetic networks, which can display such events, are becoming of more and more interest in phylogenetic research. It is therefore necessary to extend concepts like maximum parsimony from phylogenetic trees to networks. Several suggestions for possible extensions can be found in recent literature, for instance the softwired and the hardwired parsimony concepts. In this paper, we analyze the so-called big parsimony problem under these two concepts, i.e. we investigate maximum parsimonious networks and analyze their properties. In particular, we show that finding a softwired maximum parsimony network is possible in polynomial time. We also show that the set of maximum parsimony networks for the hardwired definition always contains at least one phylogenetic tree. Lastly, we investigate some parallels of parsimony to different likelihood concepts on phylogenetic networks.
△ Less
Submitted 24 October, 2016; v1 submitted 26 May, 2015;
originally announced May 2015.
-
Representing Partitions on Trees
Authors:
Katharina T. Huber,
Vincent Moulton,
Charles Semple,
Taoyang Wu
Abstract:
In evolutionary biology, biologists often face the problem of constructing a phylogenetic tree on a set $X$ of species from a multiset $Π$ of partitions corresponding to various attributes of these species. One approach that is used to solve this problem is to try instead to associate a tree (or even a network) to the multiset $Σ_Π$ consisting of all those bipartitions $\{A,X-A\}$ with $A$ a part…
▽ More
In evolutionary biology, biologists often face the problem of constructing a phylogenetic tree on a set $X$ of species from a multiset $Π$ of partitions corresponding to various attributes of these species. One approach that is used to solve this problem is to try instead to associate a tree (or even a network) to the multiset $Σ_Π$ consisting of all those bipartitions $\{A,X-A\}$ with $A$ a part of some partition in $Π$. The rational behind this approach is that a phylogenetic tree with leaf set $X$ can be uniquely represented by the set of bipartitions of $X$ induced by its edges. Motivated by these considerations, given a multiset $Σ$ of bipartitions corresponding to a phylogenetic tree on $X$, in this paper we introduce and study the set $P(Σ)$ consisting of those multisets of partitions $Π$ of $X$ with $Σ_Π=Σ$. More specifically, we characterize when $P(Σ)$ is non-empty, and also identify some partitions in $P(Σ)$ that are of maximum and minimum size. We also show that it is NP-complete to decide when $P(Σ)$ is non-empty in case $Σ$ is an arbitrary multiset of bipartitions of $X$. Ultimately, we hope that by gaining a better understanding of the mapping that takes an arbitrary partition system $Π$ to the multiset $Σ_Π$, we will obtain new insights into the use of median networks and, more generally, split-networks to visualize sets of partitions.
△ Less
Submitted 9 May, 2014;
originally announced May 2014.
-
Locating a tree in a phylogenetic network
Authors:
Leo van Iersel,
Charles Semple,
Mike Steel
Abstract:
Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in…
▽ More
Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in the network. Both problems are known to be NP-complete in general. In this article, we consider the restriction of these problems to several well-studied classes of phylogenetic networks. We show that Tree Containment is polynomial-time solvable for normal networks, for binary tree-child networks, and for level-$k$ networks. On the other hand, we show that, even for tree-sibling, time-consistent, regular networks, both Tree Containment and Cluster Containment remain NP-complete.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.
-
Quantifying the Extent of Lateral Gene Transfer Required to Avert a `Genome of Eden'
Authors:
Leo van Iersel,
Charles Semple,
Mike Steel
Abstract:
The complex pattern of presence and absence of many genes across different species provides tantalising clues as to how genes evolved through the processes of gene genesis, gene loss and lateral gene transfer (LGT). The extent of LGT, particularly in prokaryotes, and its implications for creating a `network of life' rather than a `tree of life' is controversial. In this paper, we formally model…
▽ More
The complex pattern of presence and absence of many genes across different species provides tantalising clues as to how genes evolved through the processes of gene genesis, gene loss and lateral gene transfer (LGT). The extent of LGT, particularly in prokaryotes, and its implications for creating a `network of life' rather than a `tree of life' is controversial. In this paper, we formally model the problem of quantifying LGT, and provide exact mathematical bounds, and new computational results. In particular, we investigate the computational complexity of quantifying the extent of LGT under the simple models of gene genesis, loss and transfer on which a recent heuristic analysis of biological data relied. Our approach takes advantage of a relationship between LGT optimization and graph-theoretical concepts such as tree width and network flow.
△ Less
Submitted 5 November, 2009;
originally announced November 2009.