-
Which Phylogenetic Networks are Level-k Networks with Additional Arcs? Structure and Algorithms
Authors:
Takatora Suzuki,
Momoko Hayamizu
Abstract:
Reticulate evolution gives rise to complex phylogenetic networks, making their interpretation challenging. A typical approach is to extract trees within such networks. Since Francis and Steel's seminal paper, "Which Phylogenetic Networks are Merely Trees with Additional Arcs?" (2015), tree-based phylogenetic networks and their support trees (spanning trees with the same root and leaf set as a give…
▽ More
Reticulate evolution gives rise to complex phylogenetic networks, making their interpretation challenging. A typical approach is to extract trees within such networks. Since Francis and Steel's seminal paper, "Which Phylogenetic Networks are Merely Trees with Additional Arcs?" (2015), tree-based phylogenetic networks and their support trees (spanning trees with the same root and leaf set as a given network) have been extensively studied. However, not all phylogenetic networks are tree-based, and for the study of reticulate evolution, it is often more biologically relevant to identify support networks rather than trees. This study generalizes Hayamizu's structure theorem for rooted binary phylogenetic networks, which yielded optimal algorithms for various computational problems on support trees, to extend the theoretical framework for support trees to support networks. This allows us to obtain a direct-product characterization of each of three sets: all, minimal, and minimum support networks, for a given network. Each characterization yields optimal algorithms for counting and generating the support networks of each type. Applications include a linear-time algorithm for finding a support network with the fewest reticulations (i.e., the minimum tier). We also provide exact and heuristic algorithms for finding a support network with the minimum level, both running in exponential time but practical across a reasonably wide range of reticulation numbers.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Rooted Almost-binary Phylogenetic Networks for which the Maximum Covering Subtree Problem is Solvable in Linear Time
Authors:
Takatora Suzuki,
Han Guo,
Momoko Hayamizu
Abstract:
Phylogenetic networks are a flexible model of evolution that can represent reticulate evolution and handle complex data. Tree-based networks, which are phylogenetic networks that have a spanning tree with the same root and leaf-set as the network itself, have been well studied. However, not all networks are tree-based. Francis-Semple-Steel (2018) thus introduced several indices to measure the devi…
▽ More
Phylogenetic networks are a flexible model of evolution that can represent reticulate evolution and handle complex data. Tree-based networks, which are phylogenetic networks that have a spanning tree with the same root and leaf-set as the network itself, have been well studied. However, not all networks are tree-based. Francis-Semple-Steel (2018) thus introduced several indices to measure the deviation of rooted binary phylogenetic networks $N$ from being tree-based, such as the minimum number $δ^\ast(N)$ of additional leaves needed to make $N$ tree-based, and the minimum difference $η^\ast(N)$ between the number of vertices of $N$ and the number of vertices of a subtree of $N$ that shares the root and leaf set with $N$. Hayamizu (2021) has established a canonical decomposition of almost-binary phylogenetic networks of $N$, called the maximal zig-zag trail decomposition, which has many implications including a linear time algorithm for computing $δ^\ast(N)$. The Maximum Covering Subtree Problem (MCSP) is the problem of computing $η^\ast(N)$, and Davidov et al. (2022) showed that this can be solved in polynomial time (in cubic time when $N$ is binary) by an algorithm for the minimum cost flow problem. In this paper, under the assumption that $N$ is almost-binary (i.e. each internal vertex has in-degree and out-degree at most two), we show that $δ^\ast(N)\leq η^\ast (N)$ holds, which is tight, and give a characterisation of such phylogenetic networks $N$ that satisfy $δ^\ast(N)=η^\ast(N)$. Our approach uses the canonical decomposition of $N$ and focuses on how the maximal W-fences (i.e. the forbidden subgraphs of tree-based networks) are connected to maximal M-fences in the network $N$. Our results introduce a new class of phylogenetic networks for which MCSP can be solved in linear time, which can be seen as a generalisation of tree-based networks.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Orienting undirected phylogenetic networks to tree-child network
Authors:
Shunsuke Maeda,
Yusuke Kaneko,
Hideaki Muramatsu,
Yukihiro Murakami,
Momoko Hayamizu
Abstract:
Phylogenetic networks are used to represent the evolutionary history of species. They are versatile when compared to traditional phylogenetic trees, as they capture more complex evolutionary events such as hybridization and horizontal gene transfer. Distance-based methods such as the Neighbor-Net algorithm are widely used to compute phylogenetic networks from data. However, the output is necessari…
▽ More
Phylogenetic networks are used to represent the evolutionary history of species. They are versatile when compared to traditional phylogenetic trees, as they capture more complex evolutionary events such as hybridization and horizontal gene transfer. Distance-based methods such as the Neighbor-Net algorithm are widely used to compute phylogenetic networks from data. However, the output is necessarily an undirected graph, posing a great challenge to deduce the direction of genetic flow in order to infer the true evolutionary history. Recently, Huber et al. investigated two different computational problems relevant to orienting undirected phylogenetic networks into directed ones. In this paper, we consider the problem of orienting an undirected binary network into a tree-child network. We give some necessary conditions for determining the tree-child orientability, such as a tight upper bound on the size of tree-child orientable graphs, as well as many interesting examples. In addition, we introduce new families of undirected phylogenetic networks, the jellyfish graphs and ladder graphs, that are orientable but not tree-child orientable. We also prove that any ladder graph can be made tree-child orientable by adding extra leaves, and describe a simple algorithm for orienting a ladder graph to a tree-child network with the minimum number of extra leaves. We pose many open problems as well.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Recognizing and realizing cactus metrics
Authors:
Momoko Hayamizu,
Katharina T. Huber,
Vincent Moulton,
Yukihiro Murakami
Abstract:
The problem of realizing finite metric spaces in terms of weighted graphs has many applications. For example, the mathematical and computational properties of metrics that can be realized by trees have been well-studied and such research has laid the foundation of the reconstruction of phylogenetic trees from evolutionary distances. However, as trees may be too restrictive to accurately represent…
▽ More
The problem of realizing finite metric spaces in terms of weighted graphs has many applications. For example, the mathematical and computational properties of metrics that can be realized by trees have been well-studied and such research has laid the foundation of the reconstruction of phylogenetic trees from evolutionary distances. However, as trees may be too restrictive to accurately represent real-world data or phenomena, it is important to understand the relationship between more general graphs and distances. In this paper, we introduce a new type of metric called a cactus metric, that is, a metric that can be realized by a cactus graph. We show that, just as with tree metrics, a cactus metric has a unique optimal realization. In addition, we describe an algorithm that can recognize whether or not a metric is a cactus metric and, if so, compute its optimal realization in $O(n^3)$ time, where $n$ is the number of points in the space.
△ Less
Submitted 7 February, 2020; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Ranking top-k trees in tree-based phylogenetic networks
Authors:
Momoko Hayamizu,
Kazuhisa Makino
Abstract:
'Tree-based' phylogenetic networks proposed by Francis and Steel have attracted much attention of theoretical biologists in the last few years. At the heart of the definitions of tree-based phylogenetic networks is the notion of 'support trees', about which there are numerous algorithmic problems that are important for evolutionary data analysis. Recently, Hayamizu (arXiv:1811.05849 [math.CO]) pro…
▽ More
'Tree-based' phylogenetic networks proposed by Francis and Steel have attracted much attention of theoretical biologists in the last few years. At the heart of the definitions of tree-based phylogenetic networks is the notion of 'support trees', about which there are numerous algorithmic problems that are important for evolutionary data analysis. Recently, Hayamizu (arXiv:1811.05849 [math.CO]) proved a structure theorem for tree-based phylogenetic networks and obtained linear-time and linear-delay algorithms for many basic problems on support trees, such as counting, optimisation, and enumeration. In the present paper, we consider the following fundamental problem in statistical data analysis: given a tree-based phylogenetic network $N$ whose arcs are associated with probability, create the top-$k$ support tree ranking for $N$ by their likelihood values. We provide a linear-delay (and hence optimal) algorithm for the problem and thus reveal the interesting property of tree-based phylogenetic networks that ranking top-$k$ support trees is as computationally easy as picking $k$ arbitrary support trees.
△ Less
Submitted 28 April, 2019;
originally announced April 2019.
-
A structure theorem for rooted binary phylogenetic networks and its implications for tree-based networks
Authors:
Momoko Hayamizu
Abstract:
Attempting to recognize a tree inside a phylogenetic network is a fundamental undertaking in evolutionary analysis. In the last few years, therefore, tree-based phylogenetic networks, which are defined by a spanning tree called a subdivision tree, have attracted attention of theoretical biologists. However, the application of such networks is still not easy, due to many problems whose time complex…
▽ More
Attempting to recognize a tree inside a phylogenetic network is a fundamental undertaking in evolutionary analysis. In the last few years, therefore, tree-based phylogenetic networks, which are defined by a spanning tree called a subdivision tree, have attracted attention of theoretical biologists. However, the application of such networks is still not easy, due to many problems whose time complexities are not clearly understood. In this paper, we provide a general framework for solving those various old or new problems from a coherent perspective, rather than analyzing the complexity of each individual problem or developing an algorithm one by one. More precisely, we establish a structure theorem that gives a way to canonically decompose any rooted binary phylogenetic network N into maximal zig-zag trails that are uniquely determined, and use it to characterize the set of subdivision trees of N in the form of a direct product, in a way reminiscent of the structure theorem for finitely generated Abelian groups. From the main results, we derive a series of linear time and linear time delay algorithms for the following problems: given a rooted binary phylogenetic network N, 1) determine whether or not N has a subdivision tree and find one if there exists any; 2) measure the deviation of N from being tree-based; 3) compute the number of subdivision trees of N; 4) list all subdivision trees of N; and 5) find a subdivision tree to maximize or minimize a prescribed objective function. All algorithms proposed here are optimal in terms of time complexity. Our results do not only imply and unify various known results, but also answer many open questions and moreover enable novel applications, such as the estimation of a maximum likelihood tree underlying a tree-based network. The results and algorithms in this paper still hold true for a special class of rooted non-binary phylogenetic networks.
△ Less
Submitted 27 September, 2020; v1 submitted 14 November, 2018;
originally announced November 2018.
-
On the existence of infinitely many universal tree-based networks
Authors:
Momoko Hayamizu
Abstract:
A tree-based network on a set $X$ of $n$ leaves is said to be universal if any rooted binary phylogenetic tree on $X$ can be its base tree. Francis and Steel showed that there is a universal tree-based network on $X$ in the case of $n=3$, and asked whether such a network exists in general. We settle this problem by proving that there are infinitely many universal tree-based networks for any $n>1$.
A tree-based network on a set $X$ of $n$ leaves is said to be universal if any rooted binary phylogenetic tree on $X$ can be its base tree. Francis and Steel showed that there is a universal tree-based network on $X$ in the case of $n=3$, and asked whether such a network exists in general. We settle this problem by proving that there are infinitely many universal tree-based networks for any $n>1$.
△ Less
Submitted 21 March, 2016; v1 submitted 8 December, 2015;
originally announced December 2015.
-
On minimum spanning tree-like metric spaces
Authors:
Momoko Hayamizu,
Kenji Fukumizu
Abstract:
We attempt to shed new light on the notion of 'tree-like' metric spaces by focusing on an approach that does not use the four-point condition. Our key question is: Given metric space $M$ on $n$ points, when does a fully labelled positive-weighted tree $T$ exist on the same $n$ vertices that precisely realises $M$ using its shortest path metric? We prove that if a spanning tree representation, $T$,…
▽ More
We attempt to shed new light on the notion of 'tree-like' metric spaces by focusing on an approach that does not use the four-point condition. Our key question is: Given metric space $M$ on $n$ points, when does a fully labelled positive-weighted tree $T$ exist on the same $n$ vertices that precisely realises $M$ using its shortest path metric? We prove that if a spanning tree representation, $T$, of $M$ exists, then it is isomorphic to the unique minimum spanning tree in the weighted complete graph associated with $M$, and we introduce a fourth-point condition that is necessary and sufficient to ensure the existence of $T$ whenever each distance in $M$ is unique. In other words, a finite median graph, in which each geodesic distance is distinct, is simply a tree. Provided that the tie-breaking assumption holds, the fourth-point condition serves as a criterion for measuring the goodness-of-fit of the minimum spanning tree to $M$, i.e., the spanning tree-likeness of $M$. It is also possible to evaluate the spanning path-likeness of $M$. These quantities can be measured in $O(n^4)$ and $O(n^3)$ time, respectively.
△ Less
Submitted 5 December, 2015; v1 submitted 22 May, 2015;
originally announced May 2015.