-
The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees
Authors:
Elahe Khayatian,
Gabriel Valiente,
Louxin Zhang
Abstract:
Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms underlying cancer. Since the modeling of tumor cell evolution employs labeled trees, researchers are motivated to develop different methods to assess and compare mutation trees and other labeled trees. While the Robinson-Foulds distance is a widely utilized metric for comparing phylogenetic tree…
▽ More
Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms underlying cancer. Since the modeling of tumor cell evolution employs labeled trees, researchers are motivated to develop different methods to assess and compare mutation trees and other labeled trees. While the Robinson-Foulds distance is a widely utilized metric for comparing phylogenetic trees, its applicability to labeled trees reveals certain limitations. This paper introduces the $k$-Robinson-Foulds dissimilarity measures, tailored to address the challenges of labeled tree comparison. The Robinson-Foulds distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the Robinson-Foulds distance, the k-Robinson-Foulds is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-Robinson-Foulds dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.
△ Less
Submitted 16 November, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
AligNet: Alignment of Protein-Protein Interaction Networks
Authors:
Ricardo Alberich,
Adrià Alcala,
Mercè Llabrés,
Francesc Rosselló,
Gabriel Valiente
Abstract:
One of the most difficult problems difficult problem in systems biology is to discover protein-protein interactions as well as their associated functions. The analysis and alignment of protein-protein interaction networks (PPIN), which are the standard model to describe protein-protein interactions, has become a key ingredient to obtain functional orthologs as well as evolutionary conserved pathwa…
▽ More
One of the most difficult problems difficult problem in systems biology is to discover protein-protein interactions as well as their associated functions. The analysis and alignment of protein-protein interaction networks (PPIN), which are the standard model to describe protein-protein interactions, has become a key ingredient to obtain functional orthologs as well as evolutionary conserved pathways and protein complexes. Several methods have been proposed to solve the PPIN alignment problem, aimed to match conserved subnetworks or functionally related proteins. However, the right balance between considering network topology and biological information is one of the most difficult and key points in any PPIN alignment algorithm which, unfortunately, remains unsolved. Therefore, in this work, we propose AligNet, a new method and software tool for the pairwise global alignment of PPIN that produces biologically meaningful alignments and more efficient computations than state-of-the-art methods and tools, by achieving a good balance between structural matching and protein function conservation as well as reasonable running times.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
A balance index for phylogenetic trees based on rooted quartets
Authors:
Tomás M. Coronado,
Arnau Mir,
Francesc Rosselló,
Gabriel Valiente
Abstract:
We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurc…
▽ More
We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford's $α$-model and Aldous' $β$-model and on arbitrary trees under the $α$-$γ$-model.
△ Less
Submitted 22 March, 2019; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Comparison of Galled Trees
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial time algorithms for their reconstruction. In this paper we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and he…
▽ More
Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial time algorithms for their reconstruction. In this paper we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence when they can be safely used to evaluate galled tree reconstruction methods.
△ Less
Submitted 5 June, 2009;
originally announced June 2009.
-
All that Glisters is not Galled
Authors:
Francesc Rossello,
Gabriel Valiente
Abstract:
Galled trees, evolutionary networks with isolated reticulation cycles, have appeared under several slightly different definitions in the literature. In this paper we establish the actual relationships between the main four such alternative definitions: namely, the original galled trees, level-1 networks, nested networks with nesting depth 1, and evolutionary networks with arc-disjoint reticulati…
▽ More
Galled trees, evolutionary networks with isolated reticulation cycles, have appeared under several slightly different definitions in the literature. In this paper we establish the actual relationships between the main four such alternative definitions: namely, the original galled trees, level-1 networks, nested networks with nesting depth 1, and evolutionary networks with arc-disjoint reticulation cycles.
△ Less
Submitted 16 April, 2009;
originally announced April 2009.
-
The comparison of tree-sibling time consistent phylogenetic networks is graph isomorphism-complete
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
In a previous work, we gave a metric on the class of semibinary tree-sibling time consistent phylogenetic networks that is computable in polynomial time; in particular, the problem of deciding if two networks of this kind are isomorphic is in P. In this paper, we show that if we remove the semibinarity condition above, then the problem becomes much harder. More precisely, we proof that the isomo…
▽ More
In a previous work, we gave a metric on the class of semibinary tree-sibling time consistent phylogenetic networks that is computable in polynomial time; in particular, the problem of deciding if two networks of this kind are isomorphic is in P. In this paper, we show that if we remove the semibinarity condition above, then the problem becomes much harder. More precisely, we proof that the isomorphism problem for generic tree-sibling time consistent phylogenetic networks is polynomially equivalent to the graph isomorphism problem. Since the latter is believed to be neither in P nor NP-complete, the chances are that it is impossible to define a metric on the class of all tree-sibling time consistent phylogenetic networks that can be computed in polynomial time.
△ Less
Submitted 26 February, 2009;
originally announced February 2009.
-
On Nakhleh's latest metric for phylogenetic networks
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
We prove that Nakhleh's latest dissimilarity measure for phylogenetic networks is a metric on the classes of tree-child phylogenetic networks, of semi-binary time consistent tree-sibling phylogenetic networks, and of multi-labeled phylogenetic trees. We also prove that it distinguishes phylogenetic networks with different reduced versions. In this way, it becomes the dissimilarity measure for ph…
▽ More
We prove that Nakhleh's latest dissimilarity measure for phylogenetic networks is a metric on the classes of tree-child phylogenetic networks, of semi-binary time consistent tree-sibling phylogenetic networks, and of multi-labeled phylogenetic trees. We also prove that it distinguishes phylogenetic networks with different reduced versions. In this way, it becomes the dissimilarity measure for phylogenetic networks with the strongest separation power available so far.
△ Less
Submitted 31 August, 2008;
originally announced September 2008.
-
Path lengths in tree-child time consistent hybridization networks
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
Hybridization networks are representations of evolutionary histories that allow for the inclusion of reticulate events like recombinations, hybridizations, or lateral gene transfers. The recent growth in the number of hybridization network reconstruction algorithms has led to an increasing interest in the definition of metrics for their comparison that can be used to assess the accuracy or robus…
▽ More
Hybridization networks are representations of evolutionary histories that allow for the inclusion of reticulate events like recombinations, hybridizations, or lateral gene transfers. The recent growth in the number of hybridization network reconstruction algorithms has led to an increasing interest in the definition of metrics for their comparison that can be used to assess the accuracy or robustness of these methods. In this paper we establish some basic results that make it possible the generalization to tree-child time consistent (TCTC) hybridization networks of some of the oldest known metrics for phylogenetic trees: those based on the comparison of the vectors of path lengths between leaves. More specifically, we associate to each hybridization network a suitably defined vector of `splitted' path lengths between its leaves, and we prove that if two TCTC hybridization networks have the same such vectors, then they must be isomorphic. Thus, comparing these vectors by means of a metric for real-valued vectors defines a metric for TCTC hybridization networks. We also consider the case of fully resolved hybridization networks, where we prove that simpler, `non-splitted' vectors can be used.
△ Less
Submitted 1 July, 2008;
originally announced July 2008.
-
Nodal distances for rooted phylogenetic trees
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. But, as far as rooted phylogenetic trees goes, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this cla…
▽ More
Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. But, as far as rooted phylogenetic trees goes, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class by comparing these matrices by means of metrics in spaces of real-valued $n\times n$ matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using $L^p$ metrics on these spaces of matrices.
△ Less
Submitted 12 June, 2008;
originally announced June 2008.
-
A Distance Metric for Tree-Sibling Time Consistent Phylogenetic Networks
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to de…
▽ More
The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to define a sound distance measure on the class of all phylogenetic networks have failed so far. Thus, the only practical solutions have been either the use of rough estimates of similarity (based on comparison of the trees embedded in the networks), or narrowing the class of phylogenetic networks to a certain class where such a distance is known and can be efficiently computed. The first approach has the problem that one may identify two networks as equivalent, when they are not; the second one has the drawback that there may not exist algorithms to reconstruct such networks from biological sequences.
We present in this paper a distance measure on the class of tree-sibling time consistent phylogenetic networks, which generalize tree-child time consistent phylogenetic networks, and thus also galled-trees. The practical interest of this distance measure is twofold: it can be computed in polynomial time by means of simple algorithms, and there also exist polynomial-time algorithms for reconstructing networks of this class from DNA sequence data.
The Perl package Bio::PhyloNetwork, included in the BioPerl bundle, implements many algorithms on phylogenetic networks, including the computation of the distance presented in this paper.
△ Less
Submitted 19 March, 2008;
originally announced March 2008.
-
Two metrics for general phylogenetic networks
Authors:
Gabriel Cardona,
Merce Llabres,
Francesc Rossello,
Gabriel Valiente
Abstract:
We prove that Nakhleh's latest dissimilarity measure for phylogenetic networks separates distinguishable phylogenetic networks, and that a slight modification of it provides a true distance on the class of all phylogenetic networks.
We prove that Nakhleh's latest dissimilarity measure for phylogenetic networks separates distinguishable phylogenetic networks, and that a slight modification of it provides a true distance on the class of all phylogenetic networks.
△ Less
Submitted 15 January, 2008;
originally announced January 2008.
-
A Perl Package and an Alignment Tool for Phylogenetic Networks
Authors:
Gabriel Cardona,
Francesc Rossello,
Gabriel Valiente
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, like recombination between genes, hybridization between lineages, and lateral gene transfer. While most phylogenetics tools implement a wide range of algorithms on phylogenetic trees, there exist only a few applications to work with phylogeneti…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, like recombination between genes, hybridization between lineages, and lateral gene transfer. While most phylogenetics tools implement a wide range of algorithms on phylogenetic trees, there exist only a few applications to work with phylogenetic networks, and there are no open-source libraries either.
In order to improve this situation, we have developed a Perl package that relies on the BioPerl bundle and implements many algorithms on phylogenetic networks. We have also developed a Java applet that makes use of the aforementioned Perl package and allows the user to make simple experiments with phylogenetic networks without having to develop a program or Perl script by herself.
The Perl package has been accepted as part of the BioPerl bundle. It can be downloaded from http://dmi.uib.es/~gcardona/BioInfo/Bio-PhyloNetwork.tgz. The web-based application is available at http://dmi.uib.es/~gcardona/BioInfo/. The Perl package includes full documentation of all its features.
△ Less
Submitted 22 November, 2007;
originally announced November 2007.
-
Comparison of Tree-Child Phylogenetic Networks
Authors:
Gabriel Cardona,
Francesc Rossello,
Gabriel Valiente
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In this paper, we present and study a new class of phylogenetic networks, called tree-child phylogenetic networks, where every non-extant species has some descendant through mutation. We provide an inj…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In this paper, we present and study a new class of phylogenetic networks, called tree-child phylogenetic networks, where every non-extant species has some descendant through mutation. We provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors, and we use this representation to define a distance on this class and to give an alignment method for pairs of these networks. To the best of our knowledge, they are respectively the first true distance and the first alignment method defined on a meaningful class of phylogenetic networks strictly extending the class of phylogenetic trees. Simple, polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks, and for aligning a pair of tree-child phylogenetic networks, are provided, and they have been implemented as a Perl package and a Java applet, and they are available at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance
△ Less
Submitted 27 August, 2007;
originally announced August 2007.
-
Tripartitions do not always discriminate phylogenetic networks
Authors:
Gabriel Cardona,
Francesc Rossello,
Gabriel Valiente
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In a recent series of papers devoted to the study of reconstructibility of phylogenetic networks, Moret, Nakhleh, Warnow and collaborators introduced the so-called {tripartition metric for phylogenetic…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In a recent series of papers devoted to the study of reconstructibility of phylogenetic networks, Moret, Nakhleh, Warnow and collaborators introduced the so-called {tripartition metric for phylogenetic networks. In this paper we show that, in fact, this tripartition metric does not satisfy the separation axiom of distances (zero distance means isomorphism, or, in a more relaxed version, zero distance means indistinguishability in some specific sense) in any of the subclasses of phylogenetic networks where it is claimed to do so. We also present a subclass of phylogenetic networks whose members can be singled out by means of their sets of tripartitions (or even clusters), and hence where the latter can be used to define a meaningful metric.
△ Less
Submitted 16 July, 2007;
originally announced July 2007.
-
The transposition distance for phylogenetic trees
Authors:
Francesc Rossello,
Gabriel Valiente
Abstract:
The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phyl…
▽ More
The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phylogenetic trees. In this paper, we generalize the transposition distance from fully resolved to arbitrary phylogenetic trees, through a construction that involves an embedding of the set of phylogenetic trees with a fixed number of labeled leaves into a symmetric group and a generalization of Reidys-Stadler's involution metric for RNA contact structures. We also present simple linear-time algorithms for computing it.
△ Less
Submitted 18 April, 2006;
originally announced April 2006.
-
On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa
Authors:
Merce Llabres,
Jairo Rocha,
Francesc Rossello,
Gabriel Valiente
Abstract:
Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was introduced by Semple et al in 2004. In this paper we analyze in deta…
▽ More
Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was introduced by Semple et al in 2004. In this paper we analyze in detail the meaning of this compatibility from the points of view of the local structure of the trees, of the existence of embeddings into a common supertree, and of the joint properties of their cluster representations. Our analysis leads to a very simple polynomial-time algorithm for testing this compatibility, which we have implemented and is freely available for download from the BioPerl collection of Perl modules for computational biology.
△ Less
Submitted 31 May, 2005;
originally announced May 2005.