Revealing missing parts of the interactome
Authors:
Ryan W. Solava,
Tijana Milenković
Abstract:
Protein interaction networks (PINs) are often used to "learn" new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As…
▽ More
Protein interaction networks (PINs) are often used to "learn" new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we had to introduce novel sensitive LP measures that overcome the limitations of the existing methods.
We systematically evaluate the new and existing LP measures by introducing "synthetic" noise to PINs and measuring how well the different measures reconstruct the original PINs. Our main findings are: 1) LP measures that favor nodes which are both "topologically similar" and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) our new LP measures are superior to the existing measures. After evaluating the different methods, we use them to de-noise PINs. Importantly, we manage to improve biological correctness of the PINs by de-noising them, with respect to "enrichment" of the predicted interactions in Gene Ontology terms. Furthermore, we validate a statistically significant portion of the predicted interactions in independent, external PIN data sources.
Software executables are freely available upon request.
△ Less
Submitted 12 July, 2013;
originally announced July 2013.
Identifying edge clusters in networks via edge graphlet degree vectors (edge-GDVs) and edge-GDV-similarities
Authors:
Ryan W. Solava,
Ryan P. Michaels,
Tijana Milenkovic
Abstract:
Inference of new biological knowledge, e.g., prediction of protein function, from protein-protein interaction (PPI) networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and predict protein function from the clusters. Traditionally, network research has focused on clustering of nodes. However, why f…
▽ More
Inference of new biological knowledge, e.g., prediction of protein function, from protein-protein interaction (PPI) networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and predict protein function from the clusters. Traditionally, network research has focused on clustering of nodes. However, why favor nodes over edges, when clustering of edges may be preferred? For example, nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic "signatures" throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent. Hence, we design a sensitive measure of the "topological similarity" of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker's yeast PPI networks, outperforming existing node and edge clustering approaches.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.