-
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Authors:
Yuxiang Jiang,
Tal Ronnen Oron,
Wyatt T Clark,
Asma R Bankapur,
Daniel D'Andrea,
Rosalba Lepore,
Christopher S Funk,
Indika Kahanda,
Karin M Verspoor,
Asa Ben-Hur,
Emily Koo,
Duncan Penfold-Brown,
Dennis Shasha,
Noah Youngs,
Richard Bonneau,
Alexandra Lin,
Sayed ME Sahraeian,
Pier Luigi Martelli,
Giuseppe Profiti,
Rita Casadio,
Renzhi Cao,
Zhaolong Zhong,
Jianlin Cheng,
Adrian Altenhoff,
Nives Skunca
, et al. (122 additional authors not shown)
Abstract:
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a…
▽ More
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction.
△ Less
Submitted 2 January, 2016;
originally announced January 2016.
-
Data Intensive High Energy Physics Analysis in a Distributed Cloud
Authors:
R. J. Sobie,
A. Agarwal,
M. Anderson,
P. Armstrong,
K. Fransham,
I. Gable,
D. Harris,
C. Leavett-Brown,
M. Paterson,
D. Penfold-Brown,
M. Vliet,
A. Charbonneau,
R. Impey,
W. Podaima
Abstract:
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We des…
▽ More
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We describe the process in which a user prepares an analysis virtual machine (VM) and submits batch jobs to a central scheduler. The system boots the user-specific VM on one of the IaaS clouds, runs the jobs and returns the output to the user. The user application accesses a central database for calibration data during the execution of the application. Similarly, the data is located in a central location and streamed by the running application. The system can easily run one hundred simultaneous jobs in an efficient manner and should scale to many hundreds and possibly thousands of user jobs.
△ Less
Submitted 1 January, 2011;
originally announced January 2011.
-
Cloud Scheduler: a resource manager for distributed compute clouds
Authors:
P. Armstrong,
A. Agarwal,
A. Bishop,
A. Charbonneau,
R. Desmarais,
K. Fransham,
N. Hill,
I. Gable,
S. Gaudet,
S. Goliath,
R. Impey,
C. Leavett-Brown,
J. Ouellete,
M. Paterson,
C. Pritchet,
D. Penfold-Brown,
W. Podaima,
D. Schade,
R. J. Sobie
Abstract:
The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual mach…
▽ More
The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual machine resource manager (Cloud Scheduler) for distributed compute clouds. Cloud Scheduler boots and manages the user-customized virtual machines in response to a user's job submission. We describe the motivation and design of the Cloud Scheduler and present results on its use on both science and commercial clouds.
△ Less
Submitted 30 June, 2010;
originally announced July 2010.