Showing 1–1 of 1 results for author: Nasso, S

Search v0.5.6 released 2020-02-24

arXiv:1002.3724 [pdf, other]

cs.CE cs.DS q-bio.QM

doi 10.1016/j.jprot.2010.02.006

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree

Authors: Sara Nasso, Francesco Silvestri, Francesco Tisiot, Barbara Di Camillo, Andrea Pietracaprina, Gianna Maria Toffolo

Abstract: As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput pr… ▽ More As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC-MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC-MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC-MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets. △ Less

Submitted 22 February, 2010; v1 submitted 19 February, 2010; originally announced February 2010.

Comments: Paper details: 10 pages, 7 figures, 2 tables. To be published in Journal of Proteomics. Source code available at http://www.dei.unipd.it/mzrtree

ACM Class: J.3; E.2

Journal ref: Journal of Proteomics 73(6) (2010) 1176-1182

Search v0.5.6 released 2020-02-24