Search | arXiv e-print repository

doi 10.1093/bioinformatics/bts707

Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers

Authors: Davide Albanese, Michele Filosi, Roberto Visintainer, Samantha Riccadonna, Giuseppe Jurman, Cesare Furlanello

Abstract: We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution red… ▽ More We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties, and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large (n=1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. Availability and Implementation: Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX). △ Less

Submitted 10 December, 2012; v1 submitted 21 August, 2012; originally announced August 2012.

Comments: Bioinformatics 2012, in press

arXiv:1202.6548 [pdf, ps, other]

mlpy: Machine Learning Python

Authors: Davide Albanese, Roberto Visintainer, Stefano Merler, Samantha Riccadonna, Giuseppe Jurman, Cesare Furlanello

Abstract: mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Pytho… ▽ More mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu. △ Less

Submitted 1 March, 2012; v1 submitted 29 February, 2012; originally announced February 2012.

Comments: Corrected a few typos; rephrased two sentences in the Overview section

arXiv:1004.1341 [pdf, ps, other]

doi 10.1371/journal.pone.0036540

Algebraic Comparison of Partial Lists in Bioinformatics

Authors: Giuseppe Jurman, Samantha Riccadonna, Roberto Visintainer, Cesare Furlanello

Abstract: The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained.… ▽ More The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset. △ Less

Submitted 8 April, 2010; originally announced April 2010.

Journal ref: PLoS ONE 7(5): e36540 (2012)

Showing 1–3 of 3 results for author: Riccadonna, S