Convolutional neural networks for structured omics: OmicsCNN and the OmicsConv layer
Authors:
Giuseppe Jurman,
Valerio Maggio,
Diego Fioravanti,
Ylenia Giarratano,
Isotta Landi,
Margherita Francescatto,
Claudio Agostinelli,
Marco Chierici,
Manlio De Domenico,
Cesare Furlanello
Abstract:
Convolutional Neural Networks (CNNs) are a popular deep learning architecture widely applied in different domains, in particular in classifying over images, for which the concept of convolution with a filter comes naturally. Unfortunately, the requirement of a distance (or, at least, of a neighbourhood function) in the input feature space has so far prevented its direct use on data types such as o…
▽ More
Convolutional Neural Networks (CNNs) are a popular deep learning architecture widely applied in different domains, in particular in classifying over images, for which the concept of convolution with a filter comes naturally. Unfortunately, the requirement of a distance (or, at least, of a neighbourhood function) in the input feature space has so far prevented its direct use on data types such as omics data. However, a number of omics data are metrizable, i.e., they can be endowed with a metric structure, enabling to adopt a convolutional based deep learning framework, e.g., for prediction. We propose a generalized solution for CNNs on omics data, implemented through a dedicated Keras layer. In particular, for metagenomics data, a metric can be derived from the patristic distance on the phylogenetic tree. For transcriptomics data, we combine Gene Ontology semantic similarity and gene co-expression to define a distance; the function is defined through a multilayer network where 3 layers are defined by the GO mutual semantic similarity while the fourth one by gene co-expression. As a general tool, feature distance on omics data is enabled by OmicsConv, a novel Keras layer, obtaining OmicsCNN, a dedicated deep learning framework. Here we demonstrate OmicsCNN on gut microbiota sequencing data, for Inflammatory Bowel Disease (IBD) 16S data, first on synthetic data and then a metagenomics collection of gut microbiota of 222 IBD patients.
△ Less
Submitted 16 October, 2017;
originally announced October 2017.
DGEclust: differential expression analysis of clustered count data
Authors:
Dimitrios V Vavoulis,
Margherita Francescatto,
Peter Heutink,
Julian Gough
Abstract:
Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported…
▽ More
Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported by the data) and uncertainty in parameter estimation. We show how this methodology can be utilised in differential expression analysis and we demonstrate its applicability on a more general class of problems and higher accuracy, when compared to popular alternatives. DGEclust is freely available at https://bitbucket.org/DimitrisVavoulis/dgeclust
△ Less
Submitted 4 May, 2014;
originally announced May 2014.