-
The C-SHIFT algorithm for normalizing covariances
Authors:
Evgenia Chunikhina,
Paul Logan,
Yevgeniy Kovchegov,
Anatoly Yambartsev,
Debashis Mondal,
Andrey Morgun
Abstract:
Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons…
▽ More
Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method's advantage over the classical normalization techniques. Namely, the comparison is made with Rank, Quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods. We also evaluate the performance of C-SHIFT algorithm on real biological data.
△ Less
Submitted 5 August, 2021; v1 submitted 28 March, 2020;
originally announced March 2020.
-
Stochastic Ising model with plastic interactions
Authors:
Eugene Pechersky,
Guillem Via,
Anatoly Yambartsev
Abstract:
We propose a new model based on the Ising model with the aim to study synaptic plasticity phenomena in neural networks. It is today well established in biology that the synapses or connections between certain types of neurons are strengthened when the neurons are co-active, a form of the so called synaptic plasticity. Such mechanism is believed to mediate the formation and maintenance of memories.…
▽ More
We propose a new model based on the Ising model with the aim to study synaptic plasticity phenomena in neural networks. It is today well established in biology that the synapses or connections between certain types of neurons are strengthened when the neurons are co-active, a form of the so called synaptic plasticity. Such mechanism is believed to mediate the formation and maintenance of memories. The proposed model describes some features from that phenomenon. Together with the spin-flip dynamics, in our model the coupling constants are also subject to stochastic dynamics, so that they interact with each other. The evolution of the system is described by a continuous-time Markov jump process.
Keyword Markov chain, Stochastic Ising model, synaptic plasticity, neural networks, transience
△ Less
Submitted 20 July, 2016;
originally announced July 2016.
-
Reverse enGENEering of regulatory networks from Big Data: a guide for a biologist
Authors:
Xiaoxi Dong,
Anatoly Yambartsev,
Stephen Ramsey,
Lina Thomas,
Natalia Shulzhenko,
Andrey Morgun
Abstract:
Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform this data into biological knowledge. For example, how to use this data to answer questions such as: which functional pathways are in…
▽ More
Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform this data into biological knowledge. For example, how to use this data to answer questions such as: which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction and network interrogation. Herein, we provide an overview of network analysis including a step by step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
Unexpected links reflect the noise in networks
Authors:
Anatoly Yambartsev,
Michael Perlin,
Yevgeniy Kovchegov,
Natalia Shulzhenko,
Karina L. Mine,
Xiaoxi Dong,
Andrey Morgun
Abstract:
Gene covariation networks are commonly used to study biological processes. The inference of gene covariation networks from observational data can be challenging, especially considering the large number of players involved and the small number of biological replicates available for analysis. We propose a new statistical method for estimating the number of erroneous edges in reconstructed networks t…
▽ More
Gene covariation networks are commonly used to study biological processes. The inference of gene covariation networks from observational data can be challenging, especially considering the large number of players involved and the small number of biological replicates available for analysis. We propose a new statistical method for estimating the number of erroneous edges in reconstructed networks that strongly enhances commonly used inference approaches. This method is based on a special relationship between sign of correlation (positive/negative) and directionality (up/down) of gene regulation, and allows for the identification and removal of approximately half of all erroneous edges. Using the mathematical model of Bayesian networks and positive correlation inequalities we establish a mathematical foundation for our method. Analyzing existing biological datasets, we find a strong correlation between the results of our method and false discovery rate (FDR). Furthermore, simulation analysis demonstrates that our method provides a more accurate estimate of network error than FDR.
△ Less
Submitted 25 September, 2015; v1 submitted 30 October, 2013;
originally announced October 2013.