The C-SHIFT algorithm for normalizing covariances
Authors:
Evgenia Chunikhina,
Paul Logan,
Yevgeniy Kovchegov,
Anatoly Yambartsev,
Debashis Mondal,
Andrey Morgun
Abstract:
Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons…
▽ More
Omics technologies are powerful tools for analyzing patterns in gene expression data for thousands of genes. Due to a number of systematic variations in experiments, the raw gene expression data is often obfuscated by undesirable technical noises. Various normalization techniques were designed in an attempt to remove these non-biological errors prior to any statistical analysis. One of the reasons for normalizing data is the need for recovering the covariance matrix used in gene network analysis. In this paper, we introduce a novel normalization technique, called the covariance shift (C-SHIFT) method. This normalization algorithm uses optimization techniques together with the blessing of dimensionality philosophy and energy minimization hypothesis for covariance matrix recovery under additive noise (in biology, known as the bias). Thus, it is perfectly suited for the analysis of logarithmic gene expression data. Numerical experiments on synthetic data demonstrate the method's advantage over the classical normalization techniques. Namely, the comparison is made with Rank, Quantile, cyclic LOESS (locally estimated scatterplot smoothing), and MAD (median absolute deviation) normalization methods. We also evaluate the performance of C-SHIFT algorithm on real biological data.
△ Less
Submitted 5 August, 2021; v1 submitted 28 March, 2020;
originally announced March 2020.
Unexpected links reflect the noise in networks
Authors:
Anatoly Yambartsev,
Michael Perlin,
Yevgeniy Kovchegov,
Natalia Shulzhenko,
Karina L. Mine,
Xiaoxi Dong,
Andrey Morgun
Abstract:
Gene covariation networks are commonly used to study biological processes. The inference of gene covariation networks from observational data can be challenging, especially considering the large number of players involved and the small number of biological replicates available for analysis. We propose a new statistical method for estimating the number of erroneous edges in reconstructed networks t…
▽ More
Gene covariation networks are commonly used to study biological processes. The inference of gene covariation networks from observational data can be challenging, especially considering the large number of players involved and the small number of biological replicates available for analysis. We propose a new statistical method for estimating the number of erroneous edges in reconstructed networks that strongly enhances commonly used inference approaches. This method is based on a special relationship between sign of correlation (positive/negative) and directionality (up/down) of gene regulation, and allows for the identification and removal of approximately half of all erroneous edges. Using the mathematical model of Bayesian networks and positive correlation inequalities we establish a mathematical foundation for our method. Analyzing existing biological datasets, we find a strong correlation between the results of our method and false discovery rate (FDR). Furthermore, simulation analysis demonstrates that our method provides a more accurate estimate of network error than FDR.
△ Less
Submitted 25 September, 2015; v1 submitted 30 October, 2013;
originally announced October 2013.