-
DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks
Authors:
Katherine H. Shutta,
Deborah Weighill,
Rebekka Burkholz,
Marouen Ben Guebila,
Dawn L. DeMeo,
Helena U. Zacharias,
John Quackenbush,
Michael Altenbuchinger
Abstract:
The increasing quantity of multi-omics data, such as methylomic and transcriptomic profiles, collected on the same specimen, or even on the same cell, provide a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis o…
▽ More
The increasing quantity of multi-omics data, such as methylomic and transcriptomic profiles, collected on the same specimen, or even on the same cell, provide a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics "layers." In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).
△ Less
Submitted 21 September, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Fully integrative data analysis of NMR metabolic fingerprints with comprehensive patient data: a case report based on the German Chronic Kidney Disease (GCKD) study
Authors:
Helena U. Zacharias,
Michael Altenbuchinger,
Stefan Solbrig,
Andreas Schäfer,
Mustafa Buyukozkan,
Ulla T. Schultheiß,
Fruzsina Kotsis,
Anna Köttgen,
Jan Krumsiek,
Fabian J. Theis,
Rainer Spang,
Peter J. Oefner,
Wolfram Gronwald,
GCKD study investigators
Abstract:
Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To that end, it is necessary to integrate omics data with other data types such as clinical, phenotypic, and demographic parameters of categorical or continuous nature. Here, we exemplify this data integration issue for a study on chronic kidney disea…
▽ More
Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To that end, it is necessary to integrate omics data with other data types such as clinical, phenotypic, and demographic parameters of categorical or continuous nature. Here, we exemplify this data integration issue for a study on chronic kidney disease (CKD), where complex clinical and demographic parameters were assessed together with one-dimensional (1D) 1H NMR metabolic fingerprints. Routine analysis screens for associations of single metabolic features with clinical parameters, which requires confounding variables typically chosen by expert knowledge to be taken into account. This knowledge can be incomplete or unavailable. The results of this article are manifold. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and give several sanity checks. In particular, we show that the discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that the discovery of associations in routine analysis can be biased by incorrect or incomplete expert knowledge in univariate screening approaches. Finally, we exemplify how our data integration approach reveals important associations between CKD comorbidities and metabolites. Moreover, we evaluate the predictive performance of the estimated models on independent validation data and contrast the results with a naive screening approach.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Loss-function learning for digital tissue deconvolution
Authors:
Franziska Görtler,
Stefan Solbrig,
Tilo Wettig,
Peter J. Oefner,
Rainer Spang,
Michael Altenbuchinger
Abstract:
The gene expression profile of a tissue averages the expression profiles of all cells in this tissue. Digital tissue deconvolution (DTD) addresses the following inverse problem: Given the expression profile $y$ of a tissue, what is the cellular composition $c$ of that tissue? If $X$ is a matrix whose columns are reference profiles of individual cell types, the composition $c$ can be computed by mi…
▽ More
The gene expression profile of a tissue averages the expression profiles of all cells in this tissue. Digital tissue deconvolution (DTD) addresses the following inverse problem: Given the expression profile $y$ of a tissue, what is the cellular composition $c$ of that tissue? If $X$ is a matrix whose columns are reference profiles of individual cell types, the composition $c$ can be computed by minimizing $\mathcal L(y-Xc)$ for a given loss function $\mathcal L$. Current methods use predefined all-purpose loss functions. They successfully quantify the dominating cells of a tissue, while often falling short in detecting small cell populations.
Here we learn the loss function $\mathcal L$ along with the composition $c$. This allows us to adapt to application-specific requirements such as focusing on small cell populations or distinguishing phenotypically similar cell populations. Our method quantifies large cell fractions as accurately as existing methods and significantly improves the detection of small cell populations and the distinction of similar cell types.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Scale-invariant biomarker discovery in urine and plasma metabolite fingerprints
Authors:
Helena U. Zacharias,
Thorsten Rehberg,
Sebastian Mehrl,
Daniel Richtmann,
Tilo Wettig,
Peter J. Oefner,
Rainer Spang,
Wolfram Gronwald,
Michael Altenbuchinger
Abstract:
Motivation: Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such normalization of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways.
Results: First, we study how the outcome of hypothesis tests for differenti…
▽ More
Motivation: Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such normalization of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways.
Results: First, we study how the outcome of hypothesis tests for differential metabolite concentration is affected by the choice of scale. Furthermore, we observe this interdependence also for different classification approaches. Second, to overcome this problem and establish a scale-invariant biomarker discovery algorithm, we extend linear zero-sum regression to the logistic regression framework and show in two applications to ${}^1$H NMR-based metabolomics data how this approach overcomes the scaling problem.
Availability: Logistic zero-sum regression is available as an R package as well as a high-performance computing implementation that can be downloaded at https://github.com/rehbergT/zeroSum
△ Less
Submitted 22 March, 2017;
originally announced March 2017.