-
On the combination of omics data for prediction of binary outcomes
Authors:
Mar RodrÃguez-Girondo,
Alexia Kakourou,
Perttu Salo,
Markus Perola,
Wilma E. Mesker,
Rob A. E. M. Tollenaar,
Jeanine Houwing-Duistermaat,
Bart J. A. Mertens
Abstract:
Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers fo…
▽ More
Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers for the prediction of health traits. We review several approaches for the combination of omic markers in the context of binary outcome prediction, all based on double cross-validation and regularized regression models. We evaluate their performance in terms of calibration and discrimination and we compare their performance with respect to single-omic source predictions. We illustrate the methods through the analysis of two real datasets. On the one hand, we consider the combination of two fractions of proteomic mass spectrometry for the calibration of a diagnostic rule for the detection of early-stage breast cancer. On the other hand, we consider transcriptomics and metabolomics as predictors of obesity using data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort, from Finland.
△ Less
Submitted 14 October, 2016;
originally announced October 2016.
-
Adapting censored regression methods to adjust for the limit of detection in the calibration of diagnostic rules for clinical mass spectrometry proteomic data
Authors:
Alexia Kakourou,
Werner Vach,
Bart Mertens
Abstract:
Despite the recent advances in mass spectrometry (MS), summarizing and analyzing high-throughput mass-spectrometry data remains a challenging task. This is, on the one hand, due to the complexity of the spectral signal which is measured, and on the other, due to the limit of detection (LOD). The LOD is related to the limitation of instruments in measuring markers at a relatively low level. As a co…
▽ More
Despite the recent advances in mass spectrometry (MS), summarizing and analyzing high-throughput mass-spectrometry data remains a challenging task. This is, on the one hand, due to the complexity of the spectral signal which is measured, and on the other, due to the limit of detection (LOD). The LOD is related to the limitation of instruments in measuring markers at a relatively low level. As a consequence, the outcome data set from the quantification step of proteomic analysis often consists of a reduced list of peaks where any peak intensities below the detection limit threshold are reported as missings. In this work, we propose the use of censored data methodology to handle spectral measurements within the presence of LOD, recognizing that those have been censored due to left-censoring mechanisms on low-abundance proteins. We apply this approach to the particular problem of calibrating prediction rules through prior estimation of the average isotope expression in MALDI-FTICR mass-spectrometry data, collected in the context of a pancreatic cancer case-control study. Our idea is to replace the set of incomplete spectral measurements with the average intensity estimates and use those as new input to a prediction model. We evaluate the proposed methods, with respect to their predictive ability, by comparing their performance with the one achieved using the complete information as well as alternative/competitive methods to deal with the LOD.
△ Less
Submitted 29 June, 2016;
originally announced June 2016.
-
Statistical development and assessment of summary measures to account for isotopic clustering of Fourier transform mass spectrometry data in clinical diagnostic studies
Authors:
Alexia Kakourou,
Werner Vach,
Simone Nicolardi,
Yuri van der Burgt,
Bart Mertens
Abstract:
Mass spectrometry based clinical proteomics has emerged as a powerful tool for highthroughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. S…
▽ More
Mass spectrometry based clinical proteomics has emerged as a powerful tool for highthroughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.