On the influence of several factors on pathway enrichment analysis
Authors:
Sarah Mubeen,
Alpha Tom Kodamullil,
Martin Hofmann-Apitius,
Daniel Domingo-Fernández
Abstract:
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis which may not be accounted for. Researchers may fail to give infl…
▽ More
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting, or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges which originate from the outlined factors.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
Integrative Data Semantics through a Model-enabled Data Stewardship
Authors:
Philipp Wegner,
Sebastian Schaaf,
Mischa Uebachs,
Daniel Domingo-Fernández,
Yasamin Salimi,
Stephan Gebel,
Astghik Sargsyan,
Colin Birkenbihl,
Stephan Springstubbe,
Thomas Klockgether,
Juliane Fluck,
Martin Hofmann-Apitius,
Alpha Tom Kodamullil
Abstract:
Motivation: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more c…
▽ More
Motivation: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease aetiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. Results: Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests, and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.