-
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Authors:
Philippe Brouillard,
Chandler Squires,
Jonas Wahl,
Konrad P. Kording,
Karen Sachs,
Alexandre Drouin,
Dhanya Sridhar
Abstract:
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these cl…
▽ More
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
△ Less
Submitted 13 June, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Do-calculus enables estimation of causal effects in partially observed biomolecular pathways
Authors:
Sara Mohammad-Taheri,
Jeremy Zucker,
Charles Tapley Hoyt,
Karen Sachs,
Vartika Tewari,
Robert Ness,
and Olga Vitek
Abstract:
Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways. The estimation requires experimental measurements on the pathway components. However, in practice many pathway components are left unobserved (latent) because they are either unknown, or difficult to measure. Latent variable models (LVMs) ar…
▽ More
Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways. The estimation requires experimental measurements on the pathway components. However, in practice many pathway components are left unobserved (latent) because they are either unknown, or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways. In this manuscript, we propose a general and practical approach for LVM-based estimation of causal queries. We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl's do-calculus, and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation. The code and the data documenting all the case studies are available at \url{https://github.com/srtaheri/LVMwithDoCalculus}
△ Less
Submitted 24 October, 2022; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Graph-Sparse Logistic Regression
Authors:
Alexander LeNail,
Ludwig Schmidt,
Johnathan Li,
Tobias Ehrenberger,
Karen Sachs,
Stefanie Jegelka,
Ernest Fraenkel
Abstract:
We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experim…
▽ More
We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.
△ Less
Submitted 14 December, 2017;
originally announced December 2017.
-
Cloud Usage Patterns: A Formalism for Description of Cloud Usage Scenarios
Authors:
Aleksandar Milenkoski,
Alexandru Iosup,
Samuel Kounev,
Kai Sachs,
Piotr Rygielski,
Jason Ding,
Walfredo Cirne,
Florian Rosenberg
Abstract:
Cloud computing is becoming an increasingly lucrative branch of the existing information and communication technologies (ICT). Enabling a debate about cloud usage scenarios can help with attracting new customers, sharing best-practices, and designing new cloud services. In contrast to previous approaches, which have attempted mainly to formalize the common service delivery models (i.e., Infrastruc…
▽ More
Cloud computing is becoming an increasingly lucrative branch of the existing information and communication technologies (ICT). Enabling a debate about cloud usage scenarios can help with attracting new customers, sharing best-practices, and designing new cloud services. In contrast to previous approaches, which have attempted mainly to formalize the common service delivery models (i.e., Infrastructure-as-a-Service, Platform-as-a-Service, and Software-as-a-Service), in this work, we propose a formalism for describing common cloud usage scenarios referred to as cloud usage patterns. Our formalism takes a structuralist approach allowing decomposition of a cloud usage scenario into elements corresponding to the common cloud service delivery models. Furthermore, our formalism considers several cloud usage patterns that have recently emerged, such as hybrid services and value chains in which mediators are involved, also referred to as value chains with mediators. We propose a simple yet expressive textual and visual language for our formalism, and we show how it can be used in practice for describing a variety of real-world cloud usage scenarios. The scenarios for which we demonstrate our formalism include resource provisioning of global providers of infrastructure and/or platform resources, online social networking services, user-data processing services, online customer and ticketing services, online asset management and banking applications, CRM (Customer Relationship Management) applications, and online social gaming applications.
△ Less
Submitted 5 October, 2014;
originally announced October 2014.