Search | arXiv e-print repository

Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

Authors: Ashka Shah, Adela DePavia, Nathaniel Hudson, Ian Foster, Rick Stevens

Abstract: The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by… ▽ More The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of directed acyclic graphs, to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap. In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees. We leverage the idea of a superstructure -- a set of learned or existing candidate hypotheses -- to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces. △ Less

Submitted 3 March, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: TMLR 03/2025

arXiv:2305.13402 [pdf, other]

Error-Tolerant Exact Query Learning of Finite Set Partitions with Same-Cluster Oracle

Authors: Adela Frances DePavia, Olga Medrano Martín del Campo, Erasmo Tani

Abstract: This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a Rényi-Ulam style analytical framework for this problem, and prove upper and lower boun… ▽ More This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a Rényi-Ulam style analytical framework for this problem, and prove upper and lower bounds on its worst-case query complexity. Further, we bound the expected performance of a relevant randomized algorithm. Finally, we study the relationship between adaptivity and query complexity for this problem and related variants. △ Less

Submitted 16 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 28 pages, 2 figures

arXiv:2003.09969 [pdf, other]

Spectral Clustering Revisited: Information Hidden in the Fiedler Vector

Authors: Adela DePavia, Stefan Steinerberger

Abstract: We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected… ▽ More We are interested in the clustering problem on graphs: it is known that if there are two underlying clusters, then the signs of the eigenvector corresponding to the second largest eigenvalue of the adjacency matrix can reliably reconstruct the two clusters. We argue that the vertices for which the eigenvector has the largest and the smallest entries, respectively, are unusually strongly connected to their own cluster and more reliably classified than the rest. This can be regarded as a discrete version of the Hot Spots conjecture and should be useful in applications. We give a rigorous proof for the stochastic block model and several examples. △ Less

Submitted 22 March, 2020; originally announced March 2020.

Showing 1–3 of 3 results for author: DePavia, A