Search | arXiv e-print repository

Controlling for discrete unmeasured confounding in nonlinear causal models

Authors: Patrick Burauel, Frederick Eberhardt, Michel Besserve

Abstract: Unmeasured confounding is a major challenge for identifying causal relationships from non-experimental data. Here, we propose a method that can accommodate unmeasured discrete confounding. Extending recent identifiability results in deep latent variable models, we show theoretically that confounding can be detected and corrected under the assumption that the observed data is a piecewise affine tra… ▽ More Unmeasured confounding is a major challenge for identifying causal relationships from non-experimental data. Here, we propose a method that can accommodate unmeasured discrete confounding. Extending recent identifiability results in deep latent variable models, we show theoretically that confounding can be detected and corrected under the assumption that the observed data is a piecewise affine transformation of a latent Gaussian mixture model and that the identity of the mixture components is confounded. We provide a flow-based algorithm to estimate this model and perform deconfounding. Experimental results on synthetic and real-world data provide support for the effectiveness of our approach. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2312.13438 [pdf, ps, other]

Independent Mechanism Analysis and the Manifold Hypothesis

Authors: Shubhangi Ghosh, Luigi Gresele, Julius von Kügelgen, Michel Besserve, Bernhard Schölkopf

Abstract: Independent Mechanism Analysis (IMA) seeks to address non-identifiability in nonlinear Independent Component Analysis (ICA) by assuming that the Jacobian of the mixing function has orthogonal columns. As typical in ICA, previous work focused on the case with an equal number of latent components and observed mixtures. Here, we extend IMA to settings with a larger number of mixtures that reside on a… ▽ More Independent Mechanism Analysis (IMA) seeks to address non-identifiability in nonlinear Independent Component Analysis (ICA) by assuming that the Jacobian of the mixing function has orthogonal columns. As typical in ICA, previous work focused on the case with an equal number of latent components and observed mixtures. Here, we extend IMA to settings with a larger number of mixtures that reside on a manifold embedded in a higher-dimensional than the latent space -- in line with the manifold hypothesis in representation learning. For this setting, we show that IMA still circumvents several non-identifiability issues, suggesting that it can also be a beneficial principle for higher-dimensional observations when the manifold hypothesis holds. Further, we prove that the IMA principle is approximately satisfied with high probability (increasing with the number of observed mixtures) when the directions along which the latent components influence the observations are chosen independently at random. This provides a new and rigorous statistical interpretation of IMA. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 6 pages, Accepted at Neurips Causal Representation Learning 2023

arXiv:2311.18639 [pdf, other]

Targeted Reduction of Causal Models

Authors: Armin Kekić, Bernhard Schölkopf, Michel Besserve

Abstract: Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable ca… ▽ More Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable causal patterns within these simulations through an interventional lens. However, developing general CRL frameworks suitable for practical applications remains an open challenge. We introduce Targeted Causal Reduction (TCR), a method for condensing complex intervenable models into a concise set of causal factors that explain a specific target phenomenon. We propose an information theoretic objective to learn TCR from interventional data of simulations, establish identifiability for continuous variables under shift interventions and present a practical algorithm for learning TCRs. Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems, illustrating its potential to assist scientists in the study of complex phenomena in a broad range of disciplines. △ Less

Submitted 3 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2306.00907 [pdf, other]

A site-site interaction two-dimensional model with water like structural properties

Authors: Tangi Baré, Maxime Besserve, Tomaz Urbic, Aurélien Perera

Abstract: A site-site interaction model is proposed for water in two-dimension, as an alternative to the traditional Mercedes-Benz model. In MB model, water molecules are modeled as 2-dimensional Lennard-Jones disks with three hydrogen bonding arms arranged symmetrically, resembling the Mercedes-Benz logo. The MB model qualitatively predicts both the anomalous properties of pure water and the anomalous solv… ▽ More A site-site interaction model is proposed for water in two-dimension, as an alternative to the traditional Mercedes-Benz model. In MB model, water molecules are modeled as 2-dimensional Lennard-Jones disks with three hydrogen bonding arms arranged symmetrically, resembling the Mercedes-Benz logo. The MB model qualitatively predicts both the anomalous properties of pure water and the anomalous solvation thermodynamics of non-polar molecules. One of the features of this earlier model was to have a pair correlation function with first peak for the Lennard-Jones contact distinct of that corresponding to the hydrogen bonding, which is very different from real water which has a single first peak, but a dual peak for the structure factor. The site-site model proposed here reproduces this typical feature of real water, both in real and reciprocal space. It also reproduces several of the known anomalies of real water, such as the density maximum. In addition, because of the screened Coulomb interaction between the sites, the new model appear to exhibit more homogeneity that the MB models and their variants, the latter which is highlighted by a k=0 increase of their structure factors. The new model transfers the usual bond order paradigm into a charge order paradigm, enforcing atom-atom interactions over orientational interactions. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 24 pages, 17 figures

arXiv:2306.00542 [pdf, other]

Nonparametric Identifiability of Causal Representations from Unknown Interventions

Authors: Julius von Kügelgen, Michel Besserve, Liang Wendong, Luigi Gresele, Armin Kekić, Elias Bareinboim, David M. Blei, Bernhard Schölkopf

Abstract: We study causal representation learning, the task of inferring latent causal variables and their causal relations from high-dimensional mixtures of the variables. Prior work relies on weak supervision, in the form of counterfactual pre- and post-intervention views or temporal structure; places restrictive assumptions, such as linearity, on the mixing function or latent causal model; or requires pa… ▽ More We study causal representation learning, the task of inferring latent causal variables and their causal relations from high-dimensional mixtures of the variables. Prior work relies on weak supervision, in the form of counterfactual pre- and post-intervention views or temporal structure; places restrictive assumptions, such as linearity, on the mixing function or latent causal model; or requires partial knowledge of the generative process, such as the causal graph or intervention targets. We instead consider the general setting in which both the causal model and the mixing function are nonparametric. The learning signal takes the form of multiple datasets, or environments, arising from unknown interventions in the underlying causal model. Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data. We study the fundamental setting of two causal variables and prove that the observational distribution and one perfect intervention per node suffice for identifiability, subject to a genericity condition. This condition rules out spurious solutions that involve fine-tuning of the intervened and observational distributions, mirroring similar conditions for nonlinear cause-effect inference. For an arbitrary number of variables, we show that at least one pair of distinct perfect interventional domains per node guarantees identifiability. Further, we demonstrate that the strengths of causal influences among the latent variables are preserved by all equivalent solutions, rendering the inferred representation appropriate for drawing causal conclusions from new data. Our study provides the first identifiability results for the general nonparametric setting with unknown interventions, and elucidates what is possible and impossible for causal representation learning without more direct supervision. △ Less

Submitted 28 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 camera-ready version; 36 pages, 4 figures

MSC Class: 68T05 ACM Class: I.2.6

arXiv:2305.17225 [pdf, other]

Causal Component Analysis

Authors: Liang Wendong, Armin Kekić, Julius von Kügelgen, Simon Buchholz, Michel Besserve, Luigi Gresele, Bernhard Schölkopf

Abstract: Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA c… ▽ More Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting. △ Less

Submitted 17 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 final camera-ready version

arXiv:2209.07508 [pdf, other]

Information Theoretic Measures of Causal Influences during Transient Neural Events

Authors: Kaidi Shao, Nikos K. Logothetis, Michel Besserve

Abstract: Transient phenomena play a key role in coordinating brain activity at multiple scales, however,their underlying mechanisms remain largely unknown. A key challenge for neural data science is thus to characterize the network interactions at play during these events. Using the formalism of Structural Causal Models and their graphical representation, we investigate the theoretical and empirical proper… ▽ More Transient phenomena play a key role in coordinating brain activity at multiple scales, however,their underlying mechanisms remain largely unknown. A key challenge for neural data science is thus to characterize the network interactions at play during these events. Using the formalism of Structural Causal Models and their graphical representation, we investigate the theoretical and empirical properties of Information Theory based causal strength measures in the context of recurring spontaneous transient events. After showing the limitations of Transfer Entropy and Dynamic Causal Strength in such a setting, we introduce a novel measure, relative Dynamic Causal Strength, and provide theoretical and empirical support for its benefits. These methods are applied to simulated and experimentally recorded neural time series, and provide results in agreement with our current understanding of the underlying brain circuits. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2208.06406 [pdf, other]

Function Classes for Identifiable Nonlinear Independent Component Analysis

Authors: Simon Buchholz, Michel Besserve, Bernhard Schölkopf

Abstract: Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constra… ▽ More Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: 43 pages

Journal ref: NeurIPS 2022

arXiv:2207.12067 [pdf, other]

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Authors: Hamza Keurti, Hsiao-Ru Pan, Michel Besserve, Benjamin F. Grewe, Bernhard Schölkopf

Abstract: How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn int… ▽ More How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy. △ Less

Submitted 2 July, 2024; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted at ICML2023, Presented at the Symmetry and Geometry in Neural Representations Workshop (NeurReps) @ NeurIPS2022, 26 pages, 17 figures

arXiv:2206.02416 [pdf, other]

Embrace the Gap: VAEs Perform Independent Mechanism Analysis

Authors: Patrik Reizinger, Luigi Gresele, Jack Brady, Julius von Kügelgen, Dominik Zietlow, Bernhard Schölkopf, Georg Martius, Wieland Brendel, Michel Besserve

Abstract: Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, sinc… ▽ More Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, since unregularized maximum likelihood estimation cannot invert the data-generating process. Yet, VAEs often succeed at this task. We seek to elucidate this apparent paradox by studying nonlinear VAEs in the limit of near-deterministic decoders. We first prove that, in this regime, the optimal encoder approximately inverts the decoder -- a commonly used but unproven conjecture -- which we refer to as {\em self-consistency}. Leveraging self-consistency, we show that the ELBO converges to a regularized log-likelihood. This allows VAEs to perform what has recently been termed independent mechanism analysis (IMA): it adds an inductive bias towards decoders with column-orthogonal Jacobians, which helps recovering the true latent factors. The gap between ELBO and log-likelihood is therefore welcome, since it bears unanticipated benefits for nonlinear representation learning. In experiments on synthetic and image data, we show that VAEs uncover the true latent factors when the data generating process satisfies the IMA assumption. △ Less

Submitted 27 January, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: NeurIPS2022 final version

arXiv:2204.14096 [pdf, other]

Bayesian Information Criterion for Event-based Multi-trial Ensemble data

Authors: Kaidi Shao, Nikos K. Logothetis, Michel Besserve

Abstract: Transient recurring phenomena are ubiquitous in many scientific fields like neuroscience and meteorology. Time inhomogenous Vector Autoregressive Models (VAR) may be used to characterize peri-event system dynamics associated with such phenomena, and can be learned by exploiting multi-dimensional data gathering samples of the evolution of the system in multiple time windows comprising, each associa… ▽ More Transient recurring phenomena are ubiquitous in many scientific fields like neuroscience and meteorology. Time inhomogenous Vector Autoregressive Models (VAR) may be used to characterize peri-event system dynamics associated with such phenomena, and can be learned by exploiting multi-dimensional data gathering samples of the evolution of the system in multiple time windows comprising, each associated with one occurrence of the transient phenomenon, that we will call "trial". However, optimal VAR model order selection methods, commonly relying on the Akaike or Bayesian Information Criteria (AIC/BIC), are typically not designed for multi-trial data. Here we derive the BIC methods for multi-trial ensemble data which are gathered after the detection of the events. We show using simulated bivariate AR models that the multi-trial BIC is able to recover the real model order. We also demonstrate with simulated transient events and real data that the multi-trial BIC is able to estimate a sufficiently small model order for dynamic system modeling. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: 12 pages, 4 figures

arXiv:2202.06844 [pdf, other]

On Pitfalls of Identifiability in Unsupervised Learning. A Note on: "Desiderata for Representation Learning: A Causal Perspective"

Authors: Shubhangi Ghosh, Luigi Gresele, Julius von Kügelgen, Michel Besserve, Bernhard Schölkopf

Abstract: Model identifiability is a desirable property in the context of unsupervised representation learning. In absence thereof, different models may be observationally indistinguishable while yielding representations that are nontrivially related to one another, thus making the recovery of a ground truth generative model fundamentally impossible, as often shown through suitably constructed counterexampl… ▽ More Model identifiability is a desirable property in the context of unsupervised representation learning. In absence thereof, different models may be observationally indistinguishable while yielding representations that are nontrivially related to one another, thus making the recovery of a ground truth generative model fundamentally impossible, as often shown through suitably constructed counterexamples. In this note, we discuss one such construction, illustrating a potential failure case of an identifiability result presented in "Desiderata for Representation Learning: A Causal Perspective" by Wang & Jordan (2021). The construction is based on the theory of nonlinear independent component analysis. We comment on implications of this and other counterexamples for identifiable representation learning. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 5 pages, 1 figure

arXiv:2112.05729 [pdf, other]

Learning soft interventions in complex equilibrium systems

Authors: Michel Besserve, Bernhard Schölkopf

Abstract: Complex systems often contain feedback loops that can be described as cyclic causal models. Intervening in such systems may lead to counterintuitive effects, which cannot be inferred directly from the graph structure. After establishing a framework for differentiable soft interventions based on Lie groups, we take advantage of modern automatic differentiation techniques and their application to im… ▽ More Complex systems often contain feedback loops that can be described as cyclic causal models. Intervening in such systems may lead to counterintuitive effects, which cannot be inferred directly from the graph structure. After establishing a framework for differentiable soft interventions based on Lie groups, we take advantage of modern automatic differentiation techniques and their application to implicit functions in order to optimize interventions in cyclic causal models. We illustrate the use of this framework by investigating scenarios of transition to sustainable economies. △ Less

Submitted 14 December, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

arXiv:2110.15595 [pdf, other]

Cause-effect inference through spectral independence in linear dynamical systems: theoretical foundations

Authors: Michel Besserve, Naji Shajarisales, Dominik Janzing, Bernhard Schölkopf

Abstract: Distinguishing between cause and effect using time series observational data is a major challenge in many scientific fields. A new perspective has been provided based on the principle of Independence of Causal Mechanisms (ICM), leading to the Spectral Independence Criterion (SIC), postulating that the power spectral density (PSD) of the cause time series is uncorrelated with the squared modulus of… ▽ More Distinguishing between cause and effect using time series observational data is a major challenge in many scientific fields. A new perspective has been provided based on the principle of Independence of Causal Mechanisms (ICM), leading to the Spectral Independence Criterion (SIC), postulating that the power spectral density (PSD) of the cause time series is uncorrelated with the squared modulus of the frequency response of the filter generating the effect. Since SIC rests on methods and assumptions in stark contrast with most causal discovery methods for time series, it raises questions regarding what theoretical grounds justify its use. In this paper, we provide answers covering several key aspects. After providing an information theoretic interpretation of SIC, we present an identifiability result that sheds light on the context for which this approach is expected to perform well. We further demonstrate the robustness of SIC to downsampling - an obstacle that can spoil Granger-based inference. Finally, an invariance perspective allows to explore the limitations of the spectral independence assumption and how to generalize it. Overall, these results support the postulate of Spectral Independence is a well grounded leading principle for causal inference based on empirical time series. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2106.16091 [pdf, other]

Exploring the Latent Space of Autoencoders with Interventional Assays

Authors: Felix Leeb, Stefan Bauer, Michel Besserve, Bernhard Schölkopf

Abstract: Autoencoders exhibit impressive abilities to embed the data manifold into a low-dimensional latent space, making them a staple of representation learning methods. However, without explicit supervision, which is often unavailable, the representation is usually uninterpretable, making analysis and principled progress challenging. We propose a framework, called latent responses, which exploits the lo… ▽ More Autoencoders exhibit impressive abilities to embed the data manifold into a low-dimensional latent space, making them a staple of representation learning methods. However, without explicit supervision, which is often unavailable, the representation is usually uninterpretable, making analysis and principled progress challenging. We propose a framework, called latent responses, which exploits the locally contractive behavior exhibited by variational autoencoders to explore the learned manifold. More specifically, we develop tools to probe the representation using interventions in the latent space to quantify the relationships between latent variables. We extend the notion of disentanglement to take the learned generative process into account and consequently avoid the limitations of existing metrics that may rely on spurious correlations. Our analyses underscore the importance of studying the causal structure of the representation to improve performance on downstream tasks such as generation, interpolation, and inference of the factors of variation. △ Less

Submitted 11 January, 2023; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: Published in NeurIPS 2022 Conference Proceedings

arXiv:2106.05200 [pdf, other]

Independent mechanism analysis, a new concept?

Authors: Luigi Gresele, Julius von Kügelgen, Vincent Stimper, Bernhard Schölkopf, Michel Besserve

Abstract: Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem.… ▽ More Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation. △ Less

Submitted 9 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 final camera-ready version

arXiv:2106.04619 [pdf, other]

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Authors: Julius von Kügelgen, Yash Sharma, Luigi Gresele, Wieland Brendel, Bernhard Schölkopf, Michel Besserve, Francesco Locatello

Abstract: Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulat… ▽ More Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice. △ Less

Submitted 14 January, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 final camera-ready revision (with minor corrections)

arXiv:2012.01912 [pdf, other]

Assaying Large-scale Testing Models to Interpret COVID-19 Case Numbers

Authors: Michel Besserve, Simon Buchholz, Bernhard Schölkopf

Abstract: Large-scale testing is considered key to assess the state of the current COVID-19 pandemic. Yet, the link between the reported case numbers and the true state of the pandemic remains elusive. We develop mathematical models based on competing hypotheses regarding this link, thereby providing different prevalence estimates based on case numbers, and validate them by predicting SARS-CoV-2-attributed… ▽ More Large-scale testing is considered key to assess the state of the current COVID-19 pandemic. Yet, the link between the reported case numbers and the true state of the pandemic remains elusive. We develop mathematical models based on competing hypotheses regarding this link, thereby providing different prevalence estimates based on case numbers, and validate them by predicting SARS-CoV-2-attributed death rate trajectories. Assuming that individuals were tested based solely on a predefined risk of being infectious implies the absolute case numbers reflect the prevalence, but turned out to be a poor predictor, consistently overestimating growth rates at the beginning of two COVID-19 epidemic waves. In contrast, assuming that testing capacity is fully exploited performs better. This leads to using the percent-positive rate as a more robust indicator of epidemic dynamics, however we find it is subject to a saturation phenomenon that needs to be accounted for as the number of tests becomes larger. △ Less

Submitted 3 February, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 41 pages, 7 figures

arXiv:2010.05375 [pdf, other]

Causal learning with sufficient statistics: an information bottleneck approach

Authors: Daniel Chicharro, Michel Besserve, Stefano Panzeri

Abstract: The inference of causal relationships using observational data from partially observed multivariate systems with hidden variables is a fundamental question in many scientific domains. Methods extracting causal information from conditional independencies between variables of a system are common tools for this purpose, but are limited in the lack of independencies. To surmount this limitation, we ca… ▽ More The inference of causal relationships using observational data from partially observed multivariate systems with hidden variables is a fundamental question in many scientific domains. Methods extracting causal information from conditional independencies between variables of a system are common tools for this purpose, but are limited in the lack of independencies. To surmount this limitation, we capitalize on the fact that the laws governing the generative mechanisms of a system often result in substructures embodied in the generative functional equation of a variable, which act as sufficient statistics for the influence that other variables have on it. These functional sufficient statistics constitute intermediate hidden variables providing new conditional independencies to be tested. We propose to use the Information Bottleneck method, a technique commonly applied for dimensionality reduction, to find underlying sufficient sets of statistics. Using these statistics we formulate new additional rules of causal orientation that provide causal information not obtainable from standard structure learning algorithms, which exploit only conditional independencies between observable variables. We validate the use of sufficient statistics for structure learning both with simulated systems built to contain specific sufficient statistics and with benchmark data from regulatory rules previously and independently proposed to model biological signal transduction networks. △ Less

Submitted 11 October, 2020; originally announced October 2020.

MSC Class: 94A16; 62D20; 62H22; 62Bxx

arXiv:2007.02938 [pdf, other]

Causal Feature Selection via Orthogonal Search

Authors: Ashkan Soleymani, Anant Raj, Stefan Bauer, Bernhard Schölkopf, Michel Besserve

Abstract: The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. However, established approaches often scale at least exponentially with the number of explanatory variables, are difficult to extend to nonlinear relationships, and are difficult to extend to cyclic data. Inspired by {\em Debiased… ▽ More The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. However, established approaches often scale at least exponentially with the number of explanatory variables, are difficult to extend to nonlinear relationships, and are difficult to extend to cyclic data. Inspired by {\em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data while also offering theoretical guarantees, including the case of partially nonlinear relationships possibly under the presence of cycles. As it requires only one estimation for each variable, our approach is applicable even to large graphs. We demonstrate significant improvements compared to established approaches. △ Less

Submitted 16 September, 2022; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2006.07796 [pdf, other]

Structure by Architecture: Structured Representations without Regularization

Authors: Felix Leeb, Guilia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, Bernhard Schölkopf

Abstract: We study the problem of self-supervised structured representation learning using autoencoders for downstream tasks such as generative modeling. Unlike most methods which rely on matching an arbitrary, relatively unstructured, prior distribution for sampling, we propose a sampling technique that relies solely on the independence of latent variables, thereby avoiding the trade-off between reconstruc… ▽ More We study the problem of self-supervised structured representation learning using autoencoders for downstream tasks such as generative modeling. Unlike most methods which rely on matching an arbitrary, relatively unstructured, prior distribution for sampling, we propose a sampling technique that relies solely on the independence of latent variables, thereby avoiding the trade-off between reconstruction quality and generative performance typically observed in VAEs. We design a novel autoencoder architecture capable of learning a structured representation without the need for aggressive regularization. Our structural decoders learn a hierarchy of latent variables, thereby ordering the information without any additional regularization or supervision. We demonstrate how these models learn a representation that improves results in a variety of downstream tasks including generation, disentanglement, and extrapolation using several challenging and natural image datasets. △ Less

Submitted 15 February, 2024; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Published at ICLR 2023

arXiv:2005.04034 [pdf, ps, other]

From univariate to multivariate coupling between continuous signals and point processes: a mathematical framework

Authors: Shervin Safavi, Nikos K. Logothetis, Michel Besserve

Abstract: Time series datasets often contain heterogeneous signals, composed of both continuously changing quantities and discretely occurring events. The coupling between these measurements may provide insights into key underlying mechanisms of the systems under study. To better extract this information, we investigate the asymptotic statistical properties of coupling measures between continuous signals an… ▽ More Time series datasets often contain heterogeneous signals, composed of both continuously changing quantities and discretely occurring events. The coupling between these measurements may provide insights into key underlying mechanisms of the systems under study. To better extract this information, we investigate the asymptotic statistical properties of coupling measures between continuous signals and point processes. We first introduce martingale stochastic integration theory as a mathematical model for a family of statistical quantities that include the Phase Locking Value, a classical coupling measure to characterize complex dynamics. Based on the martingale Central Limit Theorem, we can then derive the asymptotic Gaussian distribution of estimates of such coupling measure, that can be exploited for statistical testing. Second, based on multivariate extensions of this result and Random Matrix Theory, we establish a principled way to analyze the low rank coupling between a large number of point processes and continuous signals. For a null hypothesis of no coupling, we establish sufficient conditions for the empirical distribution of squared singular values of the matrix to converge, as the number of measured signals increases, to the well-known Marchenko-Pastur (MP) law, and the largest squared singular value converges to the upper end of the MPs support. This justifies a simple thresholding approach to assess the significance of multivariate coupling. Finally, we illustrate with simulations the relevance of our univariate and multivariate results in the context of neural time series, addressing how to reliably quantify the interplay between multi channel Local Field Potential signals and the spiking activity of a large population of neurons. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: 50 pages

arXiv:2004.00184 [pdf, other]

A theory of independent mechanisms for extrapolation in generative models

Authors: Michel Besserve, Rémy Sun, Dominik Janzing, Bernhard Schölkopf

Abstract: Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the architecture of such model reflect a causal graph of the true data generating process, such that one can intervene on each node independently of the others. Howev… ▽ More Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the architecture of such model reflect a causal graph of the true data generating process, such that one can intervene on each node independently of the others. However, the nodes of this graph are usually unobserved, leading to overparameterization and lack of identifiability of the causal structure. We develop a theoretical framework to address this challenging situation by defining a weaker form of identifiability, based on the principle of independence of mechanisms. We demonstrate on toy examples that classical stochastic gradient descent can hinder the model's extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged. △ Less

Submitted 31 December, 2021; v1 submitted 31 March, 2020; originally announced April 2020.

Comments: 21 pages

arXiv:1903.02456

Orthogonal Structure Search for Efficient Causal Discovery from Observational Data

Authors: Anant Raj, Luigi Gresele, Michel Besserve, Bernhard Schölkopf, Stefan Bauer

Abstract: The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properties of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the… ▽ More The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properties of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the number of the explanatory variables and are difficult to extend to nonlinear relationships. Contrary to existing work, we propose an approach which even works for observational data alone, while still offering theoretical guarantees including the case of partially nonlinear relationships. Our algorithm requires only one estimation for each variable and in our experiments we apply our causal discovery algorithm even to large graphs, demonstrating significant improvements compared to well established approaches. △ Less

Submitted 6 July, 2020; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: first author uploaded a new version as "Causal Feature Selection via Orthogonal Search"

arXiv:1812.03253 [pdf, other]

Counterfactuals uncover the modular structure of deep generative models

Authors: Michel Besserve, Arash Mehrjou, Rémy Sun, Bernhard Schölkopf

Abstract: Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to disentangle latent fac… ▽ More Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to disentangle latent factors, we argue that such requirement is too restrictive and propose instead a non-statistical framework that relies on counterfactual manipulations to uncover a modular structure of the network composed of disentangled groups of internal variables. Experiments with a variety of generative models trained on complex image datasets show the obtained modules can be used to design targeted interventions. This opens the way to applications such as computationally efficient style transfer and the automated assessment of robustness to contextual changes in pattern recognition systems. △ Less

Submitted 12 December, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

Comments: 26 pages, 17 figures

arXiv:1803.06247 [pdf, ps, other]

Coordinating users of shared facilities via data-driven predictive assistants and game theory

Authors: Philipp Geiger, Michel Besserve, Justus Winkelmann, Claudius Proissl, Bernhard Schölkopf

Abstract: We study data-driven assistants that provide congestion forecasts to users of shared facilities (roads, cafeterias, etc.), to support coordination between them, and increase efficiency of such collective systems. Key questions are: (1) when and how much can (accurate) predictions help for coordination, and (2) which assistant algorithms reach optimal predictions? First we lay conceptual ground f… ▽ More We study data-driven assistants that provide congestion forecasts to users of shared facilities (roads, cafeterias, etc.), to support coordination between them, and increase efficiency of such collective systems. Key questions are: (1) when and how much can (accurate) predictions help for coordination, and (2) which assistant algorithms reach optimal predictions? First we lay conceptual ground for this setting where user preferences are a priori unknown and predictions influence outcomes. Addressing (1), we establish conditions under which self-fulfilling prophecies, i.e., "perfect" (probabilistic) predictions of what will happen, solve the coordination problem in the game-theoretic sense of selecting a Bayesian Nash equilibrium (BNE). Next we prove that such prophecies exist even in large-scale settings where only aggregated statistics about users are available. This entails a new (nonatomic) BNE existence result. Addressing (2), we propose two assistant algorithms that sequentially learn from users' reactions, together with optimality/convergence guarantees. We validate one of them in a large real-world experiment. △ Less

Submitted 29 July, 2021; v1 submitted 16 March, 2018; originally announced March 2018.

Comments: Extended version, including supplement, of a paper at the 35th Conference on Uncertainty in Artificial Intelligence, 2019

arXiv:1707.06819 [pdf, ps, other]

A central limit like theorem for Fourier sums

Authors: Dominik Janzing, Naji Shajarisales, Michel Besserve

Abstract: We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probab… ▽ More We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probability to zero for any pseudometric on the set of distributions for which the distance between empirical distributions and the underlying distribution converges to zero in probability. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: 7 pages

MSC Class: 60Fxx

arXiv:1705.02212 [pdf, other]

Group invariance principles for causal generative models

Authors: Michel Besserve, Naji Shajarisales, Bernhard Schölkopf, Dominik Janzing

Abstract: The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by c… ▽ More The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning. △ Less

Submitted 5 May, 2017; originally announced May 2017.

Comments: 16 pages, 6 figures

ACM Class: I.2.6; I.2.10; G.3; I.5.3

arXiv:1503.01299 [pdf, ps, other]

Telling cause from effect in deterministic linear dynamical systems

Authors: Naji Shajarisales, Dominik Janzing, Bernhard Shoelkopf, Michel Besserve

Abstract: Inferring a cause from its effect using observed time series data is a major challenge in natural and social sciences. Assuming the effect is generated by the cause trough a linear system, we propose a new approach based on the hypothesis that nature chooses the "cause" and the "mechanism that generates the effect from the cause" independent of each other. We therefore postulate that the power spe… ▽ More Inferring a cause from its effect using observed time series data is a major challenge in natural and social sciences. Assuming the effect is generated by the cause trough a linear system, we propose a new approach based on the hypothesis that nature chooses the "cause" and the "mechanism that generates the effect from the cause" independent of each other. We therefore postulate that the power spectrum of the time series being the cause is uncorrelated with the square of the transfer function of the linear filter generating the effect. While most causal discovery methods for time series mainly rely on the noise, our method relies on asymmetries of the power spectral density properties that can be exploited even in the context of deterministic systems. We describe mathematical assumptions in a deterministic model under which the causal direction is identifiable with this approach. We also discuss the method's performance under the additive noise model and its relationship to Granger causality. Experiments show encouraging results on synthetic as well as real-world data. Overall, this suggests that the postulate of Independence of Cause and Mechanism is a promising principle for causal inference on empirical time series. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: This article is under review for a peer-reviewed conference

arXiv:1209.5549 [pdf, other]

Towards a learning-theoretic analysis of spike-timing dependent plasticity

Authors: David Balduzzi, Michel Besserve

Abstract: This paper suggests a learning-theoretic perspective on how synaptic plasticity benefits global brain functioning. We introduce a model, the selectron, that (i) arises as the fast time constant limit of leaky integrate-and-fire neurons equipped with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical analysis. We show that the selectron encodes reward estimates into spik… ▽ More This paper suggests a learning-theoretic perspective on how synaptic plasticity benefits global brain functioning. We introduce a model, the selectron, that (i) arises as the fast time constant limit of leaky integrate-and-fire neurons equipped with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical analysis. We show that the selectron encodes reward estimates into spikes and that an error bound on spikes is controlled by a spiking margin and the sum of synaptic weights. Moreover, the efficacy of spikes (their usefulness to other reward maximizing selectrons) also depends on total synaptic strength. Finally, based on our analysis, we propose a regularized version of STDP, and show the regularization improves the robustness of neuronal learning when faced with multiple stimuli. △ Less

Submitted 25 September, 2012; originally announced September 2012.

Comments: To appear in Adv. Neural Inf. Proc. Systems

arXiv:1202.4482 [pdf, other]

Metabolic cost as an organizing principle for cooperative learning

Authors: David Balduzzi, Pedro A Ortega, Michel Besserve

Abstract: This paper investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization… ▽ More This paper investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations. △ Less

Submitted 9 February, 2013; v1 submitted 20 February, 2012; originally announced February 2012.

Comments: 14 pages, 2 figures, to appear in Advances in Complex Systems

Showing 1–31 of 31 results for author: Besserve, M