Search | arXiv e-print repository

arXiv:2505.20028 [pdf, ps, other]

Capillary wave formation in conserved active emulsions

Authors: Florian Raßhofer, Simon Bauer, Alexander Ziepke, Ivan Maryshev, Erwin Frey

Abstract: The dynamics of phase-separated interfaces shape the behavior of both passive and active condensates. While surface tension in equilibrium systems minimizes interface length, non-equilibrium fluxes can destabilize flat or constantly curved interfaces, giving rise to complex interface morphologies. Starting from a minimal model that couples a conserved, phase-separating species to a self-generated… ▽ More The dynamics of phase-separated interfaces shape the behavior of both passive and active condensates. While surface tension in equilibrium systems minimizes interface length, non-equilibrium fluxes can destabilize flat or constantly curved interfaces, giving rise to complex interface morphologies. Starting from a minimal model that couples a conserved, phase-separating species to a self-generated chemical field, we identify the conditions under which interfacial instabilities may emerge. Specifically, we show that non-reciprocal chemotactic interactions induce two distinct types of instabilities: a stationary (non-oscillatory) instability that promotes interface deformations, and an oscillatory instability that can give rise to persistent capillary waves propagating along the boundaries of phase-separated domains. To characterize these phenomena, we develop a perturbative framework that predicts the onset, wavelength, and velocity of capillary waves, and quantitatively validate these predictions through numerical simulations. Beyond the linear regime, our simulations reveal that capillary waves undergo a secondary instability, leading to either stationary or dynamically evolving superpositions of different wave modes. Finally, we investigate whether capillary waves can facilitate directed mass transport, either along phase boundaries (conveyor belts) or through self-sustained liquid gears crawling along a solid wall. Taken together, our results establish a general framework for interfacial dynamics in active phase-separating systems and suggest new strategies for controlling mass transport in soft matter and biological condensates. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 57 pages, 24 figures

arXiv:2505.13375 [pdf, other]

Minimum-Excess-Work Guidance

Authors: Christopher Kolloff, Tobias Höppe, Emmanouil Angelis, Mathias Jacob Schreiner, Stefan Bauer, Andrea Dittadi, Simon Olsson

Abstract: We propose a regularization framework inspired by thermodynamic work for guiding pre-trained probability flow generative models (e.g., continuous normalizing flows or diffusion models) by minimizing excess work, a concept rooted in statistical mechanics and with strong conceptual connections to optimal transport. Our approach enables efficient guidance in sparse-data regimes common to scientific a… ▽ More We propose a regularization framework inspired by thermodynamic work for guiding pre-trained probability flow generative models (e.g., continuous normalizing flows or diffusion models) by minimizing excess work, a concept rooted in statistical mechanics and with strong conceptual connections to optimal transport. Our approach enables efficient guidance in sparse-data regimes common to scientific applications, where only limited target samples or partial density constraints are available. We introduce two strategies: Path Guidance for sampling rare transition states by concentrating probability mass on user-defined subsets, and Observable Guidance for aligning generated distributions with experimental observables while preserving entropy. We demonstrate the framework's versatility on a coarse-grained protein model, guiding it to sample transition configurations between folded/unfolded states and correct systematic biases using experimental data. The method bridges thermodynamic principles with modern generative architectures, offering a principled, efficient, and physics-inspired alternative to standard fine-tuning in data-scarce domains. Empirical results highlight improved sample efficiency and bias reduction, underscoring its applicability to molecular simulations and beyond. △ Less

Submitted 23 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

Comments: 30 pages, 18 figures

arXiv:2504.19621 [pdf, other]

AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis

Authors: Haroui Ma, Francesco Quinzan, Theresa Willem, Stefan Bauer

Abstract: Machine learning (ML) systems for medical imaging have demonstrated remarkable diagnostic capabilities, but their susceptibility to biases poses significant risks, since biases may negatively impact generalization performance. In this paper, we introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics. Our method l… ▽ More Machine learning (ML) systems for medical imaging have demonstrated remarkable diagnostic capabilities, but their susceptibility to biases poses significant risks, since biases may negatively impact generalization performance. In this paper, we introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics. Our method leverages the concept of counterfactual invariance, measuring the extent to which a model's predictions remain unchanged under hypothetical changes to sensitive attributes. We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases without requiring direct access to counterfactual data. Through experiments on synthetic datasets and large-scale real-world medical imaging datasets, including \textsc{cheXpert} and MIMIC-CXR, we demonstrate that our approach aligns closely with counterfactual fairness principles and outperforms standard baselines. This work provides a robust tool to ensure that ML diagnostic systems generalize well, e.g., across demographic groups, offering a critical step towards AI safety in healthcare. Code: https://github.com/Neferpitou3871/AI-Alignment-Medical-Imaging. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2504.01084 [pdf]

Surfactants Screen Slide Electrification

Authors: Xiaomei Li, Zhongyuan Ni, Xiaoteng Zhou, Lisa S. Bauer, Diego Diaz, Gabriele Schäfer, Hans-Jürgen Butt

Abstract: Water drops spontaneously accumulate charges when they move on hydrophobic dielectric surfaces by slide electrification. On the one hand, slide electrification generates electricity with possible applications on tiny devices. On the other hand, the potential of up to 1 KV generated by slide electrification alters wetting and drop motion. Therefore, it is important to know the factors that affect s… ▽ More Water drops spontaneously accumulate charges when they move on hydrophobic dielectric surfaces by slide electrification. On the one hand, slide electrification generates electricity with possible applications on tiny devices. On the other hand, the potential of up to 1 KV generated by slide electrification alters wetting and drop motion. Therefore, it is important to know the factors that affect slide electrification. To find out how surfactants affect slide electrification, we measured drop charges of aqueous drops containing cationic CTAB, anionic SDS and neutral C8E3 sliding on different hydrophobic surfaces. The result is: addition of surfactant significantly reduces the spontaneous charging of moving water drops. Based on zeta potential measurements, confocal microscopy of deposited surface-active dyes and drop impact studies, we propose that several factors contribute to this suppression of charge separation: (1) Surfactants tend to lower the contact angles, which reduces charge separation. (2) Surfactant adsorption at the solid-liquid interface can reduce the density of primary ions, particularly for anionic surfactants. (3) Anionic and neutral surfactants are mostly transferred to the liquid-air interface at the rear of the sliding drop, retaining primary ions within the drop. (4) Deposited cationic surfactant directly reduces the charge of the drop. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: 13 pages, 4 figures, 50 references

arXiv:2503.20027 [pdf]

A scalable gene network model of regulatory dynamics in single cells

Authors: Paul Bertin, Joseph D. Viviano, Alejandro Tejada-Lapuerta, Weixu Wang, Stefan Bauer, Fabian J. Theis, Yoshua Bengio

Abstract: Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene… ▽ More Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 42 pages, 10 figures

arXiv:2503.19554 [pdf, other]

Causal Bayesian Optimization with Unknown Graphs

Authors: Jean Durand, Yashas Annadani, Stefan Bauer, Sonali Parbhoo

Abstract: Causal Bayesian Optimization (CBO) is a methodology designed to optimize an outcome variable by leveraging known causal relationships through targeted interventions. Traditional CBO methods require a fully and accurately specified causal graph, which is a limitation in many real-world scenarios where such graphs are unknown. To address this, we propose a new method for the CBO framework that opera… ▽ More Causal Bayesian Optimization (CBO) is a methodology designed to optimize an outcome variable by leveraging known causal relationships through targeted interventions. Traditional CBO methods require a fully and accurately specified causal graph, which is a limitation in many real-world scenarios where such graphs are unknown. To address this, we propose a new method for the CBO framework that operates without prior knowledge of the causal graph. Consistent with causal bandit theory, we demonstrate through theoretical analysis and that focusing on the direct causal parents of the target variable is sufficient for optimization, and provide empirical validation in the context of CBO. Furthermore we introduce a new method that learns a Bayesian posterior over the direct parents of the target variable. This allows us to optimize the outcome variable while simultaneously learning the causal structure. Our contributions include a derivation of the closed-form posterior distribution for the linear case. In the nonlinear case where the posterior is not tractable, we present a Gaussian Process (GP) approximation that still enables CBO by inferring the parents of the outcome variable. The proposed method performs competitively with existing benchmarks and scales well to larger graphs, making it a practical tool for real-world applications where causal information is incomplete. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.17299 [pdf, other]

Preference-Guided Diffusion for Multi-Objective Offline Optimization

Authors: Yashas Annadani, Syrine Belakaria, Stefano Ermon, Stefan Bauer, Barbara E Engelhardt

Abstract: Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by leveraging a classifier-based guidance mechanism. Our guidance classifier is a preference model trained to predict the probability that one design dominates another… ▽ More Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by leveraging a classifier-based guidance mechanism. Our guidance classifier is a preference model trained to predict the probability that one design dominates another, directing the diffusion model toward optimal regions of the design space. Crucially, this preference model generalizes beyond the training distribution, enabling the discovery of Pareto-optimal solutions outside the observed dataset. We introduce a novel diversity-aware preference guidance, augmenting Pareto dominance preference with diversity criteria. This ensures that generated solutions are optimal and well-distributed across the objective space, a capability absent in prior generative methods for offline multi-objective optimization. We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/surrogate-based optimization methods. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions that approximate the Pareto front well. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.10845 [pdf, other]

Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation

Authors: Leonard Waldmann, Ando Shah, Yi Wang, Nils Lehmann, Adam J. Stewart, Zhitong Xiong, Xiao Xiang Zhu, Stefan Bauer, John Chuang

Abstract: Earth observation (EO) data features diverse sensing platforms with varying spectral bands, spatial resolutions, and sensing modalities. While most prior work has constrained inputs to fixed sensors, a new class of any-sensor foundation models able to process arbitrary sensors has recently emerged. Contributing to this line of work, we propose Panopticon, an any-sensor foundation model built on th… ▽ More Earth observation (EO) data features diverse sensing platforms with varying spectral bands, spatial resolutions, and sensing modalities. While most prior work has constrained inputs to fixed sensors, a new class of any-sensor foundation models able to process arbitrary sensors has recently emerged. Contributing to this line of work, we propose Panopticon, an any-sensor foundation model built on the DINOv2 framework. We extend DINOv2 by (1) treating images of the same geolocation across sensors as natural augmentations, (2) subsampling channels to diversify spectral input, and (3) adding a cross attention over channels as a flexible patch embedding mechanism. By encoding the wavelength and modes of optical and synthetic aperture radar sensors, respectively, Panopticon can effectively process any combination of arbitrary channels. In extensive evaluations, we achieve state-of-the-art performance on GEO-Bench, especially on the widely-used Sentinel-1 and Sentinel-2 sensors, while out-competing other any-sensor models, as well as domain adapted fixed-sensor models on unique sensor configurations. Panopticon enables immediate generalization to both existing and future satellite platforms, advancing sensor-agnostic EO. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: First two authors contributed equally. Code is available at: https://github.com/Panopticon-FM/panopticon

arXiv:2503.06985 [pdf, other]

Learning Decision Trees as Amortized Structure Inference

Authors: Mohammed Mahfoud, Ghait Boukachab, Michał Koziarski, Alex Hernandez-Garcia, Stefan Bauer, Yoshua Bengio, Nikolay Malkin

Abstract: Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translating to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while… ▽ More Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translating to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while deep learning inductive biases expect a temporal or spatial structure not naturally present in tabular data. We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data, formulating decision tree construction as a sequential planning problem. We train a deep reinforcement learning (GFlowNet) policy to solve this problem, yielding a generative model that samples decision trees from the Bayesian posterior. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks derived from real-world data, robustness to distribution shifts, and anomaly detection, all while yielding interpretable models with shorter description lengths. Samples from the trained DT-GFN model can be ensembled to construct a random forest, and we further show that the performance of scales consistently in ensemble size, yielding ensembles of predictors that continue to generalize systematically. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: Code: $\href{https://github.com/GFNOrg/dt-gfn}{https://github.com/GFNOrg/dt-gfn}$

arXiv:2503.04188 [pdf, other]

Measuring temporal effects of agent knowledge by date-controlled tool use

Authors: R. Patrick Xian, Qiming Cui, Stefan Bauer, Reza Abbasi-Asl

Abstract: Temporal progression is an integral part of knowledge accumulation and update. Web search is frequently adopted as grounding for agent knowledge, yet an improper configuration affects the quality of the agent's responses. Here, we assess the agent behavior using distinct date-controlled tools (DCTs) as stress test to measure the knowledge variability of large language model (LLM) agents. We demons… ▽ More Temporal progression is an integral part of knowledge accumulation and update. Web search is frequently adopted as grounding for agent knowledge, yet an improper configuration affects the quality of the agent's responses. Here, we assess the agent behavior using distinct date-controlled tools (DCTs) as stress test to measure the knowledge variability of large language model (LLM) agents. We demonstrate the temporal effects of an LLM agent as a writing assistant, which uses web search to complete scientific publication abstracts. We show that the temporality of search engine translates into tool-dependent agent performance but can be alleviated with base model choice and explicit reasoning instructions such as chain-of-thought prompting. Our results indicate that agent design and evaluations should take a dynamical view and implement measures to account for the temporal influence of external resources to ensure reliability. △ Less

Submitted 3 April, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

Comments: under review, comments welcome

arXiv:2502.10374 [pdf, other]

Robustness tests for biomedical foundation models should tailor to specification

Authors: R. Patrick Xian, Noah R. Baker, Tom David, Qiming Cui, A. Jay Holmgren, Stefan Bauer, Madhumita Sushil, Reza Abbasi-Asl

Abstract: Existing regulatory frameworks for biomedical AI include robustness as a key component but lack detailed implementational guidance. The recent rise of biomedical foundation models creates new hurdles in testing and certification given their broad capabilities and susceptibility to complex distribution shifts. To balance test feasibility and effectiveness, we suggest a priority-based, task-oriented… ▽ More Existing regulatory frameworks for biomedical AI include robustness as a key component but lack detailed implementational guidance. The recent rise of biomedical foundation models creates new hurdles in testing and certification given their broad capabilities and susceptibility to complex distribution shifts. To balance test feasibility and effectiveness, we suggest a priority-based, task-oriented approach to tailor robustness evaluation objectives to a predefined specification. We urge concrete policies to adopt a granular categorization of robustness concepts in the specification. Our approach promotes the standardization of risk assessment and monitoring, which guides technical developments and mitigation efforts. △ Less

Submitted 14 February, 2025; originally announced February 2025.

Comments: under review, comments welcome

arXiv:2412.05081 [pdf, other]

Spinal ligaments detection on vertebrae meshes using registration and 3D edge detection

Authors: Ivanna Kramer, Lara Blomenkamp, Kevin Weirauch, Sabine Bauer, Dietrich Paulus

Abstract: Spinal ligaments are crucial elements in the complex biomechanical simulation models as they transfer forces on the bony structure, guide and limit movements and stabilize the spine. The spinal ligaments encompass seven major groups being responsible for maintaining functional interrelationships among the other spinal components. Determination of the ligament origin and insertion points on the 3D… ▽ More Spinal ligaments are crucial elements in the complex biomechanical simulation models as they transfer forces on the bony structure, guide and limit movements and stabilize the spine. The spinal ligaments encompass seven major groups being responsible for maintaining functional interrelationships among the other spinal components. Determination of the ligament origin and insertion points on the 3D vertebrae models is an essential step in building accurate and complex spine biomechanical models. In our paper, we propose a pipeline that is able to detect 66 spinal ligament attachment points by using a step-wise approach. Our method incorporates a fast vertebra registration that strategically extracts only 15 3D points to compute the transformation, and edge detection for a precise projection of the registered ligaments onto any given patient-specific vertebra model. Our method shows high accuracy, particularly in identifying landmarks on the anterior part of the vertebra with an average distance of 2.24 mm for anterior longitudinal ligament and 1.26 mm for posterior longitudinal ligament landmarks. The landmark detection requires approximately 3.0 seconds per vertebra, providing a substantial improvement over existing methods. Clinical relevance: using the proposed method, the required landmarks that represent origin and insertion points for forces in the biomechanical spine models can be localized automatically in an accurate and time-efficient manner. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2412.05065 [pdf, other]

Reconstruction of 3D lumbar spine models from incomplete segmentations using landmark detection

Authors: Lara Blomenkamp, Ivanna Kramer, Sabine Bauer, Kevin Weirauch, Dietrich Paulus

Abstract: Patient-specific 3D spine models serve as a foundation for spinal treatment and surgery planning as well as analysis of loading conditions in biomechanical and biomedical research. Despite advancements in imaging technologies, the reconstruction of complete 3D spine models often faces challenges due to limitations in imaging modalities such as planar X-Ray and missing certain spinal structures, su… ▽ More Patient-specific 3D spine models serve as a foundation for spinal treatment and surgery planning as well as analysis of loading conditions in biomechanical and biomedical research. Despite advancements in imaging technologies, the reconstruction of complete 3D spine models often faces challenges due to limitations in imaging modalities such as planar X-Ray and missing certain spinal structures, such as the spinal or transverse processes, in volumetric medical images and resulting segmentations. In this study, we present a novel accurate and time-efficient method to reconstruct complete 3D lumbar spine models from incomplete 3D vertebral bodies obtained from segmented magnetic resonance images (MRI). In our method, we use an affine transformation to align artificial vertebra models with patient-specific incomplete vertebrae. The transformation matrix is derived from vertebra landmarks, which are automatically detected on the vertebra endplates. The results of our evaluation demonstrate the high accuracy of the performed registration, achieving an average point-to-model distance of 1.95 mm. Additionally, in assessing the morphological properties of the vertebrae and intervertebral characteristics, our method demonstrated a mean absolute error (MAE) of 3.4° in the angles of functional spine units (FSUs), emphasizing its effectiveness in maintaining important spinal features throughout the transformation process of individual vertebrae. Our method achieves the registration of the entire lumbar spine, spanning segments L1 to L5, in just 0.14 seconds, showcasing its time-efficiency. Clinical relevance: the fast and accurate reconstruction of spinal models from incomplete input data such as segmentations provides a foundation for many applications in spine diagnostics, treatment planning, and the development of spinal healthcare solutions. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2410.08770 [pdf, other]

doi 10.1038/s41591-024-02902-1

Causal machine learning for predicting treatment outcomes

Authors: Stefan Feuerriegel, Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal, Konstantin Hess, Alicia Curth, Stefan Bauer, Niki Kilbertus, Isaac S. Kohane, Mihaela van der Schaar

Abstract: Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combinat… ▽ More Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: Accepted version; not Version of Record

Journal ref: Nature Medicine, vol. 30, pp. 958-968 (2024)

arXiv:2409.17775 [pdf, other]

UNICORN: A Deep Learning Model for Integrating Multi-Stain Data in Histopathology

Authors: Valentin Koch, Sabine Bauer, Valerio Luppberger, Michael Joner, Heribert Schunkert, Julia A. Schnabel, Moritz von Scheidt, Carsten Marr

Abstract: Background: The integration of multi-stain histopathology images through deep learning poses a significant challenge in digital histopathology. Current multi-modal approaches struggle with data heterogeneity and missing data. This study aims to overcome these limitations by developing a novel transformer model for multi-stain integration that can handle missing data during training as well as infe… ▽ More Background: The integration of multi-stain histopathology images through deep learning poses a significant challenge in digital histopathology. Current multi-modal approaches struggle with data heterogeneity and missing data. This study aims to overcome these limitations by developing a novel transformer model for multi-stain integration that can handle missing data during training as well as inference. Methods: We propose UNICORN (UNiversal modality Integration Network for CORonary classificatioN) a multi-modal transformer capable of processing multi-stain histopathology for atherosclerosis severity class prediction. The architecture comprises a two-stage, end-to-end trainable model with specialized modules utilizing transformer self-attention blocks. The initial stage employs domain-specific expert modules to extract features from each modality. In the subsequent stage, an aggregation expert module integrates these features by learning the interactions between the different data modalities. Results: Evaluation was performed using a multi-class dataset of atherosclerotic lesions from the Munich Cardiovascular Studies Biobank (MISSION), using over 4,000 paired multi-stain whole slide images (WSIs) from 170 deceased individuals on 7 prespecified segments of the coronary tree, each stained according to four histopathological protocols. UNICORN achieved a classification accuracy of 0.67, outperforming other state-of-the-art models. The model effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression. Conclusion: Our proposed multi-modal transformer model addresses key challenges in medical data analysis, including data heterogeneity and missing modalities. Explainability and the model's effectiveness in predicting atherosclerosis progression underscores its potential for broader applications in medical research. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2407.15589 [pdf, other]

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Authors: Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl Henrik Johansson, Stefan Bauer, Andrea Dittadi

Abstract: Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning. However, these claims have yet to be thoroughly validated empirically. Recently, foundation models have demonstrated unparalleled capabilities across diverse domains… ▽ More Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning. However, these claims have yet to be thoroughly validated empirically. Recently, foundation models have demonstrated unparalleled capabilities across diverse domains, from language to computer vision, positioning them as a potential cornerstone of future research for a wide range of computational tasks. In this paper, we conduct an extensive empirical study on representation learning for downstream Visual Question Answering (VQA), which requires an accurate compositional understanding of the scene. We thoroughly investigate the benefits and trade-offs of OC models and alternative approaches including large pre-trained foundation models on both synthetic and real-world data, ultimately identifying a promising path to leverage the strengths of both paradigms. The extensiveness of our study, encompassing over 600 downstream VQA models and 15 different types of upstream representations, also provides several additional insights that we believe will be of interest to the community at large. △ Less

Submitted 3 March, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: Published at ICLR 2025

arXiv:2407.14930 [pdf, other]

doi 10.1117/12.3017699

Athermal package for OH suppression filters in astronomy part 1: design

Authors: Carlos Enrique Rordriguez Alvarez, Aashia Rahman, Hakan Önel, Frank Dionies, Jens Paschke, Svend-Marian Bauer

Abstract: We present the design of an athermal package for fiber Bragg grating (FBG)filters fabricated at our Institute for use in ground-based near-infrared (NIR) telescopes. Aperiodic multichannel FBG filters combined with photonic lanterns can effectively filter out extremely bright atmospheric hydroxyl (OH) emission lines that severely hinder ground-based NIR observations. While FBGs have the capability… ▽ More We present the design of an athermal package for fiber Bragg grating (FBG)filters fabricated at our Institute for use in ground-based near-infrared (NIR) telescopes. Aperiodic multichannel FBG filters combined with photonic lanterns can effectively filter out extremely bright atmospheric hydroxyl (OH) emission lines that severely hinder ground-based NIR observations. While FBGs have the capability of filtering specific wavelengths with high precision, due to their sensitivity to temperature variations, the success in their performance as OH suppression filters depends on a suitable athermal package that can maintain the deviations of the FBG wavelengths from that of the OH emission lines within sub-picometer accuracy over a temperature range of about 40 K. (i.e. 263 K to 303 K). We aim to develop an athermal package over the aforementioned temperature range for an optical fiber consisting of multichannel FBGs for a maximum filter length of 110 mm. In this work, we demonstrate the complete design methodology of such a package. First, we developed a custom-built test rig to study a wide range of critical physical properties of the fiber, such as strain and temperature sensitivities, elastic modulus, optimum fiber pre-tension, and adhesion performance.Next, we used these data to confirm the athermal response of an FBG bonded on the test rig from room temperature to 313 K. Based on this study, we developed a computer-aided design (CAD) model of the package and analyzed its athermal characteristics with a suitable selection of materials and their nominal dimensions using finite element analysis (FEA). We finally discuss the novel aspects of the design to achieve high-precision thermal stabilization of these filters in the temperature range of interest. △ Less

Submitted 20 July, 2024; originally announced July 2024.

Comments: 13 Pages, 6 figures, 2 Tables, SPIE Astronomical Telescopes + Instrumentation

arXiv:2407.14601 [pdf, other]

ANDES, the high resolution spectrograph for the ELT: science goals, project overview and future developments

Authors: A. Marconi, M. Abreu, V. Adibekyan, V. Alberti, S. Albrecht, J. Alcaniz, M. Aliverti, C. Allende Prieto, J. D. Alvarado Gómez, C. S. Alves, P. J. Amado, M. Amate, M. I. Andersen, S. Antoniucci, E. Artigau, C. Bailet, C. Baker, V. Baldini, A. Balestra, S. A. Barnes, F. Baron, S. C. C. Barros, S. M. Bauer, M. Beaulieu, O. Bellido-Tirado , et al. (264 additional authors not shown)

Abstract: The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently christened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of $\sim$100,000 with a minimum simultaneous wavelength coverage of 0.4-1.8 $μ$m with the goal of ex… ▽ More The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently christened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of $\sim$100,000 with a minimum simultaneous wavelength coverage of 0.4-1.8 $μ$m with the goal of extending it to 0.35-2.4 $μ$m with the addition of a U arm to the BV spectrograph and a separate K band spectrograph. It operates both in seeing- and diffraction-limited conditions and the fibre feeding allows several, interchangeable observing modes including a single conjugated adaptive optics module and a small diffraction-limited integral field unit in the NIR. Modularity and fibre-feeding allow ANDES to be placed partly on the ELT Nasmyth platform and partly in the Coudé room. ANDES has a wide range of groundbreaking science cases spanning nearly all areas of research in astrophysics and even fundamental physics. Among the top science cases, there are the detection of biosignatures from exoplanet atmospheres, finding the fingerprints of the first generation of stars, tests on the stability of Nature's fundamental couplings, and the direct detection of the cosmic acceleration. The ANDES project is carried forward by a large international consortium, composed of 35 Institutes from 13 countries, forming a team of almost 300 scientists and engineers which include the majority of the scientific and technical expertise in the field that can be found in ESO member states. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: SPIE astronomical telescope and instrumentation 2024, in press

arXiv:2406.18317 [pdf]

ANDES, the high-resolution spectrograph for the ELT: RIZ Spectrograph preliminary design

Authors: Bruno Chazelas, Yevgeniy Ivanisenko, Audrey Lanotte, Pablo Santos Diaz, Ludovic Genolet, Michael Sordet, Ian Hughes, Christophe Lovis, Tobias M. Schmidt, Manuel Amate, José Peñate Castro, Afrodisio Vega Moreno, Fabio Tenegi, Roberto Simoes, Jonay I. González Hernández, María Rosa Zapatero Osorio, Javier Piqueras, Tomás Belenguer Dávila, Rocío Calvo Ortega, Roberto Varas González, Luis Miguel González Fernández, Pedro J. Amado, Jonathan Kern, Frank Dionies, Svend-Marian Bauer , et al. (22 additional authors not shown)

Abstract: We present here the preliminary design of the RIZ module, one of the visible spectrographs of the ANDES instrument 1. It is a fiber-fed high-resolution, high-stability spectrograph. Its design follows the guidelines of successful predecessors such as HARPS and ESPRESSO. In this paper we present the status of the spectrograph at the preliminary design stage. The spectrograph will be a warm, vacuum-… ▽ More We present here the preliminary design of the RIZ module, one of the visible spectrographs of the ANDES instrument 1. It is a fiber-fed high-resolution, high-stability spectrograph. Its design follows the guidelines of successful predecessors such as HARPS and ESPRESSO. In this paper we present the status of the spectrograph at the preliminary design stage. The spectrograph will be a warm, vacuum-operated, thermally controlled and fiber-fed echelle spectrograph. Following the phase A design, the huge etendue of the telescope will be reformed in the instrument with a long slit made of smaller fibers. We discuss the system design of the spectrographs system. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Paper submitted to the SPIE astronomical telescope and instrumentation 2024, conference title : Ground-based and Airborne Instrumentation for Astronomy X, paper reference number : 13096-171

arXiv:2406.03209 [pdf, other]

Challenges and Considerations in the Evaluation of Bayesian Causal Discovery

Authors: Amir Mohammad Karimi Mamaghan, Panagiotis Tigas, Karl Henrik Johansson, Yarin Gal, Yashas Annadani, Stefan Bauer

Abstract: Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Bayesian Causal Discovery (BCD) offers a principled approach to encapsulating this uncertainty. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, evaluating BCD presents… ▽ More Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Bayesian Causal Discovery (BCD) offers a principled approach to encapsulating this uncertainty. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, evaluating BCD presents challenges due to the nature of its inferred quantity - the posterior distribution. As a result, the research community has proposed various metrics to assess the quality of the approximate posterior. However, there is, to date, no consensus on the most suitable metric(s) for evaluation. In this work, we reexamine this question by dissecting various metrics and understanding their limitations. Through extensive empirical evaluation, we find that many existing metrics fail to exhibit a strong correlation with the quality of approximation to the true posterior, especially in scenarios with low sample sizes where BCD is most desirable. We highlight the suitability (or lack thereof) of these metrics under two distinct factors: the identifiability of the underlying causal model and the quantity of available data. Both factors affect the entropy of the true posterior, indicating that the current metrics are less fitting in settings of higher entropy. Our findings underline the importance of a more nuanced evaluation of new methods by taking into account the nature of the true posterior, as well as guide and motivate the development of new evaluation procedures for this challenge. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.16718 [pdf, other]

Amortized Active Causal Induction with Deep Reinforcement Learning

Authors: Yashas Annadani, Panagiotis Tigas, Stefan Bauer, Adam Foster

Abstract: We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close th… ▽ More We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close the true causal graph is to a causal graph posterior inferred from the gathered data. On synthetic data and a single-cell gene expression simulator, we demonstrate empirically that the data acquired through our policy results in a better estimate of the underlying causal graph than alternative strategies. Our design policy successfully achieves amortized intervention design on the distribution of the training environment while also generalizing well to distribution shifts in test-time design environments. Further, our policy also demonstrates excellent zero-shot generalization to design environments with dimensionality higher than that during training, and to intervention types that it has not been trained on. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.04161 [pdf, other]

Decoding complexity: how machine learning is redefining scientific discovery

Authors: Ricardo Vinuesa, Paola Cinnella, Jean Rabault, Hossein Azizpour, Stefan Bauer, Bingni W. Brunton, Arne Elofsson, Elias Jarlebring, Hedvig Kjellstrom, Stefano Markidis, David Marlevi, Javier Garcia-Martinez, Steven L. Brunton

Abstract: As modern scientific instruments generate vast amounts of data and the volume of information in the scientific literature continues to grow, machine learning (ML) has become an essential tool for organising, analysing, and interpreting these complex datasets. This paper explores the transformative role of ML in accelerating breakthroughs across a range of scientific disciplines. By presenting key… ▽ More As modern scientific instruments generate vast amounts of data and the volume of information in the scientific literature continues to grow, machine learning (ML) has become an essential tool for organising, analysing, and interpreting these complex datasets. This paper explores the transformative role of ML in accelerating breakthroughs across a range of scientific disciplines. By presenting key examples -- such as brain mapping and exoplanet detection -- we demonstrate how ML is reshaping scientific research. We also explore different scenarios where different levels of knowledge of the underlying phenomenon are available, identifying strategies to overcome limitations and unlock the full potential of ML. Despite its advances, the growing reliance on ML poses challenges for research applications and rigorous validation of discoveries. We argue that even with these challenges, ML is poised to disrupt traditional methodologies and advance the boundaries of knowledge by enabling researchers to tackle increasingly complex problems. Thus, the scientific community can move beyond the necessary traditional oversimplifications to embrace the full complexity of natural systems, ultimately paving the way for interdisciplinary breakthroughs and innovative solutions to humanity's most pressing challenges. △ Less

Submitted 25 April, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.04062 [pdf, other]

Derivative-free tree optimization for complex systems

Authors: Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung

Abstract: A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 d… ▽ More A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 dimensions. Here, we present a tree search method for derivative-free optimization that enables accelerated optimal design of high-dimensional complex systems. Specifically, we introduce stochastic tree expansion, dynamic upper confidence bound, and short-range backpropagation mechanism to evade local optimum, iteratively approximating the global optimum using machine learning models. This development effectively confronts the dimensionally challenging problems, achieving convergence to global optima across various benchmark functions up to 2,000 dimensions, surpassing the existing methods by 10- to 20-fold. Our method demonstrates wide applicability to a wide range of real-world complex systems spanning materials, physics, and biology, considerably outperforming state-of-the-art algorithms. This enables efficient autonomous knowledge discovery and facilitates self-driving virtual laboratories. Although we focus on problems within the realm of natural science, the advancements in optimization techniques achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 39 pages, 3 figures

arXiv:2402.10932 [pdf]

Roadmap on Data-Centric Materials Science

Authors: Stefan Bauer, Peter Benner, Tristan Bereau, Volker Blum, Mario Boley, Christian Carbogno, C. Richard A. Catlow, Gerhard Dehm, Sebastian Eibl, Ralph Ernstorfer, Ádám Fekete, Lucas Foppa, Peter Fratzl, Christoph Freysoldt, Baptiste Gault, Luca M. Ghiringhelli, Sajal K. Giri, Anton Gladyshev, Pawan Goyal, Jason Hattrick-Simpers, Lara Kabalan, Petr Karpov, Mohammad S. Khorrami, Christoph Koch, Sebastian Kokott , et al. (36 additional authors not shown)

Abstract: Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) a… ▽ More Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) and its subset Machine Learning (ML), has become pivotal in addressing all these challenges. This Roadmap on Data-Centric Materials Science explores fundamental concepts and methodologies, illustrating diverse applications in electronic-structure theory, soft matter theory, microstructure research, and experimental techniques like photoemission, atom probe tomography, and electron microscopy. While the roadmap delves into specific areas within the broad interdisciplinary field of materials science, the provided examples elucidate key concepts applicable to a wider range of topics. The discussed instances offer insights into addressing the multifaceted challenges encountered in contemporary materials research. △ Less

Submitted 1 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Review, outlook, roadmap, perspective

arXiv:2402.06665 [pdf, other]

The Essential Role of Causality in Foundation World Models for Embodied AI

Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research. △ Less

Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01462 [pdf, other]

3D Vertebrae Measurements: Assessing Vertebral Dimensions in Human Spine Mesh Models Using Local Anatomical Vertebral Axes

Authors: Ivanna Kramer, Vinzent Rittel, Lara Blomenkamp, Sabine Bauer, Dietrich Paulus

Abstract: Vertebral morphological measurements are important across various disciplines, including spinal biomechanics and clinical applications, pre- and post-operatively. These measurements also play a crucial role in anthropological longitudinal studies, where spinal metrics are repeatedly documented over extended periods. Traditionally, such measurements have been manually conducted, a process that is t… ▽ More Vertebral morphological measurements are important across various disciplines, including spinal biomechanics and clinical applications, pre- and post-operatively. These measurements also play a crucial role in anthropological longitudinal studies, where spinal metrics are repeatedly documented over extended periods. Traditionally, such measurements have been manually conducted, a process that is time-consuming. In this study, we introduce a novel, fully automated method for measuring vertebral morphology using 3D meshes of lumbar and thoracic spine models.Our experimental results demonstrate the method's capability to accurately measure low-resolution patient-specific vertebral meshes with mean absolute error (MAE) of 1.09 mm and those derived from artificially created lumbar spines, where the average MAE value was 0.7 mm. Our qualitative analysis indicates that measurements obtained using our method on 3D spine models can be accurately reprojected back onto the original medical images if these images are available. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.09558 [pdf]

Molecular causality in the advent of foundation models

Authors: Sebastian Lobentanzer, Pablo Rodriguez-Mier, Stefan Bauer, Julio Saez-Rodriguez

Abstract: Correlation is not causation. As simple as this widely agreed-upon statement may seem, scientifically defining causality and using it to drive our modern biomedical research is immensely challenging. In this perspective, we attempt to synergise the partly disparate fields of systems biology, causal reasoning, and machine learning, to inform future approaches in the field of systems biology and mol… ▽ More Correlation is not causation. As simple as this widely agreed-upon statement may seem, scientifically defining causality and using it to drive our modern biomedical research is immensely challenging. In this perspective, we attempt to synergise the partly disparate fields of systems biology, causal reasoning, and machine learning, to inform future approaches in the field of systems biology and molecular networks. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 22 pages, 0 figures, 87 references; submitted to MSB

arXiv:2312.04064 [pdf, other]

DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design

Authors: Clare Lyle, Arash Mehrjou, Pascal Notin, Andrew Jesson, Stefan Bauer, Yarin Gal, Patrick Schwab

Abstract: The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interv… ▽ More The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanisms. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interventions that maximally change a target phenotype via diverse mechanisms. We propose DiscoBAX, a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. We provide theoretical guarantees of approximate optimality under standard assumptions, and conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. DiscoBAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Journal ref: International Conference on Machine Learning, 2023

arXiv:2311.06012 [pdf, other]

Doubly Robust Structure Identification from Temporal Data

Authors: Emmanouil Angelis, Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Stefan Bauer

Abstract: Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantia… ▽ More Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications. Moreover, existing algorithms often do not work with cyclic data. To address these challenges, we propose a new doubly robust method for Structure Identification from Temporal Data ( SITD ). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles and they may be confounded. We further perform extensive experiments to showcase the superior performance of our method. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.05421 [pdf, other]

Diffusion Based Causal Representation Learning

Authors: Amir Mohammad Karimi Mamaghan, Andrea Dittadi, Stefan Bauer, Karl Henrik Johansson, Francesco Quinzan

Abstract: Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, learning causal representations remains a major challenge, due to the complexity of many real-world systems. Previous works on causal representation learning have m… ▽ More Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, learning causal representations remains a major challenge, due to the complexity of many real-world systems. Previous works on causal representation learning have mostly focused on Variational Auto-Encoders (VAE). These methods only provide representations from a point estimate, and they are unsuitable to handle high dimensions. To overcome these problems, we proposed a new Diffusion-based Causal Representation Learning (DCRL) algorithm. This algorithm uses diffusion-based representations for causal discovery. DCRL offers access to infinite dimensional latent codes, which encode different levels of information in the latent code. In a first proof of principle, we investigate the use of DCRL for causal representation learning. We further demonstrate experimentally that this approach performs comparably well in identifying the causal structure and causal variables. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.20647 [pdf, other]

Triggered telecom C-band single-photon source with high brightness, high indistinguishability and sub-GHz spectral linewidth

Authors: Raphael Joos, Stephanie Bauer, Christian Rupp, Sascha Kolatschek, Wolfgang Fischer, Cornelius Nawrath, Ponraj Vijayan, Robert Sittig, Michael Jetter, Simone L. Portalupi, Peter Michler

Abstract: Long-range, terrestrial quantum networks will require high brightness single-photon sources emitting in the telecom C-band for maximum transmission rate. Many applications additionally demand triggered operation with high indistinguishability and narrow spectral linewidth. This would enable the efficient implementation of photonic gate operations and photon storage in quantum memories, as for inst… ▽ More Long-range, terrestrial quantum networks will require high brightness single-photon sources emitting in the telecom C-band for maximum transmission rate. Many applications additionally demand triggered operation with high indistinguishability and narrow spectral linewidth. This would enable the efficient implementation of photonic gate operations and photon storage in quantum memories, as for instance required for a quantum repeater. Especially, semiconductor quantum dots (QDs) have shown these properties in the near-infrared regime. However, the simultaneous demonstration of all these properties in the telecom C-band has been elusive. Here, we present a coherently (incoherently) optically-pumped narrow-band (0.8 GHz) triggered single-photon source in the telecom C-band. The source shows simultaneously high single-photon purity with $g^{(2)}(0) = 0.026$ ($g^{(2)}(0) = 0.014$), high two-photon interference visibility of 0.508 (0.664) and high application-ready rates of 0.75 MHz (1.45 MHz) of polarized photons. The source is based on a QD coupled to a circular Bragg grating cavity combined with spectral filtering. Coherent (incoherent) operation is performed via the novel SUPER scheme (phonon-assisted excitation). △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.14935 [pdf]

Causal machine learning for single-cell genomics

Authors: Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis

Abstract: Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the ca… ▽ More Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes such as gene regulation, disease progression or cellular development. However, the high-dimensional nature of the data, coupled with the intricate complexity of biological systems renders this task nontrivial. Within the machine learning community, there has been a recent increase of interest in causality, with a focus on adapting established causal techniques and algorithms to handle high-dimensional data. In this perspective, we delineate the application of these methodologies within the realm of single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology and discuss and challenge the assumptions it entails from the biological point of view. We then identify open problems in the application of causal approaches to single-cell data: generalising to unseen environments, learning interpretable models, and learning causal models of dynamics. For each problem, we discuss how various research directions - including the development of computational approaches and the adaptation of experimental protocols - may offer ways forward, or on the contrary pose some difficulties. With the advent of single cell atlases and increasing perturbation data, we expect causal models to become a crucial tool for informed experimental design. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 35 pages, 7 figures, 3 tables, 1 box

arXiv:2310.11899 [pdf, other]

Highly indistinguishable single photons from droplet-etched GaAs quantum dots integrated in single-mode waveguides and beamsplitters

Authors: Florian Hornung, Ulrich Pfister, Stephanie Bauer, Dee Rocking Cyrlyson's, Dongze Wang, Ponraj Vijayan, Ailton J. Garcia Jr, Saimon Filipe Covre da Silva, Michael Jetter, Simone L. Portalupi, Armando Rastelli, Peter Michler

Abstract: The integration of on-demand quantum emitters into photonic integrated circuits (PICs) has drawn much of attention in recent years, as it promises a scalable implementation of quantum information schemes. A central property for several applications is the indistinguishability of the emitted photons. In this regard, GaAs quantum dots (QDs) obtained by droplet etching epitaxy show excellent performa… ▽ More The integration of on-demand quantum emitters into photonic integrated circuits (PICs) has drawn much of attention in recent years, as it promises a scalable implementation of quantum information schemes. A central property for several applications is the indistinguishability of the emitted photons. In this regard, GaAs quantum dots (QDs) obtained by droplet etching epitaxy show excellent performances with visibilities close to one for both individual and remote emitters. Therefore, the realization of these QDs into PICs is highly appealing. Here, we show the first implementation in this direction, realizing the key passive elements needed in PICs, i.e. single-mode waveguides (WGs) with integrated GaAs-QDs, which can be coherently controlled, as well as beamsplitters. We study both the statistical distribution of wavelength, linewidth and decay times of the excitonic line of multiple QDs, as well as the quantum optical properties of individual emitters under resonant excitation. Here, we achieve single-photon purities as high as $1-\text{g}^{(2)}(0)=0.929\pm0.009$ as well as two-photon interference visibilities of up to V$_{\text{TPI}}=0.939\pm0.004$ for two consecutively emitted photons. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.07434 [pdf, other]

HealthWalk: Promoting Health and Mobility through Sensor-Based Rollator Walker Assistance

Authors: Ivanna Kramer, Kevin Weirauch, Sabine Bauer, Mark Oliver Mints, Peer Neubert

Abstract: Rollator walkers allow people with physical limitations to increase their mobility and give them the confidence and independence to participate in society for longer. However, rollator walker users often have poor posture, leading to further health problems and, in the worst case, falls. Integrating sensors into rollator walker designs can help to address this problem and results in a platform tha… ▽ More Rollator walkers allow people with physical limitations to increase their mobility and give them the confidence and independence to participate in society for longer. However, rollator walker users often have poor posture, leading to further health problems and, in the worst case, falls. Integrating sensors into rollator walker designs can help to address this problem and results in a platform that allows several other interesting use cases. This paper briefly overviews existing systems and the current research directions and challenges in this field. We also present our early HealthWalk rollator walker prototype for data collection with older people, rheumatism, multiple sclerosis and Parkinson patients, and individuals with visual impairments. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.10516 [pdf, other]

doi 10.1145/3606464.3606474

Evaluating the Benefits: Quantifying the Effects of TCP Options, QUIC, and CDNs on Throughput

Authors: Simon Bauer, Patrick Sattler, Johannes Zirngibl, Christoph Schwarzenberg, Georg Carle

Abstract: To keep up with increasing demands on quality of experience, assessing and understanding the performance of network connections is crucial for web service providers. While different measures, like TCP options, alternative transport layer protocols like QUIC, or the hosting of services in CDNs, are expected to improve connection performance, no studies are quantifying such impacts on connections on… ▽ More To keep up with increasing demands on quality of experience, assessing and understanding the performance of network connections is crucial for web service providers. While different measures, like TCP options, alternative transport layer protocols like QUIC, or the hosting of services in CDNs, are expected to improve connection performance, no studies are quantifying such impacts on connections on the Internet. This paper introduces an active Internet measurement approach to assess the impacts of mentioned measures on connection performance. We conduct downloads from public web servers considering different vantage points, extract performance indicators like throughput, RTT, and retransmission rate, and survey speed-ups due to TCP option usage. Further, we compare the performance of QUIC-based downloads to TCP-based downloads considering different option configurations. Next to significant throughput improvements due to TCP option usage, in particular TCP window scaling, and QUIC, our study shows significantly increased performance for connections to domains hosted by different giant CDNs. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Presented at the ACM/IRTF Applied Networking Research Workshop 2023 (ANRW23)

arXiv:2308.15922 [pdf, ps, other]

doi 10.1038/s41377-024-01488-0

High-rate intercity quantum key distribution with a semiconductor single-photon source

Authors: Jingzhong Yang, Zenghui Jiang, Frederik Benthin, Joscha Hanel, Tom Fandrich, Raphael Joos, Stephanie Bauer, Sascha Kolatschek, Ali Hreibi, Eddy Patrick Rugeramigabo, Michael Jetter, Simone Luca Portalupi, Michael Zopf, Peter Michler, Stefan Kück, Fei Ding

Abstract: Quantum key distribution (QKD) enables the transmission of information that is secure against general attacks by eavesdroppers. The use of on-demand quantum light sources in QKD protocols is expected to help improve security and maximum tolerable loss. Semiconductor quantum dots (QDs) are a promising building block for quantum communication applications because of the deterministic emission of sin… ▽ More Quantum key distribution (QKD) enables the transmission of information that is secure against general attacks by eavesdroppers. The use of on-demand quantum light sources in QKD protocols is expected to help improve security and maximum tolerable loss. Semiconductor quantum dots (QDs) are a promising building block for quantum communication applications because of the deterministic emission of single photons with high brightness and low multiphoton contribution. Here we report on the first intercity QKD experiment using a bright deterministic single photon source. A BB84 protocol based on polarisation encoding is realised using the high-rate single photons in the telecommunication C-band emitted from a semiconductor QD embedded in a circular Bragg grating structure. Utilising the 79 km long link with 25.49 dB loss (equivalent to 130 km for the direct-connected optical fibre) between the German cities of Hannover and Braunschweig, a record-high secret key bits per pulse of 4.8 * 10^{-5} with an average quantum bit error ratio of ~ 0.65 % are demonstrated. An asymptotic maximum tolerable loss of 28.11 dB is found, corresponding to a length of 144 km of standard telecommunication fibre. Deterministic semiconductor sources therefore challenge state-of-the-art QKD protocols and have the potential to excel in measurement device independent protocols and quantum repeater applications. △ Less

Submitted 2 July, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 10 pages, 4 figures

Journal ref: Light Sci Appl 13, 150 (2024)

arXiv:2308.07741 [pdf, other]

Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

Authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabas Gavin Cangan, Bernhard Schölkopf, Georg Martius

Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore… ▽ More Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation. In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, grasping, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike. In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets. △ Less

Submitted 24 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Typo in author list fixed

arXiv:2307.15690 [pdf, other]

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius

Abstract: Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to… ▽ More Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: The Eleventh International Conference on Learning Representations. 2022. Published at ICLR 2023. Datasets available at https://github.com/rr-learning/trifinger_rl_datasets

arXiv:2307.13917 [pdf, other]

BayesDAG: Gradient-Based Posterior Inference for Causal Discovery

Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

Abstract: Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existin… ▽ More Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines. △ Less

Submitted 8 December, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.04988 [pdf, other]

Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation

Authors: Chris Chinenye Emezue, Alexandre Drouin, Tristan Deleu, Stefan Bauer, Yoshua Bengio

Abstract: The practical utility of causality in decision-making is widespread and brought about by the intertwining of causal discovery and causal inference. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate seven established baseline causal discovery methods including a newly prop… ▽ More The practical utility of causality in decision-making is widespread and brought about by the intertwining of causal discovery and causal inference. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a distribution-level evaluation, we offer valuable and unique insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes, while some tend to learn many low-probability modes which impacts the (unrelaxed) recall and precision. △ Less

Submitted 30 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Peer-reviewed and Accepted to ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling

arXiv:2306.07024 [pdf, other]

DRCFS: Doubly Robust Causal Feature Selection

Authors: Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Cristian R. Rojas, Stefan Bauer

Abstract: Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the caus… ▽ More Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems. △ Less

Submitted 5 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.15427 [pdf, other]

doi 10.1103/PhysRevResearch.7.013308

Societal self-regulation induces complex infection dynamics and chaos

Authors: Joel Wagner, Simon Bauer, Sebastian Contreras, Luk Fleddermann, Ulrich Parlitz, Viola Priesemann

Abstract: Classically, endemic infectious diseases are expected to display relatively stable, predictable infection dynamics. Accordingly, basic disease models such as the susceptible-infected-recovered-susceptible model display stable endemic states or recurrent seasonal waves. However, if the human population reacts to high infection numbers by mitigating the spread of the disease, then this delayed behav… ▽ More Classically, endemic infectious diseases are expected to display relatively stable, predictable infection dynamics. Accordingly, basic disease models such as the susceptible-infected-recovered-susceptible model display stable endemic states or recurrent seasonal waves. However, if the human population reacts to high infection numbers by mitigating the spread of the disease, then this delayed behavioral feedback loop can generate infection waves itself, driven by periodic mitigation and subsequent relaxation. We show that such behavioral reactions, together with a seasonal effect of comparable impact, can cause complex and unpredictable infection dynamics, including Arnold tongues, coexisting attractors, and chaos. Importantly, these arise in epidemiologically relevant parameter regions where the costs associated to infections and mitigation are jointly minimized. By comparing our model to data, we find signs that COVID-19 was mitigated in a way that favored complex infection dynamics. Our results challenge the intuition that endemic disease dynamics necessarily implies predictability and seasonal waves and show the emergence of complex infection dynamics when humans optimize their reaction to increasing infection numbers. △ Less

Submitted 6 April, 2025; v1 submitted 18 May, 2023; originally announced May 2023.

Journal ref: Phys. Rev. Research 7, 013308 (2025)

arXiv:2305.02172 [pdf, other]

How charges separate when surfaces are dewetted

Authors: Aaron D. Ratschow, Lisa S. Bauer, Pravash Bista, Stefan A. L. Weber, Hans-Jürgen Butt, Steffen Hardt

Abstract: Charge separation at moving three-phase contact lines is observed in nature as well as technological processes. Despite the growing number of experimental investigations in recent years, the physical mechanism behind the charging remains obscure. Here we identify the origin of charge separation as the dewetting of the bound surface charge within the electric double layer by the receding contact li… ▽ More Charge separation at moving three-phase contact lines is observed in nature as well as technological processes. Despite the growing number of experimental investigations in recent years, the physical mechanism behind the charging remains obscure. Here we identify the origin of charge separation as the dewetting of the bound surface charge within the electric double layer by the receding contact line. This charge depends strongly on the local electric double layer structure close to the contact line, which is affected by the gas-liquid interface and the internal flow of the liquid. We summarize the charge separation mechanism in an analytical model that captures parametric dependencies in agreement with our experiments and numerical simulations. Charge separation increases with increasing contact angle and decreases with increasing dewetting velocity. Our findings reveal the universal mechanism of charge separation at receding contact lines, relevant to many dynamic wetting scenarios, and provide a theoretical foundation for both fundamental questions, like contact angle hysteresis, and practical applications. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.05524 [pdf, other]

Understanding Causality with Large Language Models: Feasibility and Opportunities

Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

Abstract: We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decisio… ▽ More We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2302.10607 [pdf, other]

Differentiable Multi-Target Causal Bayesian Experimental Design

Authors: Yashas Annadani, Panagiotis Tigas, Desi R. Ivanova, Andrew Jesson, Yarin Gal, Adam Foster, Stefan Bauer

Abstract: We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting -- a critical component for causal discovery from finite data where interventions can be costly or risky. Existing methods rely on greedy approximations to construct a batch of experiments while using black-box methods to optimize over a single target-state pair… ▽ More We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting -- a critical component for causal discovery from finite data where interventions can be costly or risky. Existing methods rely on greedy approximations to construct a batch of experiments while using black-box methods to optimize over a single target-state pair to intervene with. In this work, we completely dispose of the black-box optimization techniques and greedy heuristics and instead propose a conceptually simple end-to-end gradient-based optimization procedure to acquire a set of optimal intervention target-state pairs. Such a procedure enables parameterization of the design space to efficiently optimize over a batch of multi-target-state interventions, a setting which has hitherto not been explored due to its complexity. We demonstrate that our proposed method outperforms baselines and existing acquisition strategies in both single-target and multi-target settings across a number of synthetic datasets. △ Less

Submitted 2 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: Camera-ready version ICML 2023

arXiv:2211.13715 [pdf, other]

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery

Authors: Mateusz Olko, Michał Zając, Aleksandra Nowak, Nino Scherrer, Yashas Annadani, Stefan Bauer, Łukasz Kuciński, Piotr Miłoś

Abstract: Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to… ▽ More Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime. △ Less

Submitted 3 April, 2024; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2211.03846 [pdf, other]

Federated Causal Discovery From Interventions

Authors: Amin Abyaneh, Nino Scherrer, Patrick Schwab, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou

Abstract: Causal discovery serves a pivotal role in mitigating model uncertainty through recovering the underlying causal mechanisms among variables. In many practical domains, such as healthcare, access to the data gathered by individual entities is limited, primarily for privacy and regulatory constraints. However, the majority of existing causal discovery methods require the data to be available in a cen… ▽ More Causal discovery serves a pivotal role in mitigating model uncertainty through recovering the underlying causal mechanisms among variables. In many practical domains, such as healthcare, access to the data gathered by individual entities is limited, primarily for privacy and regulatory constraints. However, the majority of existing causal discovery methods require the data to be available in a centralized location. In response, researchers have introduced federated causal discovery. While previous federated methods consider distributed observational data, the integration of interventional data remains largely unexplored. We propose FedCDI, a federated framework for inferring causal structures from distributed data containing interventional samples. In line with the federated learning framework, FedCDI improves privacy by exchanging belief updates rather than raw samples. Additionally, it introduces a novel intervention-aware method for aggregating individual updates. We analyze scenarios with shared or disjoint intervened covariates, and mitigate the adverse effects of interventional data heterogeneity. The performance and scalability of FedCDI is rigorously tested across a variety of synthetic and real-world graphs. △ Less

Submitted 11 February, 2024; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2210.13774 [pdf, other]

From Points to Functions: Infinite-dimensional Representations in Diffusion Models

Authors: Sarthak Mittal, Guillaume Lajoie, Stefan Bauer, Arash Mehrjou

Abstract: Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a le… ▽ More Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2210.13583 [pdf, other]

Learning Latent Structural Causal Models

Authors: Jithendaraa Subramanian, Yashas Annadani, Ivaxi Sheth, Nan Rosemary Ke, Tristan Deleu, Stefan Bauer, Derek Nowrouzezahrai, Samira Ebrahimi Kahou

Abstract: Causal learning has long concerned itself with the accurate recovery of underlying causal mechanisms. Such causal modelling enables better explanations of out-of-distribution data. Prior works on causal learning assume that the high-level causal variables are given. However, in machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors. In such setti… ▽ More Causal learning has long concerned itself with the accurate recovery of underlying causal mechanisms. Such causal modelling enables better explanations of out-of-distribution data. Prior works on causal learning assume that the high-level causal variables are given. However, in machine learning tasks, one often operates on low-level data like image pixels or high-dimensional vectors. In such settings, the entire Structural Causal Model (SCM) -- structure, parameters, \textit{and} high-level causal variables -- is unobserved and needs to be learnt from low-level data. We treat this problem as Bayesian inference of the latent SCM, given low-level data. For linear Gaussian additive noise SCMs, we present a tractable approximate inference method which performs joint inference over the causal variables, structure and parameters of the latent SCM from random, known interventions. Experiments are performed on synthetic datasets and a causally generated image dataset to demonstrate the efficacy of our approach. We also perform image generation from unseen interventions, thereby verifying out of distribution generalization for the proposed causal model. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 21 pages, 19 figures

arXiv:2210.09511 [pdf, other]

Generalised Gillespie Algorithms for Simulations in a Rule-Based Epidemiological Model Framework

Authors: David Alonso, Steffen Bauer, Markus Kirkilionis, Lisa Maria Kreusser, Luca Sbano

Abstract: Rule-based models have been successfully used to represent different aspects of the COVID-19 pandemic, including age, testing, hospitalisation, lockdowns, immunity, infectivity, behaviour, mobility and vaccination of individuals. These rule-based approaches are motivated by chemical reaction rules which are traditionally solved numerically with the standard Gillespie algorithm proposed in the cont… ▽ More Rule-based models have been successfully used to represent different aspects of the COVID-19 pandemic, including age, testing, hospitalisation, lockdowns, immunity, infectivity, behaviour, mobility and vaccination of individuals. These rule-based approaches are motivated by chemical reaction rules which are traditionally solved numerically with the standard Gillespie algorithm proposed in the context of molecular dynamics. When applying reaction system type of approaches to epidemiology, generalisations of the Gillespie algorithm are required due to the time-dependency of the problems. In this article, we present different generalisations of the standard Gillespie algorithm which address discrete subtypes (e.g., incorporating the age structure of the population), time-discrete updates (e.g., incorporating daily imposed change of rates for lockdowns) and deterministic delays (e.g., given waiting time until a specific change in types such as release from isolation occurs). These algorithms are complemented by relevant examples in the context of the COVID-19 pandemic and numerical results. △ Less

Submitted 24 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

MSC Class: 92B05; 60G07; ACM Class: J.3; G.3

Showing 1–50 of 260 results for author: Bauer, S