Search | arXiv e-print repository

A Comprehensive Benchmark for RNA 3D Structure-Function Modeling

Authors: Luis Wyss, Vincent Mallet, Wissam Karroucha, Karsten Borgwardt, Carlos Oliver

Abstract: The relationship between RNA structure and function has recently attracted interest within the deep learning community, a trend expected to intensify as nucleic acid structure models advance. Despite this momentum, a lack of standardized, accessible benchmarks for applying deep learning to RNA 3D structures hinders progress. To this end, we introduce a collection of seven benchmarking datasets spe… ▽ More The relationship between RNA structure and function has recently attracted interest within the deep learning community, a trend expected to intensify as nucleic acid structure models advance. Despite this momentum, a lack of standardized, accessible benchmarks for applying deep learning to RNA 3D structures hinders progress. To this end, we introduce a collection of seven benchmarking datasets specifically designed to support RNA structure-function prediction. Built on top of the established Python package rnaglib, our library streamlines data distribution and encoding, provides tools for dataset splitting and evaluation, and offers a comprehensive, user-friendly environment for model comparison. The modular and reproducible design of our datasets encourages community contributions and enables rapid customization. To demonstrate the utility of our benchmarks, we report baseline results for all tasks using a relational graph neural network. △ Less

Submitted 20 May, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

arXiv:2502.02216 [pdf, ps, other]

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Authors: Dexiong Chen, Markus Krimmel, Karsten Borgwardt

Abstract: We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches. This results in sampling comple… ▽ More We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches. This results in sampling complexity and sequence lengths that scale optimally linearly with the number of edges, making it scalable and efficient for large, sparse graphs. A key success factor of AutoGraph is that its sequence prefixes represent induced subgraphs, creating a direct link to sub-sentences in language modeling. Empirically, AutoGraph achieves state-of-the-art performance on synthetic and molecular benchmarks, with up to 100x faster generation and 3x faster training than leading diffusion models. It also supports substructure-conditioned generation without fine-tuning and shows promising transferability, bridging language modeling and graph generation to lay the groundwork for graph foundation models. Our code is available at https://github.com/BorgwardtLab/AutoGraph. △ Less

Submitted 19 September, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

Comments: To appear at NeurIPS 2025

arXiv:2406.03386 [pdf, other]

Learning Long Range Dependencies on Graphs via Random Walks

Authors: Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt

Abstract: Message-passing graph neural networks (GNNs) excel at capturing local relationships but struggle with long-range dependencies in graphs. In contrast, graph transformers (GTs) enable global information exchange but often oversimplify the graph structure by representing graphs as sets of fixed-length vectors. This work introduces a novel architecture that overcomes the shortcomings of both approache… ▽ More Message-passing graph neural networks (GNNs) excel at capturing local relationships but struggle with long-range dependencies in graphs. In contrast, graph transformers (GTs) enable global information exchange but often oversimplify the graph structure by representing graphs as sets of fixed-length vectors. This work introduces a novel architecture that overcomes the shortcomings of both approaches by combining the long-range information of random walks with local message passing. By treating random walks as sequences, our architecture leverages recent advances in sequence models to effectively capture long-range dependencies within these walks. Based on this concept, we propose a framework that offers (1) more expressive graph representations through random walk sequences, (2) the ability to utilize any sequence model for capturing long-range dependencies, and (3) the flexibility by integrating various GNN and GT architectures. Our experimental evaluations demonstrate that our approach achieves significant performance improvements on 19 graph and node benchmark datasets, notably outperforming existing methods by up to 13\% on the PascalVoc-SP and COCO-SP datasets. The code is available at https://github.com/BorgwardtLab/NeuralWalker. △ Less

Submitted 7 October, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2305.07580 [pdf, other]

Fisher Information Embedding for Node and Graph Learning

Authors: Dexiong Chen, Paolo Pellizzoni, Karsten Borgwardt

Abstract: Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embe… ▽ More Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embedding framework for graphs. Our framework builds upon a hierarchical kernel for multisets of subgraphs around nodes (e.g. neighborhoods) and each kernel leverages the geometry of a smooth statistical manifold to compare pairs of multisets, by "projecting" the multisets onto the manifold. By explicitly computing node embeddings with a manifold of Gaussian mixtures, our method leads to a new attention mechanism for neighborhood aggregation. We provide theoretical insights into generalizability and expressivity of our embeddings, contributing to a deeper understanding of attention-based GNNs. We propose both efficient unsupervised and supervised methods for learning the embeddings. Through experiments on several node classification benchmarks, we demonstrate that our proposed method outperforms existing attention-based graph models like GATs. Our code is available at https://github.com/BorgwardtLab/fisher_information_embedding. △ Less

Submitted 6 June, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2207.02968 [pdf, other]

Unsupervised Manifold Alignment with Joint Multidimensional Scaling

Authors: Dexiong Chen, Bowen Fan, Carlos Oliver, Karsten Borgwardt

Abstract: We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to si… ▽ More We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment. The implementation of our work is available at https://github.com/BorgwardtLab/JointMDS △ Less

Submitted 16 February, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: ICLR 2023, see https://openreview.net/forum?id=lUpjsrKItz4

arXiv:2206.01008 [pdf, other]

Approximate Network Motif Mining Via Graph Learning

Authors: Carlos Oliver, Dexiong Chen, Vincent Mallet, Pericles Philippopoulos, Karsten Borgwardt

Abstract: Frequent and structurally related subgraphs, also known as network motifs, are valuable features of many graph datasets. However, the high computational complexity of identifying motif sets in arbitrary datasets (motif mining) has limited their use in many real-world datasets. By automatically leveraging statistical properties of datasets, machine learning approaches have shown promise in several… ▽ More Frequent and structurally related subgraphs, also known as network motifs, are valuable features of many graph datasets. However, the high computational complexity of identifying motif sets in arbitrary datasets (motif mining) has limited their use in many real-world datasets. By automatically leveraging statistical properties of datasets, machine learning approaches have shown promise in several tasks with combinatorial complexity and are therefore a promising candidate for network motif mining. In this work we seek to facilitate the development of machine learning approaches aimed at motif mining. We propose a formulation of the motif mining problem as a node labelling task. In addition, we build benchmark datasets and evaluation metrics which test the ability of models to capture different aspects of motif discovery such as motif number, size, topology, and scarcity. Next, we propose MotiFiesta, a first attempt at solving this problem in a fully differentiable manner with promising results on challenging baselines. Finally, we demonstrate through MotiFiesta that this learning setting can be applied simultaneously to general-purpose data mining and interpretable feature extraction for graph classification tasks. △ Less

Submitted 7 June, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

arXiv:2202.03036 [pdf, other]

Structure-Aware Transformer for Graph Representation Learning

Authors: Dexiong Chen, Leslie O'Bray, Karsten Borgwardt

Abstract: The Transformer architecture has gained growing attention in graph representation learning recently, as it naturally overcomes several limitations of graph neural networks (GNNs) by avoiding their strict structural inductive biases and instead only encoding the graph structure via positional encoding. Here, we show that the node representations generated by the Transformer with positional encoding… ▽ More The Transformer architecture has gained growing attention in graph representation learning recently, as it naturally overcomes several limitations of graph neural networks (GNNs) by avoiding their strict structural inductive biases and instead only encoding the graph structure via positional encoding. Here, we show that the node representations generated by the Transformer with positional encoding do not necessarily capture structural similarity between them. To address this issue, we propose the Structure-Aware Transformer, a class of simple and flexible graph Transformers built upon a new self-attention mechanism. This new self-attention incorporates structural information into the original self-attention by extracting a subgraph representation rooted at each node before computing the attention. We propose several methods for automatically generating the subgraph representation and show theoretically that the resulting representations are at least as expressive as the subgraph representations. Empirically, our method achieves state-of-the-art performance on five graph prediction benchmarks. Our structure-aware framework can leverage any existing GNN to extract the subgraph representation, and we show that it systematically improves performance relative to the base GNN model, successfully combining the advantages of GNNs and Transformers. Our code is available at https://github.com/BorgwardtLab/SAT. △ Less

Submitted 13 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: To appear in ICML 2022

arXiv:2112.09992 [pdf, other]

Weisfeiler and Leman go Machine Learning: The Story so far

Authors: Christopher Morris, Yaron Lipman, Haggai Maron, Bastian Rieck, Nils M. Kriege, Martin Grohe, Matthias Fey, Karsten Borgwardt

Abstract: In recent years, algorithms and neural architectures based on the Weisfeiler--Leman algorithm, a well-known heuristic for the graph isomorphism problem, have emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine-learning setting, focusing on the supervised regime. We discuss the theoretical backgr… ▽ More In recent years, algorithms and neural architectures based on the Weisfeiler--Leman algorithm, a well-known heuristic for the graph isomorphism problem, have emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine-learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research. △ Less

Submitted 13 July, 2023; v1 submitted 18 December, 2021; originally announced December 2021.

Comments: Accepted at JMLR

arXiv:2106.01098 [pdf, other]

Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions

Authors: Leslie O'Bray, Max Horn, Bastian Rieck, Karsten Borgwardt

Abstract: Graph generative models are a highly active branch of machine learning. Given the steady development of new models of ever-increasing complexity, it is necessary to provide a principled way to evaluate and compare them. In this paper, we enumerate the desirable criteria for such a comparison metric and provide an overview of the status quo of graph generative model comparison in use today, which p… ▽ More Graph generative models are a highly active branch of machine learning. Given the steady development of new models of ever-increasing complexity, it is necessary to provide a principled way to evaluate and compare them. In this paper, we enumerate the desirable criteria for such a comparison metric and provide an overview of the status quo of graph generative model comparison in use today, which predominantly relies on the maximum mean discrepancy (MMD). We perform a systematic evaluation of MMD in the context of graph generative model comparison, highlighting some of the challenges and pitfalls researchers inadvertently may encounter. After conducting a thorough analysis of the behaviour of MMD on synthetically-generated perturbed graphs as well as on recently-proposed graph generative models, we are able to provide a suitable procedure to mitigate these challenges and pitfalls. We aggregate our findings into a list of practical recommendations for researchers to use when evaluating graph generative models. △ Less

Submitted 18 March, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted as a Spotlight presentation at ICLR 2022

arXiv:2102.07835 [pdf, other]

Topological Graph Neural Networks

Authors: Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract: Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive (in terms the Weisfeiler--Lehm… ▽ More Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive (in terms the Weisfeiler--Lehman graph isomorphism test) than message-passing GNNs. Augmenting GNNs with TOGL leads to improved predictive performance for graph and node classification tasks, both on synthetic data sets, which can be classified by humans using their topology but not by ordinary GNNs, and on real-world data. △ Less

Submitted 17 March, 2022; v1 submitted 15 February, 2021; originally announced February 2021.

Journal ref: Tenth International Conference on Learning Representations (ICLR), 2022

arXiv:2011.03854 [pdf, other]

doi 10.1561/2200000076

Graph Kernels: State-of-the-Art and Future Challenges

Authors: Karsten Borgwardt, Elisabetta Ghisu, Felipe Llinares-López, Leslie O'Bray, Bastian Rieck

Abstract: Graph-structured data are an integral part of many application domains, including chemoinformatics, computational biology, neuroimaging, and social network analysis. Over the last two decades, numerous graph kernels, i.e. kernel functions between graphs, have been proposed to solve the problem of assessing the similarity between graphs, thereby making it possible to perform predictions in both cla… ▽ More Graph-structured data are an integral part of many application domains, including chemoinformatics, computational biology, neuroimaging, and social network analysis. Over the last two decades, numerous graph kernels, i.e. kernel functions between graphs, have been proposed to solve the problem of assessing the similarity between graphs, thereby making it possible to perform predictions in both classification and regression settings. This manuscript provides a review of existing graph kernels, their applications, software plus data resources, and an empirical comparison of state-of-the-art graph kernels. △ Less

Submitted 10 November, 2020; v1 submitted 7 November, 2020; originally announced November 2020.

Comments: Accepted by Foundations and Trends in Machine Learning, 2020

arXiv:2006.07882 [pdf, other]

Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence

Authors: Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas Turk-Browne, Smita Krishnaswamy

Abstract: Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challeng… ▽ More Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust to noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie. △ Less

Submitted 22 October, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Accepted at the Conference on Neural Information Processing Systems (NeurIPS) 2020; camera-ready version

arXiv:2005.12359 [pdf, other]

Path Imputation Strategies for Signature Models of Irregular Time Series

Authors: Michael Moor, Max Horn, Christian Bock, Karsten Borgwardt, Bastian Rieck

Abstract: The signature transform is a 'universal nonlinearity' on the space of continuous vector-valued paths, and has received attention for use in machine learning on time series. However, real-world temporal data is typically observed at discrete points in time, and must first be transformed into a continuous path before signature techniques can be applied. We make this step explicit by characterising i… ▽ More The signature transform is a 'universal nonlinearity' on the space of continuous vector-valued paths, and has received attention for use in machine learning on time series. However, real-world temporal data is typically observed at discrete points in time, and must first be transformed into a continuous path before signature techniques can be applied. We make this step explicit by characterising it as an imputation problem, and empirically assess the impact of various imputation strategies when applying signature-based neural nets to irregular time series data. For one of these strategies, Gaussian process (GP) adapters, we propose an extension~(GP-PoM) that makes uncertainty information directly available to the subsequent classifier while at the same time preventing costly Monte-Carlo (MC) sampling. In our experiments, we find that the choice of imputation drastically affects shallow signature models, whereas deeper architectures are more robust. Next, we observe that uncertainty-aware predictions (based on GP-PoM or indicator imputations) are beneficial for predictive performance, even compared to the uncertainty-aware training of conventional GP adapters. In conclusion, we have demonstrated that the path construction is indeed crucial for signature models and that our proposed strategy leads to competitive performance in general, while improving robustness of signature models in particular. △ Less

Submitted 6 June, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

arXiv:1909.12064 [pdf, other]

Set Functions for Time Series

Authors: Max Horn, Michael Moor, Christian Bock, Bastian Rieck, Karsten Borgwardt

Abstract: Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that commonly occur in real-world datasets, especially in healthcare applications. This paper proposes a novel approach for classifying irregularly-sampled time series with unaligned measurements, focusing on high scalability and data efficiency.… ▽ More Despite the eminent successes of deep neural networks, many architectures are often hard to transfer to irregularly-sampled and asynchronous time series that commonly occur in real-world datasets, especially in healthcare applications. This paper proposes a novel approach for classifying irregularly-sampled time series with unaligned measurements, focusing on high scalability and data efficiency. Our method SeFT (Set Functions for Time Series) is based on recent advances in differentiable set function learning, extremely parallelizable with a beneficial memory footprint, thus scaling well to large datasets of long time series and online monitoring scenarios. Furthermore, our approach permits quantifying per-observation contributions to the classification outcome. We extensively compare our method with existing algorithms on multiple healthcare time series datasets and demonstrate that it performs competitively whilst significantly reducing runtime. △ Less

Submitted 14 September, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: Accepted at the International Conference on Machine Learning (ICML) 2020

arXiv:1909.05114 [pdf, other]

doi 10.1007/978-3-030-45257-5_18

PaccMann$^{RL}$: Designing anticancer drugs from transcriptomic data via reinforcement learning

Authors: Jannis Born, Matteo Manica, Ali Oskooei, Joris Cadow, Karsten Borgwardt, María Rodríguez Martínez

Abstract: With the advent of deep generative models in computational chemistry, in silico anticancer drug design has undergone an unprecedented transformation. While state-of-the-art deep learning approaches have shown potential in generating compounds with desired chemical properties, they disregard the genetic profile and properties of the target disease. Here, we introduce the first generative model capa… ▽ More With the advent of deep generative models in computational chemistry, in silico anticancer drug design has undergone an unprecedented transformation. While state-of-the-art deep learning approaches have shown potential in generating compounds with desired chemical properties, they disregard the genetic profile and properties of the target disease. Here, we introduce the first generative model capable of tailoring anticancer compounds for a specific biomolecular profile. Using a RL framework, the transcriptomic profiles of cancer cells are used as a context for the generation of candidate molecules. Our molecule generator combines two separately pretrained variational autoencoders (VAEs) - the first VAE encodes transcriptomic profiles into a smooth, latent space which in turn is used to condition a second VAE to generate novel molecular structures on the given transcriptomic profile. The generative process is optimized through PaccMann, a previously developed drug sensitivity prediction model to obtain effective anticancer compounds for the given context (i.e., transcriptomic profile). We demonstrate how the molecule generation can be biased towards compounds with high predicted inhibitory effect against individual cell lines or specific cancer sites. We verify our approach by investigating candidate drugs generated against specific cancer types and find the highest structural similarity to existing compounds with known efficacy against these cancer types. We envision our approach to transform in silico anticancer drug design by leveraging the biomolecular characteristics of the disease in order to increase success rates in lead compound discovery. △ Less

Submitted 16 April, 2020; v1 submitted 29 August, 2019; originally announced September 2019.

Comments: 18 pages total (12 pages main text, 4 pages references, 11 pages appendix) 8 figures

Journal ref: International Conference on Research in Computational Molecular Biology 2020

arXiv:1906.01277 [pdf, other]

Wasserstein Weisfeiler-Lehman Graph Kernels

Authors: Matteo Togninalli, Elisabetta Ghisu, Felipe Llinares-López, Bastian Rieck, Karsten Borgwardt

Abstract: Most graph kernels are an instance of the class of $\mathcal{R}$-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Fu… ▽ More Most graph kernels are an instance of the class of $\mathcal{R}$-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Furthermore, only a limited instance of these approaches can be extended to continuously attributed graphs. We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows to find subtler differences in data sets by considering graphs as high-dimensional objects, rather than simple means. We further propose a Weisfeiler-Lehman inspired embedding scheme for graphs with continuous node attributes and weighted edges, enhance it with the computed Wasserstein distance, and thus improve the state-of-the-art prediction performance on several graph classification tasks. △ Less

Submitted 30 October, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: Accepted as a Spotlight talk at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1906.00722 [pdf, other]

Topological Autoencoders

Authors: Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt

Abstract: We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such tha… ▽ More We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors. △ Less

Submitted 31 May, 2021; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: Accepted at the International Conference on Machine Learning (ICML) 2020; camera-ready version

arXiv:1904.07990 [pdf]

Machine learning for early prediction of circulatory failure in the intensive care unit

Authors: Stephanie L. Hyland, Martin Faltys, Matthias Hüser, Xinrui Lyu, Thomas Gumbsch, Cristóbal Esteban, Christian Bock, Max Horn, Michael Moor, Bastian Rieck, Marc Zimmermann, Dean Bodenham, Karsten Borgwardt, Gunnar Rätsch, Tobias M. Merz

Abstract: Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-res… ▽ More Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-resolution ICU database with 240 patient years of data. This automatic system predicts 90.0% of circulatory failure events (prevalence 3.1%), with 81.8% identified more than two hours in advance, resulting in an area under the receiver operating characteristic curve of 94.0% and area under the precision-recall curve of 63.0%. The model was externally validated in a large independent patient cohort. △ Less

Submitted 19 April, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: 5 main figures, 1 main table, 13 supplementary figures, 5 supplementary tables; 250ppi images

arXiv:1902.01659 [pdf, other]

Early Recognition of Sepsis with Gaussian Process Temporal Convolutional Networks and Dynamic Time Warping

Authors: Michael Moor, Max Horn, Bastian Rieck, Damian Roqueiro, Karsten Borgwardt

Abstract: Sepsis is a life-threatening host response to infection associated with high mortality, morbidity, and health costs. Its management is highly time-sensitive since each hour of delayed treatment increases mortality due to irreversible organ damage. Meanwhile, despite decades of clinical research, robust biomarkers for sepsis are missing. Therefore, detecting sepsis early by utilizing the affluence… ▽ More Sepsis is a life-threatening host response to infection associated with high mortality, morbidity, and health costs. Its management is highly time-sensitive since each hour of delayed treatment increases mortality due to irreversible organ damage. Meanwhile, despite decades of clinical research, robust biomarkers for sepsis are missing. Therefore, detecting sepsis early by utilizing the affluence of high-resolution intensive care records has become a challenging machine learning problem. Recent advances in deep learning and data mining promise to deliver a powerful set of tools to efficiently address this task. This empirical study proposes two novel approaches for the early detection of sepsis: a deep learning model and a lazy learner based on time series distances. Our deep learning model employs a temporal convolutional network that is embedded in a Multi-task Gaussian Process Adapter framework, making it directly applicable to irregularly-spaced time series data. Our lazy learner, by contrast, is an ensemble approach that employs dynamic time warping. We frame the timely detection of sepsis as a supervised time series classification task. For this, we derive the most recent sepsis definition in an hourly resolution to provide the first fully accessible early sepsis detection environment. Seven hours before sepsis onset, our methods improve area under the precision--recall curve from 0.25 to 0.35/0.40 over the state of the art. This demonstrates that they are well-suited for detecting sepsis in the crucial earlier stages when management is most effective. △ Less

Submitted 15 October, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: Accepted at the Machine Learning for Healthcare 2019 Conference (MLHC). Camera-ready version

arXiv:1812.09764 [pdf, other]

doi 10.3929/ethz-b-000327207

Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology

Authors: Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas Gumbsch, Karsten Borgwardt

Abstract: While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted strati… ▽ More While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. To demonstrate the usefulness of our approach, we show that neural persistence reflects best practices developed in the deep learning community such as dropout and batch normalization. Moreover, we derive a neural persistence-based stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss. △ Less

Submitted 27 September, 2019; v1 submitted 23 December, 2018; originally announced December 2018.

Comments: Published as a conference paper at ICLR 2019

arXiv:1702.08694 [pdf, other]

Finding Statistically Significant Interactions between Continuous Features

Authors: Mahito Sugiyama, Karsten Borgwardt

Abstract: The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regar… ▽ More The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets. △ Less

Submitted 10 May, 2019; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

arXiv:1508.05803 [pdf, other]

Searching for significant patterns in stratified data

Authors: Felipe Llinares-Lopez, Laetitia Papaxanthos, Dean Bodenham, Karsten Borgwardt

Abstract: Significant pattern mining, the problem of finding itemsets that are significantly enriched in one class of objects, is statistically challenging, as the large space of candidate patterns leads to an enormous multiple testing problem. Recently, the concept of testability was proposed as one approach to correct for multiple testing in pattern mining while retaining statistical power. Still, these s… ▽ More Significant pattern mining, the problem of finding itemsets that are significantly enriched in one class of objects, is statistically challenging, as the large space of candidate patterns leads to an enormous multiple testing problem. Recently, the concept of testability was proposed as one approach to correct for multiple testing in pattern mining while retaining statistical power. Still, these strategies based on testability do not allow one to condition the test of significance on the observed covariates, which severely limits its utility in biomedical applications. Here we propose a strategy and an efficient algorithm to perform significant pattern mining in the presence of categorical covariates with K states. △ Less

Submitted 24 August, 2015; originally announced August 2015.

Comments: 18 pages, 6 figures

arXiv:1502.04315 [pdf, other]

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

Authors: Felipe Llinares López, Mahito Sugiyama, Laetitia Papaxanthos, Karsten M. Borgwardt

Abstract: We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in… ▽ More We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs. △ Less

Submitted 15 February, 2015; originally announced February 2015.

arXiv:1407.1176 [pdf, other]

Identifying Higher-order Combinations of Binary Features

Authors: Felipe Llinares, Mahito Sugiyama, Karsten M. Borgwardt

Abstract: Finding statistically significant interactions between binary variables is computationally and statistically challenging in high-dimensional settings, due to the combinatorial explosion in the number of hypotheses. Terada et al. recently showed how to elegantly address this multiple testing problem by excluding non-testable hypotheses. Still, it remains unclear how their approach scales to large d… ▽ More Finding statistically significant interactions between binary variables is computationally and statistically challenging in high-dimensional settings, due to the combinatorial explosion in the number of hypotheses. Terada et al. recently showed how to elegantly address this multiple testing problem by excluding non-testable hypotheses. Still, it remains unclear how their approach scales to large datasets. We here proposed strategies to speed up the approach by Terada et al. and evaluate them thoroughly in 11 real-world benchmark datasets. We observe that one approach, incremental search with early stopping, is orders of magnitude faster than the current state-of-the-art approach. △ Less

Submitted 4 July, 2014; originally announced July 2014.

arXiv:1407.0316 [pdf, other]

Significant Subgraph Mining with Multiple Testing Correction

Authors: Mahito Sugiyama, Felipe Llinares López, Niklas Kasenburg, Karsten M. Borgwardt

Abstract: The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the sta… ▽ More The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power. △ Less

Submitted 30 January, 2015; v1 submitted 1 July, 2014; originally announced July 2014.

Comments: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15)

arXiv:1212.4788 [pdf, other]

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

Authors: Dominik Grimm, Bastian Greshake, Stefan Kleeberger, Christoph Lippert, Oliver Stegle, Bernhard Schölkopf, Detlef Weigel, Karsten Borgwardt

Abstract: Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources. R… ▽ More Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources. Results: We present easyGWAS, a web platform that provides methods, tools and dynamic visualizations to perform and analyze GWAS. In addition, easyGWAS makes it simple to reproduce results of others, validate findings, and access larger sample sizes through merging of public datasets. Availability: Detailed method and data descriptions as well as tutorials are available in the supplementary materials. easyGWAS is available at http://easygwas.tuebingen.mpg.de/. Contact: [email protected] △ Less

Submitted 19 December, 2012; originally announced December 2012.

arXiv:1211.2315 [pdf, other]

Efficient network-guided multi-locus association mapping with graph cuts

Authors: Chloé-Agathe Azencott, Dominik Grimm, Mahito Sugiyama, Yoshinobu Kawahara, Karsten M. Borgwardt

Abstract: As an increasing number of genome-wide association studies reveal the limitations of attempting to explain phenotypic heritability by single genetic loci, there is growing interest for associating complex phenotypes with sets of genetic loci. While several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pat… ▽ More As an increasing number of genome-wide association studies reveal the limitations of attempting to explain phenotypic heritability by single genetic loci, there is growing interest for associating complex phenotypes with sets of genetic loci. While several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci, or do not scale to genome-wide settings. We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype, while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints that can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci, and exhibits higher power in detecting causal SNPs in simulation studies than existing methods. On flowering time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature. Matlab code for SConES is available at http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/ △ Less

Submitted 18 April, 2013; v1 submitted 10 November, 2012; originally announced November 2012.

Comments: 20 pages, 6 figures, accepted at ISMB (International Conference on Intelligent Systems for Molecular Biology) 2013

arXiv:1205.6986 [pdf, ps, other]

LMM-Lasso: A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction

Authors: Barbara Rakitsch, Christoph Lippert, Oliver Stegle, Karsten Borgwardt

Abstract: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In simple cases, single polymorphic loci explain a significant fraction of the phenotype variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci instead. Accurate detection of such multivariate associations is nontrivial and of… ▽ More Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In simple cases, single polymorphic loci explain a significant fraction of the phenotype variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci instead. Accurate detection of such multivariate associations is nontrivial and often hindered by limited power. At the same time, confounding influences such as population structure cause spurious association signals that result in false positive findings if they are not accounted for in the model. Here, we propose LMM-Lasso, a mixed model that allows for both, multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters, effectively controls for population structure and scales to genome-wide datasets. We show practical use in genome-wide association studies and linkage mapping through retrospective analyses. In data from Arabidopsis thaliana and mouse, our method is able to find a genetic cause for significantly greater fractions of phenotype variation in 91% of the phenotypes considered. At the same time, our model dissects this variability into components that result from individual SNP effects and population structure. In addition to this increase of genetic heritability, enrichment of known candidate genes suggests that the associations retrieved by LMM-Lasso are more likely to be genuine. △ Less

Submitted 21 September, 2012; v1 submitted 30 May, 2012; originally announced May 2012.

Showing 1–28 of 28 results for author: Borgwardt, K