Search | arXiv e-print repository

doi 10.1038/s41524-024-01254-1

Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials

Authors: Viktor Zaverkin, David Holzmüller, Henrik Christiansen, Federico Errica, Francesco Alesiani, Makoto Takamoto, Mathias Niepert, Johannes Kästner

Abstract: Efficiently creating a concise but comprehensive data set for training machine-learned interatomic potentials (MLIPs) is an under-explored problem. Active learning, which uses biased or unbiased molecular dynamics (MD) to generate candidate pools, aims to address this objective. Existing biased and unbiased MD-simulation methods, however, are prone to miss either rare events or extrapolative regio… ▽ More Efficiently creating a concise but comprehensive data set for training machine-learned interatomic potentials (MLIPs) is an under-explored problem. Active learning, which uses biased or unbiased molecular dynamics (MD) to generate candidate pools, aims to address this objective. Existing biased and unbiased MD-simulation methods, however, are prone to miss either rare events or extrapolative regions -- areas of the configurational space where unreliable predictions are made. This work demonstrates that MD, when biased by the MLIP's energy uncertainty, simultaneously captures extrapolative regions and rare events, which is crucial for developing uniformly accurate MLIPs. Furthermore, exploiting automatic differentiation, we enhance bias-forces-driven MD with the concept of bias stress. We employ calibrated gradient-based uncertainties to yield MLIPs with similar or, sometimes, better accuracy than ensemble-based methods at a lower computational cost. Finally, we apply uncertainty-biased MD to alanine dipeptide and MIL-53(Al), generating MLIPs that represent both configurational spaces more accurately than models trained with conventional MD. △ Less

Submitted 2 November, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Journal ref: npj Comput. Mater. 10, 83 (2024)

arXiv:2307.14151 [pdf, other]

Learning Disentangled Discrete Representations

Authors: David Friede, Christian Reimers, Heiner Stuckenschmidt, Mathias Niepert

Abstract: Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) wi… ▽ More Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2206.11168 [pdf, other]

Ordered Subgraph Aggregation Networks

Authors: Chendi Qian, Gaurav Rattan, Floris Geerts, Christopher Morris, Mathias Niepert

Abstract: Numerous subgraph-enhanced graph neural networks (GNNs) have emerged recently, provably boosting the expressive power of standard (message-passing) GNNs. However, there is a limited understanding of how these approaches relate to each other and to the Weisfeiler-Leman hierarchy. Moreover, current approaches either use all subgraphs of a given size, sample them uniformly at random, or use hand-craf… ▽ More Numerous subgraph-enhanced graph neural networks (GNNs) have emerged recently, provably boosting the expressive power of standard (message-passing) GNNs. However, there is a limited understanding of how these approaches relate to each other and to the Weisfeiler-Leman hierarchy. Moreover, current approaches either use all subgraphs of a given size, sample them uniformly at random, or use hand-crafted heuristics instead of learning to select subgraphs in a data-driven manner. Here, we offer a unified way to study such architectures by introducing a theoretical framework and extending the known expressivity results of subgraph-enhanced GNNs. Concretely, we show that increasing subgraph size always increases the expressive power and develop a better understanding of their limitations by relating them to the established $k\text{-}\mathsf{WL}$ hierarchy. In addition, we explore different approaches for learning to sample subgraphs using recent methods for backpropagating through complex discrete probability distributions. Empirically, we study the predictive performance of different subgraph-enhanced GNNs, showing that our data-driven architectures increase prediction accuracy on standard benchmark datasets compared to non-data-driven subgraph-enhanced graph neural networks while reducing computation time. △ Less

Submitted 15 October, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Accepted at NeurIPS 2022. Fixed link to code repository

arXiv:2106.13642 [pdf, other]

VEGN: Variant Effect Prediction with Graph Neural Networks

Authors: Jun Cheng, Carolin Lawrence, Mathias Niepert

Abstract: Genetic mutations can cause disease by disrupting normal gene function. Identifying the disease-causing mutations from millions of genetic variants within an individual patient is a challenging problem. Computational methods which can prioritize disease-causing mutations have, therefore, enormous applications. It is well-known that genes function through a complex regulatory network. However, exis… ▽ More Genetic mutations can cause disease by disrupting normal gene function. Identifying the disease-causing mutations from millions of genetic variants within an individual patient is a challenging problem. Computational methods which can prioritize disease-causing mutations have, therefore, enormous applications. It is well-known that genes function through a complex regulatory network. However, existing variant effect prediction models only consider a variant in isolation. In contrast, we propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants. The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network. In this context, we explore an approach where a gene-gene graph is given and another where VEGN learns the gene-gene graph and therefore operates both on given and learnt edges. The graph neural network is trained to aggregate information between genes, and between genes and variants. Variants can exchange information via the genes they connect to. This approach improves the performance of existing state-of-the-art models. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted at Workshop on Computational Biology, co-located with the 38th International Conference on Machine Learning

arXiv:2010.05516 [pdf, other]

Explaining Neural Matrix Factorization with Gradient Rollback

Authors: Carolin Lawrence, Timo Sztyler, Mathias Niepert

Abstract: Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The m… ▽ More Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models. We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets. △ Less

Submitted 15 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 35th AAAI Conference on Artificial Intelligence, 2021. Includes Appendix

arXiv:1908.05915 [pdf, other]

Attending to Future Tokens For Bidirectional Sequence Generation

Authors: Carolin Lawrence, Bhushan Kotnis, Mathias Niepert

Abstract: Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generatio… ▽ More Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin. △ Less

Submitted 17 September, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China

arXiv:1903.11960 [pdf, other]

Learning Discrete Structures for Graph Neural Networks

Authors: Luca Franceschi, Mathias Niepert, Massimiliano Pontil, Xiao He

Abstract: Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we pro… ▽ More Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin. △ Less

Submitted 19 June, 2020; v1 submitted 28 March, 2019; originally announced March 2019.

Comments: ICML 2019, code at https://github.com/lucfra/LDS - Revision of Sec. 3

arXiv:1901.08817 [pdf, other]

State-Regularized Recurrent Neural Networks

Authors: Cheng Wang, Mathias Niepert

Abstract: Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state tra… ▽ More Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization (a) simplifies the extraction of finite state automata modeling an RNN's state transition dynamics; (b) forces RNNs to operate more like automata with external memory and less like finite state machines; (c) makes RNNs have better interpretability and explainability. △ Less

Submitted 7 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: to appear at ICML2019, 20 pages

arXiv:1811.04752 [pdf, other]

Learning Representations of Missing Data for Predicting Patient Outcomes

Authors: Brandon Malone, Alberto Garcia-Duran, Mathias Niepert

Abstract: Extracting actionable insight from Electronic Health Records (EHRs) poses several challenges for traditional machine learning approaches. Patients are often missing data relative to each other; the data comes in a variety of modalities, such as multivariate time series, free text, and categorical demographic information; important relationships among patients can be difficult to detect; and many o… ▽ More Extracting actionable insight from Electronic Health Records (EHRs) poses several challenges for traditional machine learning approaches. Patients are often missing data relative to each other; the data comes in a variety of modalities, such as multivariate time series, free text, and categorical demographic information; important relationships among patients can be difficult to detect; and many others. In this work, we propose a novel approach to address these first three challenges using a representation learning scheme based on message passing. We show that our proposed approach is competitive with or outperforms the state of the art for predicting in-hospital mortality (binary classification), the length of hospital visits (regression) and the discharge destination (multiclass classification). △ Less

Submitted 12 November, 2018; originally announced November 2018.

arXiv:1810.09227 [pdf, other]

Knowledge Graph Completion to Predict Polypharmacy Side Effects

Authors: Brandon Malone, Alberto García-Durán, Mathias Niepert

Abstract: The polypharmacy side effect prediction problem considers cases in which two drugs taken individually do not result in a particular side effect; however, when the two drugs are taken in combination, the side effect manifests. In this work, we demonstrate that multi-relational knowledge graph completion achieves state-of-the-art results on the polypharmacy side effect prediction problem. Empirical… ▽ More The polypharmacy side effect prediction problem considers cases in which two drugs taken individually do not result in a particular side effect; however, when the two drugs are taken in combination, the side effect manifests. In this work, we demonstrate that multi-relational knowledge graph completion achieves state-of-the-art results on the polypharmacy side effect prediction problem. Empirical results show that our approach is particularly effective when the protein targets of the drugs are well-characterized. In contrast to prior work, our approach provides more interpretable predictions and hypotheses for wet lab validation. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: 13th International Conference on Data Integration in the Life Sciences (DILS2018)

arXiv:1808.06791 [pdf, other]

LRMM: Learning to Recommend with Missing Modalities

Authors: Cheng Wang, Mathias Niepert, Hui Li

Abstract: Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel fr… ▽ More Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem. △ Less

Submitted 30 August, 2018; v1 submitted 21 August, 2018; originally announced August 2018.

Comments: 11 pages, EMNLP 2018

arXiv:1806.11391 [pdf, other]

A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning

Authors: Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert

Abstract: Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbol… ▽ More Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs. △ Less

Submitted 24 March, 2020; v1 submitted 29 June, 2018; originally announced June 2018.

Comments: corrected version: incorrect evaluation fixed; IJCAI 2019

arXiv:1805.01837 [pdf, other]

Towards a Spectrum of Graph Convolutional Networks

Authors: Mathias Niepert, Alberto Garcia-Duran

Abstract: We present our ongoing work on understanding the limitations of graph convolutional networks (GCNs) as well as our work on generalizations of graph convolutions for representing more complex node attribute dependencies. Based on an analysis of GCNs with the help of the corresponding computation graphs, we propose a generalization of existing GCNs where the aggregation operations are (a) determined… ▽ More We present our ongoing work on understanding the limitations of graph convolutional networks (GCNs) as well as our work on generalizations of graph convolutions for representing more complex node attribute dependencies. Based on an analysis of GCNs with the help of the corresponding computation graphs, we propose a generalization of existing GCNs where the aggregation operations are (a) determined by structural properties of the local neighborhood graphs and (b) not restricted to weighted averages. We show that the proposed approach is strictly more expressive while requiring only a modest increase in the number of parameters and computations. We also show that the proposed generalization is identical to standard convolutional layers when applied to regular grid graphs. △ Less

Submitted 4 May, 2018; originally announced May 2018.

arXiv:1605.05273 [pdf, other]

Learning Convolutional Neural Networks for Graphs

Authors: Mathias Niepert, Mohamed Ahmed, Konstantin Kutzkov

Abstract: Numerous important problems can be framed as learning from graph data. We propose a framework for learning convolutional neural networks for arbitrary graphs. These graphs may be undirected, directed, and with both discrete and continuous node and edge attributes. Analogous to image-based convolutional networks that operate on locally connected regions of the input, we present a general approach t… ▽ More Numerous important problems can be framed as learning from graph data. We propose a framework for learning convolutional neural networks for arbitrary graphs. These graphs may be undirected, directed, and with both discrete and continuous node and edge attributes. Analogous to image-based convolutional networks that operate on locally connected regions of the input, we present a general approach to extracting locally connected regions from graphs. Using established benchmark data sets, we demonstrate that the learned feature representations are competitive with state of the art graph kernels and that their computation is highly efficient. △ Less

Submitted 8 June, 2016; v1 submitted 17 May, 2016; originally announced May 2016.

Comments: To be presented at ICML 2016

arXiv:1206.5396 [pdf, other]

Markov Chains on Orbits of Permutation Groups

Authors: Mathias Niepert

Abstract: We present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We es… ▽ More We present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We establish an insightful connection between model symmetries and rapid mixing of orbital Markov chains. Thus, we present the first lifted MCMC algorithm for probabilistic graphical models. Both analytical and empirical results demonstrate the effectiveness and efficiency of the approach. △ Less

Submitted 28 June, 2012; v1 submitted 23 June, 2012; originally announced June 2012.

Comments: To appear in Proceedings of UAI2012

Showing 1–15 of 15 results for author: Niepert, M