-
Open-Source Fermionic Neural Networks with Ionic Charge Initialization
Authors:
Shai Pranesh,
Shang Zhu,
Venkat Viswanathan,
Bharath Ramsundar
Abstract:
Finding accurate solutions to the electronic Schrödinger equation plays an important role in discovering important molecular and material energies and characteristics. Consequently, solving systems with large numbers of electrons has become increasingly important. Variational Monte Carlo (VMC) methods, especially those approximated through deep neural networks, are promising in this regard. In thi…
▽ More
Finding accurate solutions to the electronic Schrödinger equation plays an important role in discovering important molecular and material energies and characteristics. Consequently, solving systems with large numbers of electrons has become increasingly important. Variational Monte Carlo (VMC) methods, especially those approximated through deep neural networks, are promising in this regard. In this paper, we aim to integrate one such model called the FermiNet, a post-Hartree-Fock (HF) Deep Neural Network (DNN) model, into a standard and widely used open source library, DeepChem. We also propose novel initialization techniques to overcome the difficulties associated with the assignment of excess or lack of electrons for ions.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Differentiable Modeling and Optimization of Battery Electrolyte Mixtures Using Geometric Deep Learning
Authors:
Shang Zhu,
Bharath Ramsundar,
Emil Annevelink,
Hongyi Lin,
Adarsh Dave,
Pin-Wen Guan,
Kevin Gering,
Venkatasubramanian Viswanathan
Abstract:
Electrolytes play a critical role in designing next-generation battery systems, by allowing efficient ion transfer, preventing charge transfer, and stabilizing electrode-electrolyte interfaces. In this work, we develop a differentiable geometric deep learning (GDL) model for chemical mixtures, DiffMix, which is applied in guiding robotic experimentation and optimization towards fast-charging batte…
▽ More
Electrolytes play a critical role in designing next-generation battery systems, by allowing efficient ion transfer, preventing charge transfer, and stabilizing electrode-electrolyte interfaces. In this work, we develop a differentiable geometric deep learning (GDL) model for chemical mixtures, DiffMix, which is applied in guiding robotic experimentation and optimization towards fast-charging battery electrolytes. In particular, we extend mixture thermodynamic and transport laws by creating GDL-learnable physical coefficients. We evaluate our model with mixture thermodynamics and ion transport properties, where we show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, with a robotic experimentation setup, Clio, we improve ionic conductivity of electrolytes by over 18.8% within 10 experimental steps, via differentiable optimization built on DiffMix gradients. By combining GDL, mixture physics laws, and robotic experimentation, DiffMix expands the predictive modeling methods for chemical mixtures and enables efficient optimization in large chemical spaces.
△ Less
Submitted 1 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
FastFlows: Flow-Based Models for Molecular Graph Generation
Authors:
Nathan C. Frey,
Vijay Gadepally,
Bharath Ramsundar
Abstract:
We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows generates thousands of chemically valid molecules in seconds. Because of the efficient sampling, substructure filters can be applied as desired to eliminate com…
▽ More
We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows generates thousands of chemically valid molecules in seconds. Because of the efficient sampling, substructure filters can be applied as desired to eliminate compounds with unreasonable moieties. Using easily computable and learned metrics for druglikeness, synthetic accessibility, and synthetic complexity, we perform a multi-objective optimization to demonstrate how FastFlows functions in a high-throughput virtual screening context. Our model is significantly simpler and easier to train than autoregressive molecular generative models, and enables fast generation and identification of druglike, synthesizable molecules.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Bringing Atomistic Deep Learning to Prime Time
Authors:
Nathan C. Frey,
Siddharth Samsi,
Bharath Ramsundar,
Connor W. Coley,
Vijay Gadepally
Abstract:
Artificial intelligence has not yet revolutionized the design of materials and molecules. In this perspective, we identify four barriers preventing the integration of atomistic deep learning, molecular science, and high-performance computing. We outline focused research efforts to address the opportunities presented by these challenges.
Artificial intelligence has not yet revolutionized the design of materials and molecules. In this perspective, we identify four barriers preventing the integration of atomistic deep learning, molecular science, and high-performance computing. We outline focused research efforts to address the opportunities presented by these challenges.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Differentiable Physics: A Position Piece
Authors:
Bharath Ramsundar,
Dilip Krishnamurthy,
Venkatasubramanian Viswanathan
Abstract:
Differentiable physics provides a new approach for modeling and understanding the physical systems by pairing the new technology of differentiable programming with classical numerical methods for physical simulation. We survey the rapidly growing literature of differentiable physics techniques and highlight methods for parameter estimation, learning representations, solving differential equations,…
▽ More
Differentiable physics provides a new approach for modeling and understanding the physical systems by pairing the new technology of differentiable programming with classical numerical methods for physical simulation. We survey the rapidly growing literature of differentiable physics techniques and highlight methods for parameter estimation, learning representations, solving differential equations, and developing what we call scientific foundation models using data and inductive priors. We argue that differentiable physics offers a new paradigm for modeling physical phenomena by combining classical analytic solutions with numerical methodology using the bridge of differentiable programming.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
Authors:
Seyone Chithrananda,
Gabriel Grand,
Bharath Ramsundar
Abstract:
GNNs and chemical fingerprints are the predominant approaches to representing molecules for property prediction. However, in NLP, transformers have become the de-facto standard for representation learning thanks to their strong downstream task transfer. In parallel, the software ecosystem around transformers is maturing rapidly, with libraries like HuggingFace and BertViz enabling streamlined trai…
▽ More
GNNs and chemical fingerprints are the predominant approaches to representing molecules for property prediction. However, in NLP, transformers have become the de-facto standard for representation learning thanks to their strong downstream task transfer. In parallel, the software ecosystem around transformers is maturing rapidly, with libraries like HuggingFace and BertViz enabling streamlined training and introspection. In this work, we make one of the first attempts to systematically evaluate transformers on molecular property prediction tasks via our ChemBERTa model. ChemBERTa scales well with pretraining dataset size, offering competitive downstream performance on MoleculeNet and useful attention-based visualization modalities. Our results suggest that transformers offer a promising avenue of future work for molecular representation learning and property prediction. To facilitate these efforts, we release a curated dataset of 77M SMILES from PubChem suitable for large-scale self-supervised pretraining.
△ Less
Submitted 23 October, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Authors:
Joseph Gomes,
Bharath Ramsundar,
Evan N. Feinberg,
Vijay S. Pande
Abstract:
Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or f…
▽ More
Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
MoleculeNet: A Benchmark for Molecular Machine Learning
Authors:
Zhenqin Wu,
Bharath Ramsundar,
Evan N. Feinberg,
Joseph Gomes,
Caleb Geniesse,
Aneesh S. Pappu,
Karl Leswing,
Vijay Pande
Abstract:
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are be…
▽ More
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
△ Less
Submitted 25 October, 2018; v1 submitted 1 March, 2017;
originally announced March 2017.