-
Chemellia: An Ecosystem for Atomistic Scientific Machine Learning
Authors:
Anant Thazhemadam,
Dhairya Gandhi,
Venkatasubramanian Viswanathan,
Rachel C. Kurchin
Abstract:
Chemellia is an open-source framework for atomistic machine learning in the Julia programming language. The framework takes advantage of Julia's high speed as well as the ability to share and reuse code and interfaces through the paradigm of multiple dispatch. Chemellia is designed to make use of existing interfaces and avoid ``reinventing the wheel'' wherever possible. A key aspect of the Chemell…
▽ More
Chemellia is an open-source framework for atomistic machine learning in the Julia programming language. The framework takes advantage of Julia's high speed as well as the ability to share and reuse code and interfaces through the paradigm of multiple dispatch. Chemellia is designed to make use of existing interfaces and avoid ``reinventing the wheel'' wherever possible. A key aspect of the Chemellia ecosystem is the ChemistryFeaturization interface for defining and encoding features -- it is designed to maximize interoperability between featurization schemes and elements thereof, to maintain provenance of encoded features, and to ensure easy decodability and reconfigurability to enable feature engineering experiments. This embodies the overall design principles of the Chemellia ecosystem: separation of concerns, interoperability, and transparency. We illustrate these principles by discussing the implementation of crystal graph convolutional neural networks for material property prediction.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Score-Based Generative Models for Molecule Generation
Authors:
Dwaraknath Gnaneshwar,
Bharath Ramsundar,
Dhairya Gandhi,
Rachel Kurchin,
Venkatasubramanian Viswanathan
Abstract:
Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probabili…
▽ More
Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probability density using a score function approximation, as opposed to modelling the density function directly, and sampling from it using annealed Langevin Dynamics. We believe that score-based generative models could open up new opportunities in molecule generation due to their architectural flexibility, such as replacing the score function with an SE(3) equivariant model. In this work, we lay the foundations by testing the efficacy of score-based models for molecule generation. We train a Transformer-based score function on Self-Referencing Embedded Strings (SELFIES) representations of 1.5 million samples from the ZINC dataset and use the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
AutoMat: Accelerated Computational Electrochemical systems Discovery
Authors:
Emil Annevelink,
Rachel Kurchin,
Eric Muckley,
Lance Kavalsky,
Vinay I. Hegde,
Valentin Sulzer,
Shang Zhu,
Jiankun Pu,
David Farina,
Matthew Johnson,
Dhairya Gandhi,
Adarsh Dave,
Hongyi Lin,
Alan Edelman,
Bharath Ramsundar,
James Saal,
Christopher Rackauckas,
Viral Shah,
Bryce Meredig,
Venkatasubramanian Viswanathan
Abstract:
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive fi…
▽ More
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive first-principles, meso-scale and continuum simulations. We present an automated workflow, AutoMat, that accelerates these computational steps by introducing both automated input generation and management of simulations across scales from first principles to continuum device modeling. Furthermore, we show how to seamlessly integrate multi-fidelity predictions such as machine learning surrogates or automated robotic experiments "in-the-loop". The automated framework is implemented with design space search techniques to dramatically accelerate the overall materials discovery pipeline by implicitly learning design features that optimize device performance across several metrics. We discuss the benefits of AutoMat using examples in electrocatalysis and energy storage and highlight lessons learned.
△ Less
Submitted 13 May, 2022; v1 submitted 3 November, 2020;
originally announced November 2020.