-
Genetic Algorithm-Accelerated Computational Discovery of Liquid Crystal Polymers with Enhanced Optical Properties
Authors:
Jianing Zhou,
Yuge Huang,
Arman Boromand,
Keian Noori,
Lafe Purvis,
Chulwoo Oh,
Lu Lu,
Zachary W. Ulissi,
Vahe Gharakhanyan,
Xinyue Zhang
Abstract:
Liquid crystal polymers with exceptional optical properties are highly promising for next-generation virtual, augmented, and mixed reality (VR/AR/MR) technologies, serving as high-performance, compact, lightweight, and cost-effective optical components. However, the growing demands for optical transparency and high refractive index in advanced optical devices present a challenge for material disco…
▽ More
Liquid crystal polymers with exceptional optical properties are highly promising for next-generation virtual, augmented, and mixed reality (VR/AR/MR) technologies, serving as high-performance, compact, lightweight, and cost-effective optical components. However, the growing demands for optical transparency and high refractive index in advanced optical devices present a challenge for material discovery. In this study, we develop a novel approach that integrates first-principles calculations with genetic algorithms to accelerate the discovery of liquid crystal polymers with low visible absorption and high refractive index. By iterating within a predefined space of molecular building blocks, our approach rapidly identifies reactive mesogens that meet target specifications. Additionally, it provides valuable insights into the relationships between molecular structure and properties. This strategy not only accelerates material screening but also uncovers key molecular design principles, offering a systematic and scalable alternative to traditional trial-and-error methods.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Open Catalyst Experiments 2024 (OCx24): Bridging Experiments and Computational Models
Authors:
Jehad Abed,
Jiheon Kim,
Muhammed Shuaibi,
Brook Wander,
Boris Duijf,
Suhas Mahesh,
Hyeonseok Lee,
Vahe Gharakhanyan,
Sjoerd Hoogland,
Erdem Irtem,
Janice Lan,
Niels Schouten,
Anagha Usha Vijayakumar,
Jason Hattrick-Simpers,
John R. Kitchin,
Zachary W. Ulissi,
Aaike van Vugt,
Edward H. Sargent,
David Sinton,
C. Lawrence Zitnick
Abstract:
The search for low-cost, durable, and effective catalysts is essential for green hydrogen production and carbon dioxide upcycling to help in the mitigation of climate change. Discovery of new catalysts is currently limited by the gap between what AI-accelerated computational models predict and what experimental studies produce. To make progress, large and diverse experimental datasets are needed t…
▽ More
The search for low-cost, durable, and effective catalysts is essential for green hydrogen production and carbon dioxide upcycling to help in the mitigation of climate change. Discovery of new catalysts is currently limited by the gap between what AI-accelerated computational models predict and what experimental studies produce. To make progress, large and diverse experimental datasets are needed that are reproducible and tested at industrially-relevant conditions. We address these needs by utilizing a comprehensive high-throughput characterization and experimental pipeline to create the Open Catalyst Experiments 2024 (OCX24) dataset. The dataset contains 572 samples synthesized using both wet and dry methods with X-ray fluorescence and X-ray diffraction characterization. We prepared 441 gas diffusion electrodes, including replicates, and evaluated them using zero-gap electrolysis for carbon dioxide reduction (CO$_2$RR) and hydrogen evolution reactions (HER) at current densities up to $300$ mA/cm$^2$. To find correlations with experimental outcomes and to perform computational screens, DFT-verified adsorption energies for six adsorbates were calculated on $\sim$20,000 inorganic materials requiring 685 million AI-accelerated relaxations. Remarkably from this large set of materials, a data driven Sabatier volcano independently identified Pt as being a top candidate for HER without having any experimental measurements on Pt or Pt-alloy samples. We anticipate the availability of experimental data generated specifically for AI training, such as OCX24, will significantly improve the utility of computational models in selecting materials for experimental screening.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models
Authors:
Luis Barroso-Luque,
Muhammed Shuaibi,
Xiang Fu,
Brandon M. Wood,
Misko Dzamba,
Meng Gao,
Ammar Rizvi,
C. Lawrence Zitnick,
Zachary W. Ulissi
Abstract:
The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has b…
▽ More
The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks
Authors:
Brook Wander,
Muhammed Shuaibi,
John R. Kitchin,
Zachary W. Ulissi,
C. Lawrence Zitnick
Abstract:
Direct access to transition state energies at low computational cost unlocks the possibility of accelerating catalyst discovery. We show that the top performing graph neural network potential trained on the OC20 dataset, a related but different task, is able to find transition states energetically similar (within 0.1 eV) to density functional theory (DFT) 91% of the time with a 28x speedup. This s…
▽ More
Direct access to transition state energies at low computational cost unlocks the possibility of accelerating catalyst discovery. We show that the top performing graph neural network potential trained on the OC20 dataset, a related but different task, is able to find transition states energetically similar (within 0.1 eV) to density functional theory (DFT) 91% of the time with a 28x speedup. This speaks to the generalizability of the models, having never been explicitly trained on reactions, the machine learned potential approximates the potential energy surface well enough to be performant for this auxiliary task. We introduce the Open Catalyst 2020 Nudged Elastic Band (OC20NEB) dataset, which is made of 932 DFT nudged elastic band calculations, to benchmark machine learned model performance on transition state energies. To demonstrate the efficacy of this approach, we replicated a well-known, large reaction network with 61 intermediates and 174 dissociation reactions at DFT resolution (40 meV). In this case of dense NEB enumeration, we realize even more computational cost savings and used just 12 GPU days of compute, where DFT would have taken 52 GPU years, a 1500x speedup. Similar searches for complete reaction networks could become routine using the approach presented here. Finally, we replicated an ammonia synthesis activity volcano and systematically found lower energy configurations of the transition states and intermediates on six stepped unary surfaces. This scalable approach offers a more complete treatment of configurational space to improve and accelerate catalyst discovery.
△ Less
Submitted 11 June, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Adapting OC20-trained EquiformerV2 Models for High-Entropy Materials
Authors:
Christian M. Clausen,
Jan Rossmeisl,
Zachary W. Ulissi
Abstract:
Computational high-throughput studies, especially in research on high-entropy materials and catalysts, are hampered by high-dimensional composition spaces and myriad structural microstates. They present bottlenecks to the conventional use of density functional theory calculations, and consequently, the use of machine-learned potentials is becoming increasingly prevalent in atomic structure simulat…
▽ More
Computational high-throughput studies, especially in research on high-entropy materials and catalysts, are hampered by high-dimensional composition spaces and myriad structural microstates. They present bottlenecks to the conventional use of density functional theory calculations, and consequently, the use of machine-learned potentials is becoming increasingly prevalent in atomic structure simulations. In this communication, we show the results of adjusting and fine-tuning the pretrained EquiformerV2 model from the Open Catalyst Project to infer adsorption energies of *OH and *O on the out-of-domain high-entropy alloy Ag-Ir-Pd-Pt-Ru. By applying an energy filter based on the local environment of the binding site the zero-shot inference is markedly improved and through few-shot fine-tuning the model yields state-of-the-art accuracy. It is also found that EquiformerV2, assuming the role of general machine learning potential, is able to inform a smaller, more focused direct inference model. This knowledge distillation setup boosts performance on complex binding sites. Collectively, this shows that foundational knowledge learned from ordered intermetallic structures, can be extrapolated to the highly disordered structures of solid-solutions. With the vastly accelerated computational throughput of these models, hitherto infeasible research in the high-entropy material space is now readily accessible.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Authors:
Nate Gruver,
Anuroop Sriram,
Andrea Madotto,
Andrew Gordon Wilson,
C. Lawrence Zitnick,
Zachary Ulissi
Abstract:
We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculatio…
▽ More
We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials
Authors:
Xiaoxiao Wang,
Joseph Musielewicz,
Richard Tran,
Sudheesh Kumar Ethirajan,
Xiaoyan Fu,
Hilda Mera,
John R. Kitchin,
Rachel C. Kurchin,
Zachary W. Ulissi
Abstract:
Although density functional theory (DFT) has aided in accelerating the discovery of new materials, such calculations are computationally expensive, especially for high-throughput efforts. This has prompted an explosion in exploration of machine learning assisted techniques to improve the computational efficiency of DFT. In this study, we present a comprehensive investigation of the broader applica…
▽ More
Although density functional theory (DFT) has aided in accelerating the discovery of new materials, such calculations are computationally expensive, especially for high-throughput efforts. This has prompted an explosion in exploration of machine learning assisted techniques to improve the computational efficiency of DFT. In this study, we present a comprehensive investigation of the broader application of Finetuna, an active learning framework to accelerate structural relaxation in DFT with prior information from Open Catalyst Project pretrained graph neural networks. We explore the challenges associated with out-of-domain systems: alcohol ($C_{>2}$) on metal surfaces as larger adsorbates, metal-oxides with spin polarization, and three-dimensional (3D) structures like zeolites and metal-organic-frameworks. By pre-training machine learning models on large datasets and fine-tuning the model along the simulation, we demonstrate the framework's ability to conduct relaxations with fewer DFT calculations. Depending on the similarity of the test systems to the training systems, a more conservative querying strategy is applied. Our best-performing Finetuna strategy reduces the number of DFT single-point calculations by 80% for alcohols and 3D structures, and 42% for oxide systems.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture
Authors:
Anuroop Sriram,
Sihoon Choi,
Xiaohan Yu,
Logan M. Brabson,
Abhishek Das,
Zachary Ulissi,
Matt Uyttendaele,
Andrew J. Medford,
David S. Sholl
Abstract:
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical spa…
▽ More
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed $CO_2$ and/or $H_2O$. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.
△ Less
Submitted 27 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Chemical Properties from Graph Neural Network-Predicted Electron Densities
Authors:
Ethan M. Sunshine,
Muhammed Shuaibi,
Zachary W. Ulissi,
John R. Kitchin
Abstract:
According to density functional theory, any chemical property can be inferred from the electron density, making it the most informative attribute of an atomic structure. In this work, we demonstrate the use of established physical methods to obtain important chemical properties from model-predicted electron densities. We introduce graph neural network architectural choices that provide physically…
▽ More
According to density functional theory, any chemical property can be inferred from the electron density, making it the most informative attribute of an atomic structure. In this work, we demonstrate the use of established physical methods to obtain important chemical properties from model-predicted electron densities. We introduce graph neural network architectural choices that provide physically relevant and useful electron density predictions. Despite not training to predict atomic charges, the model is able to predict atomic charges with an order of magnitude lower error than a sum of atomic charge densities. Similarly, the model predicts dipole moments with half the error of the sum of atomic charge densities method. We demonstrate that larger data sets lead to more useful predictions in these tasks. These results pave the way for an alternative path in atomistic machine learning, where data-driven approaches and existing physical methods are used in tandem to obtain a variety of chemical properties in an explainable and self-consistent manner.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Beyond Independent Error Assumptions in Large GNN Atomistic Models
Authors:
Janghoon Ock,
Tian Tian,
John Kitchin,
Zachary Ulissi
Abstract:
The practical applications of determining the relative difference in adsorption energies are extensive, such as identifying optimal catalysts, calculating reaction energies, and determining the lowest adsorption energy on a catalytic surface. Although Density Functional Theory (DFT) can effectively calculate relative values through systematic error cancellation, the accuracy of Graph Neural Networ…
▽ More
The practical applications of determining the relative difference in adsorption energies are extensive, such as identifying optimal catalysts, calculating reaction energies, and determining the lowest adsorption energy on a catalytic surface. Although Density Functional Theory (DFT) can effectively calculate relative values through systematic error cancellation, the accuracy of Graph Neural Networks (GNNs) in this regard remains uncertain. To investigate this issue, we analyzed approximately 483 million pairs of energy differences predicted by DFT and GNNs using the Open Catalyst 2020 - Dense dataset. Our analysis revealed that GNNs exhibit a correlated error that can be reduced through subtraction, thereby challenging the naive independent error assumption in GNN predictions and leading to more precise energy difference predictions. To assess the magnitude of error cancellation in chemically similar pairs, we introduced a new metric, the subgroup error cancellation ratio (SECR). Our findings suggest that state-of-the-art GNN models can achieve error reduction up to 77% in these subgroups, comparable to the level of error cancellation observed with DFT. This significant error cancellation allows GNNs to achieve higher accuracy than individual adsorption energy predictions, which can otherwise suffer from amplified error due to random error propagation.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
WhereWulff: A semi-autonomous workflow for systematic catalyst surface reactivity under reaction conditions
Authors:
Rohan Yuri Sanspeur,
Javier Heras-Domingo,
John R. Kitchin,
Zachary Ulissi
Abstract:
This paper introduces WhereWulff, a semi-autonomous workflow for modeling the reactivity of catalyst surfaces. The workflow begins with a bulk optimization task that takes an initial bulk structure, and returns the optimized bulk geometry and magnetic state, including stability under reaction conditions. The stable bulk structure is the input to a surface chemistry task that enumerates surfaces up…
▽ More
This paper introduces WhereWulff, a semi-autonomous workflow for modeling the reactivity of catalyst surfaces. The workflow begins with a bulk optimization task that takes an initial bulk structure, and returns the optimized bulk geometry and magnetic state, including stability under reaction conditions. The stable bulk structure is the input to a surface chemistry task that enumerates surfaces up to a user-specified maximum Miller index, computes relaxed surface energies for those surfaces, and then prioritizes those for subsequent adsorption energy calculations based on their contribution to the Wulff construction shape. The workflow handles computational resource constraints such as limited wall-time as well as automated job submission and analysis. We illustrate the workflow for oxygen evolution (OER) intermediates on two double perovskites. WhereWulff nearly halved the number of Density Functional Theory (DFT) calculations from ~ 240 to ~ 132 by prioritizing terminations, up to a maximum Miller index of 1, based on surface stability. Additionally, it automatically handled the 180 additional re-submission jobs required to successfully converge 120+ atoms systems under a 48-hour wall-time cluster constraint. There are four main use cases that we envision for WhereWulff: (1) as a first-principles source of truth to validate and update a closed-loop self-sustaining materials discovery pipeline, (2) as a data generation tool, (3) as an educational tool, allowing users (e.g. experimentalists) unfamiliar with OER modeling to probe materials they might be interested in before doing further in-domain analyses, (4) and finally as a starting point for users to extend with reactions other than OER, as part of a collaborative software community.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials
Authors:
Janice Lan,
Aini Palizhati,
Muhammed Shuaibi,
Brandon M. Wood,
Brook Wander,
Abhishek Das,
Matt Uyttendaele,
C. Lawrence Zitnick,
Zachary W. Ulissi
Abstract:
Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and r…
▽ More
Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.
△ Less
Submitted 15 September, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Catlas: an automated framework for catalyst discovery demonstrated for direct syngas conversion
Authors:
Brook Wander,
Kirby Broderick,
Zachary W. Ulissi
Abstract:
Catalyst discovery is paramount to support access to energy and key chemical feedstocks in a post fossil fuel era. Exhaustive computational searches of large material design spaces using ab-initio methods like density functional theory (DFT) are infeasible. We seek to explore large design spaces at relatively low computational cost by leveraging large, generalized, graph-based machine learning (ML…
▽ More
Catalyst discovery is paramount to support access to energy and key chemical feedstocks in a post fossil fuel era. Exhaustive computational searches of large material design spaces using ab-initio methods like density functional theory (DFT) are infeasible. We seek to explore large design spaces at relatively low computational cost by leveraging large, generalized, graph-based machine learning (ML) models, which are pretrained and therefore require no upfront data collection or training. We present catlas, a framework that distributes and automates the generation of adsorbate-surface configurations and ML inference of DFT energies to achieve this goal. Catlas is open source, making ML assisted catalyst screenings easy and available to all. To demonstrate its efficacy, we use catlas to explore catalyst candidates for the direct conversion of syngas to multi-carbon oxygenates. For this case study, we explore 947 stable/ metastable binary, transition metal intermetallics as possible catalyst candidates. On this subset of materials, we are able to predict the adsorption energy of key descriptors, *CO and *OH, with near-DFT accuracy (0.16, 0.14 eV MAE, respectively). Using the projected selectivity towards C2+ oxygenates from an existing microkinetic model, we identified 144 candidate materials. For 10 promising candidates, DFT calculations reveal a good correlation with our assessment using ML. Among the top elemental combinations were Pt-Ti, Pd-V, Ni-Nb, and Ti-Zn, all of which appear unexplored experimentally.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
Authors:
Richard Tran,
Janice Lan,
Muhammed Shuaibi,
Brandon M. Wood,
Siddharth Goyal,
Abhishek Das,
Javier Heras-Domingo,
Adeesh Kolluru,
Ammar Rizvi,
Nima Shoghi,
Anuroop Sriram,
Felix Therrien,
Jehad Abed,
Oleksandr Voznyy,
Edward H. Sargent,
Zachary Ulissi,
C. Lawrence Zitnick
Abstract:
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single p…
▽ More
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data.
△ Less
Submitted 7 March, 2023; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery
Authors:
Adeesh Kolluru,
Muhammed Shuaibi,
Aini Palizhati,
Nima Shoghi,
Abhishek Das,
Brandon Wood,
C. Lawrence Zitnick,
John R Kitchin,
Zachary W Ulissi
Abstract:
The development of machine learned potentials for catalyst discovery has predominantly been focused on very specific chemistries and material compositions. While effective in interpolating between available materials, these approaches struggle to generalize across chemical space. The recent curation of large-scale catalyst datasets has offered the opportunity to build a universal machine learning…
▽ More
The development of machine learned potentials for catalyst discovery has predominantly been focused on very specific chemistries and material compositions. While effective in interpolating between available materials, these approaches struggle to generalize across chemical space. The recent curation of large-scale catalyst datasets has offered the opportunity to build a universal machine learning potential, spanning chemical and composition space. If accomplished, said potential could accelerate the catalyst discovery process across a variety of applications (CO2 reduction, NH3 production, etc.) without additional specialized training efforts that are currently required. The release of the Open Catalyst 2020 (OC20) has begun just that, pushing the heterogeneous catalysis and machine learning communities towards building more accurate and robust models. In this perspective, we discuss some of the challenges and findings of recent developments on OC20. We examine the performance of current models across different materials and adsorbates to identify notably underperforming subsets. We then discuss some of the modeling efforts surrounding energy-conservation, approaches to finding and evaluating the local minima, and augmentation of off-equilibrium data. To complement the community's ongoing developments, we end with an outlook to some of the important challenges that have yet to be thoroughly explored for large-scale catalyst discovery.
△ Less
Submitted 13 June, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
FINETUNA: Fine-tuning Accelerated Molecular Simulations
Authors:
Joseph Musielewicz,
Xiaoxiao Wang,
Tian Tian,
Zachary Ulissi
Abstract:
Machine learning approaches have the potential to approximate Density Functional Theory (DFT) for atomistic simulations in a computationally efficient manner, which could dramatically increase the impact of computational simulations on real-world problems. However, they are limited by their accuracy and the cost of generating labeled data. Here, we present an online active learning framework for a…
▽ More
Machine learning approaches have the potential to approximate Density Functional Theory (DFT) for atomistic simulations in a computationally efficient manner, which could dramatically increase the impact of computational simulations on real-world problems. However, they are limited by their accuracy and the cost of generating labeled data. Here, we present an online active learning framework for accelerating the simulation of atomic systems efficiently and accurately by incorporating prior physical information learned by large-scale pre-trained graph neural network models from the Open Catalyst Project. Accelerating these simulations enables useful data to be generated more cheaply, allowing better models to be trained and more atomistic systems to be screened. We also present a method of comparing local optimization techniques on the basis of both their speed and accuracy. Experiments on 30 benchmark adsorbate-catalyst systems show that our method of transfer learning to incorporate prior information from pre-trained models accelerates simulations by reducing the number of DFT calculations by 91%, while meeting an accuracy threshold of 0.02 eV 93% of the time. Finally, we demonstrate a technique for leveraging the interactive functionality built in to VASP to efficiently compute single point calculations within our online active learning framework without the significant startup costs. This allows VASP to work in tandem with our framework while requiring 75% fewer self-consistent cycles than conventional single point calculations. The online active learning implementation, and examples using the VASP interactive code, are available in the open source FINETUNA package on Github.
△ Less
Submitted 1 July, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets
Authors:
Johannes Gasteiger,
Muhammed Shuaibi,
Anuroop Sriram,
Stephan Günnemann,
Zachary Ulissi,
C. Lawrence Zitnick,
Abhishek Das
Abstract:
Recent years have seen the advent of molecular simulation datasets that are orders of magnitude larger and more diverse. These new datasets differ substantially in four aspects of complexity: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). Desp…
▽ More
Recent years have seen the advent of molecular simulation datasets that are orders of magnitude larger and more diverse. These new datasets differ substantially in four aspects of complexity: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). Despite these large differences, benchmarks on small and narrow datasets remain the predominant method of demonstrating progress in graph neural networks (GNNs) for molecular simulation, likely due to cheaper training compute requirements. This raises the question -- does GNN progress on small and narrow datasets translate to these more complex datasets? This work investigates this question by first developing the GemNet-OC model based on the large Open Catalyst 2020 (OC20) dataset. GemNet-OC outperforms the previous state-of-the-art on OC20 by 16% while reducing training time by a factor of 10. We then compare the impact of 18 model components and hyperparameter choices on performance in multiple datasets. We find that the resulting model would be drastically different depending on the dataset used for making model choices. To isolate the source of this discrepancy we study six subsets of the OC20 dataset that individually test each of the above-mentioned four dataset aspects. We find that results on the OC-2M subset correlate well with the full OC20 dataset while being substantially cheaper to train on. Our findings challenge the common practice of developing GNNs solely on small datasets, but highlight ways of achieving fast development cycles and generalizable results via moderately-sized, representative datasets such as OC-2M and efficient models such as GemNet-OC. Our code and pretrained model weights are open-sourced.
△ Less
Submitted 30 September, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
The Open Catalyst 2020 (OC20) Dataset and Community Challenges
Authors:
Lowik Chanussot,
Abhishek Das,
Siddharth Goyal,
Thibaut Lavril,
Muhammed Shuaibi,
Morgane Riviere,
Kevin Tran,
Javier Heras-Domingo,
Caleb Ho,
Weihua Hu,
Aini Palizhati,
Anuroop Sriram,
Brandon Wood,
Junwoong Yoon,
Devi Parikh,
C. Lawrence Zitnick,
Zachary Ulissi
Abstract:
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both…
▽ More
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations (~264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks.
△ Less
Submitted 24 September, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage
Authors:
C. Lawrence Zitnick,
Lowik Chanussot,
Abhishek Das,
Siddharth Goyal,
Javier Heras-Domingo,
Caleb Ho,
Weihua Hu,
Thibaut Lavril,
Aini Palizhati,
Morgane Riviere,
Muhammed Shuaibi,
Anuroop Sriram,
Kevin Tran,
Brandon Wood,
Junwoong Yoon,
Devi Parikh,
Zachary Ulissi
Abstract:
Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours…
▽ More
Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours, days, or months. One solution that offers the potential of scaling to nation-sized grids is the conversion of renewable energy to other fuels, such as hydrogen or methane. To be widely adopted, this process requires cost-effective solutions to running electrochemical reactions. An open challenge is finding low-cost electrocatalysts to drive these reactions at high rates. Through the use of quantum mechanical simulations (density functional theory), new catalyst structures can be tested and evaluated. Unfortunately, the high computational cost of these simulations limits the number of structures that may be tested. The use of machine learning may provide a method to efficiently approximate these calculations, leading to new approaches in finding effective electrocatalysts. In this paper, we provide an introduction to the challenges in finding suitable electrocatalysts, how machine learning may be applied to the problem, and the use of the Open Catalyst Project OC20 dataset for model training.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Enabling robust offline active learning for machine learning potentials using simple physics-based priors
Authors:
Muhammed Shuaibi,
Saurabh Sivakumar,
Rui Qi Chen,
Zachary W. Ulissi
Abstract:
Machine learning surrogate models for quantum mechanical simulations has enabled the field to efficiently and accurately study material and molecular systems. Developed models typically rely on a substantial amount of data to make reliable predictions of the potential energy landscape or careful active learning and uncertainty estimates. When starting with small datasets, convergence of active lea…
▽ More
Machine learning surrogate models for quantum mechanical simulations has enabled the field to efficiently and accurately study material and molecular systems. Developed models typically rely on a substantial amount of data to make reliable predictions of the potential energy landscape or careful active learning and uncertainty estimates. When starting with small datasets, convergence of active learning approaches is a major outstanding challenge which limited most demonstrations to online active learning. In this work we demonstrate a $Δ$-machine learning approach that enables stable convergence in offline active learning strategies by avoiding unphysical configurations. We demonstrate our framework's capabilities on a structural relaxation, transition state calculation, and molecular dynamics simulation, with the number of first principle calculations being cut down anywhere from 70-90%. The approach is incorporated and developed alongside AMPtorch, an open-source machine learning potential package, along with interactive Google Colab notebook examples.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Methods for comparing uncertainty quantifications for material property predictions
Authors:
Kevin Tran,
Willie Neiswanger,
Junwoong Yoon,
Qingyang Zhang,
Eric Xing,
Zachary W. Ulissi
Abstract:
Data science and informatics tools have been proliferating recently within the computational materials science and catalysis fields. This proliferation has spurned the creation of various frameworks for automated materials screening, discovery, and design. Underpinning these frameworks are surrogate models with uncertainty estimates on their predictions. These uncertainty estimates are instrumenta…
▽ More
Data science and informatics tools have been proliferating recently within the computational materials science and catalysis fields. This proliferation has spurned the creation of various frameworks for automated materials screening, discovery, and design. Underpinning these frameworks are surrogate models with uncertainty estimates on their predictions. These uncertainty estimates are instrumental for determining which materials to screen next, but the computational catalysis field does not yet have a standard procedure for judging the quality of such uncertainty estimates. Here we present a suite of figures and performance metrics derived from the machine learning community that can be used to judge the quality of such uncertainty estimates. This suite probes the accuracy, calibration, and sharpness of a model quantitatively. We then show a case study where we judge various methods for predicting density-functional-theory-calculated adsorption energies. Of the methods studied here, we find that the best performer is a model where a convolutional neural network is used to supply features to a Gaussian process regressor, which then makes predictions of adsorption energies along with corresponding uncertainty estimates.
△ Less
Submitted 20 February, 2020; v1 submitted 20 December, 2019;
originally announced December 2019.