-
SynCoTrain: A Dual Classifier PU-learning Framework for Synthesizability Prediction
Authors:
Sasan Amariamir,
Janine George,
Philipp Benner
Abstract:
Material discovery is a cornerstone of modern science, driving advancements in diverse disciplines from biomedical technology to climate solutions. Predicting synthesizability, a critical factor in realizing novel materials, remains a complex challenge due to the limitations of traditional heuristics and thermodynamic proxies. While stability metrics such as formation energy offer partial insights…
▽ More
Material discovery is a cornerstone of modern science, driving advancements in diverse disciplines from biomedical technology to climate solutions. Predicting synthesizability, a critical factor in realizing novel materials, remains a complex challenge due to the limitations of traditional heuristics and thermodynamic proxies. While stability metrics such as formation energy offer partial insights, they fail to account for kinetic factors and technological constraints that influence synthesis outcomes. These challenges are further compounded by the scarcity of negative data, as failed synthesis attempts are often unpublished or context-specific.
We present SynCoTrain, a semi-supervised machine learning model designed to predict the synthesizability of materials. SynCoTrain employs a co-training framework leveraging two complementary graph convolutional neural networks: SchNet and ALIGNN. By iteratively exchanging predictions between classifiers, SynCoTrain mitigates model bias and enhances generalizability. Our approach uses Positive and Unlabeled (PU) Learning to address the absence of explicit negative data, iteratively refining predictions through collaborative learning.
The model demonstrates robust performance, achieving high recall on internal and leave-out test sets. By focusing on oxide crystals, a well-characterized material family with extensive experimental data, we establish SynCoTrain as a reliable tool for predicting synthesizability while balancing dataset variability and computational efficiency. This work highlights the potential of co-training to advance high-throughput materials discovery and generative research, offering a scalable solution to the challenge of synthesizability prediction.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
A physics-encoded Fourier neural operator approach for surrogate modeling of divergence-free stress fields in solids
Authors:
Mohammad S. Khorrami,
Pawan Goyal,
Jaber R. Mianroodi,
Bob Svendsen,
Peter Benner,
Dierk Raabe
Abstract:
The purpose of the current work is the development of a so-called physics-encoded Fourier neural operator (PeFNO) for surrogate modeling of the quasi-static equilibrium stress field in solids. Rather than accounting for constraints from physics in the loss function as done in the (now standard) physics-informed approach, the physics-encoded approach incorporates or "encodes" such constraints direc…
▽ More
The purpose of the current work is the development of a so-called physics-encoded Fourier neural operator (PeFNO) for surrogate modeling of the quasi-static equilibrium stress field in solids. Rather than accounting for constraints from physics in the loss function as done in the (now standard) physics-informed approach, the physics-encoded approach incorporates or "encodes" such constraints directly into the network or operator architecture. As a result, in contrast to the physics-informed approach in which only training is physically constrained, both training and output are physically constrained in the physics-encoded approach. For the current constraint of divergence-free stress, a novel encoding approach based on a stress potential is proposed.
As a "proof-of-concept" example application of the proposed PeFNO, a heterogeneous polycrystalline material consisting of isotropic elastic grains subject to uniaxial extension is considered. Stress field data for training are obtained from the numerical solution of a corresponding boundary-value problem for quasi-static mechanical equilibrium. This data is also employed to train an analogous physics-guided FNO (PgFNO) and physics-informed FNO (PiFNO) for comparison. As confirmed by this comparison and as expected on the basis of their differences, the output of the trained PeFNO is significantly more accurate in satisfying mechanical equilibrium than the output of either the trained PgFNO or the trained PiFNO.
△ Less
Submitted 4 February, 2025; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Roadmap on Data-Centric Materials Science
Authors:
Stefan Bauer,
Peter Benner,
Tristan Bereau,
Volker Blum,
Mario Boley,
Christian Carbogno,
C. Richard A. Catlow,
Gerhard Dehm,
Sebastian Eibl,
Ralph Ernstorfer,
Ádám Fekete,
Lucas Foppa,
Peter Fratzl,
Christoph Freysoldt,
Baptiste Gault,
Luca M. Ghiringhelli,
Sajal K. Giri,
Anton Gladyshev,
Pawan Goyal,
Jason Hattrick-Simpers,
Lara Kabalan,
Petr Karpov,
Mohammad S. Khorrami,
Christoph Koch,
Sebastian Kokott
, et al. (36 additional authors not shown)
Abstract:
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) a…
▽ More
Science is and always has been based on data, but the terms "data-centric" and the "4th paradigm of" materials research indicate a radical change in how information is retrieved, handled and research is performed. It signifies a transformative shift towards managing vast data collections, digital repositories, and innovative data analytics methods. The integration of Artificial Intelligence (AI) and its subset Machine Learning (ML), has become pivotal in addressing all these challenges. This Roadmap on Data-Centric Materials Science explores fundamental concepts and methodologies, illustrating diverse applications in electronic-structure theory, soft matter theory, microstructure research, and experimental techniques like photoemission, atom probe tomography, and electron microscopy. While the roadmap delves into specific areas within the broad interdisciplinary field of materials science, the provided examples elucidate key concepts applicable to a wider range of topics. The discussed instances offer insights into addressing the multifaceted challenges encountered in contemporary materials research.
△ Less
Submitted 1 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
A foundation model for atomistic materials chemistry
Authors:
Ilyes Batatia,
Philipp Benner,
Yuan Chiang,
Alin M. Elena,
Dávid P. Kovács,
Janosh Riebesell,
Xavier R. Advincula,
Mark Asta,
Matthew Avaylon,
William J. Baldwin,
Fabian Berger,
Noam Bernstein,
Arghya Bhowmik,
Samuel M. Blau,
Vlad Cărare,
James P. Darby,
Sandip De,
Flaviano Della Pia,
Volker L. Deringer,
Rokas Elijošius,
Zakariya El-Machachi,
Fabio Falcioni,
Edvin Fako,
Andrea C. Ferrari,
Annalena Genreith-Schriever
, et al. (51 additional authors not shown)
Abstract:
Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferabilit…
▽ More
Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferability from one chemical system to the next. Here, using the state-of-the-art MACE architecture we introduce a single general-purpose ML model, trained on a public database of 150k inorganic crystals, that is capable of running stable molecular dynamics on molecules and materials. We demonstrate the power of the MACE-MP-0 model - and its qualitative and at times quantitative accuracy - on a diverse set problems in the physical sciences, including the properties of solids, liquids, gases, chemical reactions, interfaces and even the dynamics of a small protein. The model can be applied out of the box and as a starting or "foundation model" for any atomistic system of interest and is thus a step towards democratising the revolution of ML force fields by lowering the barriers to entry.
△ Less
Submitted 1 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions
Authors:
Janosh Riebesell,
Rhys E. A. Goodall,
Philipp Benner,
Yuan Chiang,
Bowen Deng,
Gerbrand Ceder,
Mark Asta,
Alpha A. Lee,
Anubhav Jain,
Kristin A. Persson
Abstract:
The rapid adoption of machine learning (ML) in domain sciences necessitates best practices and standardized benchmarking for performance evaluation. We present Matbench Discovery, an evaluation framework for ML energy models, applied as pre-filters for high-throughput searches of stable inorganic crystals. This framework addresses the disconnect between thermodynamic stability and formation energy…
▽ More
The rapid adoption of machine learning (ML) in domain sciences necessitates best practices and standardized benchmarking for performance evaluation. We present Matbench Discovery, an evaluation framework for ML energy models, applied as pre-filters for high-throughput searches of stable inorganic crystals. This framework addresses the disconnect between thermodynamic stability and formation energy, as well as retrospective vs. prospective benchmarking in materials discovery. We release a Python package to support model submissions and maintain an online leaderboard, offering insights into performance trade-offs. To identify the best-performing ML methodologies for materials discovery, we benchmarked various approaches, including random forests, graph neural networks (GNNs), one-shot predictors, iterative Bayesian optimizers, and universal interatomic potentials (UIP). Our initial results rank models by test set F1 scores for thermodynamic stability prediction: EquiformerV2 + DeNS > Orb > SevenNet > MACE > CHGNet > M3GNet > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi fingerprint random forest. UIPs emerge as the top performers, achieving F1 scores of 0.57-0.82 and discovery acceleration factors (DAF) of up to 6x on the first 10k stable predictions compared to random selection. We also identify a misalignment between regression metrics and task-relevant classification metrics. Accurate regressors can yield high false-positive rates near the decision boundary at 0 eV/atom above the convex hull. Our results demonstrate UIPs' ability to optimize computational budget allocation for expanding materials databases. However, their limitations remain underexplored in traditional benchmarks. We advocate for task-based evaluation frameworks, as implemented here, to address these limitations and advance ML-guided materials discovery.
△ Less
Submitted 10 December, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
A Quantum-Chemical Bonding Database for Solid-State Materials
Authors:
Aakash Ashok Naik,
Christina Ertural,
Nidal Dhamrait,
Philipp Benner,
Janine George
Abstract:
An in-depth insight into the chemistry and nature of the individual chemical bonds is essential for understanding materials. Bonding analysis is thus expected to provide important features for large-scale data analysis and machine learning of material properties. Such chemical bonding information can be computed using the LOBSTER software package, which post-processes modern density functional the…
▽ More
An in-depth insight into the chemistry and nature of the individual chemical bonds is essential for understanding materials. Bonding analysis is thus expected to provide important features for large-scale data analysis and machine learning of material properties. Such chemical bonding information can be computed using the LOBSTER software package, which post-processes modern density functional theory data by projecting the plane wave-based wave functions onto a local, atomic orbital basis. With the help of a fully automatic workflow, the VASP and LOBSTER software packages are used to generate the data. We then perform bonding analyses on 1520 compounds (insulators and semiconductors) and provide the results as a database. The database structure of the bonding analysis database, which allows easy data retrieval, is also explained. The projected densities of states and bonding indicators are benchmarked on standard density-functional theory computations and available heuristics, respectively. Lastly, we illustrate the predictive power of bonding descriptors by constructing a machine-learning model for phononic properties, which shows an increase in prediction accuracies by 27 % (mean absolute errors) compared to a benchmark model differing only by not relying on any quantum-chemical bonding features.
△ Less
Submitted 21 April, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
"Ultima Ratio": Simulating wide-range X-ray scattering and diffraction
Authors:
Brian R. Pauw,
Sofya Laskina,
Aakash Naik,
Glen J. Smales,
Janine George,
Ingo Breßler,
Philipp Benner
Abstract:
We demonstrate a strategy for simulating wide-range X-ray scattering patterns, which spans the small- and wide scattering angles as well as the scattering angles typically used for Pair Distribution Function (PDF) analysis. Such simulated patterns can be used to test holistic analysis models, and, since the diffraction intensity is on the same scale as the scattering intensity, may offer a novel p…
▽ More
We demonstrate a strategy for simulating wide-range X-ray scattering patterns, which spans the small- and wide scattering angles as well as the scattering angles typically used for Pair Distribution Function (PDF) analysis. Such simulated patterns can be used to test holistic analysis models, and, since the diffraction intensity is on the same scale as the scattering intensity, may offer a novel pathway for determining the degree of crystallinity.
The "Ultima Ratio" strategy is demonstrated on a 64-nm Metal Organic Framework (MOF) particle, calculated from Q < 0.01 1/nm up to Q < 150 1/nm, with a resolution of 0.16 Angstrom. The computations exploit a modified 3D Fast Fourier Transform (3D-FFT), whose modifications enable the transformations of matrices at least up to 8000^3 voxels in size. Multiple of these modified 3D-FFTs are combined to improve the low-Q behaviour. The resulting curve is compared to a wide-range scattering pattern measured on a polydisperse MOF powder. While computationally intensive, the approach is expected to be useful for simulating scattering from a wide range of realistic, complex structures, from (poly-)crystalline particles to hierarchical, multicomponent structures such as viruses and catalysts.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
An artificial neural network for surrogate modeling of stress fields in viscoplastic polycrystalline materials
Authors:
Mohammad S. Khorrami,
Jaber R. Mianroodi,
Nima H. Siboni,
Pawan Goyal,
Bob Svendsen,
Peter Benner,
Dierk Raabe
Abstract:
The purpose of this work is the development of an artificial neural network (ANN) for surrogate modeling of the mechanical response of viscoplastic grain microstructures. To this end, a U-Net-based convolutional neural network (CNN) is trained to account for the history dependence of the material behavior. The training data take the form of numerical simulation results for the von Mises stress fie…
▽ More
The purpose of this work is the development of an artificial neural network (ANN) for surrogate modeling of the mechanical response of viscoplastic grain microstructures. To this end, a U-Net-based convolutional neural network (CNN) is trained to account for the history dependence of the material behavior. The training data take the form of numerical simulation results for the von Mises stress field under quasi-static tensile loading. The trained CNN (tCNN) can accurately reproduce both the average response as well as the local von Mises stress field. The tCNN calculates the von Mises stress field of grain microstructures not included in the training dataset about 500 times faster than its calculation based on the numerical solution with a spectral solver of the corresponding initial-boundary-value problem. The tCNN is also successfully applied to other types of microstructure morphologies (e.g., matrix-inclusion type topologies) and loading levels not contained in the training dataset.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.