-
Towards Informatics-Driven Design of Nuclear Waste Forms
Authors:
Vinay I. Hegde,
Miroslava Peterson,
Sarah I. Allec,
Xiaonan Lu,
Thiruvillamalai Mahadevan,
Thanh Nguyen,
Jayani Kalahe,
Jared Oshiro,
Robert J. Seffens,
Ethan K. Nickerson,
Jincheng Du,
Brian J. Riley,
John D. Vienna,
James E. Saal
Abstract:
Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the…
▽ More
Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Evaluation of GlassNet for physics-informed machine learning of glass stability and glass-forming ability
Authors:
Sarah I. Allec,
Xiaonan Lu,
Daniel R. Cassar,
Xuan T. Nguyen,
Vinay I. Hegde,
Thiruvillamalai Mahadevan,
Miroslava Peterson,
Jincheng Du,
Brian J. Riley,
John D. Vienna,
James E. Saal
Abstract:
Glasses form the basis of many modern applications and also hold great potential for future medical and environmental applications. However, their structural complexity and large composition space make design and optimization challenging for certain applications. Of particular importance for glass processing is an estimate of a given composition's glass-forming ability (GFA). However, there remain…
▽ More
Glasses form the basis of many modern applications and also hold great potential for future medical and environmental applications. However, their structural complexity and large composition space make design and optimization challenging for certain applications. Of particular importance for glass processing is an estimate of a given composition's glass-forming ability (GFA). However, there remain many open questions regarding the physical mechanisms of glass formation, especially in oxide glasses. It is apparent that a proxy for GFA would be highly useful in glass processing and design, but identifying such a surrogate property has proven itself to be difficult. Here, we explore the application of an open-source pre-trained NN model, GlassNet, that can predict the characteristic temperatures necessary to compute glass stability (GS) and assess the feasibility of using these physics-informed ML (PIML)-predicted GS parameters to estimate GFA. In doing so, we track the uncertainties at each step of the computation - from the original ML prediction errors, to the compounding of errors during GS estimation, and finally to the final estimation of GFA. While GlassNet exhibits reasonable accuracy on all individual properties, we observe a large compounding of error in the combination of these individual predictions for the prediction of GS, finding that random forest models offer similar accuracy to GlassNet. We also breakdown the ML performance on different glass families and find that the error in GS prediction is correlated with the error in crystallization peak temperature prediction. Lastly, we utilize this finding to assess the relationship between top-performing GS parameters and GFA for two ternary glass systems: sodium borosilicate and sodium iron phosphate glasses. We conclude that to obtain true ML predictive capability of GFA, significantly more data needs to be collected.
△ Less
Submitted 19 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
A case study of multi-modal, multi-institutional data management for the combinatorial materials science community
Authors:
Sarah I. Allec,
Eric S. Muckley,
Nathan S. Johnson,
Christopher K. H. Borg,
Dylan J. Kirsch,
Joshua Martin,
Rohit Pant,
Ichiro Takeuchi,
Andrew S. Lee,
James E. Saal,
Logan Ward,
Apurva Mehta
Abstract:
Although the convergence of high-performance computing, automation, and machine learning has significantly altered the materials design timeline, transformative advances in functional materials and acceleration of their design will require addressing the deficiencies that currently exist in materials informatics, particularly a lack of standardized experimental data management. The challenges asso…
▽ More
Although the convergence of high-performance computing, automation, and machine learning has significantly altered the materials design timeline, transformative advances in functional materials and acceleration of their design will require addressing the deficiencies that currently exist in materials informatics, particularly a lack of standardized experimental data management. The challenges associated with experimental data management are especially true for combinatorial materials science, where advancements in automation of experimental workflows have produced datasets that are often too large and too complex for human reasoning. The data management challenge is further compounded by the multi-modal and multi-institutional nature of these datasets, as they tend to be distributed across multiple institutions and can vary substantially in format, size, and content. To adequately map a materials design space from such datasets, an ideal materials data infrastructure would contain data and metadata describing i) synthesis and processing conditions, ii) characterization results, and iii) property and performance measurements. Here, we present a case study for the low-barrier development of such a dashboard that enables standardized organization, analysis, and visualization of a large data lake consisting of combinatorial datasets of synthesis and processing conditions, X-ray diffraction patterns, and materials property measurements generated at several different institutions. While this dashboard was developed specifically for data-driven thermoelectric materials discovery, we envision the adaptation of this prototype to other materials applications, and, more ambitiously, future integration into an all-encompassing materials data management infrastructure.
△ Less
Submitted 6 February, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Interpretable models for extrapolation in scientific machine learning
Authors:
Eric S. Muckley,
James E. Saal,
Bryce Meredig,
Christopher S. Roper,
John H. Martin
Abstract:
Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their p…
▽ More
Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their potential for facilitating novel scientific insight. Here we examine the trade-off between model performance and interpretability across a broad range of science and engineering problems with an emphasis on materials science datasets. We compare the performance of black box random forest and neural network machine learning algorithms to that of single-feature linear regressions which are fitted using interpretable input features discovered by a simple random search algorithm. For interpolation problems, the average prediction errors of linear regressions were twice as high as those of black box models. Remarkably, when prediction tasks required extrapolation, linear models yielded average error only 5% higher than that of black box models, and outperformed black box models in roughly 40% of the tested prediction tasks, which suggests that they may be desirable over complex algorithms in many extrapolation problems because of their superior interpretability, computational overhead, and ease of use. The results challenge the common assumption that extrapolative models for scientific machine learning are constrained by an inherent trade-off between performance and interpretability.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Quantifying the performance of machine learning models in materials discovery
Authors:
Christopher K. H. Borg,
Eric S. Muckley,
Clara Nyby,
James E. Saal,
Logan Ward,
Apurva Mehta,
Bryce Meredig
Abstract:
The predictive capabilities of machine learning (ML) models used in materials discovery are typically measured using simple statistics such as the root-mean-square error (RMSE) or the coefficient of determination ($r^2$) between ML-predicted materials property values and their known values. A tempting assumption is that models with low error should be effective at guiding materials discovery, and…
▽ More
The predictive capabilities of machine learning (ML) models used in materials discovery are typically measured using simple statistics such as the root-mean-square error (RMSE) or the coefficient of determination ($r^2$) between ML-predicted materials property values and their known values. A tempting assumption is that models with low error should be effective at guiding materials discovery, and conversely, models with high error should give poor discovery performance. However, we observe that no clear connection exists between a "static" quantity averaged across an entire training set, such as RMSE, and an ML property model's ability to dynamically guide the iterative (and often extrapolative) discovery of novel materials with targeted properties. In this work, we simulate a sequential learning (SL)-guided materials discovery process and demonstrate a decoupling between traditional model error metrics and model performance in guiding materials discoveries. We show that model performance in materials discovery depends strongly on (1) the target range within the property distribution (e.g., whether a 1st or 10th decile material is desired); (2) the incorporation of uncertainty estimates in the SL acquisition function; (3) whether the scientist is interested in one discovery or many targets; and (4) how many SL iterations are allowed. To overcome the limitations of static metrics and robustly capture SL performance, we recommend metrics such as Discovery Yield ($DY$), a measure of how many high-performing materials were discovered during SL, and Discovery Probability ($DP$), a measure of likelihood of discovering high-performing materials at any point in the SL process.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Quantifying uncertainty in high-throughput density functional theory: a comparison of AFLOW, Materials Project, and OQMD
Authors:
Vinay I. Hegde,
Christopher K. H. Borg,
Zachary del Rosario,
Yoolhee Kim,
Maxwell Hutchinson,
Erin Antono,
Julia Ling,
Paul Saxe,
James E. Saal,
Bryce Meredig
Abstract:
A central challenge in high throughput density functional theory (HT-DFT) calculations is selecting a combination of input parameters and post-processing techniques that can be used across all materials classes, while also managing accuracy-cost tradeoffs. To investigate the effects of these parameter choices, we consolidate three large HT-DFT databases: Automatic-FLOW (AFLOW), the Materials Proje…
▽ More
A central challenge in high throughput density functional theory (HT-DFT) calculations is selecting a combination of input parameters and post-processing techniques that can be used across all materials classes, while also managing accuracy-cost tradeoffs. To investigate the effects of these parameter choices, we consolidate three large HT-DFT databases: Automatic-FLOW (AFLOW), the Materials Project (MP), and the Open Quantum Materials Database (OQMD), and compare reported properties across each pair of databases for materials calculated using the same initial crystal structure. We find that HT-DFT formation energies and volumes are generally more reproducible than band gaps and total magnetizations; for instance, a notable fraction of records disagree on whether a material is metallic (up to 7%) or magnetic (up to 15%). The variance between calculated properties is as high as 0.105 eV/atom (median relative absolute difference, or MRAD, of 6%) for formation energy, 0.65 Å$^3$/atom (MRAD of 4%) for volume, 0.21 eV (MRAD of 9%) for band gap, and 0.15 $μ_{\rm B}$/formula unit (MRAD of 8%) for total magnetization, comparable to the differences between DFT and experiment. We trace some of the larger discrepancies to choices involving pseudopotentials, the DFT+U formalism, and elemental reference states, and argue that further standardization of HT-DFT would be beneficial to reproducibility.
△ Less
Submitted 5 November, 2022; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Thermodynamic Stability of Mg-based Ternary Long-Period Stacking Ordered Structures
Authors:
James E. Saal,
C. Wolverton
Abstract:
Mg alloys containing long-period stacking ordered (LPSO) structures exhibit remarkably high tensile yield strength and ductility. They have been found in a variety of ternary Mg systems of the general form Mg-XL-XS, where XL and XS are elements larger and smaller than Mg, respectively. In this work, we examine the thermodynamic stability of these LPSO precipitates with density functional theory, u…
▽ More
Mg alloys containing long-period stacking ordered (LPSO) structures exhibit remarkably high tensile yield strength and ductility. They have been found in a variety of ternary Mg systems of the general form Mg-XL-XS, where XL and XS are elements larger and smaller than Mg, respectively. In this work, we examine the thermodynamic stability of these LPSO precipitates with density functional theory, using a newly proposed structure model based on the inclusion of a Mg interstitial atom. We predict the stabilities for 14H and 18R LPSO structures for many Mg-XL-XS ternary systems: 85 systems consisting of XL=rare earths (RE) Sc,Y,La-Lu and XS=Zn,Al,Cu,Co,Ni. We predict thermodynamically stable LPSO phases in all systems where LPSO structures are observed. In addition, we predict several stable LPSO structures in new, as-yet-unobserved Mg-RE-XS systems. Many non-RE XL elements are also explored on the basis of size mismatch between Mg and XL, including Tl,Sb,Pb,Na,Te,Bi,Pa,Ca,Th,K,Sr --- an additional 55 ternary systems. XL=Ca, Sr, and Th are predicted to be most promising to form stable LPSO phases, particularly with XS=Zn. Lastly, several previously observed trends amongst known XL elements are examined. We find that favorable mixing energy between Mg and XL on the FCC lattice and the size mismatch together serve as excellent criteria determining XL LPSO formation.
△ Less
Submitted 12 September, 2013;
originally announced September 2013.
-
Thermodynamic Stability of Co-Al-W L12 γ'
Authors:
James E. Saal,
Chris Wolverton
Abstract:
Co-based superalloys in the Co-Al-W system exhibit coherent L12 Co3(Al,W) γ' precipitates in an fcc Co γmatrix, analogous to Ni3Al in Ni-based systems. Unlike Ni3Al however, experimental observations of Co3(Al,W) suggest that it is not a stable phase at 1173K. Here, we perform an extensive series of density functional theory (DFT) calculations of the γ' Co3(Al,W) phase stability, including point d…
▽ More
Co-based superalloys in the Co-Al-W system exhibit coherent L12 Co3(Al,W) γ' precipitates in an fcc Co γmatrix, analogous to Ni3Al in Ni-based systems. Unlike Ni3Al however, experimental observations of Co3(Al,W) suggest that it is not a stable phase at 1173K. Here, we perform an extensive series of density functional theory (DFT) calculations of the γ' Co3(Al,W) phase stability, including point defect energetics and finite-temperature contributions. We first confirm and extend previous DFT calculations of the metastability of L12 Co3(Al0.5W0.5) γ' at 0K with respect to HCP Co, B2 CoAl, and D019 Co3W using the special quasi-random structure (SQS) approach to describe the Al/W solid solution, employing several exchange/correlation functionals, structures with varying degrees of disorder, and newly developed larger SQS. We expand the validity of this conclusion by considering the formation of antisite and vacancy point defects, predicting defect formation energies similar in magnitude to Ni3Al. However, in contrast to the Ni3Al system, we find that substituting Co on Al sites is thermodynamically favorable at 0K, consistent with experimental observation of Co excess and Al deficiency in γ' with respect to the Co3(Al0.5W0.5) composition. Lastly, we consider vibrational, electronic, and magnetic contributions to the free energy, finding that they promote the stability of γ', making the phase thermodynamically competitive with the convex hull at elevated temperature. Surprisingly, this is due to the relatively small finite-temperature contributions of one of the γ' decomposition products, B2 CoAl, effectively destabilizing the Co, CoAl, and Co3W three phase mixture, thus stabilizing the γ' phase.
△ Less
Submitted 4 October, 2012;
originally announced October 2012.