Search | arXiv e-print repository

Emulating Expert Insight: A Robust Strategy for Optimal Experimental Design

Authors: Matthew R. Carbone, Hyeong Jin Kim, Chandima Fernando, Shinjae Yoo, Daniel Olds, Howie Joress, Brian DeCost, Bruce Ravel, Yugang Zhang, Phillip M. Maffettone

Abstract: The challenge of optimal design of experiments (DOE) pervades materials science, physics, chemistry, and biology. Bayesian optimization has been used to address this challenge in vast sample spaces, although it requires framing experimental campaigns through the lens of maximizing some observable. This framing is insufficient for epistemic research goals that seek to comprehensively analyze a samp… ▽ More The challenge of optimal design of experiments (DOE) pervades materials science, physics, chemistry, and biology. Bayesian optimization has been used to address this challenge in vast sample spaces, although it requires framing experimental campaigns through the lens of maximizing some observable. This framing is insufficient for epistemic research goals that seek to comprehensively analyze a sample space, without an explicit scalar objective (e.g., the characterization of a wafer or sample library). In this work, we propose a flexible formulation of scientific value that recasts a dataset of input conditions and higher-dimensional observable data into a continuous, scalar metric. Intuitively, the scientific value function measures where observables change significantly, emulating the perspective of experts driving an experiment, and can be used in collaborative analysis tools or as an objective for optimization techniques. We demonstrate this technique by exploring simulated phase boundaries from different observables, autonomously driving a variable temperature measurement of a ferroelectric material, and providing feedback from a nanoparticle synthesis campaign. The method is seamlessly compatible with existing optimization tools, can be extended to multi-modal and multi-fidelity experiments, and can integrate existing models of an experimental system. Because of its flexibility, it can be deployed in a range of experimental settings for autonomous or accelerated experiments. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2306.16349 [pdf, other]

Accurate, uncertainty-aware classification of molecular chemical motifs from multi-modal X-ray absorption spectroscopy

Authors: Matthew R. Carbone, Phillip M. Maffettone, Xiaohui Qu, Shinjae Yoo, Deyu Lu

Abstract: Accurate classification of molecular chemical motifs from experimental measurement is an important problem in molecular physics, chemistry and biology. In this work, we present neural network ensemble classifiers for predicting the presence (or lack thereof) of 41 different chemical motifs on small molecules from simulated C, N and O K-edge X-ray absorption near-edge structure (XANES) spectra. Our… ▽ More Accurate classification of molecular chemical motifs from experimental measurement is an important problem in molecular physics, chemistry and biology. In this work, we present neural network ensemble classifiers for predicting the presence (or lack thereof) of 41 different chemical motifs on small molecules from simulated C, N and O K-edge X-ray absorption near-edge structure (XANES) spectra. Our classifiers not only reach a maximum average class-balanced accuracy of 0.99 but also accurately quantify uncertainty. We also show that including multiple XANES modalities improves predictions notably on average, demonstrating a "multi-modal advantage" over any single modality. In addition to structure refinement, our approach can be generalized for broad applications with molecular design pipelines. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2304.11120 [pdf, other]

What is missing in autonomous discovery: Open challenges for the community

Authors: Phillip M. Maffettone, Pascal Friederich, Sterling G. Baird, Ben Blaiszik, Keith A. Brown, Stuart I. Campbell, Orion A. Cohen, Tantum Collins, Rebecca L. Davis, Ian T. Foster, Navid Haghmoradi, Mark Hereld, Nicole Jung, Ha-Kyung Kwon, Gabriella Pizzuto, Jacob Rintamaki, Casper Steinmann, Luca Torresi, Shijing Sun

Abstract: Self-driving labs (SDLs) leverage combinations of artificial intelligence, automation, and advanced computing to accelerate scientific discovery. The promise of this field has given rise to a rich community of passionate scientists, engineers, and social scientists, as evidenced by the development of the Acceleration Consortium and recent Accelerate Conference. Despite its strengths, this rapidly… ▽ More Self-driving labs (SDLs) leverage combinations of artificial intelligence, automation, and advanced computing to accelerate scientific discovery. The promise of this field has given rise to a rich community of passionate scientists, engineers, and social scientists, as evidenced by the development of the Acceleration Consortium and recent Accelerate Conference. Despite its strengths, this rapidly developing field presents numerous opportunities for growth, challenges to overcome, and potential risks of which to remain aware. This community perspective builds on a discourse instantiated during the first Accelerate Conference, and looks to the future of self-driving labs with a tempered optimism. Incorporating input from academia, government, and industry, we briefly describe the current status of self-driving labs, then turn our attention to barriers, opportunities, and a vision for what is possible. Our field is delivering solutions in technology and infrastructure, artificial intelligence and knowledge generation, and education and workforce development. In the spirit of community, we intend for this work to foster discussion and drive best practices as our field grows. △ Less

Submitted 2 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2301.09177 [pdf, other]

Self-driving Multimodal Studies at User Facilities

Authors: Phillip M. Maffettone, Daniel B. Allan, Stuart I. Campbell, Matthew R. Carbone, Thomas A. Caswell, Brian L. DeCost, Dmitri Gavrilov, Marcus D. Hanwell, Howie Joress, Joshua Lynch, Bruce Ravel, Stuart B. Wilkins, Jakub Wlodek, Daniel Olds

Abstract: Multimodal characterization is commonly required for understanding materials. User facilities possess the infrastructure to perform these measurements, albeit in serial over days to months. In this paper, we describe a unified multimodal measurement of a single sample library at distant instruments, driven by a concert of distributed agents that use analysis from each modality to inform the direct… ▽ More Multimodal characterization is commonly required for understanding materials. User facilities possess the infrastructure to perform these measurements, albeit in serial over days to months. In this paper, we describe a unified multimodal measurement of a single sample library at distant instruments, driven by a concert of distributed agents that use analysis from each modality to inform the direction of the other in real time. Powered by the Bluesky project at the National Synchrotron Light Source II, this experiment is a world's first for beamline science, and provides a blueprint for future approaches to multimodal and multifidelity experiments at user facilities. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). AI4Mat Workshop

arXiv:2201.03550 [pdf, other]

Machine learning enabling high-throughput and remote operations at large-scale user facilities

Authors: Tatiana Konstantinova, Phillip M. Maffettone, Bruce Ravel, Stuart I. Campbell, Andi M. Barbour, Daniel Olds

Abstract: Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron li… ▽ More Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron light sources. Machine learning (ML) methods are regularly developed to process and interpret large datasets in real-time with measurements. However, there remain conceptual barriers to entry for the facility general user community, whom often lack expertise in ML, and technical barriers for deploying ML models. Herein, we demonstrate a variety of archetypal ML models for on-the-fly analysis at multiple beamlines at the National Synchrotron Light Source II (NSLS-II). We describe these examples instructively, with a focus on integrating the models into existing experimental workflows, such that the reader can easily include their own ML techniques into experiments at NSLS-II or facilities with a common infrastructure. The framework presented here shows how with little effort, diverse ML models operate in conjunction with feedback loops via integration into the existing Bluesky Suite for experimental orchestration and data management. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: 12 pages, 5 figures

arXiv:2104.04392 [pdf]

Deep learning for visualization and novelty detection in large X-ray diffraction datasets

Authors: Lars Banko, Phillip M. Maffettone, Dennis Naujoks, Daniel Olds, Alfred Ludwig

Abstract: We apply variational autoencoders (VAE) to X-ray diffraction (XRD) data analysis on both simulated and experimental thin-film data. We show that crystal structure representations learned by a VAE reveal latent information, such as the structural similarity of textured diffraction patterns. While other artificial intelligence (AI) agents are effective at classifying XRD data into known phases, a si… ▽ More We apply variational autoencoders (VAE) to X-ray diffraction (XRD) data analysis on both simulated and experimental thin-film data. We show that crystal structure representations learned by a VAE reveal latent information, such as the structural similarity of textured diffraction patterns. While other artificial intelligence (AI) agents are effective at classifying XRD data into known phases, a similarly conditioned VAE is uniquely effective at knowing what it does not know, rapidly identifying novel phases and mixtures. These capabilities demonstrate that a VAE is a valuable AI agent for materials discovery and understanding XRD measurements both on-the-fly and during post hoc analysis. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2104.00864 [pdf, other]

doi 10.1063/5.0052859

Constrained non-negative matrix factorization enabling real-time insights of $\textit{in situ}$ and high-throughput experiments

Authors: Phillip M. Maffettone, Aidan C. Daly, Daniel Olds

Abstract: Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as $\textit{in situ}$ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produce… ▽ More Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as $\textit{in situ}$ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from $\textit{in situ}$ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during $\textit{in situ}$ and high-throughput experiments. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: This article has been submitted to Applied Physics Reviews. After it is published, it will be found at https://aip.scitation.org/journal/are. Copyright (2021) Phillip M. Maffettone, Aiden C. Daly, Daniel Olds

arXiv:2008.00283 [pdf, other]

doi 10.1038/s43588-021-00059-2

Crystallography companion agent for high-throughput materials discovery

Authors: Phillip M. Maffettone, Lars Banko, Peng Cui, Yury Lysogorskiy, Marc A. Little, Daniel Olds, Alfred Ludwig, Andrew I. Cooper

Abstract: The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone, and impossible to scale. With the advent of autonomous robotic scientists or self-driving labs, contemporary techniques pro… ▽ More The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone, and impossible to scale. With the advent of autonomous robotic scientists or self-driving labs, contemporary techniques prohibit the integration of XRD. Here, we describe a computer program for the autonomous characterization of XRD data, driven by artificial intelligence (AI), for the discovery of new materials. Starting from structural databases, we train an ensemble model using a physically accurate synthetic dataset, which output probabilistic classifications -- rather than absolutes -- to overcome the overconfidence in traditional neural networks. This AI agent behaves as a companion to the researcher, improving accuracy and offering significant time savings. It was demonstrated on a diverse set of organic and inorganic materials characterization challenges. This innovation is directly applicable to inverse design approaches, robotic discovery systems, and can be immediately considered for other forms of characterization such as spectroscopy and the pair distribution function. △ Less

Submitted 17 March, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: For associated code, see https://github.com/maffettone/xca

Journal ref: Nat. Comput. Sci. 1, 290 (2021)

arXiv:1804.04906 [pdf, other]

doi 10.1103/PhysRevLett.120.265501

Negative Hydration Expansion in ZrW2O8: Microscopic Mechanism, Spaghetti Dynamics, and Negative Thermal Expansion

Authors: Mia Baise, Phillip M. Maffettone, Fabien Trousselet, Nicholas P. Funnell, François-Xavier Coudert, Andrew L. Goodwin

Abstract: We use a combination of X-ray diffraction, total scattering and quantum mechanical calculations to determine the mechanism responsible for hydration-driven contraction in ZrW$_2$O$_8$. Inclusion of H$_2$O molecules within the ZrW$_2$O$_8$ network drives the concerted formation of new W--O bonds to give one-dimensional (--W--O--)$_n$ strings. The topology of the ZrW$_2$O$_8$ network is such that th… ▽ More We use a combination of X-ray diffraction, total scattering and quantum mechanical calculations to determine the mechanism responsible for hydration-driven contraction in ZrW$_2$O$_8$. Inclusion of H$_2$O molecules within the ZrW$_2$O$_8$ network drives the concerted formation of new W--O bonds to give one-dimensional (--W--O--)$_n$ strings. The topology of the ZrW$_2$O$_8$ network is such that there is no unique choice for the string trajectories: the same local changes in coordination can propagate with a large number of different periodicities. Consequently, ZrW$_2$O$_8$ is heavily disordered, with each configuration of strings forming a dense aperiodic `spaghetti'. This new connectivity contracts the unit cell \emph{via} large shifts in the Zr and W atom positions. Fluctuations of the undistorted parent structure towards this spaghetti phase emerge as the key NTE phonon modes in ZrW$_2$O$_8$ itself. The large relative density of NTE phonon modes in ZrW$_2$O$_8$ actually reflect the degeneracy of volume-contracting spaghetti excitations, itself a function of the particular topology of this remarkable material. △ Less

Submitted 13 April, 2018; originally announced April 2018.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. Lett. 120, 265501 (2018)

arXiv:1802.07629 [pdf, other]

Extreme cooperative swelling in topologically disordered fibre entanglements

Authors: Alistair R. Overy, Raj Pandya, Phillip M. Maffettone, Philip A. Chater, Arkadiy Simonov, Andrew L. Goodwin

Abstract: Entangled states are ubiquitous amongst fibrous materials, whether naturally occurring (keratin, collagen, DNA) or synthetic (nanotube assemblies, elastane). A key mechanical characteristic of these systems is their ability to reorganise in response to external stimuli, as implicated in e.g. hydration-induced swelling of keratin fibrils in human skin. During swelling, the curvature of individual f… ▽ More Entangled states are ubiquitous amongst fibrous materials, whether naturally occurring (keratin, collagen, DNA) or synthetic (nanotube assemblies, elastane). A key mechanical characteristic of these systems is their ability to reorganise in response to external stimuli, as implicated in e.g. hydration-induced swelling of keratin fibrils in human skin. During swelling, the curvature of individual fibres changes to give a cooperative and reversible structural reorganisation that opens up a pore network. The phenomenon is known to be highly dependent on topology, even if the nature of this dependence is not well understood: certain ordered entanglements (`weavings') can swell to many times their original volume while others are entirely incapable of swelling at all. Given this sensitivity to topology, it is puzzling how the disordered entanglements of many real materials manage to support cooperative dilation mechanisms. Here we use a combination of geometric and lattice-dynamical modelling to study the effect of disorder on swelling behaviour. The model system we devise spans a continuum of disordered topologies and is bounded by ordered states whose swelling behaviour is already known to be either vanishingly small or extreme. We find that while topological disorder often quenches swelling behaviour, certain disordered states possess a surprisingly large swelling capacity. Crucially, we show that the extreme swelling response previously observed only for certain specific weavings can be matched---and even superseded---by that of disordered entanglements. Our results establish a counterintuitive link between topological disorder and mechanical flexibility that has implications not only for polymer science but also for our broader understanding of collective phenomena in disordered systems. △ Less

Submitted 21 February, 2018; originally announced February 2018.

Comments: 17 pages, 4 figures

Showing 1–10 of 10 results for author: Maffettone, P M