-
Tackling Data Scarcity with Transfer Learning: A Case Study of Thickness Characterization from Optical Spectra of Perovskite Thin Films
Authors:
Siyu Isaac Parker Tian,
Zekun Ren,
Selvaraj Venkataraj,
Yuanhang Cheng,
Daniil Bash,
Felipe Oviedo,
J. Senthilnath,
Vijila Chellappan,
Yee-Fun Lim,
Armin G. Aberle,
Benjamin P MacLeod,
Fraser G. L. Parlane,
Curtis P. Berlinguette,
Qianxiao Li,
Tonio Buonassisi,
Zhe Liu
Abstract:
Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propo…
▽ More
Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propose a machine learning model called thicknessML that predicts thickness from UV-Vis spectrophotometry input and an overarching transfer learning workflow. We demonstrate the transfer learning workflow from generic source domain of generic band-gapped materials to specific target domain of perovskite materials, where the target domain data only come from limited number (18) of refractive indices from literature. The target domain can be easily extended to other material classes with a few literature data. Defining thickness prediction accuracy to be within-10% deviation, thicknessML achieves 92.2% (with a deviation of 3.6%) accuracy with transfer learning compared to 81.8% (with a deviation of 3.6%) 11.7% without (lower mean and larger standard deviation). Experimental validation on six deposited perovskite films also corroborates the efficacy of the proposed workflow by yielding a 10.5% mean absolute percentage error (MAPE).
△ Less
Submitted 20 December, 2022; v1 submitted 14 June, 2022;
originally announced July 2022.
-
An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties
Authors:
Zekun Ren,
Siyu Isaac Parker Tian,
Juhwan Noh,
Felipe Oviedo,
Guangzong Xing,
Jiali Li,
Qiaohao Liang,
Ruiming Zhu,
Armin G. Aberle,
Shijing Sun,
Xiaonan Wang,
Yi Liu,
Qianxiao Li,
Senthilnath Jayavelu,
Kedar Hippalgaonkar,
Yousung Jung,
Tonio Buonassisi
Abstract:
Realizing general inverse design could greatly accelerate the discovery of new materials with user-defined properties. However, state-of-the-art generative models tend to be limited to a specific composition or crystal structure. Herein, we present a framework capable of general inverse design (not limited to a given set of elements or crystal structures), featuring a generalized invertible repres…
▽ More
Realizing general inverse design could greatly accelerate the discovery of new materials with user-defined properties. However, state-of-the-art generative models tend to be limited to a specific composition or crystal structure. Herein, we present a framework capable of general inverse design (not limited to a given set of elements or crystal structures), featuring a generalized invertible representation that encodes crystals in both real and reciprocal space, and a property-structured latent space from a variational autoencoder (VAE). In three design cases, the framework generates 142 new crystals with user-defined formation energies, bandgap, thermoelectric (TE) power factor, and combinations thereof. These generated crystals, absent in the training database, are validated by first-principles calculations. The success rates (number of first-principles-validated target-satisfying crystals/number of designed crystals) ranges between 7.1% and 38.9%. These results represent a significant step toward property-driven general inverse design using generative models, although practical challenges remain when coupled with experimental synthesis.
△ Less
Submitted 15 December, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Bridging the gap between photovoltaics R&D and manufacturing with data-driven optimization
Authors:
Felipe Oviedo,
Zekun Ren,
Xue Hansong,
Siyu Isaac Parker Tian,
Kaicheng Zhang,
Mariya Layurova,
Thomas Heumueller,
Ning Li,
Erik Birgersson,
Shijing Sun,
Benji Mayurama,
Ian Marius Peters,
Christoph J. Brabec,
John Fisher III,
Tonio Buonassisi
Abstract:
Novel photovoltaics, such as perovskites and perovskite-inspired materials, have shown great promise due to high efficiency and potentially low manufacturing cost. So far, solar cell R&D has mostly focused on achieving record efficiencies, a process that often results in small batches, large variance, and limited understanding of the physical causes of underperformance. This approach is intensive…
▽ More
Novel photovoltaics, such as perovskites and perovskite-inspired materials, have shown great promise due to high efficiency and potentially low manufacturing cost. So far, solar cell R&D has mostly focused on achieving record efficiencies, a process that often results in small batches, large variance, and limited understanding of the physical causes of underperformance. This approach is intensive in time and resources, and ignores many relevant factors for industrial production, particularly the need for high reproducibility and high manufacturing yield, and the accompanying need of physical insights. The record-efficiency paradigm is effective in early-stage R&D, but becomes unsuitable for industrial translation, requiring a repetition of the optimization procedure in the industrial setting. This mismatch between optimization objectives, combined with the complexity of physical root-cause analysis, contributes to decade-long timelines to transfer new technologies into the market. Based on recent machine learning and technoeconomic advances, our perspective articulates a data-driven optimization framework to bridge R&D and manufacturing optimization approaches. We extend the maximum-efficiency optimization paradigm by considering two additional dimensions: a technoeconomic figure of merit and scalable physical inference. Our framework naturally aligns different stages of technology development with shared optimization objectives, and accelerates the optimization process by providing physical insights.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
Embedding Physics Domain Knowledge into a Bayesian Network Enables Layer-by-Layer Process Innovation for Photovoltaics
Authors:
Zekun Ren,
Felipe Oviedo,
Muang Thway,
Siyu I. P. Tian,
Yue Wang,
Hansong Xue,
Jose Dario Perea,
Mariya Layurova,
Thomas Heumueller,
Erik Birgersson,
Armin Aberle,
Christoph J. Brabec,
Rolf Stangl,
Shijing Sun,
Qianxiao Li,
Fen Lin,
Ian Marius Peters,
Tonio Buonassisi
Abstract:
Process optimization of photovoltaic devices is a time-intensive, trial and error endeavor, without full transparency of the underlying physics, and with user-imposed constraints that may or may not lead to a global optimum. Herein, we demonstrate that embedding physics domain knowledge into a Bayesian network enables an optimization approach that identifies the root cause(s) of underperformance w…
▽ More
Process optimization of photovoltaic devices is a time-intensive, trial and error endeavor, without full transparency of the underlying physics, and with user-imposed constraints that may or may not lead to a global optimum. Herein, we demonstrate that embedding physics domain knowledge into a Bayesian network enables an optimization approach that identifies the root cause(s) of underperformance with layer by-layer resolution and reveals alternative optimal process windows beyond global black-box optimization. Our Bayesian-network approach links process conditions to materials descriptors (bulk and interface properties, e.g., bulk lifetime, doping, and surface recombination) and device performance parameters (e.g., cell efficiency), using a Bayesian inference framework with an autoencoder-based surrogate device-physics model that is 100x faster than numerical solvers. With the trained surrogate model, our approach is robust and reduces significantly the time consuming experimentalist intervention, even with small numbers of fabricated samples. To demonstrate our method, we perform layer-by-layer optimization of GaAs solar cells. In a single cycle of learning, we find an improved growth temperature for the GaAs solar cells without any secondary measurements, and demonstrate a 6.5% relative AM1.5G efficiency improvement above baseline and traditional black-box optimization methods.
△ Less
Submitted 3 November, 2019; v1 submitted 25 July, 2019;
originally announced July 2019.
-
Accelerating Photovoltaic Materials Development via High-Throughput Experiments and Machine-Learning-Assisted Diagnosis
Authors:
Shijing Sun,
Noor T. P. Hartono,
Zekun D. Ren,
Felipe Oviedo,
Antonio M. Buscemi,
Mariya Layurova,
De Xin Chen,
Tofunmi Ogunfunmi,
Janak Thapa,
Savitha Ramasamy,
Charles Settens,
Brian L. DeCost,
Aaron Gilad Kusne,
Zhe Liu,
Siyu I. P. Tian,
I. Marius Peters,
Juan-Pablo Correa-Baena,
Tonio Buonassisi
Abstract:
Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased thro…
▽ More
Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased throughput is enabled by streamlining experimental workflows, developing a set of precursors amenable to high-throughput synthesis, and developing machine-learning assisted diagnosis. We utilize a deep neural network to classify compounds based on experimental X-ray diffraction data into 0D, 2D, and 3D structures more than 10 times faster than human analysis and with 90% accuracy. We validate our methods using lead-halide perovskites and extend the application to novel lead-free compositions. The wider synthesis window and faster cycle of learning enables three noteworthy scientific findings: (1) we realize four inorganic layered perovskites, A3B2Br9 (A = Cs, Rb; B = Bi, Sb) in thin-film form via one-step liquid deposition; (2) we report a multi-site lead-free alloy series that was not previously described in literature, Cs3(Bi1-xSbx)2(I1-xBrx)9; and (3) we reveal the effect on bandgap (reduction to <2 eV) and structure upon simultaneous alloying on the B-site and X-site of Cs3Bi2I9 with Sb and Br. This study demonstrates that combining an accelerated experimental cycle of learning and machine-learning based diagnosis represents an important step toward realizing fully-automated laboratories for materials discovery and development.
△ Less
Submitted 25 November, 2018;
originally announced December 2018.
-
Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks
Authors:
Felipe Oviedo,
Zekun Ren,
Shijing Sun,
Charlie Settens,
Zhe Liu,
Noor Titan Putri Hartono,
Ramasamy Savitha,
Brian L. DeCost,
Siyu I. P. Tian,
Giuseppe Romano,
Aaron Gilad Kusne,
Tonio Buonassisi
Abstract:
X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a superv…
▽ More
X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce-data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal halides spanning 3 dimensionalities and 7 space-groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross validated accuracies for dimensionality and space-group classification of 93% and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16°, which enables an XRD pattern to be obtained and classified in 5.5 minutes or less.
△ Less
Submitted 23 April, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.