-
A Self-Supervised Robotic System for Autonomous Contact-Based Spatial Mapping of Semiconductor Properties
Authors:
Alexander E. Siemenn,
Basita Das,
Kangyu Ji,
Fang Sheng,
Tonio Buonassisi
Abstract:
Integrating robotically driven contact-based material characterization techniques into self-driving laboratories can enhance measurement quality, reliability, and throughput. While deep learning models support robust autonomy, current methods lack reliable pixel-precision positioning and require extensive labeled data. To overcome these challenges, we propose an approach for building self-supervis…
▽ More
Integrating robotically driven contact-based material characterization techniques into self-driving laboratories can enhance measurement quality, reliability, and throughput. While deep learning models support robust autonomy, current methods lack reliable pixel-precision positioning and require extensive labeled data. To overcome these challenges, we propose an approach for building self-supervised autonomy into contact-based robotic systems that teach the robot to follow domain expert measurement principles at high-throughputs. Firstly, we design a vision-based, self-supervised convolutional neural network (CNN) architecture that uses differentiable image priors to optimize domain-specific objectives, refining the pixel precision of predicted robot contact poses by 20.0% relative to existing approaches. Secondly, we design a reliable graph-based planner for generating distance-minimizing paths to accelerate the robot measurement throughput and decrease planning variance by 6x. We demonstrate the performance of this approach by autonomously driving a 4-degree-of-freedom robotic probe for 24 hours to characterize semiconductor photoconductivity at 3,025 uniquely predicted poses across a gradient of drop-casted perovskite film compositions, achieving throughputs over 125 measurements per hour. Spatially mapping photoconductivity onto each drop-casted film reveals compositional trends and regions of inhomogeneity, valuable for identifying manufacturing process defects. With this self-supervised CNN-driven robotic system, we enable high-precision and reliable automation of contact-based characterization techniques at high throughputs, thereby allowing the measurement of previously inaccessible yet important semiconductor properties for self-driving laboratories.
△ Less
Submitted 29 December, 2024; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Decreasing the Computing Time of Bayesian Optimization using Generalizable Memory Pruning
Authors:
Alexander E. Siemenn,
Tonio Buonassisi
Abstract:
Bayesian optimization (BO) suffers from long computing times when processing highly-dimensional or large data sets. These long computing times are a result of the Gaussian process surrogate model having a polynomial time complexity with the number of experiments. Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity scaling, in turn, hindering experime…
▽ More
Bayesian optimization (BO) suffers from long computing times when processing highly-dimensional or large data sets. These long computing times are a result of the Gaussian process surrogate model having a polynomial time complexity with the number of experiments. Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity scaling, in turn, hindering experimentation. Alternative surrogate models have been developed to reduce the computing utilization of the BO procedure, however, these methods require mathematical alteration of the inherit surrogate function, pigeonholing use into only that function. In this paper, we demonstrate a generalizable BO wrapper of memory pruning and bounded optimization, capable of being used with any surrogate model and acquisition function. Using this memory pruning approach, we show a decrease in wall-clock computing times per experiment of BO from a polynomially increasing pattern to a sawtooth pattern that has a non-increasing trend without sacrificing convergence performance. Furthermore, we illustrate the generalizability of the approach across two unique data sets, two unique surrogate models, and four unique acquisition functions. All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Using Scalable Computer Vision to Automate High-throughput Semiconductor Characterization
Authors:
Alexander E. Siemenn,
Eunice Aissi,
Fang Sheng,
Armi Tiihonen,
Hamide Kavak,
Basita Das,
Tonio Buonassisi
Abstract:
High-throughput materials synthesis methods have risen in popularity due to their potential to accelerate the design and discovery of novel functional materials, such as solution-processed semiconductors. After synthesis, key material properties must be measured and characterized to validate discovery and provide feedback to optimization cycles. However, with the boom in development of high-throug…
▽ More
High-throughput materials synthesis methods have risen in popularity due to their potential to accelerate the design and discovery of novel functional materials, such as solution-processed semiconductors. After synthesis, key material properties must be measured and characterized to validate discovery and provide feedback to optimization cycles. However, with the boom in development of high-throughput synthesis tools that champion production rates up to $10^4$ samples per hour with flexible form factors, most sample characterization methods are either slow (conventional rates of $10^1$ samples per hour, approximately 1000x slower) or rigid (e.g., designed for standard-size microplates), resulting in a bottleneck that impedes the materials-design process. To overcome this challenge, we propose a set of automated material property characterization (autocharacterization) tools that leverage the adaptive, parallelizable, and scalable nature of computer vision to accelerate the throughput of characterization by 85x compared to the non-automated workflow. We demonstrate a generalizable composition mapping tool for high-throughput synthesized binary material systems as well as two scalable autocharacterization algorithms that (1) autonomously compute the band gap of 200 unique compositions in 6 minutes and (2) autonomously compute the degree of degradation in 200 unique compositions in 20 minutes, generating ultra-high compositional resolution trends of band gap and stability. We demonstrate that the developed band gap and degradation detection autocharacterization methods achieve 98.5% accuracy and 96.9% accuracy, respectively, on the FA$_{1-x}$MA$_{x}$PbI$_3$, $0\leq x \leq 1$ perovskite semiconductor system.
△ Less
Submitted 21 November, 2023; v1 submitted 16 March, 2023;
originally announced April 2023.
-
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Authors:
Vitali Petsiuk,
Alexander E. Siemenn,
Saisamrit Surbehera,
Zad Chin,
Keith Tyser,
Gregory Hunter,
Arvind Raghavan,
Yann Hicke,
Bryan A. Plummer,
Ori Kerret,
Tonio Buonassisi,
Kate Saenko,
Armando Solar-Lezama,
Iddo Drori
Abstract:
We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid…
▽ More
We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid progress to the point that many recent models have demonstrated their ability to create realistic high-resolution images for various prompts. However, current text-to-image methods and the broader body of research in vision-language understanding still struggle with intricate text prompts that contain many objects with multiple attributes and relationships. We introduce a new text-to-image benchmark that contains a suite of thirty-two tasks over multiple applications that capture a model's ability to handle different features of a text prompt. For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly. Rather than subjectively evaluating text-to-image results on a set of prompts, our new multi-task benchmark consists of challenge tasks at three difficulty levels (easy, medium, and hard) and human ratings for each generated image.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization (ZoMBI)
Authors:
Alexander E. Siemenn,
Zekun Ren,
Qianxiao Li,
Tonio Buonassisi
Abstract:
Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only $0.82\%$ out of $146$k total materials in the open-access…
▽ More
Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only $0.82\%$ out of $146$k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum "needle" and then prunes the memory of low-performing historical experiments to accelerate compute times by reducing the algorithm time complexity from $O(n^3)$ to $O(φ^3)$ for $φ$ forward experiments per activation, which trends to a constant $O(1)$ over several activations. Additionally, ZoMBI implements two custom adaptive acquisition functions to further guide the sampling of new experiments toward the global optimum. We validate the algorithm's optimization performance on three real-world datasets exhibiting Needle-in-a-Haystack and further stress-test the algorithm's performance on an additional 174 analytical datasets. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3x more highly optimized than those discovered by similar methods MiP-EGO, TuRBO, and HEBO.
△ Less
Submitted 2 February, 2023; v1 submitted 26 August, 2022;
originally announced August 2022.
-
A Machine Learning and Computer Vision Approach to Rapidly Optimize Multiscale Droplet Generation
Authors:
Alexander E. Siemenn,
Evyatar Shaulsky,
Matthew Beveridge,
Tonio Buonassisi,
Sara M. Hashmi,
Iddo Drori
Abstract:
Generating droplets from a continuous stream of fluid requires precise tuning of a device to find optimized control parameter conditions. It is analytically intractable to compute the necessary control parameter values of a droplet-generating device that produces optimized droplets. Furthermore, as the length scale of the fluid flow changes, the formation physics and optimized conditions that indu…
▽ More
Generating droplets from a continuous stream of fluid requires precise tuning of a device to find optimized control parameter conditions. It is analytically intractable to compute the necessary control parameter values of a droplet-generating device that produces optimized droplets. Furthermore, as the length scale of the fluid flow changes, the formation physics and optimized conditions that induce flow decomposition into droplets also change. Hence, a single proportional integral derivative controller is too inflexible to optimize devices of different length scales or different control parameters, while classification machine learning techniques take days to train and require millions of droplet images. Therefore, the question is posed, can a single method be created that universally optimizes multiple length-scale droplets using only a few data points and is faster than previous approaches? In this paper, a Bayesian optimization and computer vision feedback loop is designed to quickly and reliably discover the control parameter values that generate optimized droplets within different length-scale devices. This method is demonstrated to converge on optimum parameter values using 60 images in only 2.3 hours, 30x faster than previous approaches. Model implementation is demonstrated for two different length-scale devices: a milliscale inkjet device and a microfluidics device.
△ Less
Submitted 15 January, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Online Preconditioning of Experimental Inkjet Hardware by Bayesian Optimization in Loop
Authors:
Alexander E. Siemenn,
Matthew Beveridge,
Tonio Buonassisi,
Iddo Drori
Abstract:
High-performance semiconductor optoelectronics such as perovskites have high-dimensional and vast composition spaces that govern the performance properties of the material. To cost-effectively search these composition spaces, we utilize a high-throughput experimentation method of rapidly printing discrete droplets via inkjet deposition, in which each droplet is comprised of a unique permutation of…
▽ More
High-performance semiconductor optoelectronics such as perovskites have high-dimensional and vast composition spaces that govern the performance properties of the material. To cost-effectively search these composition spaces, we utilize a high-throughput experimentation method of rapidly printing discrete droplets via inkjet deposition, in which each droplet is comprised of a unique permutation of semiconductor materials. However, inkjet printer systems are not optimized to run high-throughput experimentation on semiconductor materials. Thus, in this work, we develop a computer vision-driven Bayesian optimization framework for optimizing the deposited droplet structures from an inkjet printer such that it is tuned to perform high-throughput experimentation on semiconductor materials. The goal of this framework is to tune to the hardware conditions of the inkjet printer in the shortest amount of time using the fewest number of droplet samples such that we minimize the time and resources spent on setting the system up for material discovery applications. We demonstrate convergence on optimum inkjet hardware conditions in 10 minutes using Bayesian optimization of computer vision-scored droplet structures. We compare our Bayesian optimization results with stochastic gradient descent.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.