-
Revealing the Evolution of Order in Materials Microstructures Using Multi-Modal Computer Vision
Authors:
Arman Ter-Petrosyan,
Michael Holden,
Jenna A. Bilbrey,
Sarah Akers,
Christina Doty,
Kayla H. Yano,
Le Wang,
Rajendra Paudel,
Eric Lang,
Khalid Hattar,
Ryan B. Comes,
Yingge Du,
Bethany E. Matthews,
Steven R. Spurgeon
Abstract:
The development of high-performance materials for microelectronics, energy storage, and extreme environments depends on our ability to describe and direct property-defining microstructural order. Our present understanding is typically derived from laborious manual analysis of imaging and spectroscopy data, which is difficult to scale, challenging to reproduce, and lacks the ability to reveal laten…
▽ More
The development of high-performance materials for microelectronics, energy storage, and extreme environments depends on our ability to describe and direct property-defining microstructural order. Our present understanding is typically derived from laborious manual analysis of imaging and spectroscopy data, which is difficult to scale, challenging to reproduce, and lacks the ability to reveal latent associations needed for mechanistic models. Here, we demonstrate a multi-modal machine learning (ML) approach to describe order from electron microscopy analysis of the complex oxide La$_{1-x}$Sr$_x$FeO$_3$. We construct a hybrid pipeline based on fully and semi-supervised classification, allowing us to evaluate both the characteristics of each data modality and the value each modality adds to the ensemble. We observe distinct differences in the performance of uni- and multi-modal models, from which we draw general lessons in describing crystal order using computer vision.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security
Authors:
Nathan Tallent,
Jan Strube,
Luanzheng Guo,
Hyungro Lee,
Jesun Firoz,
Sayan Ghosh,
Bo Fang,
Oceane Bel,
Steven Spurgeon,
Sarah Akers,
Christina Doty,
Erol Cromwell
Abstract:
Automating the theory-experiment cycle requires effective distributed workflows that utilize a computing continuum spanning lab instruments, edge sensors, computing resources at multiple facilities, data sets distributed across multiple information sources, and potentially cloud. Unfortunately, the obvious methods for constructing continuum platforms, orchestrating workflow tasks, and curating dat…
▽ More
Automating the theory-experiment cycle requires effective distributed workflows that utilize a computing continuum spanning lab instruments, edge sensors, computing resources at multiple facilities, data sets distributed across multiple information sources, and potentially cloud. Unfortunately, the obvious methods for constructing continuum platforms, orchestrating workflow tasks, and curating datasets over time fail to achieve scientific requirements for performance, energy, security, and reliability. Furthermore, achieving the best use of continuum resources depends upon the efficient composition and execution of workflow tasks, i.e., combinations of numerical solvers, data analytics, and machine learning. Pacific Northwest National Laboratory's LDRD "Cloud, High-Performance Computing (HPC), and Edge for Science and Security" (CHESS) has developed a set of interrelated capabilities for enabling distributed scientific workflows and curating datasets. This report describes the results and successes of CHESS from the perspective of open science.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Slowly Scaling Per-Record Differential Privacy
Authors:
Brian Finley,
Anthony M Caruso,
Justin C Doty,
Ashwin Machanavajjhala,
Mikaela R Meyer,
David Pujol,
William Sexton,
Zachary Terner
Abstract:
We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released.
Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distrib…
▽ More
We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released.
Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy. More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss. The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data.
We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility.
△ Less
Submitted 2 May, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation
Authors:
Waqwoya Abebe,
Jan Strube,
Luanzheng Guo,
Nathan R. Tallent,
Oceane Bel,
Steven Spurgeon,
Christina Doty,
Ali Jannesari
Abstract:
Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics - where are boundaries located? what segments are logically similar? - change depending on the domain, such that state-of-the-art foundation models can generate meaningless and incorrect results. Moreover, in certain domains, fine-tuning and retrain…
▽ More
Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics - where are boundaries located? what segments are logically similar? - change depending on the domain, such that state-of-the-art foundation models can generate meaningless and incorrect results. Moreover, in certain domains, fine-tuning and retraining techniques are infeasible: obtaining labels is costly and time-consuming; domain images (micrographs) can be exponentially diverse; and data sharing (for third-party retraining) is restricted. To enable rapid adaptation of the best segmentation technology, we propose the concept of semantic boosting: given a zero-shot foundation model, guide its segmentation and adjust results to match domain expectations. We apply semantic boosting to the Segment Anything Model (SAM) to obtain microstructure segmentation for transmission electron microscopy. Our booster, SAM-I-Am, extracts geometric and textural features of various intermediate masks to perform mask removal and mask merging operations. We demonstrate a zero-shot performance increase of (absolute) +21.35%, +12.6%, +5.27% in mean IoU, and a -9.91%, -18.42%, -4.06% drop in mean false positive masks across images of three difficulty classes over vanilla SAM (ViT-L).
△ Less
Submitted 10 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Unsupervised segmentation of irradiation$\unicode{x2010}$induced order$\unicode{x2010}$disorder phase transitions in electron microscopy
Authors:
Arman H Ter-Petrosyan,
Jenna A Bilbrey,
Christina M Doty,
Bethany E Matthews,
Le Wang,
Yingge Du,
Eric Lang,
Khalid Hattar,
Steven R Spurgeon
Abstract:
We present a method for the unsupervised segmentation of electron microscopy images, which are powerful descriptors of materials and chemical systems. Images are oversegmented into overlapping chips, and similarity graphs are generated from embeddings extracted from a domain$\unicode{x2010}$pretrained convolutional neural network (CNN). The Louvain method for community detection is then applied to…
▽ More
We present a method for the unsupervised segmentation of electron microscopy images, which are powerful descriptors of materials and chemical systems. Images are oversegmented into overlapping chips, and similarity graphs are generated from embeddings extracted from a domain$\unicode{x2010}$pretrained convolutional neural network (CNN). The Louvain method for community detection is then applied to perform segmentation. The graph representation provides an intuitive way of presenting the relationship between chips and communities. We demonstrate our method to track irradiation$\unicode{x2010}$induced amorphous fronts in thin films used for catalysis and electronics. This method has potential for "on$\unicode{x2010}$the$\unicode{x2010}$fly" segmentation to guide emerging automated electron microscopes.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Deep Learning for Automated Experimentation in Scanning Transmission Electron Microscopy
Authors:
Sergei V. Kalinin,
Debangshu Mukherjee,
Kevin M. Roccapriore,
Ben Blaiszik,
Ayana Ghosh,
Maxim A. Ziatdinov,
A. Al-Najjar,
Christina Doty,
Sarah Akers,
Nageswara S. Rao,
Joshua C. Agar,
Steven R. Spurgeon
Abstract:
Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centered experiment workflow design and…
▽ More
Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centered experiment workflow design and optimization. Here, we discuss the associated challenges with the transition to active ML, including sequential data analysis and out-of-distribution drift effects, the requirements for the edge operation, local and cloud data storage, and theory in the loop operations. Specifically, we discuss the relative contributions of human scientists and ML agents in the ideation, orchestration, and execution of experimental workflows and the need to develop universal hyper languages that can apply across multiple platforms. These considerations will collectively inform the operationalization of ML in next-generation experimentation.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Design of a Graphical User Interface for Few-Shot Machine Learning Classification of Electron Microscopy Data
Authors:
Christina Doty,
Shaun Gallagher,
Wenqi Cui,
Wenya Chen,
Shweta Bhushan,
Marjolein Oostrom,
Sarah Akers,
Steven R. Spurgeon
Abstract:
The recent growth in data volumes produced by modern electron microscopes requires rapid, scalable, and flexible approaches to image segmentation and analysis. Few-shot machine learning, which can richly classify images from a handful of user-provided examples, is a promising route to high-throughput analysis. However, current command-line implementations of such approaches can be slow and unintui…
▽ More
The recent growth in data volumes produced by modern electron microscopes requires rapid, scalable, and flexible approaches to image segmentation and analysis. Few-shot machine learning, which can richly classify images from a handful of user-provided examples, is a promising route to high-throughput analysis. However, current command-line implementations of such approaches can be slow and unintuitive to use, lacking the real-time feedback necessary to perform effective classification. Here we report on the development of a Python-based graphical user interface that enables end users to easily conduct and visualize the output of few-shot learning models. This interface is lightweight and can be hosted locally or on the web, providing the opportunity to reproducibly conduct, share, and crowd-source few-shot analyses.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.