-
Denoising guarantees for optimized sampling schemes in compressed sensing
Authors:
Yaniv Plan,
Matthew S. Scott,
Xia Sheng,
Ozgur Yilmaz
Abstract:
Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for opt…
▽ More
Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for optimized sampling schemes, assuming that the noise is Gaussian. We moreover provide similar guarantees for measurements sampled with-replacement with arbitrary probability weights. All our results hold on prior sets contained in a union of low-dimensional subspaces. Finally, we demonstrate that this denoising behavior appears in empirical experiments with a rate that closely matches our theoretical guarantees when the prior set is the range of a generative ReLU neural network and when it is the set of sparse vectors.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
Model-adapted Fourier sampling for generative compressed sensing
Authors:
Aaron Berk,
Simone Brugiapaglia,
Yaniv Plan,
Matthew Scott,
Xia Sheng,
Ozgur Yilmaz
Abstract:
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each comp…
▽ More
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each component of the so-called local coherence vector $\boldsymbolα$ quantifies the alignment of a corresponding Fourier vector with the range of $G$. We construct a model-adapted sampling strategy with an improved sample complexity of $\textit{O}(kd\| \boldsymbolα\|_{2}^{2})$ measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.
△ Less
Submitted 17 November, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)
Authors:
Luis Rangel DaCosta,
Katherine Sytwu,
Catherine Groschner,
Mary Scott
Abstract:
Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In th…
▽ More
Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapidly generating complex nanoscale atomic structures, and develop an end-to-end workflow for creating large simulated databases for training neural networks. Construction Zone enables fast, systematic sampling of realistic nanomaterial structures, and can be used as a random structure generator for simulated databases, which is important for generating large, diverse synthetic datasets. Using HRTEM imaging as an example, we train a series of neural networks on various subsets of our simulated databases to segment nanoparticles and holistically study the data curation process to understand how various aspects of the curated simulated data -- including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions -- affect model performance across several experimental benchmarks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles from several experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Generalization Across Experimental Parameters in Machine Learning Analysis of High Resolution Transmission Electron Microscopy Datasets
Authors:
Katherine Sytwu,
Luis Rangel DaCosta,
Mary C. Scott
Abstract:
Neural networks are promising tools for high-throughput and accurate transmission electron microscopy (TEM) analysis of nanomaterials, but are known to generalize poorly on data that is "out-of-distribution" from their training data. Given the limited set of image features typically seen in high-resolution TEM imaging, it is unclear which images are considered out-of-distribution from others. Here…
▽ More
Neural networks are promising tools for high-throughput and accurate transmission electron microscopy (TEM) analysis of nanomaterials, but are known to generalize poorly on data that is "out-of-distribution" from their training data. Given the limited set of image features typically seen in high-resolution TEM imaging, it is unclear which images are considered out-of-distribution from others. Here, we investigate how the choice of metadata features in the training dataset influences neural network performance, focusing on the example task of nanoparticle segmentation. We train and validate neural networks across curated, experimentally-collected high-resolution TEM image datasets of nanoparticles under controlled imaging and material parameters, including magnification, dosage, nanoparticle diameter, and nanoparticle material. Overall, we find that our neural networks are not robust across microscope parameters, but do generalize across certain sample parameters. Additionally, data preprocessing heavily influences the generalizability of neural networks trained on nominally similar datasets. Our results highlight the need to understand how dataset features affect deployment of data-driven algorithms.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
A coherence parameter characterizing generative compressed sensing with Fourier measurements
Authors:
Aaron Berk,
Simone Brugiapaglia,
Babhru Joshi,
Yaniv Plan,
Matthew Scott,
Özgür Yilmaz
Abstract:
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We mov…
▽ More
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries and provide recovery bounds, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
△ Less
Submitted 9 November, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Understanding the Influence of Receptive Field and Network Complexity in Neural-Network-Guided TEM Image Analysis
Authors:
Katherine Sytwu,
Catherine Groschner,
Mary C. Scott
Abstract:
Trained neural networks are promising tools to analyze the ever-increasing amount of scientific image data, but it is unclear how to best customize these networks for the unique features in transmission electron micrographs. Here, we systematically examine how neural network architecture choices affect how neural networks segment, or pixel-wise separate, crystalline nanoparticles from amorphous ba…
▽ More
Trained neural networks are promising tools to analyze the ever-increasing amount of scientific image data, but it is unclear how to best customize these networks for the unique features in transmission electron micrographs. Here, we systematically examine how neural network architecture choices affect how neural networks segment, or pixel-wise separate, crystalline nanoparticles from amorphous background in transmission electron microscopy (TEM) images. We focus on decoupling the influence of receptive field, or the area of the input image that contributes to the output decision, from network complexity, which dictates the number of trainable parameters. We find that for low-resolution TEM images which rely on amplitude contrast to distinguish nanoparticles from background, the receptive field does not significantly influence segmentation performance. On the other hand, for high-resolution TEM images which rely on a combination of amplitude and phase contrast changes to identify nanoparticles, receptive field is a key parameter for increased performance, especially in images with minimal amplitude contrast. Our results provide insight and guidance as to how to adapt neural networks for applications with TEM datasets.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net
Authors:
Yawen Huang,
Feng Zheng,
Danyang Wang,
Weilin Huang,
Matthew R. Scott,
Ling Shao
Abstract:
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity…
▽ More
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity of neuroimaging data remains another key challenge in leveraging existing randomized scans effectively, as data of the same modality is often measured differently by different machines. There is a clear need to go beyond the traditional imaging-dependent process and synthesize anatomically specific target-modality data from a source input. In this paper, we propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$\ell_4$Net. Through an initial unification of intra-modal data in the feature maps and multivariate canonical adaptation, CSC$\ell_4$Net facilitates feature-level mutual transformation. The positive definite Riemannian manifold-penalized data fidelity term further enables CSC$\ell_4$Net to reconstruct missing measurements according to transformed features. Finally, the maximization $\ell_4$-norm boils down to a computationally efficient optimization problem. Extensive experiments validate the ability and robustness of our CSC$\ell_4$Net compared to the state-of-the-art methods on multiple datasets.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Investigating the Role of Renewable Energies in Integrated Energy-water Nexus Planning under Uncertainty Using Fuzzy Logic
Authors:
Afshin Ghassemi,
Michael J Scott
Abstract:
Energy and water systems are highly interconnected. Energy is required to extract, transmit, and treat water and wastewater, and water is needed for cooling energy systems. There is a rapid increase in demand for energy and water due to factors such as population and economic growth. In less than 30 years, the need for energy and water will nearly double globally. As the energy and water resources…
▽ More
Energy and water systems are highly interconnected. Energy is required to extract, transmit, and treat water and wastewater, and water is needed for cooling energy systems. There is a rapid increase in demand for energy and water due to factors such as population and economic growth. In less than 30 years, the need for energy and water will nearly double globally. As the energy and water resources are limited, it is critical to have a sustainable energy-water nexus framework to meet these growing demands. Renewable energies provide substantial opportunities in energy-water nexuses by boosting energy and water reliability and sustainability and can be less water-intensive than conventional technologies. These resources, such as wind and solar power, do not need water inputs. As a result, they can be used as a supplement to the energy-water nexus portfolio. In this paper, renewable energies in energy-water nexus have been investigated for a range of possible scenarios. As renewable energy resources are not deterministic, fuzzy logic is used to model the uncertainty. The results show that renewable energies can significantly improve the energy-water nexus planning; however, the power grid reliability on renewable energy should be aligned with the level of systems uncertainty. The gap between the decisions extracted from the Fuzzy model and the deterministic model amplifies the importance of considering uncertainty to generate reliable decisions. Keywords: Energy-water Nexus, Renewable Energies, Optimization under Uncertainty, Fuzzy Logic.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
A Mathematical Approach to Improve Energy-Water Nexus Reliability Using a Novel Multi-Stage Adjustable Fuzzy Robust Approach
Authors:
Afshin Ghassemi,
Michael J Scott
Abstract:
A system of a systems approach that analyzes energy and water systems simultaneously is called energy-water nexus. Neglecting the interrelationship between energy and water drives vulnerabilities whereby limits on one resource can cause constraints on the other resource. Power plant energy production directly depends on water availability, and an outage of the power systems will affect the wastewa…
▽ More
A system of a systems approach that analyzes energy and water systems simultaneously is called energy-water nexus. Neglecting the interrelationship between energy and water drives vulnerabilities whereby limits on one resource can cause constraints on the other resource. Power plant energy production directly depends on water availability, and an outage of the power systems will affect the wastewater treatment facility processes. Therefore, it is essential to integrate energy and water planning models. As mathematical energy-water nexus problems are complex, involve many uncertain parameters, and are large-scale, we proposed a novel multi-stage adjustable Fuzzy robust approach that balances the solutions' robustness against the budget-constraints. Scenario-based analysis indicates that the proposed approach generates flexible and robust decisions that avoid excessive costs compared to conservative methods. Keywords: Energy-water Nexus, Renewable Energy, Optimization under Uncertainty, Fuzzy logic, Robust Optimization
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Machine Learning Pipeline for Segmentation and Defect Identification from High Resolution Transmission Electron Microscopy Data
Authors:
C. K. Groschner,
Christina Choi,
M. C. Scott
Abstract:
In the field of transmission electron microscopy, data interpretation often lags behind acquisition methods, as image processing methods often have to be manually tailored to individual datasets. Machine learning offers a promising approach for fast, accurate analysis of electron microscopy data. Here, we demonstrate a flexible two step pipeline for analysis of high resolution transmission electro…
▽ More
In the field of transmission electron microscopy, data interpretation often lags behind acquisition methods, as image processing methods often have to be manually tailored to individual datasets. Machine learning offers a promising approach for fast, accurate analysis of electron microscopy data. Here, we demonstrate a flexible two step pipeline for analysis of high resolution transmission electron microscopy data, which uses a U-Net for segmentation followed by a random forest for detection of stacking faults. Our trained U-Net is able to segment nanoparticle regions from amorphous background with a Dice coefficient of 0.8 and significantly outperforms traditional image segmentation methods. Using these segmented regions, we are then able to classify whether nanoparticles contain a visible stacking fault with 86% accuracy. We provide this adaptable pipeline as an open source tool for the community. The combined output of the segmentation network and classifier offer a way to determine statistical distributions of features of interest, such as size, shape and defect presence, enabling detection of correlations between these features.
△ Less
Submitted 23 February, 2021; v1 submitted 14 January, 2020;
originally announced January 2020.