-
Scaling Computational Fluid Dynamics: In Situ Visualization of NekRS using SENSEI
Authors:
Victor A. Mateevitsi,
Mathis Bode,
Nicola Ferrier,
Paul Fischer,
Jens Henrik Göbbert,
Joseph A. Insley,
Yu-Hsiang Lan,
Misun Min,
Michael E. Papka,
Saumil Patel,
Silvio Rizzi,
Jonathan Windgassen
Abstract:
In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and G…
▽ More
In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and GPU-driven High-Performance Computing (HPC) and confront larger problem sizes, the choice becomes increasingly stark: to compromise data fidelity or to reduce resolution. To navigate this challenge, this study advocates for the use of in situ analysis and visualization techniques. These allow more frequent data "snapshots" to be taken directly from memory, thus avoiding the need for disruptive checkpointing. We detail our approach of instrumenting NekRS, a GPU-focused thermal-fluid simulation code employing the spectral element method (SEM), and describe varied in situ and in transit strategies for data rendering. Additionally, we provide concrete scientific use-cases and report on runs performed on Polaris, Argonne Leadership Computing Facility's (ALCF) 44 Petaflop supercomputer and Jülich Wizard for European Leadership Science (JUWELS) Booster, Jülich Supercomputing Centre's (JSC) 71 Petaflop High Performance Computing (HPC) system, offering practical insight into the implications of our methodology.
△ Less
Submitted 18 December, 2023; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Adversarial Predictions of Data Distributions Across Federated Internet-of-Things Devices
Authors:
Samir Rajani,
Dario Dematties,
Nathaniel Hudson,
Kyle Chard,
Nicola Ferrier,
Rajesh Sankaran,
Peter Beckman
Abstract:
Federated learning (FL) is increasingly becoming the default approach for training machine learning models across decentralized Internet-of-Things (IoT) devices. A key advantage of FL is that no raw data are communicated across the network, providing an immediate layer of privacy. Despite this, recent works have demonstrated that data reconstruction can be done with the locally trained model updat…
▽ More
Federated learning (FL) is increasingly becoming the default approach for training machine learning models across decentralized Internet-of-Things (IoT) devices. A key advantage of FL is that no raw data are communicated across the network, providing an immediate layer of privacy. Despite this, recent works have demonstrated that data reconstruction can be done with the locally trained model updates which are communicated across the network. However, many of these works have limitations with regard to how the gradients are computed in backpropagation. In this work, we demonstrate that the model weights shared in FL can expose revealing information about the local data distributions of IoT devices. This leakage could expose sensitive information to malicious actors in a distributed system. We further discuss results which show that injecting noise into model weights is ineffective at preventing data leakage without seriously harming the global model accuracy.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
EXSCLAIM! -- An automated pipeline for the construction of labeled materials imaging datasets from literature
Authors:
Eric Schwenker,
Weixin Jiang,
Trevor Spreadbury,
Nicola Ferrier,
Oliver Cossairt,
Maria K. Y. Chan
Abstract:
Due to recent improvements in image resolution and acquisition speed, materials microscopy is experiencing an explosion of published imaging data. The standard publication format, while sufficient for traditional data ingestion scenarios where a select number of images can be critically examined and curated manually, is not conducive to large-scale data aggregation or analysis, hindering data shar…
▽ More
Due to recent improvements in image resolution and acquisition speed, materials microscopy is experiencing an explosion of published imaging data. The standard publication format, while sufficient for traditional data ingestion scenarios where a select number of images can be critically examined and curated manually, is not conducive to large-scale data aggregation or analysis, hindering data sharing and reuse. Most images in publications are presented as components of a larger figure with their explicit context buried in the main body or caption text, so even if aggregated, collections of images with weak or no digitized contextual labels have limited value. To solve the problem of curating labeled microscopy data from literature, this work introduces the EXSCLAIM! Python toolkit for the automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific literature. We highlight the methodology behind the construction of EXSCLAIM! and demonstrate its ability to extract and label open-source scientific images at high volume.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
A Two-stage Framework for Compound Figure Separation
Authors:
Weixin Jiang,
Eric Schwenker,
Trevor Spreadbury,
Nicola Ferrier,
Maria K. Y. Chan,
Oliver Cossairt
Abstract:
Scientific literature contains large volumes of complex, unstructured figures that are compound in nature (i.e. composed of multiple images, graphs, and drawings). Separation of these compound figures is critical for information retrieval from these figures. In this paper, we propose a new strategy for compound figure separation, which decomposes the compound figures into constituent subfigures wh…
▽ More
Scientific literature contains large volumes of complex, unstructured figures that are compound in nature (i.e. composed of multiple images, graphs, and drawings). Separation of these compound figures is critical for information retrieval from these figures. In this paper, we propose a new strategy for compound figure separation, which decomposes the compound figures into constituent subfigures while preserving the association between the subfigures and their respective caption components. We propose a two-stage framework to address the proposed compound figure separation problem. In particular, the subfigure label detection module detects all subfigure labels in the first stage. Then, in the subfigure detection module, the detected subfigure labels help to detect the subfigures by optimizing the feature selection process and providing the global layout information as extra features. Extensive experiments are conducted to validate the effectiveness and superiority of the proposed framework, which improves the detection precision by 9%.
△ Less
Submitted 7 October, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Toward an Automated HPC Pipeline for Processing Large Scale Electron Microscopy Data
Authors:
Rafael Vescovi,
Hanyu Li,
Jeffery Kinnison,
Murat Keceli,
Misha Salim,
Narayanan Kasthuri,
Thomas D. Uram,
Nicola Ferrier
Abstract:
We present a fully modular and scalable software pipeline for processing electron microscope (EM) images of brain slices into 3D visualization of individual neurons and demonstrate an end-to-end segmentation of a large EM volume using a supercomputer. Our pipeline scales multiple packages used by the EM community with minimal changes to the original source codes. We tested each step of the pipelin…
▽ More
We present a fully modular and scalable software pipeline for processing electron microscope (EM) images of brain slices into 3D visualization of individual neurons and demonstrate an end-to-end segmentation of a large EM volume using a supercomputer. Our pipeline scales multiple packages used by the EM community with minimal changes to the original source codes. We tested each step of the pipeline individually, on a workstation, a cluster, and a supercomputer. Furthermore, we can compose workflows from these operations using a Balsam database that can be triggered during the data acquisition or with the use of different front ends and control the granularity of the pipeline execution. We describe the implementation of our pipeline and modifications required to integrate and scale up existing codes. The modular nature of our environment enables diverse research groups to contribute to the pipeline without disrupting the workflow, i.e. new individual codes can be easily integrated for each step on the pipeline.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Authors:
Wushi Dong,
Murat Keceli,
Rafael Vescovi,
Hanyu Li,
Corey Adams,
Elise Jennings,
Samuel Flender,
Tom Uram,
Venkatram Vishwanath,
Nicola Ferrier,
Narayanan Kasthuri,
Peter Littlewood
Abstract:
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel d…
▽ More
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod library, which is different from the asynchronous training scheme used in the published FFN code. We demonstrated that our distributed training scaled well up to 2048 Intel Knights Landing (KNL) nodes on the Theta supercomputer. Our trained models achieved similar level of inference performance, but took less training time compared to previous methods. Our study on the effects of different batch sizes on FFN training suggests ways to further improve training efficiency. Our findings on optimal learning rate and batch sizes agree with previous works.
△ Less
Submitted 9 December, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Training on the Edge: The why and the how
Authors:
Navjot Kukreja,
Alena Shilova,
Olivier Beaumont,
Jan Huckelheim,
Nicola Ferrier,
Paul Hovland,
Gerard Gorman
Abstract:
Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth req…
▽ More
Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth requirements on the Edge nodes. While it is common to see inference being done on Edge nodes today, it is much less common to do training on the Edge. The reasons for this range from computational limitations, to it not being advantageous in reducing communications between the Edge nodes. In this paper, we explore some scenarios where it is advantageous to do training on the Edge, as well as the use of checkpointing strategies to save memory.
△ Less
Submitted 13 February, 2019;
originally announced March 2019.
-
Computational multifocal microscopy
Authors:
Kuan He,
Zihao Wang,
Xiang Huang,
Xiaolei Wang,
Seunghwan Yoo,
Pablo Ruiz,
Itay Gdor,
Alan Selewa,
Nicola J. Ferrier,
Norbert Scherer,
Mark Hereld,
Aggelos K. Katsaggelos,
Oliver Cossairt
Abstract:
Despite recent advances, high performance single-shot 3D microscopy remains an elusive task. By introducing designed diffractive optical elements (DOEs), one is capable of converting a microscope into a 3D "kaleidoscope", in which case the snapshot image consists of an array of tiles and each tile focuses on different depths. However, the acquired multifocal microscopic (MFM) image suffers from mu…
▽ More
Despite recent advances, high performance single-shot 3D microscopy remains an elusive task. By introducing designed diffractive optical elements (DOEs), one is capable of converting a microscope into a 3D "kaleidoscope", in which case the snapshot image consists of an array of tiles and each tile focuses on different depths. However, the acquired multifocal microscopic (MFM) image suffers from multiple sources of degradation, which prevents MFM from further applications. We propose a unifying computational framework which simplifies the imaging system and achieves 3D reconstruction via computation. Our optical configuration omits chromatic correction grating and redesigns the multifocal grating to enlarge the tracking area. Our proposed setup features only one single grating in addition to a regular microscope. The aberration correction, along with Poisson and background denoising, are incorporated in our deconvolution-based fully-automated algorithm, which requires no empirical parameter-tuning. In experiments, we achieve the spatial resolutions of $0.35$um (lateral) and $0.5$um (axial), which are comparable to the resolution that can be achieved with confocal deconvolution microscopy. We demonstrate a 3D video of moving bacteria recorded at $25$ frames per second using our proposed computational multifocal microscopy technique.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
U-SLADS: Unsupervised Learning Approach for Dynamic Dendrite Sampling
Authors:
Yan Zhang,
Xiang Huang,
Nicola Ferrier,
Emine B. Gulsoy,
Charudatta Phatak
Abstract:
Novel data acquisition schemes have been an emerging need for scanning microscopy based imaging techniques to reduce the time in data acquisition and to minimize probing radiation in sample exposure. Varies sparse sampling schemes have been studied and are ideally suited for such applications where the images can be reconstructed from a sparse set of measurements. Dynamic sparse sampling methods,…
▽ More
Novel data acquisition schemes have been an emerging need for scanning microscopy based imaging techniques to reduce the time in data acquisition and to minimize probing radiation in sample exposure. Varies sparse sampling schemes have been studied and are ideally suited for such applications where the images can be reconstructed from a sparse set of measurements. Dynamic sparse sampling methods, particularly supervised learning based iterative sampling algorithms, have shown promising results for sampling pixel locations on the edges or boundaries during imaging. However, dynamic sampling for imaging skeleton-like objects such as metal dendrites remains difficult. Here, we address a new unsupervised learning approach using Hierarchical Gaussian Mixture Mod- els (HGMM) to dynamically sample metal dendrites. This technique is very useful if the users are interested in fast imaging the primary and secondary arms of metal dendrites in solidification process in materials science.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
Reduced Electron Exposure for Energy-Dispersive Spectroscopy using Dynamic Sampling
Authors:
Yan Zhang,
G. M. Dilshan Godaliyadda,
Nicola Ferrier,
Emine B. Gulsoy,
Charles A. Bouman,
Charudatta Phatak
Abstract:
Analytical electron microscopy and spectroscopy of biological specimens, polymers, and other beam sensitive materials has been a challenging area due to irradiation damage. There is a pressing need to develop novel imaging and spectroscopic imaging methods that will minimize such sample damage as well as reduce the data acquisition time. The latter is useful for high-throughput analysis of materia…
▽ More
Analytical electron microscopy and spectroscopy of biological specimens, polymers, and other beam sensitive materials has been a challenging area due to irradiation damage. There is a pressing need to develop novel imaging and spectroscopic imaging methods that will minimize such sample damage as well as reduce the data acquisition time. The latter is useful for high-throughput analysis of materials structure and chemistry. In this work, we present a novel machine learning based method for dynamic sparse sampling of EDS data using a scanning electron microscope. Our method, based on the supervised learning approach for dynamic sampling algorithm and neural networks based classification of EDS data, allows a dramatic reduction in the total sampling of up to 90%, while maintaining the fidelity of the reconstructed elemental maps and spectroscopic data. We believe this approach will enable imaging and elemental mapping of materials that would otherwise be inaccessible to these analysis techniques.
△ Less
Submitted 27 June, 2017;
originally announced July 2017.