-
Dataset of News Articles with Provenance Metadata for Media Relevance Assessment
Authors:
Tomas Peterka,
Matyas Bohacek
Abstract:
Out-of-context and misattributed imagery is the leading form of media manipulation in today's misinformation and disinformation landscape. The existing methods attempting to detect this practice often only consider whether the semantics of the imagery corresponds to the text narrative, missing manipulation so long as the depicted objects or scenes somewhat correspond to the narrative at hand. To t…
▽ More
Out-of-context and misattributed imagery is the leading form of media manipulation in today's misinformation and disinformation landscape. The existing methods attempting to detect this practice often only consider whether the semantics of the imagery corresponds to the text narrative, missing manipulation so long as the depicted objects or scenes somewhat correspond to the narrative at hand. To tackle this, we introduce News Media Provenance Dataset, a dataset of news articles with provenance-tagged images. We formulate two tasks on this dataset, location of origin relevance (LOR) and date and time of origin relevance (DTOR), and present baseline results on six large language models (LLMs). We identify that, while the zero-shot performance on LOR is promising, the performance on DTOR hinders, leaving room for specialized architectures and future work.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories
Authors:
Tomas Peterka,
Matyas Bohacek
Abstract:
The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation, whether in deepfakes or text articles, often miss the interplay between multiple modalities. Built around a large language model, the system proposed in this p…
▽ More
The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation, whether in deepfakes or text articles, often miss the interplay between multiple modalities. Built around a large language model, the system proposed in this paper addresses these challenges. It analyzes both the article's text and the provenance metadata of included images and videos to determine whether they are relevant. We open-source the system prototype and interactive web interface.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Make the Fastest Faster: Importance Mask for Interactive Volume Visualization using Reconstruction Neural Networks
Authors:
Jianxin Sun,
David Lenz,
Hongfeng Yu,
Tom Peterka
Abstract:
Visualizing a large-scale volumetric dataset with high resolution is challenging due to the high computational time and space complexity. Recent deep-learning-based image inpainting methods significantly improve rendering latency by reconstructing a high-resolution image for visualization in constant time on GPU from a partially rendered image where only a small portion of pixels go through the ex…
▽ More
Visualizing a large-scale volumetric dataset with high resolution is challenging due to the high computational time and space complexity. Recent deep-learning-based image inpainting methods significantly improve rendering latency by reconstructing a high-resolution image for visualization in constant time on GPU from a partially rendered image where only a small portion of pixels go through the expensive rendering pipeline. However, existing methods need to render every pixel of a predefined regular sampling pattern. In this work, we provide Importance Mask Learning (IML) and Synthesis (IMS) networks which are the first attempts to learn importance regions from the sampling pattern to further minimize the number of pixels to render by jointly considering the dataset, user's view parameters, and the downstream reconstruction neural networks. Our solution is a unified framework to handle various image inpainting-based visualization methods through the proposed differentiable compaction/decompaction layers. Experiments show our method can further improve the overall rendering latency of state-of-the-art volume visualization methods using reconstruction neural network for free when rendering scientific volumetric datasets. Our method can also directly optimize the off-the-shelf pre-trained reconstruction neural networks without elongated retraining.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Do Large Language Models Speak Scientific Workflows?
Authors:
Orcun Yildiz,
Tom Peterka
Abstract:
With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models usin…
▽ More
With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models using state-of-the-art workflow systems. Our studies reveal that LLMs often struggle with workflow related tasks due to their lack of knowledge of scientific workflows. We further observe that the performance of LLMs varies across experiments and workflow systems. Our findings can help workflow developers and users in understanding LLMs capabilities in scientific workflows, and motivate further research applying LLMs to workflows.
△ Less
Submitted 6 January, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
ChatVis: Automating Scientific Visualization with a Large Language Model
Authors:
Tanwi Mallick,
Orcun Yildiz,
David Lenz,
Tom Peterka
Abstract:
We develop an iterative assistant we call ChatVis that can synthetically generate Python scripts for data analysis and visualization using a large language model (LLM). The assistant allows a user to specify the operations in natural language, attempting to generate a Python script for the desired operations, prompting the LLM to revise the script as needed until it executes correctly. The iterati…
▽ More
We develop an iterative assistant we call ChatVis that can synthetically generate Python scripts for data analysis and visualization using a large language model (LLM). The assistant allows a user to specify the operations in natural language, attempting to generate a Python script for the desired operations, prompting the LLM to revise the script as needed until it executes correctly. The iterations include an error detection and correction mechanism that extracts error messages from the execution of the script and subsequently prompts LLM to correct the error. Our method demonstrates correct execution on five canonical visualization scenarios, comparing results with ground truth. We also compared our results with scripts generated by several other LLMs without any assistance. In every instance, ChatVis successfully generated the correct script, whereas the unassisted LLMs failed to do so. The code is available on GitHub: https://github.com/tanwimallick/ChatVis/.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Adaptive Multi-Resolution Encoding for Interactive Large-Scale Volume Visualization through Functional Approximation
Authors:
Jianxin Sun,
David Lenz,
Hongfeng Yu,
Tom Peterka
Abstract:
Functional approximation as a high-order continuous representation provides a more accurate value and gradient query compared to the traditional discrete volume representation. Volume visualization directly rendered from functional approximation generates high-quality rendering results without high-order artifacts caused by trilinear interpolations. However, querying an encoded functional approxim…
▽ More
Functional approximation as a high-order continuous representation provides a more accurate value and gradient query compared to the traditional discrete volume representation. Volume visualization directly rendered from functional approximation generates high-quality rendering results without high-order artifacts caused by trilinear interpolations. However, querying an encoded functional approximation is computationally expensive, especially when the input dataset is large, making functional approximation impractical for interactive visualization. In this paper, we proposed a novel functional approximation multi-resolution representation, Adaptive-FAM, which is lightweight and fast to query. We also design a GPU-accelerated out-of-core multi-resolution volume visualization framework that directly utilizes the Adaptive-FAM representation to generate high-quality rendering with interactive responsiveness. Our method can not only dramatically decrease the caching time, one of the main contributors to input latency, but also effectively improve the cache hit rate through prefetching. Our approach significantly outperforms the traditional function approximation method in terms of input latency while maintaining comparable rendering quality.
△ Less
Submitted 30 August, 2024;
originally announced September 2024.
-
Critical Point Extraction from Multivariate Functional Approximation
Authors:
Guanqun Ma,
David Lenz,
Tom Peterka,
Hanqi Guo,
Bei Wang
Abstract:
Advances in high-performance computing require new ways to represent large-scale scientific data to support data storage, data transfers, and data analysis within scientific workflows. Multivariate functional approximation (MFA) has recently emerged as a new continuous meshless representation that approximates raw discrete data with a set of piecewise smooth functions. An MFA model of data thus of…
▽ More
Advances in high-performance computing require new ways to represent large-scale scientific data to support data storage, data transfers, and data analysis within scientific workflows. Multivariate functional approximation (MFA) has recently emerged as a new continuous meshless representation that approximates raw discrete data with a set of piecewise smooth functions. An MFA model of data thus offers a compact representation and supports high-order evaluation of values and derivatives anywhere in the domain. In this paper, we present CPE-MFA, the first critical point extraction framework designed for MFA models of large-scale, high-dimensional data. CPE-MFA extracts critical points directly from an MFA model without the need for discretization or resampling. This is the first step toward enabling continuous implicit models such as MFA to support topological data analysis at scale.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network
Authors:
Tianyu Xiong,
Skylar W. Wurster,
Hanqi Guo,
Tom Peterka,
Han-Wei Shen
Abstract:
Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not…
▽ More
Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not support inference time reconstruction quality assessment, as coordinate-level errors cannot be evaluated in the absence of ground truth data. We propose a parameter-efficient multi-decoder SRN (MDSRN) ensemble architecture consisting of a shared feature grid with multiple lightweight multi-layer perceptron decoders. MDSRN can generate a set of plausible predictions for a given input coordinate to compute the mean as the prediction of the multi-decoder ensemble and the variance as a confidence score. The coordinate-level variance can be rendered along with the data to inform the reconstruction quality, or be integrated into uncertainty-aware volume visualization algorithms. To prevent the misalignment between the quantified variance and the prediction quality, we propose a novel variance regularization loss for ensemble learning that promotes the Regularized multi-decoder SRN (RMDSRN) to obtain a more reliable variance that correlates closely to the true model error. We comprehensively evaluate the quality of variance quantification and data reconstruction of Monte Carlo Dropout, Mean Field Variational Inference, Deep Ensemble, and Predicting Variance compared to the proposed MDSRN and RMDSRN across diverse scalar field datasets. We demonstrate that RMDSRN attains the most accurate data reconstruction and competitive variance-error correlation among uncertain SRNs under the same neural network parameter budgets.
△ Less
Submitted 5 August, 2024; v1 submitted 26 July, 2024;
originally announced July 2024.
-
Wilkins: HPC In Situ Workflows Made Easy
Authors:
Orcun Yildiz,
Dmitriy Morozov,
Arnur Nigmetov,
Bogdan Nicolae,
Tom Peterka
Abstract:
In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing…
▽ More
In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing scalable and efficient execution of workflow tasks. Wilkins provides a flexible workflow description interface, employs a high-performance data transport layer based on HDF5, and supports tasks with disparate data rates by providing a flow control mechanism. Wilkins seamlessly couples scientific tasks that already use HDF5, without requiring task code modifications. We demonstrate the above features using both synthetic benchmarks and two science use cases in materials science and cosmology.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Scalable Volume Visualization for Big Scientific Data Modeled by Functional Approximation
Authors:
Jianxin Sun,
David Lenz,
Hongfeng Yu,
Tom Peterka
Abstract:
Considering the challenges posed by the space and time complexities in handling extensive scientific volumetric data, various data representations have been developed for the analysis of large-scale scientific data. Multivariate functional approximation (MFA) is an innovative data model designed to tackle substantial challenges in scientific data analysis. It computes values and derivatives with h…
▽ More
Considering the challenges posed by the space and time complexities in handling extensive scientific volumetric data, various data representations have been developed for the analysis of large-scale scientific data. Multivariate functional approximation (MFA) is an innovative data model designed to tackle substantial challenges in scientific data analysis. It computes values and derivatives with high-order accuracy throughout the spatial domain, mitigating artifacts associated with zero- or first-order interpolation. However, the slow query time through MFA makes it less suitable for interactively visualizing a large MFA model. In this work, we develop the first scalable interactive volume visualization pipeline, MFA-DVV, for the MFA model encoded from large-scale datasets. Our method achieves low input latency through distributed architecture, and its performance can be further enhanced by utilizing a compressed MFA model while still maintaining a high-quality rendering result for scientific datasets. We conduct comprehensive experiments to show that MFA-DVV can decrease the input latency and achieve superior visualization results for big scientific data compared with existing approaches.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization
Authors:
Skylar Wolfgang Wurster,
Tianyu Xiong,
Han-Wei Shen,
Hanqi Guo,
Tom Peterka
Abstract:
Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a do…
▽ More
Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.
△ Less
Submitted 6 April, 2024; v1 submitted 16 July, 2023;
originally announced August 2023.
-
TROPHY: A Topologically Robust Physics-Informed Tracking Framework for Tropical Cyclones
Authors:
Lin Yan,
Hanqi Guo,
Thomas Peterka,
Bei Wang,
Jiali Wang
Abstract:
Tropical cyclones (TCs) are among the most destructive weather systems. Realistically and efficiently detecting and tracking TCs are critical for assessing their impacts and risks. Recently, a multilevel robustness framework has been introduced to study the critical points of time-varying vector fields. The framework quantifies the robustness of critical points across varying neighborhoods. By rel…
▽ More
Tropical cyclones (TCs) are among the most destructive weather systems. Realistically and efficiently detecting and tracking TCs are critical for assessing their impacts and risks. Recently, a multilevel robustness framework has been introduced to study the critical points of time-varying vector fields. The framework quantifies the robustness of critical points across varying neighborhoods. By relating the multilevel robustness with critical point tracking, the framework has demonstrated its potential in cyclone tracking. An advantage is that it identifies cyclonic features using only 2D wind vector fields, which is encouraging as most tracking algorithms require multiple dynamic and thermodynamic variables at different altitudes. A disadvantage is that the framework does not scale well computationally for datasets containing a large number of cyclones. This paper introduces a topologically robust physics-informed tracking framework (TROPHY) for TC tracking. The main idea is to integrate physical knowledge of TC to drastically improve the computational efficiency of multilevel robustness framework for large-scale climate datasets. First, during preprocessing, we propose a physics-informed feature selection strategy to filter 90% of critical points that are short-lived and have low stability, thus preserving good candidates for TC tracking. Second, during in-processing, we impose constraints during the multilevel robustness computation to focus only on physics-informed neighborhoods of TCs. We apply TROPHY to 30 years of 2D wind fields from reanalysis data in ERA5 and generate a number of TC tracks. In comparison with the observed tracks, we demonstrate that TROPHY can capture TC characteristics that are comparable to and sometimes even better than a well-validated TC tracking algorithm that requires multiple dynamic and thermodynamic scalar fields.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Neural Stream Functions
Authors:
Skylar Wolfgang Wurster,
Hanqi Guo,
Tom Peterka,
Han-Wei Shen
Abstract:
We present a neural network approach to compute stream functions, which are scalar functions with gradients orthogonal to a given vector field. As a result, isosurfaces of the stream function extract stream surfaces, which can be visualized to analyze flow features. Our approach takes a vector field as input and trains an implicit neural representation to learn a stream function for that vector fi…
▽ More
We present a neural network approach to compute stream functions, which are scalar functions with gradients orthogonal to a given vector field. As a result, isosurfaces of the stream function extract stream surfaces, which can be visualized to analyze flow features. Our approach takes a vector field as input and trains an implicit neural representation to learn a stream function for that vector field. The network learns to map input coordinates to a stream function value by minimizing the inner product of the gradient of the neural network's output and the vector field. Since stream function solutions may not be unique, we give optional constraints for the network to learn particular stream functions of interest. Specifically, we introduce regularizing loss functions that can optionally be used to generate stream function solutions whose stream surfaces follow the flow field's curvature, or that can learn a stream function that includes a stream surface passing through a seeding rake. We also discuss considerations for properly visualizing the trained implicit network and extracting artifact-free surfaces. We compare our results with other implicit solutions and present qualitative and quantitative results for several synthetic and simulated vector fields.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Parallel Domain Decomposition techniques applied to Multivariate Functional Approximation of discrete data
Authors:
Vijay S. Mahadevan,
David Lenz,
Iulian Grindeanu,
Thomas Peterka
Abstract:
Compactly expressing large-scale datasets through Multivariate Functional Approximations (MFA) can be critically important for analysis and visualization to drive scientific discovery. Tackling such problems requires scalable data partitioning approaches to compute MFA representations in amenable wall clock times. We introduce a fully parallel scheme to reduce the total work per task in combinatio…
▽ More
Compactly expressing large-scale datasets through Multivariate Functional Approximations (MFA) can be critically important for analysis and visualization to drive scientific discovery. Tackling such problems requires scalable data partitioning approaches to compute MFA representations in amenable wall clock times. We introduce a fully parallel scheme to reduce the total work per task in combination with an overlapping additive Schwarz-based iterative scheme to compute MFA with a tensor expansion of B-spline bases, while preserving full degree continuity across subdomain boundaries. While previous work on MFA has been successfully proven to be effective, the computational complexity of encoding large datasets on a single process can be severely prohibitive. Parallel algorithms for generating reconstructions from the MFA have had to rely on post-processing techniques to blend discontinuities across subdomain boundaries. In contrast, a robust constrained minimization infrastructure to impose higher-order continuity directly on the MFA representation is presented here. We demonstrate the effectiveness of the parallel approach with domain decomposition solvers, to minimize the subdomain error residuals of the decoded MFA, and more specifically to recover continuity across non-matching boundaries at scale. The analysis of the presented scheme for analytical and scientific datasets in 1-, 2- and 3-dimensions are presented. Extensive strong and weak scalability performances are also demonstrated for large-scale datasets to evaluate the parallel speedup of the MPI-based algorithm implementation on leadership computing machines.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
MFA-DVR: Direct Volume Rendering of MFA Models
Authors:
Jianxin Sun,
David Lenz,
Hongfeng Yu,
Tom Peterka
Abstract:
3D volume rendering is widely used to reveal insightful intrinsic patterns of volumetric datasets across many domains. However, the complex structures and varying scales of volumetric data can make efficiently generating high-quality volume rendering results a challenging task. Multivariate functional approximation (MFA) is a new data model that addresses some of the critical challenges: high-orde…
▽ More
3D volume rendering is widely used to reveal insightful intrinsic patterns of volumetric datasets across many domains. However, the complex structures and varying scales of volumetric data can make efficiently generating high-quality volume rendering results a challenging task. Multivariate functional approximation (MFA) is a new data model that addresses some of the critical challenges: high-order evaluation of both value and derivative anywhere in the spatial domain, compact representation for large-scale volumetric data, and uniform representation of both structured and unstructured data. In this paper, we present MFA-DVR, the first direct volume rendering pipeline utilizing the MFA model, for both structured and unstructured volumetric datasets. We demonstrate improved rendering quality using MFA-DVR on both synthetic and real datasets through a comparative study. We show that MFA-DVR not only generates more faithful volume rendering than using local filters but also performs faster on high-order interpolations on structured and unstructured datasets. MFA-DVR is implemented in the existing volume rendering pipeline of the Visualization Toolkit (VTK) to be accessible by the scientific visualization community.
△ Less
Submitted 14 October, 2023; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Reinforcement Learning for Load-balanced Parallel Particle Tracing
Authors:
Jiayi Xu,
Hanqi Guo,
Han-Wei Shen,
Mukund Raj,
Skylar W. Wurster,
Tom Peterka
Abstract:
We explore an online reinforcement learning (RL) paradigm to dynamically optimize parallel particle tracing performance in distributed-memory systems. Our method combines three novel components: (1) a work donation algorithm, (2) a high-order workload estimation model, and (3) a communication cost model. First, we design an RL-based work donation algorithm. Our algorithm monitors workloads of proc…
▽ More
We explore an online reinforcement learning (RL) paradigm to dynamically optimize parallel particle tracing performance in distributed-memory systems. Our method combines three novel components: (1) a work donation algorithm, (2) a high-order workload estimation model, and (3) a communication cost model. First, we design an RL-based work donation algorithm. Our algorithm monitors workloads of processes and creates RL agents to donate data blocks and particles from high-workload processes to low-workload processes to minimize program execution time. The agents learn the donation strategy on the fly based on reward and cost functions designed to consider processes' workload changes and data transfer costs of donation actions. Second, we propose a workload estimation model, helping RL agents estimate the workload distribution of processes in future computations. Third, we design a communication cost model that considers both block and particle data exchange costs, helping RL agents make effective decisions with minimized communication costs. We demonstrate that our algorithm adapts to different flow behaviors in large-scale fluid dynamics, ocean, and weather simulation data. Our algorithm improves parallel particle tracing performance in terms of parallel efficiency, load balance, and costs of I/O and communication for evaluations with up to 16,384 processors.
△ Less
Submitted 31 January, 2022; v1 submitted 12 September, 2021;
originally announced September 2021.
-
Exact Analytical Parallel Vectors
Authors:
Hanqi Guo,
Tom Peterka
Abstract:
This paper demonstrates that parallel vector curves are piecewise cubic rational curves in 3D piecewise linear vector fields. Parallel vector curves -- loci of points where two vector fields are parallel -- have been widely used to extract features including ridges, valleys, and vortex core lines in scientific data. We define the term \emph{generalized and underdetermined eigensystem} in the form…
▽ More
This paper demonstrates that parallel vector curves are piecewise cubic rational curves in 3D piecewise linear vector fields. Parallel vector curves -- loci of points where two vector fields are parallel -- have been widely used to extract features including ridges, valleys, and vortex core lines in scientific data. We define the term \emph{generalized and underdetermined eigensystem} in the form of $\mathbf{A}\mathbf{x}+\mathbf{a}=λ(\mathbf{B}\mathbf{x}+\mathbf{b})$ in order to derive the piecewise rational representation of 3D parallel vector curves. We discuss how singularities of the rationals lead to different types of intersections with tetrahedral cells.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Deep Hierarchical Super Resolution for Scientific Data
Authors:
Skylar W. Wurster,
Hanqi Guo,
Han-Wei Shen,
Thomas Peterka,
Jiayi Xu
Abstract:
We present a novel technique for hierarchical super resolution (SR) with neural networks (NNs), which upscales volumetric data represented with an octree data structure to a high-resolution uniform grid with minimal seam artifacts on octree node boundaries. Our method uses existing state-of-the-art SR models and adds flexibility to upscale input data with varying levels of detail across the domain…
▽ More
We present a novel technique for hierarchical super resolution (SR) with neural networks (NNs), which upscales volumetric data represented with an octree data structure to a high-resolution uniform grid with minimal seam artifacts on octree node boundaries. Our method uses existing state-of-the-art SR models and adds flexibility to upscale input data with varying levels of detail across the domain, instead of only uniform grid data that are supported in previous approaches. The key is to use a hierarchy of SR NNs, each trained to perform 2x SR between two levels of detail, with a hierarchical SR algorithm that minimizes seam artifacts by starting from the coarsest level of detail and working up. We show that our hierarchical approach outperforms baseline interpolation and hierarchical upscaling methods, and demonstrate the usefulness of our proposed approach across three use cases including data reduction using hierarchical downsampling+SR instead of uniform downsampling+SR, computation savings for hierarchical finite-time Lyapunov exponent field calculation, and super-resolving low-resolution simulation results for a high-resolution approximation visualization.
△ Less
Submitted 14 October, 2022; v1 submitted 30 May, 2021;
originally announced July 2021.
-
Workflows Community Summit: Bringing the Scientific Workflows Community Together
Authors:
Rafael Ferreira da Silva,
Henri Casanova,
Kyle Chard,
Dan Laney,
Dong Ahn,
Shantenu Jha,
Carole Goble,
Lavanya Ramakrishnan,
Luc Peterson,
Bjoern Enders,
Douglas Thain,
Ilkay Altintas,
Yadu Babuji,
Rosa M. Badia,
Vivien Bonazzi,
Taina Coleman,
Michael Crusoe,
Ewa Deelman,
Frank Di Natale,
Paolo Di Tommaso,
Thomas Fahringer,
Rosa Filgueira,
Grigori Fursin,
Alex Ganose,
Bjorn Gruning
, et al. (20 additional authors not shown)
Abstract:
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla…
▽ More
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
FTK: A Simplicial Spacetime Meshing Framework for Robust and Scalable Feature Tracking
Authors:
Hanqi Guo,
David Lenz,
Jiayi Xu,
Xin Liang,
Wenbin He,
Iulian R. Grindeanu,
Han-Wei Shen,
Tom Peterka,
Todd Munson,
Ian Foster
Abstract:
We present the Feature Tracking Kit (FTK), a framework that simplifies, scales, and delivers various feature-tracking algorithms for scientific data. The key of FTK is our high-dimensional simplicial meshing scheme that generalizes both regular and unstructured spatial meshes to spacetime while tessellating spacetime mesh elements into simplices. The benefits of using simplicial spacetime meshes i…
▽ More
We present the Feature Tracking Kit (FTK), a framework that simplifies, scales, and delivers various feature-tracking algorithms for scientific data. The key of FTK is our high-dimensional simplicial meshing scheme that generalizes both regular and unstructured spatial meshes to spacetime while tessellating spacetime mesh elements into simplices. The benefits of using simplicial spacetime meshes include (1) reducing ambiguity cases for feature extraction and tracking, (2) simplifying the handling of degeneracies using symbolic perturbations, and (3) enabling scalable and parallel processing. The use of simplicial spacetime meshing simplifies and improves the implementation of several feature-tracking algorithms for critical points, quantum vortices, and isosurfaces. As a software framework, FTK provides end users with VTK/ParaView filters, Python bindings, a command line interface, and programming interfaces for feature-tracking applications. We demonstrate use cases as well as scalability studies through both synthetic data and scientific applications including Tokamak, fluid dynamics, and superconductivity simulations. We also conduct end-to-end performance studies on the Summit supercomputer. FTK is open-sourced under the MIT license: https://github.com/hguo/ftk
△ Less
Submitted 12 April, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis
Authors:
Jiayi Xu,
Hanqi Guo,
Han-Wei Shen,
Mukund Raj,
Xueyun Wang,
Xueqiao Xu,
Zhehui Wang,
Tom Peterka
Abstract:
We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this…
▽ More
We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this study, we prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs, in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processes using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively.
△ Less
Submitted 13 April, 2021; v1 submitted 4 March, 2020;
originally announced March 2020.
-
InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations
Authors:
Wenbin He,
Junpeng Wang,
Hanqi Guo,
Ko-Chih Wang,
Han-Wei Shen,
Mukund Raj,
Youssef S. G. Nashed,
Tom Peterka
Abstract:
We propose InSituNet, a deep learning based surrogate model to support parameter space exploration for ensemble simulations that are visualized in situ. In situ visualization, generating visualizations at simulation time, is becoming prevalent in handling large-scale simulations because of the I/O and storage constraints. However, in situ visualization approaches limit the flexibility of post-hoc…
▽ More
We propose InSituNet, a deep learning based surrogate model to support parameter space exploration for ensemble simulations that are visualized in situ. In situ visualization, generating visualizations at simulation time, is becoming prevalent in handling large-scale simulations because of the I/O and storage constraints. However, in situ visualization approaches limit the flexibility of post-hoc exploration because the raw simulation data are no longer available. Although multiple image-based approaches have been proposed to mitigate this limitation, those approaches lack the ability to explore the simulation parameters. Our approach allows flexible exploration of parameter space for large-scale ensemble simulations by taking advantage of the recent advances in deep learning. Specifically, we design InSituNet as a convolutional regression model to learn the mapping from the simulation and visualization parameters to the visualization results. With the trained model, users can generate new images for different simulation parameters under various visualization settings, which enables in-depth analysis of the underlying ensemble simulations. We demonstrate the effectiveness of InSituNet in combustion, cosmology, and ocean simulations through quantitative and qualitative evaluations.
△ Less
Submitted 16 October, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
The Universe at Extreme Scale: Multi-Petaflop Sky Simulation on the BG/Q
Authors:
Salman Habib,
Vitali Morozov,
Hal Finkel,
Adrian Pope,
Katrin Heitmann,
Kalyan Kumaran,
Tom Peterka,
Joe Insley,
David Daniel,
Patricia Fasel,
Nicholas Frontiere,
Zarija Lukic
Abstract:
Remarkable observational advances have established a compelling cross-validated model of the Universe. Yet, two key pillars of this model -- dark matter and dark energy -- remain mysterious. Sky surveys that map billions of galaxies to explore the `Dark Universe', demand a corresponding extreme-scale simulation capability; the HACC (Hybrid/Hardware Accelerated Cosmology Code) framework has been de…
▽ More
Remarkable observational advances have established a compelling cross-validated model of the Universe. Yet, two key pillars of this model -- dark matter and dark energy -- remain mysterious. Sky surveys that map billions of galaxies to explore the `Dark Universe', demand a corresponding extreme-scale simulation capability; the HACC (Hybrid/Hardware Accelerated Cosmology Code) framework has been designed to deliver this level of performance now, and into the future. With its novel algorithmic structure, HACC allows flexible tuning across diverse architectures, including accelerated and multi-core systems.
On the IBM BG/Q, HACC attains unprecedented scalable performance -- currently 13.94 PFlops at 69.2% of peak and 90% parallel efficiency on 1,572,864 cores with an equal number of MPI ranks, and a concurrency of 6.3 million. This level of performance was achieved at extreme problem sizes, including a benchmark run with more than 3.6 trillion particles, significantly larger than any cosmological simulation yet performed.
△ Less
Submitted 19 November, 2012;
originally announced November 2012.